ERDDAP (the Environmental Research Division's Data Access Program) is a data server that gives you
a simple, consistent way to download subsets of scientific datasets in common file formats and
make graphs and maps.
Table of Contents
The Problems that ERDDAP Tries To Solve
Without ERDDAP, when a person (or a computer program) looks on the Internet for a
specific type of scientific data (for example, satellite sea surface temperature data),
there are problems ...
- Interesting datasets are hard to find because they are at many different web sites.
- Each site requires a different protocol to request the data
(for example,
HTTP GET,
XML,
SOAP+XML,
OPeNDAP,
WCS,
WFS,
SOS,
or an HTML form).
- Each site returns the data in a different format (for example, XML, SOAP+XML, OPeNDAP binary
data stream, ASCII text, HDF 4, HDF 5, NetCDF, ...) and it isn't the common file format
that you want.
- Data from different sites is hard to compare because the dates+times are expressed in
different formats for example, "Jan 2, 1985", "02-JAN-1985", "1/2/85", "2/1/85",
"1985-01-02", or days since Jan 1, 1980, or ...).
- For a quick introduction to ERDDAP,
watch the first half of this
video. (5 minutes)
In it, a scientist downloads ocean currents forecast data from ERDDAP to model a toxic
spill in the ocean using
NOAA's GNOME software (in 5 minutes!).
This video shows:
Thanks to Rich Signell.
(One tiny error in the video:
when searching for datasets, don't use AND between search terms. It is implicit.)
- ERDDAP can get data from local (on the server's hard drive) and
remote (accessed via the web) data sources.
See the
list
of types of data sources
that ERDDAP can access.
- ERDDAP can serve many types of scientific data, not just oceanographic data.
ERDDAP is a Data Access Program that was written at
NOAA
NMFS
SWFSC
ERD.
The ERDDAP server at ERD serves oceanographic data,
but ERDDAP (the program) can access and serve any gridded or tabular data.
- ERDDAP offers several ways to search for interesting datasets.
For example,
full text search,
search by category
(also known as faceted search), and
Advanced Search.
Advanced Search combines all of the search techniques
and adds searches for datasets that have data within longitude, latitude, and time ranges,
so you can search for datasets based on many different criteria simultaneously.
- ERDDAP lets you request data in a standardized way,
regardless of the data source's request protocol.
ERDDAP also provides Data Access Forms (web pages) which help humans create the OPeNDAP
requests. OPeNDAP's
Data Access Protocol (DAP)
is one of the recommended
IOOS
DMAC
data transport mechanisms and a
NASA EOSDIS standard.
(OPeNDAP is great!)
ERDDAP translates your request from the OPeNDAP, WMS, or SOS format to the data source's
request format and converts the response to one of ERDDAP's internal data structures.
Then ERDDAP reformats the data in the common file format of your choice (for example, as an
.html table, ESRI .asc, Google Earth .kml, .mat, .nc, ODV .txt, .csv, .tsv, .json, .xhtml, .png, .pdf)
and sends the file to you.
See the list of
griddap file types
and the list of
tabledap file types.
Other protocols for requesting the data (for example,
WCS)
may be added in the future.
ERDDAP is structured for these additions and there don't seem to be any impediments.
- Requests for gridded data can be made in user units.
Although requests for gridded data in ERDDAP can be made with array indices
(following the OPeNDAP specification), requests can also be in user units
(for example, degrees east), using
a parentheses notation,
since users think in those units,
not indices.
- ERDDAP sends results in common data file formats.
The results can be returned in any of several common data file formats
(for example,
.html table,
ESRI .asc,
Google Earth .kml,
.mat,
.nc,
ODV .txt,
.csv,
.tsv,
.json,
.xhtml),
instead of just the original format or just the OPeNDAP transfer format (which has no
standard file manifestation). These files are created on-the-fly.
Since there are few internal data structures, it is easy to add additional file-type drivers.
See the complete list of
grid file types
and table file types.
- ERDDAP standardizes the variable names and units for longitude, latitude, altitude,
depth, and time in the results.
To facilitate comparisons of data from different datasets, the requests and results in ERDDAP
use standardized space/time axis units:
- longitude is always in degrees_east.
- latitude is always in degrees_north.
- altitude is always in meters with positive=up.
- depth is always in meters with positive=down.
- time, when formatted as a number, is always in "seconds since 1970-01-01T00:00:00Z"
(known as Unix time or epoch seconds,
which is
UDUNITS-compatible)
and, when formatted as a string, is formatted according
to the
ISO 8601:2004 "extended" format
standard (YYYY-MM-DDThh:mm:ssZ, for example,
"1985-01-02T00:00:00Z").
(You can convert numeric times to/from ISO string times with ERDDAP's
time converter.)
Also, to avoid time zone and daylight savings time confusion, time values are always
converted to the Zulu (UTC, GMT) time zone.
This makes it easy to specify constraints in requests without having to worry about the altitude
data format (are positive values up or down? in meters or fathoms?) or the time data format
(a nightmarish realm of possible formats and time zones, for example, "Jan 2, 1985",
"02-JAN-1985", "1/2/85", "2/1/85", "1985-01-02", or days since Jan 1, 1980).
This makes the results from different data sources easy to compare.
ERDDAP has a utility to
Convert a Numeric Time to/from a String Time.
For more details, see
How ERDDAP Deals with Time.
Because the longitude, latitude, altitude, and time variables are specifically recognized,
ERDDAP is aware of the geo/temporal features of each dataset.
This is useful when making images with maps or time-series, and when saving data
in geo-referenced file types (e.g., .esriAscii, .geoJson, and .kml).
Two common standards for writing units of measure are:
- UDUNITS
- from
Unidata,
which is used in
COARDS,
CF, and
NetCDF
data files. For example,
UDUNITS has many options for degrees Celsius, including "degree_C" and "degC".
- UCUM
- the Unified Code for Units of Measure.
OGC
services such as
SOS,
WCS, and
WMS
often refer to UCUM as UOM (Units Of Measure). For example, UCUM has just one
case-sensitive option for degrees Celsius: "Cel".
Although ERDDAP doesn't require the use of either units standard, most ERDDAP installations
favor one or the other.
(ERDDAP administrators: you can specify this with the <units_standard>
tag in setup.xml.)
You can convert UDUNITS to/from UCUM units with ERDDAP's
units converter.
When you request data or a graph from a tabledap dataset,
you can append &units("UDUNITS")
or &units("UCUM") to the end of the URL to request UDUNITS or UCUM units.
more information
- ERDDAP can add or modify metadata.
Many data sources have little or no
metadata
(for example,
CF metadata)
describing the data.
ERDDAP lets (and encourages) the administrator to describe metadata which will be added
to datasets and their variables on-the-fly.
See the
addAttributes section
of the
directions for administrators.
- ERDDAP lets you request .png and .pdf image files with graphs and maps
of the data in addition to the actual data. And ERDDAP's Make A Graph lets you customize the images.
Some special uses of these images are:
- Requesting Compressed Files
ERDDAP doesn't offer results stored in compressed (e.g., .zip or .gzip) files.
Instead, ERDDAP looks for
accept-encoding
in the HTTP GET request header sent by the client.
If a supported compression type ("gzip", "x-gzip", or "deflate") is found in the
accept-encoding list, ERDDAP includes "content-encoding" in the HTTP response header
and compresses the data as it transmits it.
It is up to the client program to look for "content-encoding" and decompress the data.
Browsers and OPeNDAP clients do this by default. They request compressed data and
decompress the returned data automatically.
Other clients (e.g., Java programs) have to do this explicitly.
- ERDDAP
makes different types of data servers (OPeNDAP, OBIS, SOS, WMS, ...) interoperable.
Different types of data servers are used in different scientific communities.
In the foreseeable future, it is unlikely that any one type will become dominant and
replace the others. So ERDDAP acts as a bridge between different types of
client programs (web browsers, IDV, Matlab, netCDF programs, ODV, WMS clients, etc.)
and the different types of data servers.
- ERDDAP accepts client requests for data in different formats (e.g., OPeNDAP, WMS).
- ERDDAP converts a given request into the request format used by the source data server
(e.g., OPeNDAP, SOS, OBIS, ...) and sends that to the source data server.
- ERDDAP converts the response data from the source data server into an internal format,
including converting all time data to a common format: "seconds since 1970-01-01T00:00:00Z".
- ERDDAP converts the data from the internal format into the file format requested by the client
(e.g., .csv, Google Earth .kml, .htmlTable, .dods, .mat, .nc, ODV .txt, .png).
Clients don't have to worry about, or know about, the type of the source data server.
They just get the data they want, in the file format they want.
- ERDDAP uses just two basic data structures to hold data.
- Since it is difficult for human clients and computer clients to deal with a complex set of
possible dataset structures, ERDDAP uses just two basic data structures:
- Certainly, not all data can be expressed in these structures, but much of it can.
Tables, in particular, are very flexible data structures (look at the phenomenal success
of
relational database programs).
- This makes data queries easier to construct.
- This makes data responses have a simple structure, which makes it easier to serve the data
in a wider variety of standard file types (which often just support simple data structures).
This is the main reason that we set up ERDDAP this way.
- This, in turn, makes it very easy for us (or anyone) to write client software which works
with all ERDDAP datasets.
- This makes it easier to compare data from different sources, for example for an
Integrated Ecosystem Analysis (IEA).
- We are very aware that if you are used to working with data in other data structures
you may initially think that this approach is simplistic or insufficient.
But all data structures have tradeoffs. None is perfect.
Even the do-it-all structures have their downsides: working with them is complex and
the files can only be written or read with special software libraries.
If you accept ERDDAP's approach enough to try to work with it, you may find that it has
its advantages (notably the support for multiple file types that can hold the data responses).
The
original ERDDAP slide show
(particularly the
data
structures slide)
talks about these issues.
- And even if this approach sounds odd to you, most ERDDAP clients will never notice --
they will simply see that all of the datasets have a nice simple structure and they will
be thankful that they can get data from a wide variety of sources returned in a wide
variety of file formats.
- ERDDAP offers
email/URL and
RSS subscription services,
so you can be notified whenever a dataset changes.
- ERDDAP is very good at detecting changes to gridded datasets because it can detect
when the axis values (e.g., the time values) change.
- ERDDAP is not very good at detecting changes to tabular datasets because there are usually
no changes to the metadata when new data is added.
- ERDDAP will detect if a dataset becomes unavailable (but perhaps not immediately).
- ERDDAP will detect when that dataset becomes available again.
- ERDDAP makes no promises about the suitability or accuracy of these services
(see ERDDAP's DISCLAIMERS).
Email/URL Subscriptions
(not available at some ERDDAP installations)
Whenever a dataset changes, the email/URL subscription system will immediately
send you an email or contact a URL that you specify.
Email/URL subscriptions are not available at some ERDDAP installations.
To set up an email/URL subscription, click on one of the envelope icons
that appear at the far right on ERDDAP web pages with lists of datasets
(example)
and on the Data Access Forms and Make A Graph web pages for individual datasets
(example)
if this ERDDAP installation supports email/URL subscriptions.
(Computer programmers: if you write web services, you can use the URL system
to have ERDDAP notify your web service immediately whenever a dataset changes.)
RSS Subscriptions
RSS is standard system for notifying users when the content at a web site has changed.
Modern web browsers have an RSS client built in or you can use a separate
RSS Reader.
ERDDAP offers a separate RSS 2.01 feed for each dataset so that you can find out
when interesting datasets have changed.
To subscribe to a dataset's RSS feed, click on one of the RSS icons
that appear at the far right on ERDDAP web pages with lists of datasets
(example)
or on the Data Access Forms and Make A Graph web pages for individual datasets
(example).
Comparison
The RSS service may be just what you are looking for. It is a nice standard.
But if you need to know as soon as possible when a dataset changes, use the
email/URL system, not RSS. RSS clients periodically (every hour?) request and read
the RSS XML document to look for changes.
So typically, an RSS client will not detect a change to a dataset quickly (average 30 minutes?).
In contrast, the email/URL subscription system acts immediately whenever ERDDAP detects
a change to a dataset.
The more pro-active approach of the email/URL system is also much more efficient:
You may be able to set your RSS client to check for changes every minute (don't do it!),
but that would just lead to lots of unnecessary requests to the ERDDAP server
and it still wouldn't detect changes immediately.
- ERDDAP is a
web application
(for humans with browsers)
and a
web service
(with services for computer programs).
- ERDDAP has
REST-
and
ROA-style
links to make its services available to computer programs.
These features can be used to build another web service on top of ERDDAP
(making ERDDAP do all the work!).
ERDDAP is not intended to be a high-level data exploration/graphing service.
Instead, ERDDAP is intended to provide services for such web sites and programs.
So if you have an idea for a better interface to the data the ERDDAP serves,
we encourage you to build your own web application or web service, and use ERDDAP
as the foundation. Read more about ERDDAP's
Services for Computer Programs.
- Security - By default, ERDDAP runs as an entirely public server with no login
system and no restrictions to data access.
However, an ERDDAP administrator can configure ERDDAP to restrict access to some
or all datasets to users who log in and have been assigned certain roles.
ERDDAP has built-in methods for authentication (logging in).
If an ERDDAP installation has authentication turned on, there will be a "log in" link
at the top of each web page.
Users never have to log in to access the publicly available datasets.
Users who have logged in can access public datasets and the private datasets to which
they are allowed access.
ERDDAP uses http: URLs for users who aren't logged in, and https: (Secure Sockets Layer)
URLs for users who are.
more information
- ERDDAP processes data in chunks.
To save memory (a big issue) and make responses start sooner, ERDDAP processes
data requests in chunks -- repeatedly getting a chunk of data from the source,
cleaning it up (for example, adding
metadata),
and sending that to the client.
For many data sources, this means that the first chunk of data (for example, from the
first sensor) gets to the client in seconds instead of minutes (for example, after data
from the last sensor has been retrieved), reassuring the client that the data is coming.
From a memory standpoint, this allows numerous large requests (each larger than available
memory) to be handled simultaneously.
- ERDDAP has a modular structure.
ERDDAP is structured so that it is easy to add different components
(for example, a class to request data from a SOS server and store it as a table).
The new component then gains all the features and capabilities of the parent
(for example, support for OPeNDAP requests and the ability to save the data in several
common file formats).
- Data Dissemination / Data Distribution Networks: Push and Pull Technology
Normally, ERDDAP acts as an intermediary: it takes a request from a user;
gets data from a remote data source; reformats the data; and sends it to the user.
Pull Technology:
But ERDDAP also has the ability to actively get all of the available data
from a remote data source and
store
a local copy of the data.
Push Technology:
By using ERDDAP's subscription services,
other data servers can be notified
as soon as new data is available so that they can request the data (by pulling the data).
ERDDAP's
EDDGridFromErddap and EDDTableFromErddap use ERDDAP's subscription services and
flag system
so that they will be notified immediately when new data is available.
You can combine these to great effect:
if you wrap an EDDGridCopy around an EDDGridFromErddap dataset
(or wrap an EDDTableCopy around an EDDTableFromErddap dataset),
ERDDAP will automatically create and maintain a local copy of another ERDDAP's dataset.
Because the subscription services work as soon as new data is available,
push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data
for his/her ERDDAP comes from. Other ERDDAP administrators can do the same.
There is no need for coordination between administrators.
If many ERDDAP administrators link to each other's ERDDAPs,
a data distribution network is formed.
Data will be quickly, efficiently, and automatically disseminated from data sources
(ERDDAPs and other servers) to data re-distribution sites (ERDDAPs) anywhere in the network.
A given ERDDAP can be both a source of data for some datasets and a re-distribution site
for other datasets.
The resulting network is roughly similar to data distribution networks set up with programs
like
Unidata's IDD/IDM,
but less rigidly structured.
Is ERDDAP a solution to everyone's data distribution / data access problems?
No. ERDDAP tries to find a sweet spot that is a really good solution to most of the
data distribution problems that we confronted.
ERDDAP takes a middleware approach:
It can get data from lots of different types of remote data servers
and it can give that data to clients in lots of different file formats.
It is designed to be an agnostic solution which seeks to make other data servers
(OPeNDAP, SOS, OBIS, WMS, ...) interoperable.
Is there one perfect data server that meets everyone's needs perfectly? We don't think so.
And even if you think there is or will be, it will be a long time before everyone switches
to it, if ever. Until then, ERDDAP is available right now to make other data servers
interoperable and to serve data right now.
ERDDAP can handle many/most datasets as is, but not all.
It isn't that the remaining datasets (e.g., model data using a cubed sphere projection)
aren't important. It's just that ERDDAP's goal of returning data in common file formats
(some of which are pretty simple), precludes a more complex internal data structure.
Groups of researchers working with more complex data structures often already have specialized
data servers and specialized client software which are customized to their community's needs.
ERDDAP, as a general purpose data server, doesn't try to compete with these specialized data servers.
They are customized to the needs of their community and do a great job.
However, those datasets are often only "understood" by the specialized software in that community.
A Work-Around for Complex Datasets - ERDDAP has a way to handle complex datasets that it
can't handle directly. Just as a
relational database
can store a complex dataset by using just
one simple data structure (a table), ERDDAP can serve the data from more complex datasets by
breaking the source dataset into a few ERDDAP datasets, each with similar, simple data structures.
For example, some gridded environmental model datasets can be stored in ERDDAP by
putting the sea surface variables ([time][latitude][longitude]) in one ERDDAP dataset,
and by putting the variables with altitude ([time][altitude][latitude][longitude])
in another ERDDAP dataset. We know this isn't ideal, but it is necessary to allow ERDDAP
to return data in common file formats (some of which are pretty simple).
Another approach to dealing with complex datasets (e.g., for model data using a cubed
sphere projection) is to also offer a reprojected version of the dataset
([time][altitude][latitude][longitude]) which ERDDAP can work with easily.
These simpler data structures aren't meant to replace the original data structures,
but they can be a useful way to distribute the data to a wider audience.
How to Cite ERDDAP in a Paper
If you want to cite ERDDAP itself in a scientific paper, please use something like
Simons, R.A. 2011. ERDDAP - The Environmental Research Division's
Data Access Program. http://coastwatch.pfeg.noaa.gov/erddap .
Pacific Grove, CA: NOAA/NMFS/SWFSC/ERD.
If you want to cite a specific dataset in ERDDAP, please generate the citation based on the information in the dataset's metadata. If you are referring to a specific subset of a dataset, please include the complete URL needed to replicate that download.
Guidelines for Data Distribution Systems
Bob's opinions about the design
and evaluation of data distribution systems can be found
here.
You can
Set Up Your Own ERDDAP Server
and serve your own data.
- The small effort to set up ERDDAP brings many benefits.
- If you already have a web service for distributing your data, you can set up ERDDAP
to access your data via the existing service or via the source files or database.
Then, people will have another way to access your data and will be able to download
the data in additional file formats or as graphs or maps.
- If you have datasets that are in high demand, you can install
multiple ERDDAPs
that work together to scale up and meet the needs of a large data
distribution center.
If you have questions, suggestions, or comments about ERDDAP in general (not this specific
ERDDAP installation), please send an email to bob dot simons at noaa dot gov
and include the ERDDAP URL directly related to your question or comment.
|