|
|
Ecological Metadata Language Increases Research Capability
John Porter, VCR LTER
Ecological Metadata Language (EML) is a metadata standard developed by
the ecological community for the ecological discipline. This article will
provide a brief background on EML and discuss its possible benefits for
LTER sites and ecological research in general. EML is based on prior work
done by the LTER Information Managers Committee, and the Ecological Society
of America Future of Long Term Data Sets (FLED) Committee (Michener et
al. 1997, Ecological Applications).
EML was originally developed at the National Center for Ecological Analysis
and Synthesis (NCEAS) and is now a community-based project with an increasingly
broad list of players.. EML is implemented as a series of XML (eXtensible
Markup Language) document types that can by used in a modular and extensible
manner to document ecological data. Each EML module is designed to describe
one logical part of the total metadata that should be included with any
ecological dataset.
Development of EML is an ongoing process, and it requires comment and
input from the community it will serve. The development of an ecological
metadata standard (EML) has broad implications for LTER research. Currently
each site within the LTER Network is responsible for its own metadata
management system. These metadata management systems have developed over
time as each sites needs have dictated. This has led to heterogeneity
in site metadata content, format and storage systems. Examples of this
heterogeneity run the gamut from text residing in a file on a computer
to complicated centralized relational database management systems with
customized interfaces. This heterogeneity makes the development of software
tools for cross-site searching, sharing, integrating, and analyzing of
metadata and data extremely difficult. EML helps solve this problem by
providing a standard format for metadata content so that software tools
can work together seamlessly. EML could also be used as a guide by information
managers and researchers for their own site or research specific metadata
needs. EML version 2.0 will be finalized at a workshop at the Sevilleta
Research Station in April 2002 and released to the community for review
as a standard soon after.. The ability of sites to generate EML (Ecological
Metadata Language)-standard metadata makes possible the creation of general
tools to support ecological research (Figure 1). A list of tools and capabilities
(made possible by the availability of EML metadata) were identified during
a January 2002 workshop aimed at facilitating the implementation of EML
at LTER sites. For the individual researcher, EML makes possible:
- Access to on-line analytical engines that integrate a variety of analytical
tools such as SAS, MATLAB etc. with point-and-click access to LTER data
from multiple sites
- Seamless, automated preparation for analysis of downloaded data from
LTER sites that are generating EML
- The ability to search, browse and locate available data using sophisticated
search engines
- Automated production of customized data input forms that check data
for errors or inconsistencies as data is being entered, either in the
laboratory or on palmtop-computers in the field
- Use of metadata development as a tool for research design. Research
design tools can have their parameters specified based on metadata.
For the individual site, EML makes possible:
- Use of generic software for metadata and data management, these may
include:
- Sophisticated programs for tracking changes in data
- Easy-to-use metadata entry forms o Powerful search and query
and retrieval interfaces o Tools for managing model and application
data
- Tools for providing data in a variety of forms (e.g., spreadsheet,
statistical package, GIS)
- Tools for production of alternative forms of metadata that allow the
site to easily participate in national databases and clearinghouses
(e.g., Global Change Master Directory, Mercury, National Biological
Information Infrastructure, ISO TC211)
- Reductions in software and personnel costs through use of generic
software that may have been produced by other projects For the LTER
Network as a whole, EML makes possible:
- Improved facilitation of intersite synthesis by standardizing procedures
for use of data from different sites
- Development of software useful to multiple LTER sites
- Easier development of Network Information System Modules that regularly
integrate data from multiple LTER sites

All of these tools and capabilities have three things in
common. First, they will improve our ability to conduct cutting-edge ecological
research such as the recent cross-site primary production comparison (Knapp,
A. and M.D. Smith 2001, Science 291: 481-484). Currently, large amounts
of time are required to process and integrate data that originate at different
sites, or even within the same site. Reducing the time required to standardize
data months or years after it has been collected will facilitate research
at larger scales of time and space. Secondly, because of the extreme heterogeneity
of metadata and data at sites it is virtually impossible to create tools
without LTER site participation in the generation of EML. Only at the
site does the expertise exist to translate site metadata into alternative
forms (e.g., EML). Once created, EML documents allow for a consistent
exchanges so that tools can be developed on-site, according to site needs,
and will be able to access metadata and data from any site that is generating
it. Finally, the standardization of LTER metadata representation in EML
is a foundation we can build on, but erecting our structures atop it,
i.e., generating the EML-based metadata documents and the tools that use
them,will come at a price, demanding additional work and imagination by
information managers and software developers at LTER sites and elsewhere.
Additional information
on EML is available at:
http://www.ecoinformatics.org
Individual Research
A sudden beep from her palmtop computer brought Shirley Wright,
graduate student, back from her musings on the role caterpillar
frass could play in the local nitrogen cycle. Looking at her display,
she saw an error message that the pH value in the data record she
was entering was out of the specified bounds. Shirley had specified
the valid range for pH values when she prepared the metadata for
her data set weeks before. She had estimated an appropriate pH range
by querying an online analysis engine for pH values from data collected
on the same soil type. Someone told her that the data for that analysis
came from three different LTER sites and required an analytical
system that integrated GIS software and statistical packages, but
she didnt need to know the details because the user-friendly
web interface hid unnecessary complexities. The actual analysis
had been done on a computer at some other site, but again Shirley
did not need to be immersed in the technology to use it. Although
time consuming, the preparation of the metadata for Shirleys
study had been a useful exercise. It allowed her to think through
which variables would need to be measured and what units of measurement
should be used. Preparing the metadata ahead of the study 1) gave
her a customized data input program for her palmtop, and 2) automatically
generated quality checking features (like the one that is now beeping
at her). A sudden cold gust of wind reminds Shirley that it might
be important to add a variable for snow depth to her dataset. With
this tool, all she needs to do is add a new variable for snow to
the metadata and download a new input module to her wirelessly-connected
palmtop. The module used to gather snow depth metadata running on
Shirleys palmtop is a collaborative effort between the information
manager at her site and information managers at two others sites.
The three of them can collaborate on modules that can be used by
all sites because it is built upon EML, which is standard across
all sites. However, they arent there to help her with her
current problem: the beeping palm-device. Why the beep? Inspection
of the pH value indicates that Shirley had mistakenly entered two
8s instead of one. Having a measured pH of 88.5 could
definitely have caused problems in the model she is working on.
Glad the program caught it now! -JP
|
Collaborative Studies
Things have not gone well for Dr. Publish R. Parish today. The
working group he assembled to study the effects of caterpillar frass
on soil nitrogen levels is spending all its time dealing with issues
of data formatting and unit conversions, leaving no time for real
analyses. Moreover, a rift is developing in the group between people
who want to analyze the data using Excel and those who want to use
MATLAB. Unfortunately, with 24 sites to deal with, it is excruciating
enough to import the data into one or the other. Doing both is out
of the question! However, things are starting to look up. His new
graduate student, Shirley Wright has been helping the participants
search the LTER Data Catalog, which is based around the EML metadata
standard. Workshop participants are able to access data from all
of the LTER sites using a common interface, one that gives them
the option of receiving the data either as an Excel Spreadsheet
or a MATLAB file, among other options.
They dont even need to worry about the fact that some of
the underlying data was originally archived as comma-separated ASCII,
others as tab separated and yet others using column formatting.
Because that formatting information is stored in the metadata, automated
programs written at different LTER sites can read the data in its
raw form and transform it into the forms requested by his group.
Shirley is now trying out an online data integration engine. It
allows her to identify equivalent variables in different datasets
through a point and click process. Possible matches
are identified based on similar units and she can read the specific
methods for each variable to see if they should, or should not,
be matched with one another. Once she has identified the appropriate
matches, a new dataset (along with its own set of metadata and citations
for the original datasets used) will be produced in the format she
requests. It was never like this in the old days! -JP
|
* Note: The tools and capabilities described in these boxes are not currently
in existence, although in a few cases prototypes are actually under construction.
All rely on development of EML formatted metadata to provide the common
platform for sharing metadata among sites. Additional development regarding
use of information from EML data sets to directly access data itself will
be required.
Acknowledgements: The author thanks participants in the January 2002
Metadata Workshop at CAP LTER for such stimulating discussion of the wealth
of possibilities opened by the development and implementation of EML.
Patty Sprott and Owen Eddins made constructive comments and additions
to the manuscript. However, any errors or absurdities remain the authors.
Particular thanks go to Matt Jones, Peter McCartney and others who have
worked so hard on developing the EML standard.
|