Lola M. Olsen and Gene R. Major
Researchers, educators, and students can take advantage of an on-line tool that provides rapid and accurate on-line retrieval of Earth science data information from around the world. The Global Change Master Data Directory (GCMD) provides descriptions of all types of Earth science data sets (not just those related to global change), including more than 3400 descriptions of data sets held at all Earth Observation System Data Information System's (EOSDIS) Distributed Active Archive Centers, many of the U.S. federal agency environmental and Earth science data centers, research laboratories and universities, and international agencies and programs. The service recently upgraded its locator services, expanding references to Earth science data, updating present entries, and capitalizing on current technology in system development. The popularity of the GCMD skyrocketed after the introduction of the Web interface, with usage increasing threefold when comparing the last 6 months of 1994 with the last 6 months of 1995.
The GCMD offers users a choice of client-server hosts, including access through the WWW. A basic ASCII-based client is maintained for those with only VT100 capabilities. The support for two other dedicated client-server hosts will be discontinued based on a review of statistics used to analyze the trends of usage. These statistics showed a significant decline in the use of dedicated clients and indicated that efforts should be redirected to World Wide Web (WWW) applications. The GCMD hosts a Home Page on the Internet that can be accessed using the uniform resource locator (URL) address, http://gcmd.gsfc.nasa.gov. Graphical Web browsers providing "forms" capability allow users to submit queries directly to the GCMD database from the Home Page. The GCMD Web server also offers imbedded URLs as links to data centers and work stations where data reside. System users will discover up-to-date hyperlinked sensor descriptions, data center information, sample images, etc. Recently the GCMD Internet site was selected by The McKinley Group's professional editorial team as a "3-Star" site. This rating is a special mark of achievement in Magellan, McKinley's comprehensive Internet directory of over 1.5 million sites and 40,000 reviews.
The versatility of GCMD lies in its flexible and multiple routes for searching data set entries. Massive updates can be made in a single transaction, minimizing maintenance. All maintenance activities and updating will continue to be done using the Oracle database. However, seeking to mitigate several limitations of the fundamental relational database, a Z39.50 Information Retrieval Service and Protocol was implemented and now offers users a free-text search interface to directory information. The GCMD Z39.50 gateway allows full text, fielded searches, and temporal/spatial searches on the data set information and serves as a useful alternative for search and retrieval. For this application, text files are created from the database on a regular basis and indexed for the gateway search. Z39.50 is an evolving standard that holds promise for distributed searching and is experiencing growing acceptance around the world. Protocols are being developed for this standard to provide compatibility across platforms and among formats.
The GCMD collects and displays its metadata in the Directory Interchange Format (DIF), although information can be stored in the database for any number of fields and then be output in any specified format. Only those fields necessary for a particular requirement are selected for a specific application from the database, such as the limited number of fields needed for the Government Information Locator Service standard or the larger number of fields required by the Content Standard for Digital Geospatial Metadata that was instituted by an executive order through the Federal Geographic Data Committee, and thus is referred to as the FGDC standard.
The DIF content was recently expanded to include new fields mandated by the FGDC standard. The new fields permit a more complete set of descriptors that allow the researcher to make a more informed choice among data sets. The current fields include Entry_Title and Entry_ID, Data_Set_Citation, Discipline, Topic, Terms, Variables, Keywords, Start_Date, Stop_Date, Coverage, Location, Earth_Data_Resolution, Sensor_Name, Source_Name, Campaign, Constraints, Quality, Summary, Aggregation, Investigator, Technical_Contact, Originating_Center, Data_Center, Distribution, Data_Set_Progress, Browse, Reference, DIF_Author, and Review_Date. Several of these fields hold expanded content, such as the distribution field, which contains cost, media, size and format.
Of the new fields added, one of the most important for scientists and data set producers is the Data_Set_Citation field. This field provides a formal reference similar to that of a bibliographic reference for printed publications, and thus offers a standard way of crediting a data set producer. Two other significant additions to the DIF are the specifications for temporal and spatial resolutions (Earth_Data_Resolution) for the data set, which allows researchers to determine whether they have identified appropriate data sets. An improved set of valid keywords (often used in the search for data sets) is becoming widely accepted throughout the Earth science community, contributing to the growing acceptance of the DIF standard.
Over the past 18 months, improvements have been made in both the quantity and quality of the directory entries, in part due to the new DIF writing tools. Several tools now help data holders provide this directory level information easily and accurately in the Directory Interchange Format (DIF). The most convenient tool is a World Wide Web form, DIFweb, which allows the information provider to choose items from scrollable valid keywords and move information from an external source directly onto the form. This tool was originally targeted for the infrequent contributor who may hold a single data set on his own workstation but has become very popular with more frequent DIF contributors. A second tool, DIFmacs, uses the database for validation during the writing of a DIF. Targeted for the frequent contributor, this software tool virtually guarantees that a data set description will be properly registered into the database. Suggestions from our collaborations with representatives of the United Nations Environmental Program (UNEP) and encouragement from international partners inspired the creation of a third tool that can be used in the field with minimal computer capabilities. This tool, DIFwrite, runs on a PC and does not require database connectivity or other complementary packages. Interest by the European community in increasing their Earth science metadata documentation has inspired a private company in the UK to write another DIF writing software tool, DIFent, (for DIF entry) for the PC with MS-Windows. This tool is intended to encourage greater participation in Europe to document data sets.
Three science coordinators representing oceanography, atmospheric science, and geoscience provide competent in-house experience in gathering information to generate directory entries. Their work will soon be complemented by the addition of a new staff member funded by the National Biological Service (NBS), who will assist in acquiring data set descriptions for the ecological and biological sciences. In addition, NBS and NASA will cooperate in developing new techniques to help users find the existing sources of data and information most applicable to their needs. This partnership means that the scientific community and the public will have more efficient access to a broader range of information about existing sources of biological data. By working together, the agencies are also reducing overlapping costs.
The GCMD project also supports three operations maintenance/system developers, who interact with science coordinators to address development issues and channel user feedback from the research community into upgrades and improvements. Other user feedback is collected through the GCMD system's "comments" area on the WWW, as well as through demonstrations given at scientific conferences and meetings, such as those sponsored by the American Geophysical Union, the American Meteorological Society, and The Oceanographic Society.
The GCMD coordinates activities for the Committee on Earth Observation Satellite's International Directory Network (CEOS IDN), whose goal is to expand the concept of sharing standardized data set descriptions through the addition of new nodes and by increasing contributions by countries worldwide by sharing software and accepting data set descriptions in the database. It consists of three coordinating nodes representing the international science community. These are the American node, the Global Change Master Directory at NASA/Goddard Space Flight Center, in Greenbelt, Md.; the Asian node at the National Space Development Agency of Japan in Saitama, Japan; and the European node at the European Space Agency/European Space Research Institute in Frascati, Italy. These coordinating nodes maintain duplicate copies of the database and the operational software. However, distributed capabilities are planned for the future, with a prototype distributed search capability planned for demonstration in May 1996 at a CEOS meeting in Japan.
Canada, France, Germany, the Netherlands, Italy, Brazil, and Argentina, as well as UNEP/GRID and several agencies in the United States represented by NOAA, USGS, and CIESIN offer "cooperating nodes." These nodes provide a path for researchers within these countries to exchange information with the CEOS IDN. Russia, Australia, New Zealand, and possibly China will join the network in the coming years. Several data set entries have already been received from these countries.
The GCMD also serves as NASA's contribution to an interagency federation of directories, the Global Change Data and Information System (GCDIS). In response to the recommendation from the Interagency Working Group on Data Management for Global Change, the GCDIS was created to coordinate and unite separate federal agency directory efforts. Many of the recent software development activities target needs with broader interagency requirements.
The importance of metadata and demand for them will continue to rise as cross-disciplinary studies increase, long-term global studies are undertaken, more complex research and modeling activities require more variables from more sources, and data sets become more accessible through technological advances. Therefore the need for metadata directory efforts will become increasingly important.
The thrust for future development will be in expanding the use of the GCMD through a "distributed" system. The GCMD is already exploring the nature and feasibility of distributed systems through work with the Distributed Oceanographic Data System, in collaboration with developers at the University of Rhode Island. The opportunity to explore "distributed" technology will help guide the feasibility of such technology for GCDIS and the CEOS IDN. The GCMD's utility will also be extended by concentrating on methods for automatic updating of directory information; exploring avenues for increasing population efforts beyond the public sector; integrating other levels of information services, such as the future EOSDIS advertising service; and dealing with data set descriptions of variable spatial scales. To begin addressing variable spatial scales, the GCMD is focusing attention at the local level through its joint participation in a University of Maryland, USGS-funded project, "Community Resource of Spatial Data" via the National Spatial Data Infrastructure.
An active science User Working Group, chaired by USGS scientist Lou Steyaert, represents the broad range of Earth science disciplines including ecology, oceanography, geophysics, and atmospheric science. Members are chosen for their interest in NASA's directory effort, their knowledge of available data sets, and their appreciation of the importance of high-quality data management and ways to access appropriate data.
Insight for project direction comes from the project manager and staff, the User Working Group, interagency and international communications, recognition of emerging technology, and the assessment of user statistics. Above all, the GCMD staff recognizes the importance of directory "content" and plans to maintain a focus on uncovering and carefully documenting existing Earth science data sets and making them available to interested users through the most effective means. GCMD is funded through NASA's Mission to Planet Earth Program
Lola M. Olsen, NASA/Goddard Space Flight Center, and Gene R. Major of Hughes/STX