GBIF Metadata Strategy v.06

Metadata Requirements for Datasets delivered via the Global Biodiversity Information Facility (GBIF) Network
GBIF (2008) Compiled by amonn Tuama, GBIF Secretariat
Acknowledgements In addition to those who contributed to the metadata scoping exercise via the wiki1, the GBIF Secretariat wishes to acknowledge the valuable contributions to this document of the following individuals: Eliot Christian, U.S. Geological Survey; Lynn Kutner, NatureServe; Hannu Saarenmaa, GBIF Finnish National Node; ngela M. Surez-Mayorga, Instituto de Investigacin de Recursos Biolgicos, Alexander von Humboldt.
1(http://wiki.gbif.org/dadiwiki/wikka.php?wakka=CategoryMetadata) 1
Version 0.6; 03 June 08
Table of Contents
1Introduction .......................................................................................................................................2 2The Present GBIF Metadata System..................................................................................................4 3Progressive Enhancement of Metadata in GBIF Network.................................................................5 3.1New metadata system for existing GBIF mediated datasets ......................................................6 3.1.1Metadata Catalogue Interface ............................................................................................6 3.2GBIF's Primary Dataset Types ...................................................................................................6 3.2.1Specimens...........................................................................................................................6 3.2.2Observations........................................................................................................................7 3.2.3Names..................................................................................................................................8 3.3Developing a Minimum Conceptual Profile...............................................................................8 3.4Infrastructure...............................................................................................................................9 3.4.1Metadata Management........................................................................................................9 3.4.2Interoperability across Metadata Networks......................................................................10 4Recommendations.............................................................................................................................11 5Appendix A References & Acronyms Expansion..........................................................................12
1 Introduction
The mission of the Global Biodiversity Information Facility (GBIF) (www.gbif.org) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development. Priorities, with an emphasis on promoting participation and working through partners, include mobilising biodiversity data, developing protocols and standards to ensure scientific integrity and interoperability, building an informatics architecture to allow the interlinking of diverse data types from disparate sources, promoting capacity building and catalysing development of analytical tools for improved decision-making. The current GBIF informatics architecture features a distributed network of data providing nodes linked to a central data caching system (Fig. 1). Nodes make their data available by installing provider software which acts as an interface to their databases. The provider software enables a mapping from the local database schema to a federation or interchange schema such as Darwin Core, and communication between distributed databases using the DiGIR, BioCASe or TAPIR access protocols. The nodes first advertise their presence by registering an access URI in a central UDDI registry. GBIF uses the registry to look up the access point for a node and then communicates with it using the access protocols to retrieve data records encoded according to various versions of the Darwin Core and ABCD schemas. These data are stored in a central cache at GBIF where they undergo indexing. Databases use indices to allow fast and efficient searches of their tables. By maintaining an index of a subset of the most commonly used elements selected from the superset of all elements provided by the data providers, GBIF helps users to search efficiently for biodiversity data. The mandatory record fields required for indexing are: 1. Scientific Name 2. Institution Code 3. Collection Code 4. Catalogue Number But certain other fields are also highly desirable: 1. Geospatial Location - where specimen was collected or observation made 2
2. 3. 4. 5.
Collection Date - when specimen was collected or observation made Higher Taxon Information (to avoid ambiguities with homonyms, etc., during indexing) Basis of Record (whether specimen or observation) DateLastModified (to reduce load during the re-indexing process)
It is inherently difficult to maintain full dynamic access to a network comprising hundreds of data serving nodes, any of which may become unavailable temporarily. GBIF has succeeded in developing a centralised data cache for its network with a strategy for its periodic updating. A centralised cache also ensures the fastest possible response time for services built on the network. The design of the GBIF data portal is thus strongly tied to mobilising and making directly accessible primary species occurrence records delivered as Darwin Core records (or ABCD equivalent). The emphasis is on providing access to individual data records rather than describing in detail the datasets of which the are constituents. Under consideration in this document is whether GBIF should adopt more fully the well known model of a Service Oriented Architecture (SOA) [1], some elements of which are already present in the existing network. The essence of the SOA is that consumers discover providers. The key term here is discover and the discovery mechanism is usually implemented as a central registry, or system of communicating registries, for catalogues of metadata which describe providers and the resources and services they make available. To share data effectively and facilitate re-use in ways not envisaged at the time of collection requires that the data are well documented, especially in regard to fitness for various uses. Moreover, for all types of observation data, there is a need for additional documentation on, e.g., sampling techniques and protocols. These are key requirements for the data made available on the GBIF network, and thus it is critical that we devise an adequate metadata profile to serve the diverse needs of our users. The objective of this document is to outline a strategy that will allow the GBIF network to develop the infrastructure to support the management and delivery of the highest quality metadata that will enable potential end users to easily discover which datasets are available, and, critically, evaluate the appropriateness of such datasets for particular purposes. The original scoping exercise has been made available as a wiki (http://wiki.gbif.org/dadiwiki/wikka.php?wakka=CategoryMetadata) that includes contributions from the wider GBIF community. The development of the metadata strategy must be aligned with the planned, more distributed architecture of a greatly expanded GBIF network where the nodes2 play the crucial role of data publishing centres, with metadata as an essential requirement, and develop customised portals and specialised web services to best serve local needs. The strategy must thus be executed in close consultation with the nodes, recognising their existing arrangements and future needs. The metadata under consideration are for datasets rather than the constituent individual records that make up a dataset. The GBIF network currently handles primary species occurrence data that can be expressed through Darwin Core which is a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections3. The Darwin Core provides a basis of record field to allow distinction (amongst others) between a specimen and an observation. The two categories are otherwise indistinguishable. The dichotomy, in any case, is a rather artificial one, as a specimen can be considered as a type of observation for which a voucher (physical evidence in a museum or herbarium) of some kind exists.
2 Node: A data provider designated by a GBIF Participant that maintains a stable computer gateway that makes data available through the GBIF network. A GBIF Participant is a Signatory of the GBIF-establishing Memorandum of Understanding (MoU). 3 http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome
The other data type handled by GBIF is names data also delivered as part of the Darwin Core record. These are cross referenced to the ITIS/Catalogue of Life to reconcile synonyms and act as the taxonomic route into the data cache.
Specimens, observations and names thus constitute the three broad classes of data that are currently being handled by GBIF and that need to be addressed. The Metadata Work Group will consider whether a single profile can capture the description of these different resources, and recommend the minimum conceptual profile that will be required to support data discovery and exchange through the GBIF network. In addition, following acceptance of the general approach outlined in this document, a timeline of what can be achieved in one year (2008) covering schema development, management tools/applications and network infrastructure will need to be developed following acceptance of the general approach outlined here. This document proceeds by describing the present GBIF metadata system, and how it is being enhanced, then describes the three primary dataset types currently being handled by GBIF, suggests options for the development of a minimum conceptual profile for these datasets and concludes with some recommendations for a metadata network infrastructure and component tools and applications.
2 The Present GBIF Metadata System

GBIF provides an UDDI (Universal Description, Discovery and Integration) [2] registry (http://registry.gbif.net/uddi/web) where data providers can list their data and services. UDDI lists business information (i.e., contact details, etc.) and the binding template, i.e., the URL by which the provider installation can be accessed. In the first instance, the access point URL of the provider installation (e.g., DiGIR [3] or BioCASe [4]) is submitted manually through an online form (http://www.gbif.org/DataProviders/registerme), or, for those using the GBIF DiGIR installation package, automatically. GBIF then uses the access point to communicate with the provider to gather metadata about the data being served and populate the UDDI registry. Apart from the UDDI registry, GBIF does not offer a separate metadata catalogue or dedicated client for browsing/ searching metadata. Instead, metadata are deeply integrated in the data portal and 4
presented as part of the four general search routes into data (species, countries, datasets, occurrences). The following dataset-level metadata elements are currently processed.
Data provider details (Name; Website; GBIF participant; Description; Country; Added to portal; Information updated) Provider (DiGIR, BioCASe, TAPIR) binding Name (of dataset) Website Description Citation How to cite this dataset Basis of record Access point URL Added to portal Information updated Contacts (Name, Role, Address, Email, Telephone) Data networks Occurrences records indexed Number of records shared by provider Occurrences with coordinates Occurrences with no geospatial issues Number of species Number of taxa
Through the GBIF indexing process, the following additional information on a dataset is derived
3 Progressive Enhancement of Metadata in GBIF Network

GBIF is proposing to progressively enhance its metadata system. The goals of such a system include the advertising and discovery of data providers and datasets, enabling dataset owners to document their datasets, assisting users to decide on fitness of use (i.e., document so called quality), and enabling Participant Nodes to document and inventory collections and their related datasets that have not yet been fully shared. Initially, for the occurrence datasets already available on the network, it is proposed to develop a part-automated metadata entry form where data custodians can contribute additional information. The metadata thus captured would be made available through a catalogue offering browse/search functionality and would provide a significant enhancement to the data portal. Thereafter, it is proposed to develop a minimum conceptual metadata profile that can be used to describe the distinct varieties of datasets that GBIF currently mediates (specimens, observations, names) and to recommend software tools and applications that can be used to manage, publish and harvest metadata over a network. 5
3.1 New metadata system for existing GBIF mediated datasets

Work has already commenced to enhance the metadata profile for occurrence datasets (specimens and observations) currently mediated by GBIF and to develop an online registration form that can be part-filled automatically from the metadata information that is already harvested or derived through the GBIF indexing process. The dataset custodian can then be contacted and requested to complete any remaining fields. In addition to metadata derived from DiGIR and TAPIR metadata requests to providers of datasets, through the portal indexing process, the following additional elements will be added taxonomic coverage - kingdom, common names, family/genus/species geospatial coverage - countries, and bounding box (represented using the GeoRSS model (8) temporal coverage - a date range for the dataset, and months of the year for which it has data The metadata database is designed in such a way that metadata content derived through the indexing process can later be overwritten by authoritative statements submitted by the dataset custodian. Metadata will be made available in Ecological Metadata Language (EML) [9] format.
3.1.1 Metadata Catalogue Interface

The client interface to the metadata catalogue will provide another route into the data mediated by GBIF by allowing a user to browse and search through the metadata catalogue, and, ultimately, link back to the portal taxonomy and overview pages for the datasets returned by the query. Common Query Language (CQL) [11] is being used for the query syntax and Search/Retrieval via URL (SRU) [12] as the internet protocol. The ZeeRex specification [34] will be used for the service description of the metadata catalogue. ZeeRex records provide descriptions of servers, the databases they contain, and the capabilities of those databases, thereby allowing discovery and access to metadata catalogues. The GBIF metadata catalogue will thus be accessible to other metadata clearing-houses. Recommendation 1: GBIF will provide on the data portal a search and browse interface to a metadata catalogue populated with enhanced dataset metadata derived through DiGIR/TAPIR provider requests, the data indexing process, and input from dataset custodians. These data will be presented in a fashion which allows benchmarking the nodes, like the current statistics in the www.gbif.org/DataProviders pages do, possibly providing information on some aspects of the quality (e.g., spatial accuracy) in datasets.
3.2 GBIF's Primary Dataset Types

To date, GBIF has worked with three broad categories of data specimens, observations and names. Within each category, there are further sub-categories, and the challenge for GBIF is to develop a minimum conceptual metadata profile that adequately describes such datasets.
3.2.1 Specimens
Some 40% of the records currently mediated by GBIF are specimens (vouchers, type specimens, etc.) housed in collections that are typically under the custodianship of natural history museums, herbaria, etc. The European BioCASE project [13] developed a metadata profile and schema for biological collections based on Dublin Core and other standards but continuing development has now been transferred to the TDWG Natural Collections Description (NCD) [14] group who aim to generalise and extend the schema. The schema is also being used as the basis for the TDWG NCD 6
LSID Ontology [10]. The relationship of the term collection to specimen datasets is somewhat ambiguous. It may be used to describe a physical collection of specimens and so is the equivalent of a dataset, or it may be used in the sense defined by NCD as any association of data given any grouping criteria where the mapping of constituent datasets to collections is more complex. The metadata for specimen datasets should thus, where appropriate, reference their parent collections. The recently commenced Biodiversity Collections Index (BCI) project [15], created by a consortium of major institutions with joint funding by GBIF and Biodiversity Information Standards (BIS) TDWG, and open to all institutions globally, aims to be a single, internet-based shared resource providing information about collections. It will deliver metadata and a Universally Unique Identifier (UUID) for each collection. The metadata will conform to the NCD schema. Another project, nearing completion, will deliver a toolkit (NCD Toolkit) that allows data custodians to manage and publish NCD conforming metadata for their collections. It is envisaged that such a toolkit would be installed on GBIF nodes to assist in data management and to allow the metadata to be harvested into the central BCI repository. The central repository would deal with cases of multiple (metadata) records per collection: it must be understood that data managers in many cases choose to register/publish their collections in more than one place, e.g., in a national repository and in a regional one. BCI would disambiguate the metadata records, issuing UUIDs for each collection and deliver them to GBIF (and other users). Recommendation 2: GBIF will develop a metadata profile that adequately describes specimen datasets and, where appropriate, documents links to parent collections via BCI issued UUIDs.
3.2.2 Observations
Many of the records made available through the GBIF network are not georeferenced. Although these may provide a location in textual form (e.g., a county name) this information is not processed by the GBIF data portal in a geospatial way, e.g., by mapping species counts within county boundaries. In dealing with georeferenced data, the portal is limited to working with point occurrence data, i.e., point features that indicate where a particular taxon was observed. Other geospatial features such as lines and polygons that might delineate the extent of the occurrence or represent transects and sampling areas in more complex ecological datasets are not represented. Such features are less easy to process, ideally requiring full geospatial databases to store the data, and sophisticated clients that can issue complex geospatial queries to the databases. Furthermore, Darwin Core, the main federation schema used by the GBIF network, can only represent single occurrences: it provides no mechanism for describing associated points (occurrences) along a transect or within a sampling area. It remains to be seen whether the TDWG Interest Group on Observations and Specimen Records [16] addresses this issue. It is clear that GBIF needs to move beyond the relatively simple point occurrence data that it currently mediates to include polygon or line representations of species occurrences typical of ecological datasets that our users will increasingly demand in order to perform the modelling and analyses required to monitor and halt the loss of biodiversity, and that will underpin sustainable development. It must also be noted that no distinction is currently made in the GBIF system between the metadata for specimen and observational records other than reporting a basis of record as either specimen or observation. The observational datasets served via the GBIF network thus require more comprehensive metadata that adequately document them. Recommendation 3: GBIF will develop a metadata profile that adequately describes observations datasets.
3.2.3 Names
Names data consist of all the variations in, and variety of, names (taxonomic, common, multilingual) that are retrieved through the GBIF central indexing process. As such, they represent a corpus of tagged entities that require reconciling with authorative nomenclators and checklists. Through its ECAT work programme, GBIF proposes to develop a Global Names Architecture (GNA) to address the provision of names-oriented taxonomic tools, services, and content that GBIF and its partners can use to develop better methods for organising, accessing, and serving the primary species occurrence data mediated by the GBIF network. One key component of the GNA will be a system of data repositories and registries that allow existing checklists to be catalogued, indexed and accessible via standard protocols. The data providers, in turn, will require tools to help them manage and publish their checklists. There is thus a requirement to define a metadata profile in order to catalogue checklists and other name resources. Such a profile should, wherever possible, reuse components from the collections and observations metadata profiles and will make use of the TDWG Taxonomic Concept Transfer Schema (TCS) (http://www.tdwg.org/standards/117/) which has been designed to support the exchange of taxonomic information. Recommendation 4: GBIF will ensure that the metadata developments for the GNA are in alignment with, and build upon, the general metadata profile and architecture of the GBIF network.
3.3 Developing a Minimum Conceptual Profile

Because of the wide range of dataset types that the GBIF data portal is required to support, and noting, in particular, the exponential growth in the number of datasets of the ecological type as a result of research activities and surveys and monitoring programmes, it is advisable to use an extensible, modular, metadata standard that can be adapted to provide the minimum recommended profile for data discovery and exchange on the GBIF network. Of the main standards that we have identified ISO19115 /19139 [20, 21], Content Standard for Digital Geospatial Metadata (CSDGM) [22] , EML the latter appears to match our requirements most closely. The features that make EML attractive as our chosen standard include the following (see the EML FAQ http://knb.ecoinformatics.org/software/eml/eml-2.0.1/eml-faq.html) 1. EML is an Open Source, community oriented project developed within the ecological community for the ecological community. 2. EML is modular and extensible 1. By contrast, CSDGM is monolithic and not easily split into independent units for integration with other standards. 2. EML is designed as a set of modules: these can be linked together and to other metadata standards. 3. Unlike in EML, the level of detail permitted by CSDGM is not adequate to describe the wide variety of methods and non-standard data formats encountered in ecological research. 3. EML has strived for compatibility with other metadata standards, building on the expertise that went into shaping them and sharing much of its syntax with them. These standards are: Dublin Core [23], CSDGM and the Biological Profile of the CSDGM [24], ISO 19115, ISO 8601 Date and Time Standard [25], OGC Geography Markup Language (GML), the Scientific, Technical and Medical Markup Language (STMML) [26] and the Extensible Scientific Interchange Language (XSIL) [27]. 8
It is important for GBIF to use the most expressive standard available for its purposes. Conversion to other less expressive metadata languages is always an option and the compatibility of EML with other standards facilitates such conversions. It is recognised that many organisations are required to deliver metadata that complies with a particular standard (e.g. FGDC in the USA). Stylesheets and conversion tools will thus be critical to minimize the metadata burden on such providers who will be much more likely to create metadata if they can create one record that can be converted rather than needing to create different versions of metadata for different needs. In developing the conceptual metadata profile, GBIF will draw on the experience of others, e.g., that of the Instituto de Investigacin de Recursos Biolgicos Alexander Von Humboldt which developed the SiB (Sistema de Informacin sobre Biodiversidad de Colombia) metadata profile. Recommendation 5: GBIF will develop a minimum recommended metadata profile in EML that can be used with specimens, observations and names datasets. Stylesheets to transform EML to other metadata formats will be developed as required.
3.4 Infrastructure
Ultimately, the responsibility for providing metadata lies with dataset custodians working through the GBIF nodes network. They will require tools for managing and publishing metadata and making it searchable online. Thus, there is a need to work closely with nodes to obtain their support and engagement. It is envisaged that nodes will provide a search and browse interface to both locally held (centralised at the Participant node) and distributed (at other Participant nodes) metadata catalogues. Likewise, the GBIF data portal will allow searching of metadata across the whole GBIF network, and also, ideally, offer search interoperability across other metadata networks, e.g., those that may be available from such organisations as ICES, FAO, ILTER, EEA, etc. Recommendation 6: GBIF will consult nodes about currently used metadata systems and implications of GBIF developing/recommending a new system.
3.4.1 Metadata Management

Nodes will require tools for editing, managing, publishing and accessing metadata (both locally held and distributed). Ideally, these tools should be able to manage different metadata profiles. GBIF will investigate and make recommendations on suitable tools. For example, the EML community have developed Morpho [28], a multi-platform tool for managing metadata, and Metacat [29], an XML database that allows storage, query and retrieval of metadata documents. Morpho acts as a client talking to the Metacat server. The Oak Ridge National Laboratory in the USA has developed the Mercury system[37] for distributed metadata management, data discovery and access. It supports several metadata standards. The National Biological Information Infrastructure (NBII) [40] is one of its primary users and it has also been used in a wide variety of projects including the Inter American Biodiversity Information Network (IABIN) [41]. It is recognised that many nodes may already have metadata management systems in place. There will thus be a need to work with these on an individual basis to explore how their catalogues can be integrated with the GBIF metadata network. Many options are available, including, but not limited to the following 1. transformation from the local metadata profile to the GBIF EML metadata profile with storage of the transformed metadata in a local Metacat server (or equivalent) so that it can be accessed online; 2. use of TAPIR to harvest metadata and cache it centrally on the GBIF data portal; this also 9
would require mapping from local metadata profile to the GBIF EML profile; it may also be possible to use TAPIR in a federated search rather than caching centrally; 3. use of OAI-PMH for harvesting metadata; OAI-PMH provides a simplified method for populating a central source that then provides a consolidated search service; it is an appropriate technique for consolidating many small collections, provided that each has records that can be losslessly cast into a common format. Its appeal seems greatest where smaller sites have data to share but lack IT expertise and/or network computer resources. 4. implementation of an OGC Catalogue Service for Web (CSW) [30] system; it should be noted that, from an interoperability perspective, OGC CSW is not really a single standard. It encompasses multiple, non-interoperable search technologies. One of the search technologies is an ISO 23950 [36] search service, which can be implemented with SRU and CQL (see section 3.1.1). Another of the search technologies uses the OGC Filter Expression - a complex search expression language which has some similarity to SQL, makes use of GML, and uses complex features of XML. (Note that the OGC CQL described in the CSW document is NOT the same as the CQL used with SRU.) The implementation of the GBIF distributed metadata system should take into consideration the advantages/disadvantages of offering federated searching through distributed metadata catalogues as opposed to harvesting catalogues into a central cache. The latter may be required for performance reasons. Many questions remain to be decided and these will influence which options should be adopted. Recommendation 7: GBIF will investigate, and make recommendations on, tools for metadata management at the Nodes level. Recommendation 8: GBIF will work with at least two nodes to establish a prototype of the metadata network. Recommendation 9: GBIF will assist Nodes in finding solutions to integrate their existing metadata systems into the GBIF metadata network.
3.4.2 Interoperability across Metadata Networks

It is envisaged that GBIF will ultimately encourage development of a search interoperability mechanism for metadata by cooperating with other organisations that maintain appropriate metadata catalogues. Potential partners include (but are not limited to) ILTER, ICES, OBIS, FAO, EEA. Such a system can contribute to the broader Clearing-House Mechanism of the Convention on Biological Diversity [38] whose mission is to contribute significantly to the implementation of the Convention through the promotion and facilitation of technical and scientific cooperation, among Parties, other Governments and stakeholders. The kind of architecture that will be adopted to support the biodiversity datasets clearing-house remains to be determined, as do implementation choices. The OGC CSW specification and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [35] are two prime candidates. We can look to the experiences of the Architecture Implementation Pilot (AIP) [32] of GEOSS [33] for examples and best practices in implementing a clearing-house mechanism for a very diverse range of geospatial data. GBIF, in turn, can play an essential role in GEOSS by fostering a coordinated mechanism for integrating biodiversity data in support of several Societal Benefit Areas identified for GEOSS, e.g., Biodiversity, Agriculture, Ecosystems [39]. Recommendation 10: In designing its new distributed architecture, GBIF will ensure that it is aligned with the wider GEOSS initiative, particularly with regard to participating in the GEOSS clearing-house mechanism. 10
4 Recommendations
Recommendation 1: GBIF will provide on the data portal a search and browse interface to a metadata catalogue populated with enhanced dataset metadata derived through DiGIR/TAPIR provider requests, the data indexing process, and input from dataset custodians. These data will be presented in a fashion which allows benchmarking the nodes, like the current statistics in the www.gbif.org/DataProviders pages do, possibly providing information on some aspects of the quality (e.g., spatial accuracy) in datasets. Recommendation 2: GBIF will develop a metadata profile that adequately describes specimen datasets and, where appropriate, documents links to parent collections via BCI issued UUIDs. Recommendation 3: GBIF will develop a metadata profile that adequately describes observations datasets. Recommendation 4: GBIF will ensure that the metadata developments for the GNA are in alignment with, and build upon, the general metadata profile and architecture of the GBIF network. Recommendation 5: GBIF will develop a minimum recommended metadata profile in EML that can be used with specimens, observations and names datasets. Stylesheets to transform EML to other metadata formats will be developed as required. Recommendation 6: GBIF will consult nodes about currently used metadata systems and implications of GBIF developing/recommending a new system. Recommendation 7: GBIF will investigate, and make recommendations on, tools for metadata management at the Nodes level. Recommendation 8: GBIF will work with at least two nodes to establish a prototype of the metadata network. Recommendation 9: GBIF will assist Nodes in finding solutions to integrate their existing metadata systems into the GBIF metadata network. Recommendation 10: In designing its new distributed architecture, GBIF will ensure that it is aligned with the wider GEOSS initiative, particularly with regard to participating in the GEOSS clearing-house mechanism.
11
5 Appendix A References & Acronyms Expansion

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SOA; Service Oriented Architecture; http://en.wikipedia.org/wiki/Service-oriented_architecture UDDI; Universal Description Discovery and Integration; http://uddi.xml.org/ DiGIR; Distributed Generic Information Retrieval; http://digir.sourceforge.net/ BioCASE (Biological Collection Access Service for Europe) provider software; http://www.biocase.org/products/provider_software/index.shtml TAPIR; TDWG Access Protocol for Information Retrieval; http://www.tdwg.org/activities/tapir/ Darwin Core; http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome ABCD; Access to Biological Collection Data; http://wiki.tdwg.org/twiki/bin/view/ABCD/WebHome GeoRSS model; http://www.georss.org/model EML; Ecological Metadata Language; http://knb.ecoinformatics.org/software/eml/ TDWG NCD (Natural Collections Description) LSID Ontology; http://rs.tdwg.org/ontology/voc/Collection.rdf CQL; Common Query Language; http://en.wikipedia.org/wiki/Common_Query_Language SRU; Search/Retrieval via URL; http://www.loc.gov/standards/sru/ BioCASE project; www.biocase.org NCD; Natural Collections Description TDWG Interest Group; http://www.tdwg.org/activities/ncd/ BCI; Biological Collections Index; http://www.biodiversitycollectionsindex.org/ TDWG Interest Group on Observations and Specimen Records; (http://www.tdwg.org/activities/osr/ OGC; Open Geospatial Consortium; http://www.opengeospatial.org/) TDWG Geospatial Interest Group; http://www.tdwg.org/activities/geospatial/ GML; Geography Markup Language; http://www.opengeospatial.org/standards/gml ISO 19115: the International Standards Organisation Geographic Information Metadata XML schema definition; http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm? csnumber=26020 ISO 19139: the International Standards Organisation Geographic Information Metadata XML schema implementation http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm? csnumber=32557 CSDGM; Content Standard for Digital Geospatial Metadata; http://www.fgdc.gov/metadata/csdgm/ Dublin Core Metadata Initiative; http://dublincore.org/ Biological Profile of the CSDGM; http://www.nbii.gov/portal/server.pt? open=512&objID=255&&PageID=337&mode=2&in_hi_userid=2&cached=true ISO 8601 Date and Time Standard; http://en.wikipedia.org/wiki/ISO_8601 STMML; The Scientific, Technical and Medical Markup Language; http://www.ch.ic.ac.uk/rzepa/codata2/ XSIL; The Extensible Scientific Interchange Language (XSIL) ;
21
22 23 24 25 26 27
12
http://www.cacr.caltech.edu/SDA/xsil/ 28 29 30 31 32 33 34 35 36 Morpho; http://knb.ecoinformatics.org/software/morpho/ Metacat; http://knb.ecoinformatics.org/software/metacat/) CSW; OGC Catalogue Service for Web; http://www.opengeospatial.org/standards/cat INSPIRE; http://inspire.jrc.it/ GEOSS Architecture Implementation Pilot; http://www.ogcnetwork.net/AIpilot GEOSS; Global Earth Observation System of Systems; http://www.earthobservations.org/geoss.shtml) ZeeRex; http://explain.z3950.org/overview/index.html OAI-PMH; Open Archives Initiative Protocol for Metadata Harvesting; http://www.openarchives.org/ pmh/ ISO 23950; ISO ISO 23950:1998 Information and documentation -- Information retrieval (Z39.50) -Application service definition and protocol specification; http://www.iso.org/iso/catalogue_detail? csnumber=27446 Mercury Metadata System; http://mercury.ornl.gov/ Convention on Biological Diversity Clearing-House Mechanism; http://www.cbd.int/chm/ GEO, Group on Earth Observations; http://www.earthobservations.org/ NBII, National Biological Information Infrastructure; http://www.nbii.gov/ IABIN, Inter American Biodiversity Information Network; http://www.iabin.net
37 38 39 40 41
13

GBIF Metadata Strategy v.06

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GBIF Metadata Strategy v.06

Uploaded by

Copyright:

Available Formats

Metadata Requirements for Datasets delivered via the Global Biodiversity Information Facility (GBIF) Network

GBIF (2008) Compiled by amonn Tuama, GBIF Secretariat

Version 0.6; 03 June 08

2 The Present GBIF Metadata System

3 Progressive Enhancement of Metadata in GBIF Network

3.1 New metadata system for existing GBIF mediated datasets

3.1.1 Metadata Catalogue Interface

3.2 GBIF's Primary Dataset Types

Version 0.6; 03 June 08

3.3 Developing a Minimum Conceptual Profile

3.4.1 Metadata Management

3.4.2 Interoperability across Metadata Networks

Version 0.6; 03 June 08

5 Appendix A References & Acronyms Expansion

Version 0.6; 03 June 08

Version 0.6; 03 June 08

You might also like