You are on page 1of 11

Metadata Interoperability and Harmonization

Mr. Samarendra Dash National Informatics Centre Bhubaneswar- 751001 Email samardash@gmail.com; Dr. Jyotshna Sahoo Lecturer, P.G. Dept. of Lib. & Inf. Sc Sambalpur University, Jyoti Vihar, Burla Email - jyotshna.basu@gmail.com; jyotshna_sahoo@rediffmail.com Phone 06632542154; 9437922397 Abstract: The growth in number of resources in Internet has reached an unmanageable state. Not only the resources described are of a much more diverse nature, but also of different kinds and distributed over many more computer systems. With the rise of the web as a platform, for dissemination of subject specific information, a new usage pattern of metadata has surfaced. This paper discusses different aspects related to Metadata Harvesting techniques and its interoperability across a number of Metadata Standards. It also describes the methods that are used to achieve or improve interoperability among different metadata schemas for the purposes of facilitating conversion and exchange of metadata. Keywords Metadata, Metadata Harvesting, Crosswalk, Metadata Harmonization, Metadata Interoperability. Introduction: The ever increasing INTERNET has changed the way we look at the process of information generation to that of information dissemination. It has also revolutionized the methods and techniques involved in information organisation & its retrieval. The exact number of web pages available on the Web is very difficult to predict, as it is distributed one. But according to an estimate roughly there are 234 million numbers of websites as of December 2009. It is a Herculean task to find the required information from such a huge

pool. This problem is compounded by the divergent format of web pages. Retrieval of information from Digital collection across web is experiencing difficulties.Webpage is a source of information. This source of information can be Information Resource only if its content is properly organized so that it can be retrieved as & when required. The reusability feature of digital collection is attained by Metadata. The present paper is an attempt that centers round the following features of metadata: Metadata-the concept Metadata extraction and harvesting The process of Metadata harvesting Interoperability among Schemas and principles of metadata interoperability and Metadata harmonization

Metadata - The Concept Metadata has become a buzz word in the information society and it is equally important for the writers, digital archivists, database developers and also for the end users of electronic information. Metadata is nothing but information about information. Metadata of digital information can be compared with card catalogue for printed materials available in libraries. Both are retrieval tools & act as surrogate for information resource. Metadata is inevitable for searching and It is a process of matching the query terms with the terms embedded in the source contents. It improves the process of matching by standardizing the structure and content of cataloguing information. Metadata can be defined as a structured description of the essential attributes of an information object. Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource. NISO 2004. 1. Metadata Extraction and Harvesting With emergence of a number of metadata schemas, problems arise for information retrieval from large digital libraries or repositories because each one adopting its suitable metadata schema. This retrieval problem is partially handled by the technique Metadata

Harvesting.Metadata extraction and Metadata harvesting are two different automatic methods.

1.1 Metadata extraction occurs when an algorithm automatically extracts metadata from a resources content displayed via a Web browser. Metadata that are used for object representation are extracted from the resource content to produce structured (labeled) metadata. These structured metadata are specific to Web resources and are extracted from the body section of a HTML or XHTML document. 1.2 Harvesting, the other automatic metadata generation method, occurs when metadata is automatically collected from META tags found in the header source code of an HTML resource or encoding form another resource format (e.g., Microsoft WORD documents). The harvesting process is dependent on the metadata provided by the content developer (humans) or by full or semi-automatic processes supported by web development software. For example, Web editing software(e.g. Dreamweaver and FrontPage) and selected document software (e.g., Microsoft WORD and Adobe) automatically produce metadata at the time of resource creation or updation. Metadata like, format, date of creation, revision date, are automatically taken care by these softwares without human intervention. The software automatically converts the metadata to META tags (or other tag form depending on the document format) and places them in the resource header. These metadata not only aid searching, but can be harvested by a generator to create a structured metadata record. 1.3 Harvesting Process At present different metadata standards are used to support different element sets for describing digital information e.g. DC (Dublin Core), VRA (Visual Resource Association), CDWA (Categories for the Description of Works of Art ), IEEEsLOM (Learning Object Metadata) ONIX(Online Information Exchange), CSDMGM(Content Standard for Digital Geospatial Metadata), IMSLRM(Learning Resource Meta-data), MARC, MPEG-21 DIDL

(Digital Item Declaration Language) etc. These metadata standards were all devised based on the requirements of particular user communities, types of materials, subject domains, project needs, etc and are well accepted in terms of their semantics and content in their respective disciplines. Metadata Harvesting is the process which extracts Metadata across different metadata standards. OAI- PMH( Open Access Initiative-Protocol for Metadata Harvesting) is such a widely used technique. OAI PMH Data model - This model assumes a hierarchy of Resource, Item & Records for any digital object. The metadata used in item may adopt any of the above standards depending on the document type & Creators intention.
Resource

Items

OAI-PMH Identifier
Other metadata standards

Record s

DC

MARC

MPEG-21 DIDL

Increasing Complex Metadata schemas

Figure 1- OAI PMH Data model OAI PMH & other models are used to harvest metadata from information resources using a gamut of Metadata standards. But it is not sufficient to solve problems for large digital libraries or repositories which adopt diverse schemas of metadata records. This is where metadata interoperability comes to picture. The purpose of metadata interoperability & harmonization is to facilitate conversion and exchange of metadata and enabling crossdomain metadata harvesting and federated searches.

Crosswalk, Interoperability & Harmonization

Metadata
Harvesting Web Page collection Database Load

Metadata

Repository

Services
Queries
OAI Server

Create OAI server tables

Figure 2- Schematic view of MetaData Harvesting using OAI-PMH The following three methods that have been presently used to improve or completely achieve interoperability among metadata schemas. Crosswalk Interoperability Harmonization

2. Crosswalk A crosswalk provides a specification for mapping of metadata elements from one metadata standard to another standard. Crosswalks known as mappings are used to translate between different element sets. The elements of one metadata set are correlated with the elements of another metadata set that have the similar meanings. Prerequisite for one to one mapping of elements from two sets of metadata standards requires a clear and precise definition and relationship among the elements in each standard. The most common method used in crosswalk is direct mapping or establishing equivalency between and among elements in

different schemas. Equivalent fields or elements are mapped in order to allow conversion from one to the other. There have been a number of crosswalks available. Few of them are: Dublin Core to EAD: Dublin Core to IEEE LOM: Dublin Core/MARC/GILS: EAD to ISAD(G): FGDC to MARC: ISAD(G) to EAD: MARC to Dublin Core: ONIX to MARC 21: USMARC to EAD: USMARC to FGDC: VRA 3.0 to MARC VRA 2.0/VRA 3.0/Dublin Core/EAD and others Dublin core being the simplest of all metadata standards, Crosswalk works well for mapping MARC fields to Dublin Core elements. But it is also possible to do the other way round by using another Crosswalk. For conversion of MARC into Dublin Core, the MARC to Dublin Core Crosswalk and for mapping of Dublin Core to Marc, Dublin Core to Marc Crosswalk should be used. Both the Crosswalk use two sets of mappings, one for DC Unqualified core elements & other for qualified elements. The mapping for qualified mapping includes both refinements of the original fifteen as well as syntax and vocabulary encoding schemes. In the MARC to Dublin Core Crosswalk many fields of MARC are mapped into a single Dublin Core element.

MARC input Convert to input structure Translate to DC

Convert to output structure


DC METADATA Output

Figure 3- Dataflow of a Crosswalk (MARC to DC)

3. Metadata Interoperability Interoperability in its generic sense as defined by NISO - "Interoperability is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality". In the context of Metadata interoperability, it can be defined as the ability of two or more systems or components to exchange descriptive data about digital resource, and to interpret the descriptive data that has been exchanged in a way that is consistent with the interpretation of the creator of the data. Metadata interoperability can be attained if the participating systems are capable of exchanging data consistently. Duval, Hodgins, Sutton, and Weibel (2002) suggested four fundamental principles for metadata interoperability. These are: 3.1.1 Extensibility or the ability to create structural additions to a metadata standard for application-specific or community-specific needs. Given the diversity of resources and information, extensibility is a critical feature of metadata standards and formats.

3.1.2 Modularity or the ability to combine metadata fragments adhering to different standards. Modularity is stronger than simple extensibility in that it requires that metadata from different standards, including metadata extensions from different sources, should be usable in combination without causing ambiguities or incompatibilities. 3.1.3 Refinements or the ability to create semantic extensions, i.e., more fine-grained descriptions that are compatible with more coarse-grained metadata, and to translate a fine-grained description into a more coarse-grained description. 3.1.4 Multilingualism or the ability to express, process and display metadata in a number of different linguistic and cultural circumstances. One important aspect of this is the ability to distinguish between what needs to be human-readable and what needs to be machineprocessable. Nilsson et al (2006), suggested a fifth principle 3.1.5 Machine-processability or the ability to automate processing of different aspects of the metadata specifications, so that machines can handle extensions, manage modules, understand refinements and provide support for multilingualism. 3.2. Metadata interoperability can be attained at different levels: 1. Schema level In this level the elements of the schemas are considered. The interoperability at this level results in derived element sets or encoded schemas, crosswalks, application profiles, and element registries. 2. Record level The metadata records are integrated through the mapping of the elements according to their semantic meanings. This results in, converted records and new records resulting from combining values of existing records. 3. Repository level In this level focus is on mapping value strings associated with particular elements (e.g., terms associated with subject or format elements) with harvested or integrated records from varying sources. 4. Metadata Harmonization

Harmonization is the process of enabling consistency across several standards. Metadata harmonization is the ability of two or more systems to exchange combined metadata consistently across many standards. Harmonization results in ability to create and maintain only one set of metadata, which is consistent with the intentions of the creators of the metadata and it can also be mapped to any number of related metadata standards. Metadata harmonization refers to the ability to correctly process several different metadata standards in combination within a single software system. Two different approaches can be adopted to improve metadata harmonization i.e. Vertical and Horizontal Harmonization Vertical harmonization - Pre-coordinated harmonization on different levels within a given set of standard, based on pre-coordination of a base standard. Horizontal harmonization- It is a Post-coordinated harmonization based on interoperability between independent standards. 4.1 Metadata harmonization can be considered on sources encoded with different metadata standards with respect to following three specific functions: 4.1.1 Data transformation - Data transformation may take place from one environment into another or for migration of an information source to another standard. Transformation would typically attempt an optimal interpretation of the source in the terms of the target. 4.1.2 Data merging - Information elements from different sources may be combined into new information units allowing for more reasoning. Before doing so, relevant parts of the source data structures need to be merged. 4.1.3 Query mediation - Instead of the data, the outgoing requests are transformed. Typically the answer sets require nevertheless a transformation for post-processing. The access engine contains a global schema, similar to point 2., not necessarily used to actually store data.

CONCLUSION: The adoption of multitude Metadata standards in Digital Libraries, Subject specific Depositories and other digital repositories for description of digital resources caused a serious problem for retrieval of digital information. A number of process like Crosswalk, Interoperability etc have been formulated yet Metadata Harmonization is still, to a large extent, only a vision, and metadata standards still live in relative isolation from each other. To achieve complete metadata harmonization among metadata schemas , there is a need for a radical restructuring of metadata standards, modularization of metadata vocabularies, and formalization of abstract frameworks. RDF(Resource Description Framework) schema, SKOS (Simple Knowledge Organization Systems) and the Semantic Web provide an inspiring approach to metadata modeling. Moreover Projects like - Joint DCMI / IEEE LTSC Taskforce1 & RDA (Resource Discovery and Access) demonstrate important progress towards harmonization of several important metadata but it remains to be seen whether standards. Reference: http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/ Accessed on 30/11/2010 Duval, E., Hodgins, W., Sutton, S. & Weibel, S. L. (2002), Metadata Principles and Practicalities, D-Lib Magazine, April 2002. http://www.dlib.org/dlib/april02/weibel/04weibel.html Accessed on 6/12/2010 Nelson, Michael L et. al. Efficient, Automatic Web Resource Harvesting http://Citeseerx.ist.psu.edu Accessed on 03/12/2010 Nilsson, M., Johnston, P., Naeve, A., Powell, A. (2006), The Future of Learning Object Metadata Interoperability, in Koohang A. (ed.) Learning Objects: Standards, Metadata,Repositories, and LCMS. http://kmr.nada.kth.se/papers/SemanticWeb/FutureOfLOMI.pdf Accessed on 6/12/2010 Niso. Understanding Metadata. Available at http://www.niso.org/standards/resources/UnderstandingMetadata.pdf Accessed on 01.12.2010 that will be adopted as a basis for a wide variety of web-oriented metadata

G.Sivaraman et al. METADATA STANDARD HARVESTING / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1 (3), 2010, P158-162 Accessed on 05.12.2010 http://www.wikipedia.org http://www.dublincore.org/specifications/

You might also like