You are on page 1of 9

Pgina 1 de 9

Metadata for your Digital Resource


Metadata for your Digital Resource ......................................................................................1
Introduction ............................................................................................................................................ 1
What do we mean by Documentation? .......................................................................................... 2
Why bother with Documentation?..................................................................................................... 2
Documenting your project .................................................................................................................. 2
What do we mean by contextual documentation?...................................................................... 2
What exactly is Metadata?................................................................................................................. 2
What does metadata do?................................................................................................................... 2
How is metadata different from other kinds of documentation? ................................................ 3
Why is metadata important? .............................................................................................................. 3
What different kinds of metadata are there?.................................................................................. 3
1. Resource discovery metadata.............................................................................................3
2. Preservation metadata .........................................................................................................3
3. Metadata at different levels .................................................................................................3
What are Metadata Standards? ........................................................................................................ 3
Can't I just make it up myself?............................................................................................................. 4
How will I know which standard to choose? .................................................................................... 4
What kinds of standards are out there?............................................................................................ 4
Don't all these standards just lead to confusion?............................................................................ 5
What is Dublin Core?............................................................................................................................. 5
What about Controlled Vocabularies and Thesauri?..................................................................... 6
Further Issues to Consider ..................................................................................................................... 6
So why do we need Preservation Metadata as well?.................................................................... 6
What purpose does Preservation Metadata serve?....................................................................... 7
Preservation Metadata Initiatives....................................................................................................... 7
What does Preservation Metadata look like?.................................................................................. 7
This sounds complicated! ..................................................................................................................... 7
The Future................................................................................................................................................ 8
Summary ................................................................................................................................................. 8
What steps can I take to ensure that the metadata I provide is of a high standard?............. 8
Links and Bibliography .......................................................................................................................... 8
Content written on: 8th June 2004 by Iain Wallace and Eileen Maitland.
Content updated: 8th June 2004 by Iain Wallace and Eileen Maitland.
Introduction
Documentation is a crucial part of any digitisation project. Careful recording of all aspects of
a digital collection and the circumstances surrounding its creation can make the difference between a
resource which has limited value beyond the context in which it was originally created, and one whose
value extends far beyond this context and may be used extensively by the academic community in
perpetuity.
Pgina 2 de 9
This paper will discuss different forms of documentation, from unstructured information to
resource discovery and preservation metadata. The paper should enable anyone embarking on a
digitisation project to make informed choices about how to successfully document their digital resources.
Readers of this Information Paper may also be interested in the article on Guidelines for
Documenting Data, (http://www.ahds.ac.uk/creating/information-papers/documentation/index.htm ) which
discusses less structured forms of documentation at greater length.
What do we mean by Documentation?
At its most general, a resource's documentation should outline the reasons for and
circumstances surrounding its creation. As an absolute minimum, the document should provide details of
the resource's provenance, contents, structure, and the terms and conditions that apply to its use.
Why bother with Documentation?
As creator of the resource, it is natural that your main concern will be with its primary data,
and the value this offers to the academic community. However, in order to maximise that value, both now
and in the future, it is just as important to take pains to provide adequate accompanying information.
Anyone with relevant computer skills but no knowledge of your resource should be able to find it, and
then exploit it fully and effectively. In this way, the potential for its future re-use within other contexts and
for an audience well beyond its initial target community is significantly enhanced. Creating this
documentation should be as integral a part of your project as the research on which it is focused - not an
add-on or afterthought when the main body of work has been completed.
Documenting your project
The digital resources you create should be described using a structured metadata schema.
An account of how to approach metadata creation follows later in this paper. However, most research
projects require or produce some documentation which does not fit within the conventions of a metadata
schema. This type of information may be far less structured but forms, nevertheless, a valuable part of
the deposit, and can enhance its future use significantly. The vital role played by informal contextual
documentation might not immediately appear to merit the priority or consideration given to other areas of
the project. However, the absence of such material can in itself render the resource meaningless.
What do we mean by contextual documentation?
Contextual documentation can include any unstructured material which does not comprise
part of the resource itself, but which supports and enhances its use. For example, documentation relating
to the provenance of a data collection should include how, why, when and by whom the data collection
was created and used. This type of information would include: the aims and objectives of the research,
the funding arrangements supporting it, its scope and subject matter, related research, strengths and
weaknesses, methodologies chosen, rights associated with material used, keys, codes, guides,
glossaries, abbreviations, encryption schemes etc necessary in order to understand the resource(s)
created.
A data collection's intellectual context should be documented thoroughly enough to enable
someone who has not been involved in the project to understand the intellectual framework in which it
was created.
Information about contents, structure and terms and conditions can generally be recorded in
a much more structured way using metadata.
What exactly is Metadata?
In terms of a traditional library environment, metadata would be described as cataloguing
information. Metadata is a more recent term, which relates more specifically to digital resources. It is
information relating to and describing other information: data about data, and has been described as "the
sum total of what one can say about any information object at any level of aggregation." (Buca, ed. 2000)
What does metadata do?
Metadata summarises not only the content of a resource, but a whole range of factors
associated with its creation, content, context and structure. The metadata can be separate from the
Pgina 3 de 9
resource it describes, or in can actually be embedded within it. This was the case even before the advent
of digital resources, as we can see in things like the CIP (Cataloguing In Publication) information data on
the back of a book's title page, or the information printed on the label of a vinyl record. Even right-clicking
on a digital image on a website will immediately provide you with a moderate amount of metadata, telling
you what type of image it is (e.g. jpeg, gif etc), its web address, its size (no. of bytes), dimension (in
pixels), and dates of creation and modification.
How is metadata different from other kinds of documentation?
Metadata is distinguishable from other forms of documentation in that it is structured and
exhibits consistency. Standardisation in the way that metadata is created means that information about
resources can be presented in a meaningful and consistent way, and is crucial for effective resource
discovery. It is also vital in facilitating interoperability, which allows integrated access to and searching of
a wide range of resources across different systems. Standardised structures for organising and
presenting metadata are known as schema. Dublin Core (described later in this document) is one such
schema, which comprises 15 key elements.
Why is metadata important?
Metadata enriches the resource it describes by extending the user's understanding of its
content and the factors surrounding its creation. It places the resource in context and provides a
background to it, thus enhancing the user's appreciation and understanding of it.
Metadata also extends the usefulness of the resource to the wider research community by
facilitating access to it beyond the confines of the individual project or institution in which it was created.
Metadata enhances the value of the resource to researchers by enabling them to locate it,
and by allowing them to make informed decisions as to whether or not a particular resource is relevant to
their purposes.
Metadata allows the resource to be managed effectively by the party responsible for it.
In recording the resource's physical features and inherent qualities and events and activities
to which it has been subjected since its creation, its viability as a useful and usable resource is
significantly increased over the longer term.
What different kinds of metadata are there?
1. Resource discovery metadata
This type of metadata is primarily related to the content of the resource, and describes the
resource in such a way as to allow it to be located in a search, and differentiated from other, similar
resources.
2. Preservation metadata
Preservation metadata can be broadly divided into the categories "technical" and
"administrative", and basically comprises any information essential to continued use of the resource.
Technical metadata documents the resource's history, such as the processes involved in its creation (e.g.
file formats, date of digitisation etc.) or any manipulation it has undergone (e.g. colour adjustments to an
image). Administrative metadata includes anything related to its management, delivery or distribution,
such as rights information.
3. Metadata at different levels
Metadata can describe resources at different levels of aggregation. One record can refer to a
whole collection, or could be confined to a single item.
What are Metadata Standards?
Metadata standards are designed to impose structure and consistency on the way metadata
is recorded. This consistency ensures accuracy and reliability in information retrieval and allows users to
cross-search different disciplines, collections and domains by promoting interoperability. Whatever
approach is adopted in terms of developing metadata, it is crucial to use established standards as part of
the process.
Pgina 4 de 9
A metadata standard is a specification that outlines a set of fields or elements, each of which
is designed to contain information on a particular aspect of the resource.
The standard defines a meaning for each element, and guidelines as to its application.
Can't I just make it up myself?
Intimate knowledge of your resource is no guarantee that your description of it will make
sense and be meaningful and accessible to the wider world. In order to do that, you must be, in a sense,
describing it in the same terms as others are describing theirs. It is acceptable to use your own set of
fields for recording data about your digital resources, as long as your in-house schema can be mapped to
existing metadata standards.
However, inconsistency and a lack of precision in description and data entry can lead to
resources being missed in searches (or appearing as irrelevant in a list of results, which is just as bad),
and will not enhance their value to the research community - possibly the reverse.
To ensure consistency within descriptive elements, from the way in which a date is
expressed, to the use of corporate and personal names, established standards should be adhered to
where possible. Even something as simple as spelling can cause problems - simple carelessness means
there is always a good chance of finding a bargain on eBay! For example, try searching for "plam pilots".
How will I know which standard to choose?
There is no such thing as a "one size fits all" standard, and the following considerations will
all influence the selection of the most appropriate standard.
Aggregation (will the metadata describe collections/groups of resources or individual items?)
Granularity (will the metadata provide considerable detail or is a broader approach more
appropriate/all that is manageable within resources available to the project?)
Context (do the resources fall into a very specialised subject grouping or are they part of a
larger collection which covers a number of disciplines?)
Concept (what is the collection for? What kind of metadata will best represent the resource
to its users now and also in the future?)
What kinds of standards are out there?
Different authoritative bodies have developed many different Metadata Standards. Some of
these are associated with different levels of aggregation, while others relate to material in particular
subjects. These are a few examples:
EAD (Encoded Archival Description) was developed as a means of marking up the data
contained in a finding aid so that it can be structured, displayed and searched online. Basic
finding aids include guides, inventories, card catalogues, checklists, shelflists, and indexes. In
general, finding aids are highly structured and hierarchical, and relate to a group of materials.
The VRA (Visual Resources Association) metadata element set is for the description of visual
materials. These might be paintings, buildings or sculpture, but in terms of a repository of
information are more likely to be surrogates of those originals such as photographs and slides.
SPECTRUM - The UK Museum Documentation Standard
The RSLP Collection Description schema is a structured set of metadata attributes, for describing
collections in a consistent and machine-readable way.
MARC - Machine-Readable Cataloguing - a standard primarily used for library catalogue data
The TEI (Text Encoding Initiative) is a set of tags and rules defined in XML, which describe the
structure and elements of a type of document. TEI is designed for marking up electronic texts
such as novels, plays and poetry.
Pgina 5 de 9
Don't all these standards just lead to confusion?
Inevitably, with so many different systems and schemas in place, it is often necessary to be
able to "translate" or "map" the elements from one system to another. Such mapping systems, which
allow metadata created by one community to be used by a group using a different metadata standard are
known as crosswalks. The success of any such mapping arrangement depends on the similarity between
the two schemes, the granularity of elements in the target scheme compared to that in the source, and
the compatibility of the rules on content within each element.
What is Dublin Core?
Dublin Core began as an initiative to improve discovery of digital resources, primarily on the
Web. Dublin Core is an international protocol for resource discovery, which can encompass both digital
and non-digital resource formats, and was designed to be used by individuals who do not necessarily
have any kind of background in information management. It was not designed for complex resource
description: its simplicity is intended to facilitate effective retrieval in a networked environment of the
resources it describes, and to accommodate researchers across a range of disciplines, whose
perspectives on the resources they create and require access to might be widely different.
Effectively, what Dublin Core offers is a compromise whose beauty lies in its simplicity and
breadth of application. In short, Dublin Core can be regarded as metadata's lowest common denominator.
Its 15 elements are as follows:
1. Title - Name of resource
2. Creator - Party responsible for content of resource
3. Subject - What the resource is about - usually expressed as keywords/phrases/classification
codes; should be drawn from authority list/formal classification scheme
4. Description - Account of content of resource; could be list of contents/abstract/free text
description
5. Publisher - Person(s)/institution(s) responsible for making the resource available
6. Contributor - Person(s)/institution(s) responsible for making contributions to the resource
7. Date - A date associated with the life cycle of the resource; very often the date it was created or
made available. Expressed (as defined in ISO 8601 [W3CDTF] in yyyy-mm-dd format
8. Type - Nature or genre of the content of the resource
9. Format - Physical or digital manifestation of the resource - includes information re associated
software or hardware. Best practice - select a value from a controlled vocabulary such as the list
of MIME types defining computer media formats.
10. Identifier - a unique reference to the resource. Examples = URI (Uniform Resource Identifier) -
including URL, Digital Object Identifier (DOI) & ISBN
11. Source - Reference to a resource from which the present resource is derived. Best practice is to
use a string or number conforming to a formal identification system.
12. Language - Language of intellectual content of resource. Best practice is to use RFC 3066 ,
which, in conjunction with ISO 639 defines 2, & 3-letter primary language tags with optional sub
tags. Examples = "en" or "eng" for English, "mar" for Marathi, and "en-GB" for English used in the
UK.
13. Relation - Reference to a related resource. Best practice is to use a string or number conforming
to a formal identification system.
14. Coverage - The extent or scope of the content of a resource. This might be spatial (a place
name/geographic coordinates), temporal (period label, date or date range) or jurisdiction (e.g. a
named administrative entity). Best practice is to select a value from a controlled vocabulary such
as the TGN (Thesaurus of Geographic Names) & that, where appropriate, named places or time
periods are used in preference to numeric identifiers such as coordinates or date ranges.
Pgina 6 de 9
15. Rights - Information about rights held in or over a resource. This could be a rights management
statement, or reference to a service providing this information. Rights information often
encompasses Intellectual Property Rights, (IPR), copyright and various property rights. If this
element is absent, no assumptions should be made.
Here is an example of a record describing a photograph, using Dublin Core: o
Creator:Donald Coopero Role=Photographero Subject: Shakespeare, William, 1564-1616, Antony and
Cleopatra [LC]o Description:Vanessa Redgrave as Cleopatrao Date: 1973-08-09o Type:Imageo
Format:JPEGo Identifier:4150 [catalogue no]o Source: negative no 235o Relation: Antony and Cleopatra:
Thompson/73-8o IsPartOfo Coverage:Bankside Globeo Role=Spatialo Rights:Donald Cooper
What about Controlled Vocabularies and Thesauri?
Controlled vocabulary lists and thesauri offer consistency in terminology for use in elements
like Subject (where the metadata creator wants to indicate what the resource is about). The more
consistency that can be applied to this procedure, the more fruitful searches will be, both within one set of
metadata records and across records held by different organisations. If multiple organisations describe
their collections consistently by using terms from a controlled list, the common approach will reap great
benefits during searches.
For example, if a group of resources on the history of theatre in India all attach the Subject
Heading 'Theatre History, India' (which comes from Library of Congress Subject Headings ), as opposed
to making up their own headings (e.g. Indian Theatre History), individual records will not slip through the
net during a search. It is important also to state which list your term is selected from (e.g. by putting [LC]
in brackets after the heading).
There are also controlled lists for terms within particular disciplines. These are produced by
authoritative bodies, and are often available online. The National Monuments Record Type (NMR) ,
Humanities and Social Sciences Electronic Thesaurus (HASSET) The Art & Architecture Thesaurus
(AAT), Union List of Artists Names (ULAN) and Thesaurus of Geographic Names (TGN) are all examples
of these. The last three were all developed at the Getty Research Institute in California, which promotes
innovative scholarship in the arts and humanities.
In addition to these, there are also established standards for expressing elements like Date,
Type and Language, such as ISO 639 for language abbreviations and RFC 2045 and 2046 for Media
(MIME) types.
Further Issues to Consider
Each project will have different metadata requirements. The schema employed should be
tailored to the characteristics particular to the project's digital resources and can be mapped later on to
standards such as Dublin Core for the purposes of interoperability. These standards should be seen less
as "off-the-shelf" commodities and more as reference points to which bespoke schemas can map.
Metadata must be "fit for purpose" - think carefully about the level of complexity required to describe your
resource. The content and number of fields to be used in a specific record may vary according to the
requirements of a particular collection and the nature of the individual digital object. This important
flexibility allows the time and attention given to cataloguing to vary according to the size, significance and
location of the digital resource being described.
There are now quite a number of software tools (many of them online) available to aid the
process of metadata creation. DC.dot (a web-based tool for creating Dublin Core tags for networked
resources), The Nordic Metadata Project DC Template, and the RSLP Collection Level Description
metadata generator are all examples of these.
It is very important to have consistency and authority in your metadata records. The most
important aspect of metadata standardisation is not that all records must contain the same fields, but that
where the same field exists in records belonging to different collections then it should be used for the
same purpose with the same standards.
So why do we need Preservation Metadata as well?
There are few environments that change as rapidly as the digital environment. The speed
with which technical innovations emerge and processes change are such that within a very few years
hardware and software on which valuable data has been stored can be rendered obsolete. Fortunately,
Pgina 7 de 9
solutions in order to meet these challenges are constantly being developed and updated. Nevertheless,
whatever the particular circumstances, it is never enough simply to have the resource to hand. If you do
not also have the technical and administrative information which provides the background to its creation,
delivery, operation and administration, you cannot be said to have full access to it, and it cannot be said
to have been preserved. Metadata which informs the user of the technical context for the resource (its file
format, file size, associated software, version etc), and other information (e.g. copyright information)
crucial to its ongoing management are as integral a part of its preservation as its physical security.
Without this information, some or all of the following vital questions, and others, will remain unanswered:
what is the resource?
how can it be used?
how has it been changed?
who has been involved in its creation/alteration?
What purpose does Preservation Metadata serve?
Five key functions of preservation metadata identified by the National Library of Australia are
as follows:
To store technical information that supports preservation decisions and action
To document preservation action taken (e.g. migration or emulation)
To record effects of preservation strategies
To ensure authenticity of digital resources over time (e.g. by using digital signatures)
To note information on Collection and Rights Management
Preservation Metadata Initiatives
As digital resources become an ever more dominant method of recording and disseminating
information in academia and beyond, the management of preservation metadata is inevitably an area of
increasing concern to those responsible for such resources. There are various initiatives attempting to
develop a framework for preservation metadata, including OCLC/RLG , CEDARS (Leeds University) ,
PADI (National Library of Australia) and NEDLIB (based in the Netherlands).
There is as yet no agreed standard, but the general principles governing discussions within
these groups do seem to be moving in the same direction, and are in some cases already converging.
The "CEDARS Guide to Preservation Metadata", for example, is closely based on the OCLC Preservation
Metadata Framework, which is very much at the forefront of developments in Preservation Metadata.
What does Preservation Metadata look like?
This depends on the resource to which it relates. Technical metadata relating to an image,
for example, might record features such as its format (e.g. JPEG), level of granularity (e.g. 24-bit), and
colourspace (e.g. RGB, CMYK, etc.)
A record for an audio file would record the bit rate, no. of channels, sample rate, etc., and
one for a video would provide details of its resolution, format of video content (codec used - e.g. DIVX),
the format of the video sound (e.g. .wma, .mov, .ra, etc.)
This sounds complicated!
But the good news isthat the AHDS can undertake some of this preservation
documentation work for you. On receipt of a collection the AHDS will add metadata relating to the future
handling of the resource. When such information is added to the preservation metadata developed by a
resource creator (recording details such as the original technical specifications of the resource and
information on collection and rights management), a siginficant body of data has been created to help
ensure the long-term maintenance of the resource.
Pgina 8 de 9
The Future
The emergence of the Semantic Web, in which the way information is created and
organised, makes metadata a more crucial activity than ever before.
"The Semantic Web is an extension of the current web in which informationis given well-
defined meaning, better enabling computers and people to work in cooperation." (Berners-Lee et al,
2001)
The Semantic Web provides a common framework that allows data to be shared and reused
across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with
participation from a large number of researchers and industrial partners. It is based on the Resource
Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs
for naming. In this way metadata can be easily re-used in a number of different ways by intelligent agents
and other programs, as well as human users.
Developments in metadata schemas continue to expand rapidly, as the effort to keep pace
with and manage the explosion of digital materials afforded by new technology continues. Every day new
groups of users and creators emerge, with their own unique perspectives and requirements - and
consequently new schemas are developed or existing ones adapted to meet these.
Summary
The creation of good quality metadata (to be embedded within a digital resource or to
accompany it to the repository) should always go hand in hand with that of the object(s) it describes. This,
more than any other activity in the digitisation process, will add value to a digital object as a resource for
teaching, learning and research, aid its promotion throughout the educational community, and increase its
longevity. Metadata is the key to ensuring that your resource is meaningful and accessible both now and
in the future, and should be a central consideration from the outset by anyone undertaking a digitisation
project, with its development costed in as an integral part of the project plan.
What steps can I take to ensure that the metadata I provide is of a high
standard?
Contact the AHDS as early as possible in a project that will result in a deposit. We can
provide advice and guidance on creating and submitting metadata which will maximize the usefulness
and viability of the material it describes.
Links and Bibliography
Martha Buca (ed.), 2000, Introduction to Metadata: Pathways to Digital Information
Encoded Archival Description (EAD)
VRA Core Categories
Spectrum
RSLP Collection Description
MARC
Text Encoding Initiative (TEI)
Crosswalks : mapping between schemas
Dublin Core
ISO 8601
RFC 3066
ISO 639
Library of Congress
Pgina 9 de 9
Art & Architecture Thesaurus (AAT)
National Monuments Record Thesauri
Thesaurus of Geographic Names (TGN)
Humanities & Social Science Electronic Thesaurus (HASSET)
Union List of Artists Names (ULAN)
MIME types
DC.dot
Nordic Metadata Project DC Template
OCLC/RLG
PADI (National Library of Australia)
CEDARS (Leeds University)
NEDLIB
Guidelines for Documenting Data Tim Berners-Lee, James Hendler, Ora Lassila, 2001, The
Semantic Web, Scientific American
Gail Hodge, Metadata made simpler Daniel Gelaw Alemneh, Samantha Kelly Hastings, and
Cathy Nelson Hartman, "
A Metadata approach to the preservation of digital resources", First Monday Cory Doctorow,
Metacrap:
Putting the torch to seven straw men of the meta-utopia

You might also like