You are on page 1of 32

Theme 9

INL CREATION OF METADATA: ENCODING

120

2018/09/17
Learning outcomes
After the completion of this theme, you need to be able to
discuss the following concepts:
• Understand how and why we do encoding.
• Discuss the aims and advantages of encoding.
• Discuss the four types of encoding standards.
• Discuss and provide examples of the following concepts:
RDF, Semantic Web, IBA, ISA, Ontologies and OWL.

2
9.1 How and why do we encode surrogate
records for machine manipulation?
Encoding = providing the syntax of the metadata
(Syntax = the grammatical arrangement of words in a
language)

Metadata content and encoding for the content are


entwined:
• Can choose to create metadata records by first
determining the descriptive content and then encoding
the content
• Or, can start with the shell comprising the codes, and then
fill in the contents of each field
9.2 Why encoding standards?
Surrogate records must be encoded for machine manipulation if
they are to be placed into an online database because:
• fundamentally computers deal with numbers
• they store characters by assigning a code to
each one
therefore each letter of the alphabet, each numeral, each
symbol in every language has to be represented by a code;
01001000
01000101
01001100
010011

What you see 8-bit ASCII What is stored


on the screen. binary code. and manipulated
inside the computer
Binary Lette Binary Lette Binary Letter
r r
0100 0001 A 0100 1010 J 0101 0011 s
0100 0010 B 0100 1011 K 0101 0100 T
0100 0011 0100 1100 L 0101 0101 u
c
0100 0100 D 0100 1101 M 0101 0110
v
0100 0101 E 0100 1110 N 0101 0111 w
0100 0110 F 0100 1111 0 0101 1000 X

0100 0111 G 0101 0000 p 0101 1001 y


0100 1000 H 0101 0001 Q 0101 1010
z
0100 1001 I 0101 0010 R
9.3 Aims and advantages of encoding

1 Encoding enables the creation of searching programmes that


allow the searching of certain fields.

2 Encoding is used for display.

3 Encoding allows for integration of many languages and


scripts to be displayed and searched in the same file.

4 Encoding is used for data transmission


9.3.1 Encoding enables the creation of searching
programmes that allow the searching of certain
fields.
• Surrogate records are encoded by assigning tags,
numbers, letters or words (i.e. codes) to discrete
(separate) pieces of information.

• e.g. In MARC coding the personal author name is given the


tag 100

• In an XML mark-up language such as TEI it is preceded with


<author> and followed with </author>
9.3.2 Encoding is also used for display
As with above searching in certain fields, computer
programmes can be written so that each field will
display in a certain position e.g. author at top or after
title etc.

9.3.3 Encoding allows for integration of many


languages and scripts to be displayed and
searched in the same file

Records in other scripts than Roman for e.g. do not need to


be “romanised” – online the coding allows for identification
of the fields in surrogate records regardless of the language
of the human doing the organising.
9.3.4 Encoding is used for data transmission

Allows for exchange of surrogate records between


institutions without having to worry about duplication.

Cooperative arrangements can be made to exchange


surrogate record between institutions so that each one
does not have to create every record from scratch
9.4 Types of encoding standards
❖ 9.4.1 MARC (Machine-Readable-Cataloguing)
o Standard for encoding library catalogue records since
1968.
o Used for transmitting data from one system to another.
o MARC directory was a major contribution to the
information science world as it was first that allowed for
variable-length fields, which is now common in most
databases.
NB: The four (4) types of encoding standards
- You need to be able to know what the acronums stand for, as well as,
discuss each of the encoding standards
❖ 9.4.2 SGML (Standard Generalised Markup
Language)
o SGML is a meta-language; that is, a language for
describing mark-up languages. It is not itself a mark-up
language
o International standard for document mark-up and
conforms to ISO 8879:1986.
❖ 9.4.3 XML (eXtensible Markup Language)
o XML is a subset of SGML
o Said to be just as easy to use as HTML, but as powerful
as SGML
o Incorporates techniques needed for multimedia files
such as ability to identify the format used to encode an
illustration.
❖ 9.4.4 HTML (HyperText Markup Language)
o HTML was developed to enable the creation of web
o pages.
o Provides for creation of simple structure Enables
display of images
o Establishing links between documents
9.5 Frameworks
• Each of the above encoding standards is a kind of shell or
container waiting for text to be inserted.
• This text can be suggested by the encoding standard
o or it may be controlled by another standard
• Bigger "containers‟“ are called frameworks:
o hold more than one record and provide a means for
linking together metadata for different kinds of sources
e.g. metadata for documents linked with metadata for
persons responsible for them etc.
9.5.1 Warwick framework (named after the location of
the conference where it was framed)
• Described as Container Architecture – container for
containers
o pulling together packages of metadata that are related
to the same information package but that need to be
separately controlled e.g. AACR2/MARC record, Dublin
Core record, EAD finding aid etc. all describing the same
collection
o allowing for interchange of data among different
communities
o allowing for selective access to certain metadata records
while ignoring others
• This whole process eventually led to the next step in the
evolution of frameworks namely the RDF (Resource
Description Framework)
9.5.2 RDF (Resource Description Framework)
• RDF stands for Resource Description Framework
o is a framework for describing resources on the web
o is designed to be read and understood by computers
o is not designed for being displayed to people
o is written in XML
o is a part of the W3C's Semantic Web Activity
• The Resource Description Framework (RDF) is a W3C standard
for describing Web resources, such as the title, author,
modification date, content, and copyright information of a Web
page.
9.5.2 RDF (Resource Description Framework)

• It provides an infrastructure that enables the encoding,


exchange and re-use of metadata:
o in a way that is unambiguous (that does not have more
than
o one interpretation)
o so that machines can understand the semantics
(meaning) of
o the metadata and therefore can use it in resource
discovery.
• In other words, RDF was designed to provide a common way to
describe information so that it can be read and understood by
computer applications.
Examples of use:
• Describing properties for shopping items, such as price and
availability
• Describing time schedules for web events
• Describing information about web pages (content,
• author, created and modified date)
• Describing content and rating for web pictures Describing
content for search engines
• Describing electronic libraries
9.6 The Semantic Web (Web 3.0)
Semantic = the meaning of…
Semantic Web = web with a meaning

• The RDF language is a part of the W3C's Semantic Web


Activity.
• The term “semantic web” was coined by the World Wide Web
Consortium director Tim Berners-Lee.
• Semantic Web is a group of methods and technologies to
allow machines to understand the meaning - or "semantics" -
of information on the WWW
9.6 The Semantic Web (Web 3.0)

• W3C 's "Semantic Web Vision" is a future where:


o Web information has exact meaning
o Web information can be understood and processed
o by computers
o Computers can integrate information from the Web
• Many of the technologies proposed by the W3C already exist
and are used in various projects.
• The Semantic Web as a global vision, however, has remained
largely unrealized and its critics have questioned the feasibility
of the approach.
Example of Semantic Web facility:

• Suppose a semantic web system was built to administer the


selling and buying of used cars over the Internet.
• The system would contain two main applications:
o One for people who wanted to buy a car
o One for people who wanted to put up a car for sale
• Let's call the Internet applications IBA (I Buy Application), and
ISA (I Sell Application).
IBA - The I Buy Application
People who want to buy a car could use an IBA application
much like this:
• In a "real live" application you would be asked to identify
yourself the first time you used it. Your ID would be stored in
an RDF file. Your ID would identify you as a person with name,
address, email, and ID number.
• When you submitted the query, the application would return
a list of cars for sale, and the list could be drilled down and
sorted by year, price, location and availability. This
information would be returned from a web spider
continuously searching the web for RDF files.
ISA - The I Sell Application
People who want to sell a car could use an ISA application
much like this:

• When you submitted the form, the application would ask you
for more information and store your ID and the information
in an RDF file made available to the web.
• The RDF file would contain information like: Your ID: Name,
address, email, ID number.
• Your selling item: type, model, picture, price, description
Behind the scenes
• Behind the scenes, the "ISA" application creates an RDF file
with a lot of RDF pointers.
• It creates an RDF pointer to a file with information about you,
an RDF pointer to information about Volvo and Volvo models,
an RDF pointer to Volvo dealers and resellers, about parts,
about prices, and much more.
• An RDF pointer is a pointer (actually an URL) to information
about things (like a knowledge database).
• The beauty about this is that you don't have to
• describe yourself, or the car model. The RDF application will
sort it out for you.
Behind the scenes (cont…)
• However, remember that the semantic web cannot work all
by itself. The “ISA‟and “IBA” applications above will have to
be developed by someone. Someone will have to build a
search engine database for all the items, and someone will
have to develop a standard for it.

Can you think of other examples of IBA and ISA applications?


9.7 Schemas and Ontologies
• The Semantic web uses RDF as a graph model to describe web
resources. This, however, is not enough.
• Everything can be put into a graph, but:
o how do we tell computers that one part of the graph can
be joined to another part because they refer to the same
thing?
o And also how do we put restrictions on how the graph is
built to make it interesting and not become a mess of
“stuff ” with no relationships?

This can be done by using schemas and ontologies.


Using schemas and ontologies:
• These are intended to provide a formal description of
concepts, terms, and relationships within a given knowledge
domain.
• RDF Schema is one of the simplest tools that allows modelling
these restrictions.
• Ontologies also provide ways to restrict how the graph is
modelled and how it should be interpreted by the computer.
• For the web, ontology is about the exact description of web
information and relationships between web information.
• The Web Ontology Language (OWL) is used in order to define
Semantic Web ontologies.
9.7.1 OWL (Web Ontology Language)
• is built on top of RDF
• is for processing information on the web
• was designed to be interpreted by computers
• was not designed for being read by people
• is written in XML
• has sublanguages
• is a W3C standard
9.7.2 OWL vs. RDF
• OWL and RDF are much of the same thing, but OWL is a
stronger language with greater machine interpretability than
RDF.
• OWL comes with a larger vocabulary and stronger syntax than
RDF.
9.8 The World Wide Web Consortium
(W3C)
For this INL 120 course with its emphasis on standards, you
also have to be aware of the W3C.

• The World Wide Web Consortium (W3C) is the main


International standards organisation for the WWW.
• Founded and headed by Sir Tim Berners-Lee the
consortium is made up of member organizations which
maintain full-time staff for the purpose of working
together in the development of standards for the WWW.
9.8 The World Wide Web Consortium
(W3C) (cont…)

• The W3C is responsible for various standards. Some of


these you may recognise (e.g. HTML, OWL, RDF, XHTML,
XML).
• A full list is available in your notes and can be accessed
on the Web if you are interested.

You might also like