You are on page 1of 25

Nrityakosha: Preserving the Intangible Heritage of Indian

Classical Dance
ANUPAMA MALLIK and SANTANU CHAUDHURY, Indian Institute of Technology, Delhi
HIRANMAY GHOSH, Tata Consultancy Services

Preservation of intangible cultural heritage, such as music and dance, requires encoding of background knowledge together with
digitized records of the performances. We present an ontology-based approach for designing a cultural heritage repository for
that purpose. Since dance and music are recorded in multimedia format, we use Multimedia Web Ontology Language (MOWL)
to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with
a labeled set of training data and use of the ontology to automatically annotate new instances of digital heritage artifacts.
The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have demonstrated
the efficacy of our approach by constructing an ontology for the cultural heritage domain of Indian classical dance, and have
developed a browsing application for semantic access to the heritage collection of Indian dance videos.
Categories and Subject Descriptors: H.m [Information Systems] : Miscellaneous
General Terms: Design, Experimentation
Additional Key Words and Phrases: Heritage preservation, ontology construction, concept recognition, video annotation, multimedia web ontology language
ACM Reference Format:
Mallik, A., Chaudhury, S., and Ghosh, H. 2011. Nrityakosha: Preserving the intangible heritage of Indian classical dance. ACM
J. Comput. Cult. Herit. 4, 3, Article 11 (December 2011), 25 pages.
DOI = 10.1145/2069276.2069280 http://doi.acm.org/10.1145/2069276.2069280

1.

INTRODUCTION

Preservation of heritage and ensuring its accessibility over a prolonged period of time has received
a boost with digital multimedia technology. The tangible heritage resources like monuments, handicrafts, and sculpture can be preserved through digitization and 2D and 3D modeling techniques
[Foni et al. 2010; Aliaga et al. 2011]. Preservation of intangible resources like language, art and culture, music and dance, is more complex and requires a knowledge intensive approach. Such a living
heritage lies with a set of people, who naturally become the custodians of this heritage, by practising

This work was funded under the heritage project entitled Managing Intangible Cultural Assets through Ontological Interlinking of the Department of Science and Technology of the Government of India.
Authors addresses: A. Mallik, Multimedia Lab, Room 405, Block II, IIT Delhi, New Delhi-110016, India; email: ansimal@gmail.
com; S. Chaudhury, Multimedia Lab, Room 405, Block II, IIT Delhi, New Delhi-110016, India; email: schaudhury@gmail.com;
H. Ghosh, TCS Innovation Labs Delhi, 249 D&E Udyog Vihar Phase IV, Gurgaon 122016. India; email: hiranmay.ghosh@tcs.com.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page
or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to
lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481,
or permissions@acm.org.
c 2011 ACM 1556-4673/2011/12-ART11 $10.00

DOI 10.1145/2069276.2069280 http://doi.acm.org/10.1145/2069276.2069280
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11

11:2

A. Mallik et al.

and passing on its legacy, often in an undocumented form. Thus, intangible cultural heritage is very
fragile and its preservation must capture the background knowledge that lies with its exponents, such
as dancers, musicians, poets, writers, historians, and the communities at large.
India has a rich cultural heritage, where music and dance have been interwoven into the social
fabric. Indian classical music and dance portray human emotions, love and devotion, narrate stories
from myths and religious scriptures, and are integral parts of the celebration of life. A list pertaining
to intangible cultural heritage compiled by UNESCO [UNESCO-HeritageList 2010] includes many
Indian dances, theatre and music forms. We present an ontology-based approach to the preservation
of intangible cultural heritage with a case study in the domain of Indian classical dance (ICD).
Indian classical dance is an ancient heritage, more than 5000 years old. The depictions of many forms
of this dance can be seen in dance postures in the form of sculptures on the walls of ancient Indian
temples and monuments. Contemporary dancers still learn from these sculptures and keep them alive
in their dance performances. An ancient Indian scripture called Natya Shastra,1 which is almost 2000
years old, and is probably the oldest surviving text on the subject, provides a detailed listing of the
grammar and rules covering music, dance, makeup, stage design and virtually every aspect of stagecraft. The dance forms whose theory and practices can be traced back to the Natya Shastra are known
as Indian classical dances. With the passage of time, these dance forms have been interpreted and
performed by different artists in different ways and have been associated with a rich set of body postures and gesturesa grammar for the performance, mythological stories, sculptural depictions, and
various other cultural artifacts. Thus, these dance forms embody a correlated collection of knowledge
sources which can be presented through a variety of manifestations in digital form, such as digitized
scriptures, digital close-ups of the dance postures, and gestures and video recordings of performances.
Preservation of this intangible knowledge is assisted by the availability and integrity of these varied
sources. Although large multimedia repositories of ICD heritage resources are available [Kalasampada; ASI-India], there are none that correlate the digital resources with the traditional knowledge.
The digital medium offers an attractive means for preserving and presenting tangible, intangible,
and natural heritage artifacts. It not only preserves a digital representation of heritage objects against
the passage of time, but also creates unique opportunities to present the objects in different ways, such
as virtual walkthroughs and time-lapse techniques over the Internet. In recent times, the economics
of computing and networking resources have created an opportunity for large-scale digitization of the
heritage artifacts for their broader dissemination over the Internet and for their preservation. Several
renowned museums and cultural heritage groups have put up virtual galleries in cyberspace [LouvreMuseum; EuropeanArt; Kalasampada; ASI-India] to reach out to a global audience. Currently, most
of such presentations available on these portals are hand-crafted and static. With the increase in the
size of collections on the web, correlating different heritage artifacts and accessing the desired ones
in a specific use-case context creates a significant cognitive load on the user. Of late, there has been
some research effort to facilitate semantic access in large heritage collections. Several research groups
[Hunter 2003; Hammiche 2004; Tsinaraki 2005; Petridis 2005] have proposed use of an ontology in the
semantic interpretation of multimedia data in collaborative multimedia and digital library projects.
While ontology is a useful tool for modeling a conceptual domain, it has not been designed to model
multimedia data that is perceptual in nature. The ontology and the metadata schemes are tightly
coupled in these approaches, which necessitates creation of a central metadata scheme for the entire collection and prevents integration of data from heritage collections developed in a decentralized
manner.
1 The Natya Shastra is an ancient Indian treatise on the performing arts, encompassing theatre, dance and music. Natya
Shastra is the foundation of the fine arts in India. (http://en.wikipedia.org/wiki/Natya Shastra).

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:3

The significance of a heritage artifact is implied in its contextual information. Thus, the scope of
digital preservation extends to preservation of the knowledge that puts them in proper perspective.
Establishing such contexts enables relating different heritage artifacts to create interactive tools to
explore a digital heritage collection. A major activity in any digital heritage project is to create a domain ontology [Stasinopoulou et al. 2007; Hernendez 2007; Aliaga et al. 2011]. However, much of the
digital heritage artifacts are recorded in multimedia format, and traditional ontology representation
schemes need multimedia records to be annotated for semantic processing. Such annotation is a laborintensive process and a major bottleneck in creating a digital heritage collection. In this context, we
present a different approach wherein the domain ontology, enriched with multimedia data and carrying probabilistic associations between concepts, provides a means to curate a heritage collection by
generating semiautomated annotation for the digital artifacts.
The key contribution of our work presented in this article, is to utilize computing methods to preserve
the living heritage of a cultural domain like Indian classical dance. This is made possible by providing
ways to encode the highly specialized background knowledge of the domain in an ontology, and further
providing methods to correlate this knowledge to access the audio-visual recordings and other digital
artifacts of ICD in a seamless fashion. Our ontology-based framework provides a conceptual linkage
between the heritage resources at the knowledge-level and multimedia data at feature-level. One of
the key ingredients in our architecture is a cultural heritage ontology [Mallik and Chaudhury 2009]
encoded in a novel multimedia ontology representation [Ghosh et al. 2007]. The ontology includes descriptions of domain concepts in terms of expected audio-visual features in multimedia manifestations,
making it especially suitable for semantic interpretation of multimedia data. We have experimented
with a heritage collection of ICD videos. Starting with a hand-crafted basic ontology for the ICD domain, we create a multimedia enriched ontology for the domain by using a training set of video segments labelled by domain experts. The heritage-specific knowledge of the domain is revalidated by
applying machine-learning techniques to finetune the ontology parameters. Once a multimedia enriched ontology is available, it can be used to interpret the media features extracted from a larger
collection of videos, to classify the video segments in different semantic groups, and to generate semantic annotations for them. The annotations enable creation of a semantic navigation environment
in the cultural heritage repository. Our knowledge-based system of preserving ICD heritage is called
the Nrityakosha.2 It offers information technology-based means to safeguard the ancient tradition of a
heritage domain like ICD and make it accessible to future generations.
The rest of the article is organized as follows. Section 2 gives an overview of our ontology-based
framework. Section 3 gives a brief introduction to the domain of Indian classical dance. Section 4
explains the requirement of a multimedia ontology for preserving digital heritage artifacts and the
advantage of using MOWL to build the multimedia ontology of the ICD domain. In Section 5, the annotation generation framework is detailed along with its main module: concept-recognition. Section 6
gives a brief idea of the ontology-based conceptual video browsing application. Section 7 concludes the
article with a summary of our findings and directions for future work.
2.

ONTOLOGY-BASED FRAMEWORK

Digital heritage resources include digital replicas of the intangible heritage exhibits, such as videos or
still images of dance forms, as well as the contextual knowledge relating to the exhibits contributed
by domain experts. Our architectural framework is motivated by the need for relating the digital objects with contextual knowledge, to make the former more usable. The goal behind building such a
framework is to give a novel multimedia experience to the user seeking to retrieve digital resources
2 Sanskrit

word Nritya means dance and Kosha means treasure, so Nrityakosha means a treasure of dance.
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:4

A. Mallik et al.

Fig. 1. Architecture for ontology-based management of a digital heritage collection.

belonging to a heritage collection. The heritage collection may be comprised of independently built collections of text documents, image documents, video files, music data, and scanned images pertaining to
different concepts in the heritage domain. Our framework with its unique concept-recognition faculty
has the capability of correlating the digital resources in different collections, which pertain to the same
concept in the ontology. The user, given an interface to browse the high-level concepts of the Heritage
collection via an ontology, selects a concept, say a poem from the musical epic Geeta Govinda.3 Performing a cross-modal retrieval with the help of the domain ontology, the retrieval system can provide
the following:
lyrics of a poem from the epic: a text document;
hand gestures and body postures corresponding to the words in the poem: Images;
a dance performed on the poem: Video;
a sculpture from the walls of a temple depicting the poem: Image;
a song composed using the lyrics of poem: Music.
Thus, by correlating and presenting different heritage artifacts and providing access to the desired
ones, we shift the cognitive load from the user to our system, in the process enhancing the users
multimedia experience. Figure 1 shows the architecture of our ontology-based framework for managing
a digital collection of intangible heritage artifacts. There are two major tasks in the framework.
(1) Ontology Creation. To begin with, a basic ontology for the domain is hand-crafted using an
ontology-editing tool like Protege by a group of domain experts. This includes the domain concepts,
3 Jayadevas Geetagovinda, composed in the twelfth century, is an Indian classic and a part of world cultural heritage.
http://www.geetagovinda.org/Geetagovinda.html.

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:5

their properties, and their relations. In general, the concepts follow a hierarchy. At the lowest level,
some concepts can be detected using specific media detectors, for example, specific Bhangimas (body
postures) in Indian classical dance. We call them media nodes in the ontology. At the higher level, there
are abstract concepts that have domain-specific relations with each other and associate with the media
nodes. Taking an example from the domain of Indian classical dance, an Odissi dance is generally accompanied by Odissi music which has audio characteristics. Mangalacharan is a kind of Odissi dance,
and thus its concept node inherits the audio-visual properties of the Odissi dance, including the audio
properties of Odissi Music (Figure 4).
The domain experts also provide conceptual annotations using a manual annotation tool to a training
set of multimedia data. Low-level media features are extracted by feature extractors and are used for
training the media feature classifier for the media nodes in the ontology. The basic ontology considers
all relations to be crisp. However, different instances of an abstract concept manifest in different sets
of media nodes. Moreover, different instances of media nodes have different media characteristics. This
motivates us to model the ontology with probabilistic associations [Ding et al. 2004].
In the Ontology Learning module, we compute the joint probability distributions of the concept and
the media nodes and create the probabilistic associations. The multimedia ontology with concepts
that have media-based properties and probabilistic associations needs a more evolved representation
scheme. We use the MOWL language (detailed in Section 4) to represent the ontological concepts and
the uncertainties inherent in their media-specific relations. MOWL encoding of the ontology allows
us to construct a Bayesian network (BN) equivalent to it. The multimedia ontology thus created, encodes the experts perspective and needs adjustments to attune it to the real-world data. Conceptual
annotations help build the case data used for applying a machine-learning technique, called the Full
Bayesian Network (FBN) learning [Su and Zhang 2006], to refine the ontology.
Thus, ontology learning is done by learning the structure and parameters of the BN derived from the
given ontology. While the concept nodes remain unchanged, some of the hand-crafted relations between
the concepts may be found statistically insignificant and get deleted. This is described in more detail
in Mallik et al. [2008]. Some new significant relations may be found and get added in the ontology. The
technique is applied periodically as new labelled multimedia data instances are added to the collection
and the ontology is updated. This semi-automated maintenance of the ontology alleviates significant
efforts on the part of knowledge engineers.
(2) Annotation Generation and Conceptual Browsing. The multimedia ontology is used to automatically annotate new instances of media artifacts. The media features of the new videos are extracted
and a set of media detectors are used to detect the media nodes in the ontology. The MOWL ontology
can then be used to recognize the abstract domain concepts using a probabilistic reasoning framework,
which is detailed in Section 5.1. The concepts so recognized are used to annotate the multimedia artifacts. Automatic annotation of digital heritage artifacts enables quick ingestion of such artifacts in a
digital heritage collection. Moreover, these annotations are used to create semantic hyperlinks in the
video collection and to provide an effective video browsing interface to the user, which is detailed in
Section 6.
3.

INDIAN CLASSICAL DANCE

We introduce the domain of Indian classical dance, which we have selected as a test bed for our heritage
preservation scheme at this stage of the article, since it contains some special terms describing its
concepts, and references to these in different parts of the article cannot be understood otherwise. Dance
is a perceptual domain where both audio and visual components play an important part. Hence video
as a medium is best suited to capture its knowledge. Any classical dance is that subclass of dance that
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:6

A. Mallik et al.

Fig. 2. Hand gestures, face expressions and dance posture images from the ICD domain. With permission of Madhumita Raut
(source: Odissi: What, Why and HowEvolution, Revival and Technique). Vanishka Chawla (dancer).

contains highly stylized body postures, hand gestures, and sequences;4 for ICD, some are shown in
Figure 2.
It is important to note here that hand gestures, body movements, and facial expressions of the dancer
always express a natural language keyword or phrase in Indian classical dance. Recognition of a hand
gesture or facial expression in an image or a video can lead to the discovery of the semantic concept
defined by the keyword/s expressed (e.g., in Figure 2, the first hand gesture denotes a blooming lotus
flower; the second gesture denotes the face of a deer; the first facial expression is of a sad mood,
while second one expresses happiness or fun). Thus, a spatio-temporal sequence of hand gestures
and body postures can lead to the discovery of high-level concepts like a mood, words of a poem, or a
portion of a mythological story. However an ontology is needed to understand the correlations and to
build the spatio-temporal associations between the various dance steps (e.g., shown in Figure 6).
While studying the domain, we came across various facts and facets of ICD, which we mention here,
as they helped construct a domain ontology.
There are at least 8 to 9 different Indian classical dance forms belonging to different parts of India,
some of which are Bharatnatyam, Odissi, Kuchipudi, and Kathak. Each is different in choreography,
posture, hand movements, costumes, colors, language, and music, yet follow some common basic
tenets, as laid out in ancient Indian literature.
All the dance forms are performed in a classical music rendition, which can be described somewhat
in terms of Raag (melody) and Taal (beat). The two main subclasses of Indian classical music are
Carnatic music and Hindustani music.
The uniqueness of Indian classical dance is that all the performances are mainly devotional in content. The dance performances are based on stories and poems from Indian epics like Mahabharat,
Ramayan, and GeetaGovinda, and those from legends in Indian mythology. Thus the dancers are
constantly realizing the roles of the deities described in the legends.
Much of the knowledge about the rules and grammar of Indian classical dance and music come from
the Natya Shastra. Another piece of literature, called the Abhinaya Darpan, written by Nandikeshvar, first published in 1874, is known as the Indian classical dance manual. It contains vast amount
of textual material for the study of the technique and grammar of body movements in an Indian
classical dance.
The sculpture found in ancient Indian monuments and temples has preserved a complete record of
the dance postures from ancient times. It reflects a deep inter-relationship with Indian dance-drama.
4 Basic to the Odissi dance are the postures known as chawk and tribhangi: chawk is a quadrangular posture of the body created
with the help of shoulders, hands, knees, and legs; tribhangi is a triangular pose created by the dancers hips, waist, and head.
(from www.pagesofindia.com/culture/dances-in-india.htm).

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:7

Fig. 3. Perceptual model. With permission of Madhumita Raut (source: Odissi: What, Why and HowEvolution, Revival and
Technique). Vanishka Chawla (dancer).

With this background, we explain the need for a multimedia ontology for preserving the knowledge of
this domain, along with its digital resources in the next section.
4.

MULTIMEDIA ONTOLOGY FOR INDIAN CLASSICAL DANCE

Cultural heritage artifacts generally need to be digitized in audio-visual formats. Conventional


ontology languages, such as OWL, use natural language constructs to represent concepts and their
relations, making it convenient to use them for processing textual information. An attempt to use the
conceptual model to interpret multimedia data gets severely impaired by the semantic gap that exists
between the two. Current implementations of semantic web technology for cultural heritage repositories rely on accompanying annotations, which are manually created, contextually derived (e.g., camera
parameters), or generated through media processing techniques. However, the concepts have their
roots in perceptual experience of human beings, and the apparent disconnect between the conceptual
and the perceptual worlds is rather artificial. The key to semantic processing of media data lays in
harmonizing the seemingly isolated conceptual and perceptual worlds.
Concepts are formed in human minds as a result of many perceptual observations of the real-world
objects and abstraction of the experience, and are labeled with linguistic constructs to facilitate communication in human society. When a natural language construct is used to specify a concept, it gives
rise to the expectation of some perceptible media properties. Observation of those media properties
forms the basis of concept recognition in the real-world as well as on multimedia recordings. For example, Figure 3 depicts the formation of the concept Pranam (a Sanskrit word meaning salutation)
and the abstracted visual patterns that are expected on an embodiment of the concept in a multimedia
artifact. Note that the different instances of the concept have significant variations, and the perceptual
model needs to cope with the uncertainty.
Thus, we can form a model of the world where the presence of a concept causes some media patterns
to manifest in the multimedia data instances with a degree of definite probability. Detection of any
of such media pattern is evidence for the presence of the concept. A concept can be recognized on the
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:8

A. Mallik et al.

Fig. 4. Multimedia ontology snippet from the Indian classical dance domain. With permission of Madhumita Raut (source:
Odissi: What, Why and HowEvolution, Revival and Technique). Vanishka Chawla (dancer).

basis of accumulated evidential value as a result of detection of a number of expected media patterns
with a closed-world assumption. Multimedia Web Ontology Language [Ghosh et al. 2007] is based on
this causal model of the world. The model based on probabilistic cause-effect relations motivate us
to use Bayesian network-based evidential reasoning for concept recognition. Associating media examples with concepts in an ontology for multimedia data processing is described in Bertini et al. [2009].
However, it does not include a scheme for reasoning with the media data.
Figure 4 depicts a small section of an ontology encoding media-based description of concepts. The ontology snippet is part of the ICD ontology, and shows some of the concepts related to the Odissi dance
form, which is a subclass of Indian classical dance. An Odissi dance performance is generally accompanied by an Odissi music score. This relation is shown by an isAccompaniedBy relation between the
two nodes. The Odissi music concept in the ontology has an audio media pattern attached to it, which
can be detected by an audio media- detector. Similarly, visual properties of the folded hand posture
describe the concept of Pranam. The Mangalacharan dance contains a sequence called BhumiPranam.
A unique posture of the dancer, called chawk, can be used to identify that the dance performance is of
the form Odissi dance. Visual properties of these media patterns can be used to describe these concepts
with some uncertainty. Individual media patterns are often connected by some spatial or temporal relationships, in the context of a concept or an event. Such spatio-temporal relations can be formally
specified in MOWL, following the scheme proposed in Wattamwar and Ghosh [2008].
4.1 MOWL Language Constructs
The MOWL language has been designed as an extension of OWL to ensure compatibility with the
W3C standards. It uses OWL constructs to define classes, individuals, and properties. In addition, it
proposes some language extensions for the following:
Encoding media properties. We use these constructs to attach media observables such as media
examples and media features (MPEG-7-based such as color, edge histogram, or composite media
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:9

Fig. 5. MOWL relations. With permission of Madhumita Raut (source: Odissi: What, Why and HowEvolution, Revival and
Technique).

patterns like postures, human actions) to concepts in our ontology. Figure 5(a) shows how the concept
Pranam is associated with its media properties.
Specifying property propagation rules that are unique to media-based description of concepts. For
hierarchy relations, media properties propagate from superclass to subclass and media examples
propagate from subclass to superclass. There are relations in an ontology, different from hierarchy
relations, which also permit the flow of media properties. For example, say Odissi dance is accompanied by Odissi music, then the former generally inherits the audio properties of the latter, although
the relation is accompanied by does not imply a concept hierarchy (see Figure 5(b)). To allow specifications of such extended semantics, MOWL declares several subclasses to OWL:OBJECTPROPERTY.
Specifying Spatio-Temporal Relations. Many concepts observed in videos are characterized by specific spatial or temporal arrangements of component entities. The occurrence of such a relation
is termed as an event, for example, a goal event in soccer, a dance step in ICD. A multimedia
ontology should be able to specify such concepts in terms of spatial/temporal relations between
the components. MOWL defines a new property, MOWL:SPATIOTEMPORALPROPERTY, as a subclass of
OWL:OBJECTPROPERTY to instantiate specific spatio-temporal relations.
Classical Dance sequences are choreographical sequences of some short dance steps, which are
further choreographed as temporal sequences of classical dance postures performed by the dancer.
We illustrate the spatio-temporal definition in MOWL, of a dance step labeled PranamDanceStep in
ICD domain in Figure 6.
Specification of probabilistic association between concepts and their media properties. The uncertainty in a domain can be characterized as a joint probability distribution over all the domain classes
and individuals, which can then be factored into several component factors, each a local distribution
involving fewer variables. The uncertainty of a variable can be specified as a conditional probability
distribution over a set of variables which are its direct causes. Uncertainty is encoded by specifying
a causal strength of association between concepts or between a concept and a media property specification as a pair of probability values P(M | C) and P(M | C), as shown in Figure 7, where C
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:10

A. Mallik et al.

Fig. 6. Part(a) shows the construction of the observation graph for an event related to the Pranam concept from the ICD
ontology. The smallest event unit is the temporal relation followedBy. Hence we make an event entity node Folding of hands
followedBy (Closing of eyes followedBy Bowing of head). The set of observation nodes connected to this event entity node is
virtual evidence for the observation node FOLDING OF HANDS and the event entity node Closing of eyes followedBy Bowing of
head. Likewise for the second temporal relation, followedBy, we have the two observation nodes: Closing of eyes and Bowing
of head, and the followedBy event. Part(b) shows the observation graph when the observable media objects (images of postures)
are also specified in the ontology. The observation nodes are shown linked to the corresponding concept nodes. With permission
of Vanshika Chawla (dancer).
Mangalacharan(0.97)
isPartOf(1.00,0.43)

Odissi(0.99)

contains(0.43,0.01)

BhumiPranam(0.84)

contains(0.35,0.00)isAccompaniedBy(0.75,0.10)isDancedBy(0.05,0.10)

Chawk(0.99)

OdissiMusic(0.34)

hasMF(0.90,0.00)

ChawkPosture(1.00)

hasMF(0.85,0.20)

OdissiAudio(0.12)

MadhumitaRaut(0.05)
hasMF(1.00,0.00)

MadhuFace(0.06)

contains(0.89,0.01)

Pranam(0.95)
hasMF(1.00,0.01)

PranamPosture(1.00
)

hasMF(0.83,0.20)

PranamDanceStep(0.39)

Fig. 7. Concept recognition in an observation model.

is a concept and M represents an associated concept or an associated media pattern. Thus MOWL
supports probabilistic reasoning with Bayesian networks, in contrast to crisp description logic-based
reasoning with traditional ontology languages.
4.2 Reasoning in MOWL
There are two distinct stages of reasoning with MOWL for concept recognition. The first stage involves
identifying a set of media properties that are likely to manifest when a concept is present in a multimedia instance. In general, the set can include the media property of the concept as well as those of
some related concepts in the domain. Referring to the ICD example mentioned earlier, an Odissi dance
is generally accompanied by Odissi music. Thus, the audio characteristics of Odissi music are likely
to manifest in a video recording of Odissi dance. This form of media property inheritance is quite
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:11

distinct from the property inheritance rules that are supported by existing ontology models. MOWL
constructs include typing of the relations between the concepts and defines media property inheritance
rules.
Once the expected media properties for a concept are obtained, they together with the concepts
involved, are organized as a connected graph, called an observation model (OM) for the concept. The
OM is, in effect, a specification for the concept in terms of search-able media patterns. It is modeled as
a Bayesian network. The joint probability distribution tables that signify causal strength (significance)
of the different media properties towards recognizing the concept are computed from the probabilistic
associations specified in the ontology. Figure 7 depicts a typical OM for the concept Mangalacharan,
which is an invocation dance in Odissi dance form. The media nodes are shown connected to their
respective abstract concepts with a hasMF or hasMediaFeature relation (a MOWL relation) described
in Section 4. The OM derived from the associations in the ontology, part of which is shown in Figure 4,
shows how the concept node Mangalacharan is specified in terms of the following:
PranamDanceStep - a composite concept (from the related concept BhumiPranam);
PranamPosture - a folded hands posture (from the related concept Pranam);
chawkPosture (from the parent node Odissi dance); and
OdissiMusic (from the parent node Odissi dance), and so on.
The PranamDanceStep concept in the OM, shown as an ellipse with a double boundary, is a composite
concept. It can be described in terms of other concepts along with the spatio-temporal relations between them. Figure 6 shows the construction of the observation graph for the spatio-temporal event,
PraamDanceStep, related to the Pranam concept.
Once an OM for a semantic concept is created, the presence of the expected media patterns can be
detected in a multimedia artifact using appropriate media-detector tools. Such observations lead to
instantiation of some of the media nodes, which in turn result in belief propagation in the Bayesian
network. The posterior probability of the concept node as a result of such belief propagation represents
the degree of belief in the concept.
4.3 Nrityakosha Compilation
The ICD heritage collection for our Nrityakosha system was compiled by gathering dance videos from
different sources. These include a highly specialized collection called Symphony Celestial Gold Edition purchased from Invis Multimedia,5 which contains videos of classical dance performances by
eminent Indian artists. Another set of high-quality dance performance videos was obtained from the
Doordarshan Archives of India.6 Many dance DVDs were donated for research purposes by well-known
artists of ICD. The videos contain dance and music performances, training and tutorials on different
dance forms, as well as many interviews and talks on ICD. We started work with a data set of approximately 20 hours of dance videos, consisting of dance performances of mainly two Indian classical
dance formsBharatnatyam and Odissi. The ICD ontology was constructed by encoding specialized
knowledge gathered from the domain experts, as well as from dance manuals like Natya Shastra and
Abhinaya Darpan. The ontology is written in MOWL. The experts provided conceptual labels for the
content of about 30% of videos, which were then used as a training set to finetune the ICD ontology.
Our ICD ontology contains around 500 concepts related to Indian dance and music in the ontology, out
of which about 260 have media observable patterns (features/examples) associated with them.
5 http://www.invismultimedia.com.
6 http://www.ddindia.gov.in/About

DD/Programme Archives.

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:12

A. Mallik et al.

ICDConcept
isa

isa
Composition

DanceEvent

isa

isa

isa

isa

DancePerformance

isa

belongsTo

isa
DanceForm

hasMusic

isa

isa

HindustaniMusic

Bharatnatyam

isa

MisrachapuTaal

isa

isa
hasRoles

hasTaal

KrishnaStory

KrishnaHandGesture

KrishnaRole

Posture

hasDanceStep

KrishnaDanceStep

hasMediaFeature

KrishnaDanceAction

performedBy

io

hasDanceStep
isa

isa

isa

hasRoles* hasRoles*

isa

DanceStep

isa

Role

KrishnaYashodaStory

YashodaRole

isa

HandGesture

isa hasRoles*

isa

hasRoles

BodyMovement

is_of_music hasMediaFeature

MisrachapuAudio

DurgaDance

io

belongsTo

isa
hasMusic

hasRoles*

KrishnaYashodaPerformance_29

Odissi

Taal

isa

isa

hasTaal

Artist

Story

KrishnaYashodaDance

isa

isa

Music

CarnaticMusic

isa

NaughtyKrishna
isa

hasPosture

hasPosture

KrishnaPose

hasMediaFeature

KrishnaFlutePosture

performedBy

Dancer

io

VanshikaChawla

hasMediaFeature
VanshikaFace

Fig. 8. A snippet from the MOWL ontology of Indian classical dance.

Figure 8 shows a snippet of the Indian Classical Dance ontology in MOWL represented graphically.
Root node represents an ICD concept, subclasses of which are shown as Music, DanceForm, Artist,
and Composition. This snippet focuses on an important mythological figurean Indian deity named
Krishna. Stories about Krishna abound in folklore, and all the classical dances of India have performances dedicated to events in his life. One of the events depicted here is enactment of a scene between
Krishna and his mother Yashoda through a performance in Bharatnatyam dance form. Linkages and
dependencies between a Story, a Role, a DancePerformance, a DanceForm, a Dancer, and various Body
Movements are encoded in the MOWL ontology.
The pink-colored leaf nodes in elliptical shape denote media nodes, which can be associated with various media classifiers such as posture recognizer, face recognizer, and so on. The hierarchical relations
in the Ontology are denoted by black-colored edges and labeled isa. All other relations that denote
media propagate properties between nodes, are shown as blue-colored, dotted edges. Nodes that are
red-lined are individuals or instances of their respective class to which they are connected with a
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:13

Fig. 9. Annotation tool architecture.

red-colored io (instance-of) link. This graphical representation of the ICD ontology represents a
Bayesian network. The conditional probability values are not shown here in order to preserve the
visual clarity of the diagram.
To recognize concepts in a new video of ICD, evidence is gathered at the leaf nodes, as different
media features are recognized or classified in the video by the media classifiers. If evidence at a node
is above a threshold, the media feature node is instantiated. In this snippet, the media nodes that
have been instantiated are MisrachapuAudio, KrishnaDanceAction, and KrishnaFlutePosture. These
instantiations result in belief propagation in the Bayesian network, and the posterior probability at the
associated concept nodes is computed. The nodes in gray color that are linked directly to the instantiated media nodes are MisrachapuTaal, KrishnaDanceStep, and KrishnaPose. After belief propagation,
these nodes have high posterior probability. As they get instantiated, we find a high level of belief for
the existence of other high-level abstract nodes (in cyan color). These are CarnaticMusic, and hence
Bharatnatyam; NaughtyKrishna and hence KrishnaRole; KrishnaRole hence KrishnaYashodaStory,
and KrishnaStory; KrishnaRole and Bharatnatyam hence KrishnaYashodaDance. Thus we surmise
that the video is of a Bharatnatyam dance performed to the accompaniment of Carnatic music, on the
theme of a Krishna Yashoda story, in which the dancer performed a Naughty Krishna role.
5.

SEMANTIC ANNOTATION OF THE HERITAGE ARTIFACTS

An important contribution of our framework is the attachment of conceptual annotation to multimedia


data belonging to the heritage domain, thus preserving the background knowledge and enhancing the
usability of this data through digital access. This annotation is like curating or authoring a domainbased specific collection, where a curator selects possible domain concepts for annotating a media file
and the system helps by providing evidence for presence or absence of the concept/s and suggesting
related concepts based on the media content analysis and ontological relations. Figure 9 shows the
architecture of the annotation generation framework. It consists of five functional components. The
basis of this whole framework is the MOWL ontology created from domain knowledge, enriched with
multimedia data and then refined with learning from annotated examples of the domain.
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:14

A. Mallik et al.

The most important component of this module is the concept-recognizer. The task of this module is to
recognize the high-level semantic concepts in multimedia data with the help of low-level media-based
features. The belief-propagation in BN which leads to the recognition of high-level semantic concepts
is explained in Section 5.1. OMs for the high-level concepts selected by the curator of the collection are
generated from the MOWL ontology by the MOWL parser and given as input to this module. Low-level
media features (SIFT features, spatio-temporal interest points, MFCC features, etc., detailed later in
this section) are extracted from the digital artifacts which can be in different formats (image, audio,
video), and provided to the concept recognizer by media feature extractor. Media pattern classifiers,
trained by feature vectors extracted from the training set of multimedia data, help detect the media
patterns (postures, actions, music, and so on) in the digital artifacts. Some of these classifiers are
detailed in Section 5.1.
In initial stages of building the collection, data is labeled with the help of manual annotations, provided by the domain experts. The XML-based annotation generator is responsible for generating the
annotation files in XML. The inputs to this module are the conceptual annotations (manual, supervised, and automatic), and features of the multimedia data (in MPEG-7 format) and the output is the
XML file as per MPEG-7 standard (example snippet shown in Figure 13(b)).
5.1 Concept Recognition
Preservation of heritage resources belonging to the performing arts in a digital form to give a holistic
view to the audience, requires a solution to the problem of recognizing elementary domain concepts
like postures, actions, and audio in video sequences. We have tackled this problem with the help of
various media detectors, details of which are given below. We follow a common representation scheme
for the different media patterns that are to be classified in concept recognition. Although the low-level
media features differ for the various media detectors, the basic steps to build the feature vectors for
the classifiers are as follows.
Collect labeled examples of media patterns (images or video snippets) from different dance videos.
Extract descriptors for all the examples, and quantize the descriptors by K-means clustering algorithm to obtain a discrete set of local N f feature words.
An example media pattern Ei is represented by an Indicator vector iv(Ei ), which is a histogram of
its constituent descriptor words,
iv(Ei ) = {n(Ei , f1 ), . . . , n(Ei , f j ), . . . , n(Ei , f N f )}

(1)

where n(Ei , f j ) is the number of local descriptors in image Ei , quantized into feature words f j ,
through some similarity computation.
Train a support vector machine (SVM) classifier [Burges 1998] with the indicator vectors to classify
the media patterns.
5.1.1 Dance Action Categorization. We have used spatio-temporal interest points [Niebles et al.
2008] for detecting human action categories for ICD dance actions. Spatio-temporal interest points are
extracted for frames of a video. The extracted spatio-temporal interest points are used in the bag of
words approach to summarize the videos in the form of spatio-temporal words. The results of dance
action categorization experiments performed for some recognizable Odissi dance actions are shown in
Table I. Approximately 30 video shots of each action were submitted for training. We use the Weka
machine-learning platform (www.cs.waikato.ac.nz/ml) to train the SVM classifiers and then perform
10-fold cross-validation tests on 77 videos to test the classification of the various dance actions. The
accuracy of classification was found to be approx. 88.8% on an average, wherein the cluster sizes in
various tests were 50, 100, 200, and 500.
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

BhumiPranam

Ardhacheera

Manjira

11:15

Veena

Fig. 10. Examples of Odissi dance actions with spatio-temporal features. With permission of Madhumita Raut (source: Odissi:
What, Why and HowEvolution, Revival and Technique).

Table I. SVM Classification Results for Odissi Dance Actions


Classes

TPRate

FPRate

Precision

Recall

F Score

ROC Area

Ardhacheera
Bhasa
BhumiPranam
Goithi
JagranNritya
Manjira
TribhangiBhramari
Veena
VipritBhramari

0.833
1
0.846
1
1
0.846
0.75
0.917
0.75

0.011
0.024
0.047
0.011
0
0
0.011
0
0.022

0.909
0.875
0.733
0.917
1
1
0.857
1
0.75

0.833
1
0.846
1
1
0.846
0.75
0.917
0.75

0.87
0.933
0.786
0.957
1
0.917
0.8
0.957
0.75

0.911
0.988
0.9
0.994
1
0.923
0.87
0.958
0.864

5.1.2 Dance Posture Recognition. We have used the scale-invariant feature transform (SIFT) approach [Lowe 2003] to recognize dance postures in still images taken from dance videos. This approach
transforms image data into scale-invariant coordinates relative to local features. Using the representation scheme detailed above, an SVM classifier is trained with indicator vectors to classify the postures.
We extracted about 628 images of various Bharatnatyam dance postures (by 13 different dancers) and
about 288 frames depicting various Odissi dance postures (by 6 dancers) from our set of ICD videos.
These postures were labeled by the experts. The Bharatnatyam postures were classified into 16 classes
and Odissi postures into 12 classes. Some of the Odissi postures with their SIFT features are shown
in Figure 11. A 10-fold cross-validation using SVM classifier (Linear kernel, cost factor of 2.0) on the
Weka machine learning framework yielded an accuracy of 92.78%.
5.1.3 Music Form Classification. An important component of the Indian classical dance is the music accompanying the dance performance. For our experiments on classification of music forms, we
selected 160 video clips of duration 45 seconds to 2 minutes from different dance forms, with five
different music categories. Using the concept of audio terms discovered in an audio file, each audio
file was represented by a feature vector composed of an aural feature vector, a Mel feature cepstral
coefficient (MFCC) [Logan 2000] feature vector, and a combination of both. Many tests with 10-fold
cross-validation using the SVM classifier (linear kernel, cost factor of 2.0) on a Weka machine learning
framework, were conducted. The results are shown in Figure 12 via a comparative study of the various
tests done using different numbers of clusters and a confusion matrix for one of the tests. The accuracy
of classification was high, averaging around 95%.
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:16

A. Mallik et al.

Alasa
indolent, languid

Chawk
squarestance

Darpani

Mardala

looking into a mirror

Manjira

playing 'mardala' instrument

playing 'manjira'

Fig. 11. Odissi dance postures with SIFT features. With permission of Madhumita Raut (source: Odissi: What, Why and How
Evolution, Revival and Technique).

Table II. SVM Classification Results for Odissi Dance Postures


Classes

TPRate

FPRate

Precision

Recall

F-Measure

ROC Area

Alasa
Ardhacheera
chawk
DarpaniLeft
DarpaniRight
Flute
Manjira
Mardala
Tribhangi
TribhangiRight

1
1
0.955
1
0.833
0.5
0.875
0.833
1
1

0.011
0
0.013
0
0
0
0.011
0.011
0
0.037

0.9
1
0.955
1
1
1
0.875
0.833
1
0.842

1
1
0.955
1
0.833
0.5
0.875
0.833
1
1

0.947
1
0.955
1
0.909
0.667
0.875
0.833
1
0.914

0.994
1
0.971
1
0.917
0.75
0.932
0.833
1
0.981

ati

rn
Ca

(a) Classification Accuracy for Audio

Features

ri

tan

ipu

us

nd
Hi

an

iss

ya

tri

Od

Sa

1.00

0.00

0.00

0.00

0.00

Carnatic

0.00

1.00

0.00

0.00

0.00

Hindustani

0.00

0.00

1.00

0.00

0.00

Manipuri

0.00

0.0625

0.00

0.9375

0.00

Odissi

0.0625

0.00

0.0625

0.00

0.8750

Satriya

(b) Confusion Matrix of Combined Vectors

with 50 clusters
Fig. 12. Experimental results of music form classification.

To illustrate concept-recognition using these media classifiers, we refer to the OM for the concept
Mangalacharan detailed in Figure 7. Here, the BN corresponding to the OM is shown after some
media patterns were detected in an Odissi dance video and corresponding media nodes were instantiated. The concept Odissi Dance is related to the root node Mangalacharan as its parent. Basic to the
Odissi dance are the two postures known as chawk and tribhangi. One of the famous Odissi dancers is
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:17

Madhumita Raut. Mangalacharan includes the dance action, BhumiPranam, which contains the
Pranam (Figure 3) posture.
Media nodes are shown as ellipses in a light pink color and are attached to their associated conceptnodes by pink-colored links called hasMF (hasMediaFeature). The pair of values at each link denote
the probabilities P(M | C) and P(M | C), where C is a concept and M represents an associated
concept or media pattern. The OM is constructed from the ICD ontology refined with FBN learning,
so the probability values shown correspond to real-world data. Bracketed value with the name of each
node denotes its posterior probability after media nodes have been instantiated and belief propagation
has taken place in the BN.
For a new video, media features are extracted and media patterns detected to initiate conceptrecognition. Concept-recognition occurs with belief propagation in the BN representing the OM. In
this video, the media patterns detected are chawkPosture and PranamPosture, shown as dark pink
ellipses. The concept nodes Chawk and Pranam highlighted in gray are the low-level concepts which
are recognized due to presence of these media patterns in data. Due to belief propagation in the BN,
higher-level concept nodes (in cyan) are recognized as present in the video. The presence of the Chawk
concept causes Odissi Dance to be recognized. Presence of Pranam and BhumiPranam lead to the
recognition of the Mangalacharan concept, which is further confirmed by recognition of the Odissi
Dance concept in the video. Conceptual annotations are generated and attached to the video through
the annotation generation framework, detailed earlier in this section.
5.2 Video Annotation
Here we illustrate an application which is based on the annotation generation framework described
above. This is a video annotation tool that allows the curator of a heritage video collection to attach
conceptual annotations to the videos in the collection. The spatio-temporal characteristic of video data
means that an annotation can be associated to the video content at any level of spatio-temporal detail.
A comprehensive annotation scheme needs to take care of video analysis at multiple levels of granularity at which the entities may exist and interact. For example, annotations can be associated with a
spatial region within a frame, a complete frame, a spatio-temporal region spanning a set of frames, a
set of frames within a shot, and a complete shot, and a set of shots, or the complete video.
Events can also be specified at multiple levels of granularity. A spatial event can be associated with
a spatial region in a frame where two spatial entities happen to be in a specified spatial relation.
Temporal events occur when two temporal entities exhibit a specified temporal relation. Events (spatial
and temporal) can also correspond to the change of state (e.g., change of spatial relation or temporal
relation with another entity). An event which depicts a change of state can be associated with a spatial
region, that is, the spatial expanse of the aggregate of the spatial regions corresponding to the initial
and the final states, or a temporal range, that is, the duration over which a change of state occurs.
Some examples of the different kinds of events from ICD domain are the following.
Spatial events. Hand gestures, body postures, facial features, facial expressions.
Temporal events. A choreographic sequence related to a dance; steps like making a circle, walking
to the left side; a sequence of hand gestures and body postures that express the words of a song.
Certain dance sequences in classical dance have a set pattern of steps following each other, and
such a sequence is denoted by a name. For example, Bhumi Pranam (bowing to the earth), is a
choreographic sequence where the dancer squats, bending her knees, and touches her forehead with
both palms after touching the floor.
Spatio-temporal events. These help recognize the different roles played by different objects in a video
shot. In case of a group dance there is a main dancer accompanied by other dancers. A sequence
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:18

A. Mallik et al.

(a) Video annotation tool snapshot. With permission from Vanshika Chawla, dancer.

(b) Example XML snippet from a video annotation file. With permission from Vanshika Chawla, dancer.
Fig. 13.

depicting a dialogue between say a mother and son (Yashoda and Krishna), has two roles being
enactedone actor-dancer plays the mother, and the second plays the role of the son.
In our work, we specify an annotation as a triplet: <concept, location, probability>
The concept could be a high-level or low-level concept from the domain ontology. The parameter location
specifies the spatio-temporal region to which the concept refers to. The parameter probability provides
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:19

(a) Conceptual video browser screen shot showing the different panes.
With permission from Vanshika Chawla, dancer.

(b) Conceptual video browser screen shot showing selection of concepts through
a textual representation of the ontology. With permission from Vanshika Chawla, dancer.
Fig. 14.

a graded measure of the likelihood that the concept is relevant to the specified location. This parameter
is optional, and in case of default, the concept is considered relevant (with probability 1) to the specified
location.
The ontology derived Bayesian network can now be considered to model the dependencies between
concept and hypotheses. A concept hypothesis is the pair <concept, location>, where the location for
every concept is hypothesized to correspond to some spatio-temporal region/s. The Bayesian network
would then make use of the evidential observations at those hypothesized locations and infer probabilities for the relevance of the concepts to their corresponding hypothesized locations.
6.

BROWSING AND QUERYING VIDEO CONTENT

The video database in our Nrityakosha system consists of videos of Indian classical dance as well as
their annotation files. These XML files for storing the conceptual annotations, generated by the video
annotation application (Section 5.2), contain sufficient details to allow access to any section of the
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:20

A. Mallik et al.

video, right from a scene to a shot, to a set of frames depicting an event, to a single frame denoting
a concept, to an object in an audio-visual sequence. Each of these entities is labeled with one or more
concepts in the Domain ontology, through manual, supervised and automatic annotation generation as
detailed in Section 5.
The video annotation files in XML, along with the actual videos and a MOWL ontology of the domain,
constitute the inputs to the video browsing application called the conceptual video browser (CVB). The
domain ontology enriched with multimedia videos of the domain is parsed to produce a textual or
graphical representation. Users can view the domain concepts along with their relations and select
domain concepts to access. This browsing interface provides the user with a concept-based navigation
of the video database, using which he or she can easily browse and navigate progressively towards
more relevant content. Using this interface, a user can retrieve video content from different videos,
view annotations attached to a video entity, and navigate to other similar video entities using hyperlinks. Instances of the same concept entity or related concept entities are hyper-linked for this purpose.
We illustrate how CVB can be used to browse the ICD video collection with a MOWL ontology of the
ICD domain as an input to the browsing tool, with some example snapshots in the following pages.
Users who could typically be ICD dancers or students of ICD can use the CVB
to learn about specific dance postures, dance steps, hand gestures, and so on, by querying about them
and viewing the various video clips and images returned as search results;
to watch performances related to a particular poem (e.g., Geeta Govinda or story, e.g., Mahabharat
in mythology);
to watch portrayals of a mythological character, say Krishna or Yashoda in different dance forms;
to get to see dance performances pertaining to a particular music form like Carnatic music; or to
select a musical beat (called Taal in ICD ) or melody(Raag), and so on.
The system is able to show videos that are related to the search results on the basis of various ontological relations, and thus provide a comprehensive browsing and navigational interface for the ICD
collection.
We invited 15 exponents and dancers in the ICD domain to use the conceptual video browser for
searching and browsing the heritage collection of ICD videos. They were happy to find an interactive, computer-based, comprehensive system that provides hands-on knowledge about the basic dance
postures, music forms, and dance steps in ICD. Concepts and relationships in the ontology were validated and approved by them, as they could see them in context of the dance videos. While using the
CVB, they were able to search for the dance performances by entering different queriesrelated to a
dance, a dancer, a music form, a character portrayed, a geographic region, a composition or even the
name of a posture or dance step. Sometimes, when they could not formulate what they were looking
for, the ontology-guided browsing interface helped them navigate to the relevant content. Thus, they
were able to view different aspects of dance and visit different sections of the graphically represented
ICD domain. A subjective user satisfaction score was given by the users based on a scale of 1 to 5 in
increasing order of their satisfaction with the usage of the CVB. They were asked to give a score on
several parameters like relevance of the search results, ease of finding a dance video, correctness of the
ontological relations, comprehensiveness of the dance ontology and the user-friendliness of the GUI.
The mean opinion score computed by averaging the subjective scores was 4.
7.

CONCLUSION

This work is an attempt to help preserve the knowledge about Indian classical dance, which is an
ancient heritage. It makes use of technology to capture the grammar, the rules, and the ethics of
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:21

(a) CVB screen shot showing selection of Bharatnatyam Dance concept. Related videos show concepts CarnaticMusic
and TamilLanguage.

(b) CVB screen shot showing observation graph for BharatnatyamDance and its ontological relations with the concepts
CarnaticMusic and TamilLanguage.

Fig. 15. Example I. The user wishes to view videos of a particular dance form and has the intention of viewing the dance
performances which pertain to a particular kind of music and beat. (a) The user selects a major dance form BharatnatyamDance
from the ontology and submits a conceptual query for search. Search results show thumb-nails of videos of BharatnatyamDance.
The hyper-link pane shows related videos under two columns, which have labels: CarnaticMusic and TamilLanguage. The user
selects to play a video from search results by clicking on its thumbnail. (b) The user chooses to view the ontological relations of
BharatnatyamDance by pressing the View OM button. To navigate further, the user can select the music form CarnaticMusic
or the language TamilLanguage and browse videos pertaining to the selected concept. With permission from Vanshika Chawla
(dancer).
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:22

A. Mallik et al.

Fig. 16. Example I (continued). The user selects to view the OM for CarnaticMusic in order to navigate further. After viewing
the ontological relations, he can select a musical beat (Taal) - AdiTaal which is related to CarnaticMusic. This CVB Screen shot
shows the observation graph for CarnaticMusic concept and its relations with other concepts including the AdiTaal concept.
With permission of Vanshika Chawla (dancer).

Fig. 17. Example II. The user selects the role of the mythological figure Krishna to view the portrayals of this character in
different dance forms. The CVB screen shot here shows selection of the KrishnaRole concept, along with the observation graph
for KrishnaRole showing linkages with media features like KrishnaDanceStep and KrishnaFlutePosture (refer Section 3 and
Figure 8). With permission of Vanshika Chawla (dancer).

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:23

(a) CVB screen shot showing selection of VanshikaChawla concept and its observation graph.

(b) CVB screen shot showing the observation graph for BharatnatyamDancer.
Fig. 18. Example III. The user wishes to view dance performances of a particular dancer, and then may wish to view dance
performances of other dancers of the same dance form. (a) The user selects a dancers name VanshikaChawla from the ontology.
He discovers after viewing the OM for the dancer that she is a Bharatnatyam dancer; (b) He then selects the concept BharatnatyamDancer in order to search for the larger set of dance performances for all Bharatnatyam dancers. The figure shows
the OM for BharatnatyamDancer concept which has two dancers VanshikaChawla and AnitaRatnam as its instances, and thus
the media examples of these two dancers propagate to this concept due to the MOWL interpretation of such relations (refer to
Section 4.1). Thus user gets to view dance performances by all Bharatnatyam dancers. With permission of Vanshika Chawla
(dancer).
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

11:24

A. Mallik et al.

dance as laid down in ancient scripts and treatises. It also tries to capture the various interpretations
of this knowledge in actual performances by dancers and the dance-gurus over time as well as in
contemporary world. This has been made possible by using ontology to encode the interdependencies
between various facets of dance and music and by building a knowledge base enriched with multimedia
examples of different aspects of dance. With the help of this multimedia-enriched ontology, we are able
to attach conceptual metadata to a larger collection of digital artifacts in the heritage domain and
provide a holistic, semantically integrated navigational access to them.
In this work, we have experimented with videos of Indian classical dance that belong to the two main
classical dancesBharatnatyam and Odissi. We plan to extend the knowledge base by adding videos of
other Indian dances. The media properties studied and analyzed are mainly the dance postures, short
dance actions, music forms, and some textual and face-recognition features. We would like to include
hand gestures that denote specific things or characters, and face expressions that are used to express
emotions and moods, as they are an important component of Indian dance. Content-based analysis
tools which include hand-gesture recognition and facial expression interpreters would greatly enhance
concept recognition and annotation generation in this domain. Another enhancement can come from
extending the browsing tool to personalize retrieval of search results and add context and geographical
connotations to the ontology.
ACKNOWLEDGMENTS

The ICD dance videos were contributed for research by the PadmaShri-awarded, Odissi dancer, Guru
Mayadhar Raut and his daughter and disciple, Ms. Madhumita Raut [Raut 2007], and by Ms. Vanshika
Chawla, a Bharatnatyam dancer based in New Delhi. The required third-party permissions were obtained for using the images from their performances in this article. A few images from the Wikimedia
Commons database [Wikimedia ] have also been used in this article.
REFERENCES
ALIAGA, D. G., BERTINO, E., AND VALTOLINA, S. 2011. Decho: A framework for the digital exploration of cultural heritage objects.
J. Comput. Cult. Herit. 3, 12, 1, 1226.
ASI-INDIA. Archaeological survey of Indiahome. http://www.asi.nic.in/index.asp.
BERTINI, M., BIMBO, A. D., SERRA, G., TORNIAI, C., CUCCHIARA, R., GRANA, C., AND VEZZANI, R. 2009. Dynamic pictorially enriched
ontologies for digital video libraries. IEEE Multimedia 16, 4251.
BURGES, C. J. C. 1998. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121167.
DING, Z., PENG, Y., AND PAN, R. 2004. A Bayesian approach to uncertainty modeling in OWL ontology. In Proceedings of the
International Conference on Advances in Intelligent Systems: Theory and Applications.
EUROPEANART. Web gallery of art, image collection, virtual museum, searchable database of European fine arts.
http://www.wga.hu/.
FONI, A. E., PAPAGIANNAKIS, G., AND MAGNENAT-THALMANN, N. 2010. A taxonomy of visualization strategies for cultural heritage
applications. J. Comput. Cult. Herit. 3, 1:11:21.
GHOSH, H., CHAUDHURY, S., KASHYAP, K., AND MAITI, B. 2007. Ontology Specification and Integration for Multimedia Applications,
Springer, Berlin.
HAMMICHE, S. 2004. Semantic retrieval of multimedia data. In Proceedings of the 2nd ACM International Workshop on Multimedia Databases, ACM, New York, 3644.
HERNENDEZ, F. 2007. Semantic web use cases and case studies: An ontology of Cantabrias cultural heritage.
http://www.w3.org/2001/sw/sweo/public/UseCases/FoundationBotin/.
HUNTER, J. 2003. Enhancing the semantic interoperability of multimedia through a core ontology. IEEE Trans. Circuits Syst.
Video Technol. 4958.
KALASAMPADA. Digital library: Resources of Indian cultural heritage. http://www.ignca.nic.in/dlrich.html.
LOGAN, B. 2000. Mel frequency cepstral coefficients for music modelling. Tech. rep., Cambridge Research Lab, Cambridge MA.
LOUVRE MUSEUM official website. http://www.louvre.fr/llv/commun/home.jsp?bmLocale=en.
ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance

11:25

LOWE, D. 2003. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 20, 91110.
MALLIK, A. AND CHAUDHURY, S. 2009. Using concept recognition to annotate a video collection. In Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence (PReMI09). Springer, Berlin, 50751.
MALLIK, A., PASUMARTHI, P., AND CHAUDHURY, S. 2008. Multimedia ontology learning for automatic annotation and video browsing.
In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR08). ACM, New York, 387
394.
NIEBLES, J. C., WANG, H., AND FEI-FEI, L. 2008. Unsupervised learning of human action categories using spatial-temporal words.
Int. J. Comput. Vision, 299318.
PETRIDIS, K. 2005. Knowledge representation and semantic annotation for multimedia analysis and reasoning. In IEE Proceedings on Vision, Image and Signal Processing. 255262.
RAUT, M. 2007. Odissi: What, Why and How Evolution, Revival and Technique. B.R. Rhythms, Delhi.
STASINOPOULOU, T., BOUNTOURI, L., KAKALI, C., LOURDI, I., PAPATHEODOROU, C., DOERR, M., AND GERGATSOULIS, M. 2007. Ontologybased metadata integration in the cultural heritage domain. In Proceedings of the 10th International Conference on Asian
Digital Libraries (ICADL07). Springer, Berlin, 165175.
SU, J. AND ZHANG, H. 2006. Full Bayesian network classifiers. In Proceedings of the 23rd International Conference on Machine
Learning. 897904
TSINARAKI, C. 2005. Ontology-based semantic indexing for mpeg-7 and tv-anytime audiovisual content. In Special issue of Multimedia Tools and Application J. Video Segment. Semantic Annot. Transcod. 26, 299325.
UNESCO-HERITAGELIST. 2010. The list of intangible cultural heritage in need of urgent safeguarding.
http://www.unesco.org/culture/ich/index.php?lg=en&pg=00011.
WATTAMWAR, S. S. AND GHOSH, H. 2008. Spatio-temporal query for multimedia databases. In Proceedings of the 2nd ACM Workshop on Multimedia Semantics (MS08). ACM, New York, 4855.
WIKIMEDIA. Wikimedia commons database: A database of freely usable media files to which anyone can contribute. http: //commons.wikimedia.org/wiki/Main Page.

Received March 2011; accepted April 2011

ACM Journal on Computing and Cultural Heritage, Vol. 4, No. 3, Article 11, Publication date: December 2011.

You might also like