You are on page 1of 10

Music Understanding

See also Computer Accompaniment and Beat Tracking

Back to Bibliography by Subject

This page is divided into several categories:

• General Overviews, mostly conference presentations that describe a number of


research projects I have worked on.
• Style Classification, using computers and machine learning to label different
styles of music.

• Music Information Retrieval, especially systems that search using melodies as a
query (query-by-humming).
• Structural Analysis, systems that analyze music to obtain a description such as
AABA, or discover other patterns.
• Music Alignment, systems that time-align two performances or match symbolic
representations to audio (note that computer accompaniment systems perform
real-time alignment to follow a score, but this is such an important and distinctive
application, that I have put those papers in their own group).

General Overviews
These are all overviews of my work in this area. The IAKTA/LIST paper is most up-to-
date, although it is actually quite dated. Since then, I've worked on Query-by-Humming,
polyphonic score alignment, music search by polyphonic alignment, music structure
analysis, and beat tracking informed by music structure..

Dannenberg, ``Music Understanding,'' 1987/1988 Computer Science Research Review,


Carnegie Mellon School of Computer Science, pp. 19-28.

[Postscript Version.] [Adobe Acrobat (PDF) Version.]

Dannenberg, ``Recent Work In Real-Time Music Understanding By Computer,'' Music,


Language, Speech, and Brain, Wenner-Gren International Symposium Series, Sundberg,
Nord, and Carlson, ed., Macmillan, 1991, pp. 194-202.

Postscript Version.
Dannenberg, ``Computerbegleitung und Musicverstehen,'' in Neue Musiktechnologie,
Bernd Enders, ed., Schott, 1993, Mainz, pp. 241-252.

Dannenberg, ``Recent Work in Music Understanding,'' in Proceedings of the 11th Annual


Symposium on Small Computers in the Arts, Philadelphia: SCAN, (November 1991), pp.
9-14.

Postscript Version.

Dannenberg, ``Music Understanding and the Future of Computer Music,'' Contemporary


Music Review, (to appear).

Dannenberg, ``Music Understanding by Computer,'' in IAKTA/LIST International


Workshop on Knowledge Technology in the Arts Proceedings, International Association
of Knowledge Technology in the Arts, Inc. in cooperation with Laboratories of Image
Information Science and Technology, Osaka Japan, pp. 41-56 (September 16, 1993).

ABSTRACT. Music Understanding refers to the recognition or identification of structure


and pattern in musical information. Music understanding projects initiated by the author
are discussed. In the first, Computer Accompaniment, the goal is to follow a performer in
a score. Knowledge of the position in the score as a function of time can be used to
synchronize an accompaniment to the live performer and automatically adjust to tempo
variations. In the second project, it is shown that statistical methods can be used to
recognize the location of an improviser in a cyclic chord progression such as the 12-bar
blues. The third project, Beat Tracking, attempts to identify musical beats using note-
onset times from a live performance. Parallel search techniques are used to consider
several hypotheses simultaneously, and both timing and higher-level musical knowledge
are integrated to evaluate the hypotheses. The fourth project, the Piano Tutor, identifies
student performance errors and offers advice. The fifth project studies human tempo
tracking with the goal of improving the naturalness of automated accompaniment
systems.

[Postscript Version.] [Adobe Acrobat (PDF) Versioon.]

Style Classification
Getting a computer music system to listen to a performance and determine aspects of
style, such as:

• Improvisational style: frantic, lyrical, syncopated, ...


• Instrumentalist: ala Miles Davis, Louis Armstrong, ...
• Composer: Mozartian, Bach-like, ...
• Texture: homophonic, polyphonic, ...
• Emotion,
• (This list could go on and on.)

Dannenberg, Thom, and Watson, ``A Machine Learning Approach to Musical Style
Recognition" in 1997 International Computer Music Conference, International Computer
Music Association (September 1997), pp. 344-347.

ABSTRACT: Much of the work on perception and understanding of music by computers


has focused on low-level perceptual features such as pitch and tempo. Our work
demonstrates that machine learning can be used to build effective style classifiers for
interactive performance systems. We also present an analysis explaining why these
techniques work so well when hand-coded approaches have consistently failed. We also
describe a reliable real-time performance style classifier.

[Postscript Version.] [Adobe Acrobat (PDF) Version.]

Han, Rho, Dannenberg, and Hwang, ``SMERS: Music Emotion Recognition Using
Support Vector Regression'' in Proceedings of the 10th International Conference on
Music Information Retrieval (ISMIR 2009), (October 2009), pp. 651-656.

ABSTRACT: Music emotion plays an important role in music retrieval, mood detection
and other music-related applications. Many issues for music emotion recognition have
been addressed by different disciplines such as physiology, psychology, cognitive science
and musicology. We present a support vector regression (SVR) based music emotion
recognition system. The recognition process consists of three steps: (i) seven distinct
features are extracted from music; (ii) those features are mapped into eleven emotion
categories on Thayer's two-dimensional emotion model; (iii) two regression functions are
trained using SVR and then arousal and valence values are predicted. We have tested our
SVR-based emotion classifier in both Cartesian and polar coordinate systems empirically.
The results indicate the SVR classifier in the polar representation produces satisfactory
results which reach 94.55% accuracy, superior to the SVR (in Cartesian) and other
machine learning classification algorithms such as SVM and GMM.

[Adobe Acrobat (PDF) Version.]

Music Information Retrieval


This work is mostly focussed on retrieval from melodic databases using a sung or
hummed query as the search key. This raises many issues relating to melodic similarity,
music representation, and pitch recognition. Many of the melodic similarity techniques
are related to earlier work in Computer Accompaniment.
Dannenberg, Foote, Tzanetakis, and Weare, ``Panel: New Directions in Music
Information Retrieval,'' in Proceedings of the 2001 International Computer Music
Conference, International Computer Music Association, (September 2001), pp. 52-59.
Mazzoni and Dannenberg, ``Melody Matching Directly from Audio,'' in ISMIR 2001 2nd
Annual International Symposium on Music Information Retrieval, Bloomington: Indiana
University, (2001), pp. 73-82.
ABSTRACT. In this paper we explore a technique for content-based music retrieval
using a continuous pitch contour derived from a recording of the audio query instead of a
quantization of the query into discrete notes. Our system determines the pitch for each
unit of time in the query and then uses a time-warping algorithm to match this string of
pitches against songs in a database of MIDI files. This technique, while much slower at
matching, is usually far more accurate than techniques based on discrete notes. It would
be an ideal technique to use to provide the final ranking of candidate results produced by
a faster but lest robust matching algorithm.

[Acrobat (PDF) Version]

Birmingham, Dannenberg, Wakefield, Bartsch, Bykowski, Mazzoni, Meek, Mellody, and


Rand, ``MUSART: Music Retrieval via Aural Queries,'' in ISMIR 2001 2nd Annual
International Symposium on Music Information Retrieval, Bloomington: Indiana
University, (2001), pp. 73-82.

Dannenberg, ``Music Information Retrieval as Music Understanding,'' in ISMIR 2001


2nd Annual International Symposium on Music Information Retrieval, Bloomington:
Indiana University, (2001), pp. 139-142.

Hu and Dannenberg, ``A Comparison of Melodic Database Retrieval Techniques Using


Sung Queries,'' in Joint Conference on Digital Libraries, New York: ACM Press, (2002),
pp. 301-307.
ABSTRACT. Query-by-humming systems search a database of music for good matches
to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch
and tempo can cause substantial mismatch between queries and targets. Thus, algorithms
for measuring melodic similarity in query-by-humming systems should be robust. We
compare several variations of search algorithms in an effort to improve search precision.
In particular, we describe a new frame-based algorithm that significantly outperforms
note-by-note algorithms in tests using sung queries and a database of MIDI-encoded
music.

[Acrobat (PDF) Version]

Hu, Dannenberg, and Lewis. ``A Probabilistic Model of Melodic Similarity,'' in


Proceedings of the 2002 International Computer Music Conference. San Francisco:
International Computer Music Association, (2002), pp. 509-15.
ABSTRACT. Melodic similarity is an important concept for music databases,
musicological studies, and interactive music systems. Dynamic programming is
commonly used to compare melodies, often with a distance function based on pitch
differences measured in semitones. This approach computes an "edit distance" as a
measure of melodic dissimilarity. The problem can also be viewed in probabilistic terms:
What is the probability that a melody is a "mutation" of another melody, given a table of
mutation probabilities? We explain this approach and demonstrate how it can be used to
search a database of melodies. Our experiments show that the probabilistic model
performs better than a typical "edit distance" comparison.

[Acrobat (PDF) Version]

Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo. ``The MUSART testbed for
query-by-humming evaluation,'' in ISMIR 2003: Proceedings of the Fourth International
Conference on Music Information Retrieval, Baltimore: Johns Hopkins Univeristy,
(2003), pp. 41-50.

An slightly expanded and revised version of this paper (not online) is published in
Computer Music Journal:

Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo, ``The MUSART Testbed
for Query-By-Humming Evaluation,'' Computer Music Journal, 28(2) (Summer 2004),
pp. 34-48.

ABSTRACT. Evaluating music information retrieval systems is acknowledged to be a


difficult problem. We have created a database and a software testbed for the systematic
evaluation of various query-by-humming (QBH) search systems. As might be expected,
different queries and different databases lead to wide variations in observed search
precision. "Natural" queries from two sources led to lower performance than that
typically reported in the QBH literature. These results point out the importance of careful
measurement and objective comparisons to study retrieval algorithms. This study
compares search algorithms based on note-interval matching with dynamic programming,
fixed-frame melodic contour matching with dynamic time warping, and a hidden Markov
model. An examination of scaling trends is encouraging: precision falls off very slowly as
the database size increases. This trend is simple to compute and could be useful to predict
performance on larger databases.

Birmingham, Dannenberg, and Pardo, ``Query by Humming With the VocalSearch


System,'' Communications of the ACM, 49(8) (August 2006), pp. 49-52.

Dannenberg, Birmingham, Pardo, Hu, Meek, Tzanetakis, ``A Comparative Evaluation of


Search Techniques for Query-by-Humming Using the MUSART Testbed,'' Journal of the
American Society for Information Science and Technology, 58(3) (February 2007), to
appear.
ABSTRACT. Query-by-Humming systems offer content-based searching for melodies
and require no special musical training or knowledge. Many such systems have been
built, but there has not been much useful evaluation and comparison in the literature due
to the lack of shared databases and queries. The MUSART project testbed allows various
search algorithms to be compared using a shared framework that automatically runs
experiments and summarizes results. Using this testbed, we compared algorithms based
on string alignment, melodic contour matching, a hidden Markov model, n-grams, and
CubyHum. Retrieval performance is very sensitive to distance functions and the
representation of pitch and rhythm, which raises questions about some previously
published conclusions. Some algorithms are particularly sensitive to the quality of
queries. Our queries, which are taken from human subjects in a fairly realistic setting, are
quite difficult, especially for n-gram models. Finally, simulations on query-byhumming
performance as a function of database size indicate that retrieval performance falls only
slowly as the database size increases.

[Acrobat (PDF) Version]

Structural Analysis
Using similarity and repetition to guide them, listeners can discover structure in music.
This research aims to build music listening models that, starting with audio such as CD
recordings, find patterns and generate explanations of the music. Explanations include
analyses of structure, e.g. and "AABA" form, as well as other relationships.

Dannenberg, ``Listening to `Naima': An Automated Structural Analysis of Music from


Recorded Audio,'' In Proceedings of the 2002 International Computer Music Conference.
San Francisco: International Computer Music Association., (2002), pp. 28-34.

ABSTRACT. A model of music listening has been automated. A program takes digital
audio as input, for example from a compact disc, and outputs an explanation of the music
in terms of repeated sections and the implied structure. For example, when the program
constructs an analysis of John Coltrane's "Naima," it generates a description that relates
to the AABA form and notices that the initial AA is omitted the second time. The
algorithms are presented and results with two other input songs are also described. This
work suggests that music listening is based on the detection of relationships and that
relatively simple analyses can successfully recover interesting musical structure.

Acrobat (PDF) Version

Dannenberg and Hu, ``Discovering Musical Structure in Audio Recordings,'' in Music


and Artificial Intelligence: Second International Conference, C. Anagnostopoulo, M.
Ferrand, A. Smail, eds., Lecture notes in computer science; Vol 2445: Lecture notes in
artificial intelligence, Berlin: Springer Verlag, (2002), pp. 43-57.
ABSTRACT. Music is often described in terms of the structure of repeated phrases. For
example, many songs have the form AABA, where each letter represents an instance of a
phrase. This research aims to construct descriptions or explanations of music in this form,
using only audio recordings as input. A system of programs is described that transcribes
the melody of a recording, identifies similar segments, clusters these segments to form
patterns, and then constructs an explanation of the music in terms of these patterns.
Additional work using spectral information rather than melodic transcription is also
described. Examples of successful machine “listening” and music analysis are presented.

[Acrobat (PDF) Version]

Dannenberg and Hu, ``Pattern Discovery Techniques for Music Audio,'' in ISMIR 2002
Conference Proceedings: Third International Conference on Music Information
Retrieval, M. Fingerhut, ed., Paris: IRCAM, (2002), pp. 63-70.

An slightly expanded and revised version of this paper (not online) is published in
JNMR:

Dannenberg and Hu, ``Pattern Discovery Techniques for Music Audio,'' Journal of New
Music Research, (June 2003), pp. 153-164.

ABSTRACT. Human listeners are able to recognize structure in music through the
perception of repetition and other relationships within a piece of music. This work aims
to automate the task of music analysis. Music is “explained” in terms of embedded
relationships, especially repetition of segments or phrases. The steps in this process are
the transcription of audio into a representation with a similarity or distance metric, the
search for similar segments, forming clusters of similar segments, and explaining music
in terms of these clusters. Several transcription methods are considered: monophonic
pitch estimation, chroma (spectral) representation, and polyphonic transcription followed
by harmonic analysis. Also, several algorithms that search for similar segments are
described. These techniques can be used to perform an analysis of musical structure, as
illustrated by examples.

For completeness, here's the abstract from the JNMR version:

ABSTRACT. Human listeners are able to recognize structure in music through the
perception of repetition and other relationships within a piece of music. This work aims
to automate the task of music analysis. Music is “explained” in terms of embedded
relationships, especially repetition of segments or phrases. The steps in this process are
the transcription of audio into a representation with a similarity or distance metric, the
search for similar segments, forming clusters of similar segments, and explaining music
in terms of these clusters. Several pre-existing signal analysis methods have been used:
monophonic pitch estimation, chroma (spectral) representation, and polyphonic
transcription followed by harmonic analysis. Also, several algorithms that search for
similar segments are described. Experience with these various approaches suggests that
there are many ways to recover structure from music audio. Examples are offered using
classical, jazz, and rock music.
[Acrobat (PDF) Version]

Dannenberg and Goto, ``Music Structure Analysis from Acoustic Signals,'' in Handbook
of Signal Processing in Acoustics, Vol 1, Springer Verlag. 2009, pp. 305-331.

This book chapter attempts to summarize various techniques and approaches.

ABSTRACT.Music is full of structure, including sections, sequences of distinct musical


textures, and the repetition of phrases or entire sections. The analysis of music audio
relies upon feature vectors that convey information about music texture or pitch content.
Texture generally refers to the average spectral shape and statistical fluctuation, often
reflecting the set of sounding instruments, e.g. strings, vocal, or drums. Pitch content
reflects melody and harmony, which is often independent of texture. Structure is found in
several ways. Segment boundaries can be detected by observing marked changes in
locally averaged texture. Similar sections of music can be detected by clustering
segments with similar average textures. The repetition of a sequence of music often
marks a logical segment. Repeated phrases and hierarchical structures can be discovered
by finding similar sequences of feature vectors within a piece of music. Structure analysis
can be used to construct music summaries and to assist music browsing.

[Acrobat (PDF) Version]

Music Alignment
Music alignment is a capability that forms a bridge between signals and symbols. For
example, by aligning an audio recording with a MIDI file, you obtain a transcription of
the audio. By aligning two audio recordings, you can detect differences in tempo and
interpretation. Computer accompaniment also relies on alignment. The papers listed here
exploit some of the techniques introduced for Computer Accompaniment, but explore
other applications and the possibility of working with polyphonic music.

Hu, Dannenberg, and Tzanetakis. ``Polyphonic Audio Matching and Alignment for
Music Retrieval,'' in 2003 IEEE Workshop on Applications of Signal Processing to Audio
and Acoustics, New York: IEEE (2003), pp. 185-188.

ABSTRACT.We describe a method that aligns polyphonic audio recordings of music to


symbolic score information in standard MIDI files without the difficult process of
polyphonic transcription. By using this method, we can search through a MIDI database
to find the MIDI file corresponding to a polyphonic audio recording.

We must have run out of space for a longer abstract. This paper covers two interesting
experiments. One compares different features for alignment and concludes that the
chromagram is better than multiple pitch estimation, spectra, and mel cepstra. The paper
also includes an experiment where the quality of match is used to search for midi files
that match audio. It works, but not very reliably.

[Acrobat (PDF) Version]

Dannenberg and Hu. ``Polyphonic Audio Matching for Score Following and Intelligent
Audio Editors,'' in Proceedings of the 2003 International Computer Music Conference,
San Francisco: International Computer Music Association, (2003), pp. 27-34.

This paper was actually submitted before the WASPAA paper, so it does not have some
results on comparing different distance metrics. Instead, this paper stresses some different
applications, one being the possibility of intelligent audio editors that align audio to
symbolic notation or midi files to help with search, indexing, aligning multiple takes of
live recordings, etc.

ABSTRACT.Getting computers to understand and process audio recordings in terms of


their musical content is a difficult challenge. We describe a method in which general,
polyphonic audio recordings of music can be aligned to symbolic score information in
standard MIDI files. Because of the difficulties of polyphonic transcription, we convert
MIDI to audio and perform matching directly on acoustic features. Polyphonic audio
matching can be used for polyphonic score following, building intelligent editors that
understand the content of recorded audio, and the analysis of expressive performance.

[Acrobat (PDF) Version]

Dannenberg and Raphael, ``Music Score Alignment and Computer Accompaniment,''


Communications of the ACM, 49(8) (August 2006), pp. 38-43.

Dannenberg and Hu, ``Bootstrap Learning for Accurate Onset Detection," Machine
Learning 65(2-3) (December 2006), pp. 457-471.

See also:
Concatenative Synthesis Using Score-Aligned Transcriptions, a synthesis technique
where alignment is used to build a dictionary mapping time slices of MIDI files to units
of audio, which are selected and concatenated to "resynthesize" other MIDI files.

Remixing Stereo Music with Score-Informed Source Separation, where alignment is used
to help with source separation, with the goal of editing individual instruments within a
stereo audio mix.

Bootstrap Learning for Accurate Onset Detection, which uses alignment to find note
onsets, which are then used as training data for automatic onset detection.

You might also like