Professional Documents
Culture Documents
General Overviews
These are all overviews of my work in this area. The IAKTA/LIST paper is most up-to-
date, although it is actually quite dated. Since then, I've worked on Query-by-Humming,
polyphonic score alignment, music search by polyphonic alignment, music structure
analysis, and beat tracking informed by music structure..
Postscript Version.
Dannenberg, ``Computerbegleitung und Musicverstehen,'' in Neue Musiktechnologie,
Bernd Enders, ed., Schott, 1993, Mainz, pp. 241-252.
Postscript Version.
Style Classification
Getting a computer music system to listen to a performance and determine aspects of
style, such as:
Dannenberg, Thom, and Watson, ``A Machine Learning Approach to Musical Style
Recognition" in 1997 International Computer Music Conference, International Computer
Music Association (September 1997), pp. 344-347.
Han, Rho, Dannenberg, and Hwang, ``SMERS: Music Emotion Recognition Using
Support Vector Regression'' in Proceedings of the 10th International Conference on
Music Information Retrieval (ISMIR 2009), (October 2009), pp. 651-656.
ABSTRACT: Music emotion plays an important role in music retrieval, mood detection
and other music-related applications. Many issues for music emotion recognition have
been addressed by different disciplines such as physiology, psychology, cognitive science
and musicology. We present a support vector regression (SVR) based music emotion
recognition system. The recognition process consists of three steps: (i) seven distinct
features are extracted from music; (ii) those features are mapped into eleven emotion
categories on Thayer's two-dimensional emotion model; (iii) two regression functions are
trained using SVR and then arousal and valence values are predicted. We have tested our
SVR-based emotion classifier in both Cartesian and polar coordinate systems empirically.
The results indicate the SVR classifier in the polar representation produces satisfactory
results which reach 94.55% accuracy, superior to the SVR (in Cartesian) and other
machine learning classification algorithms such as SVM and GMM.
Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo. ``The MUSART testbed for
query-by-humming evaluation,'' in ISMIR 2003: Proceedings of the Fourth International
Conference on Music Information Retrieval, Baltimore: Johns Hopkins Univeristy,
(2003), pp. 41-50.
An slightly expanded and revised version of this paper (not online) is published in
Computer Music Journal:
Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo, ``The MUSART Testbed
for Query-By-Humming Evaluation,'' Computer Music Journal, 28(2) (Summer 2004),
pp. 34-48.
Structural Analysis
Using similarity and repetition to guide them, listeners can discover structure in music.
This research aims to build music listening models that, starting with audio such as CD
recordings, find patterns and generate explanations of the music. Explanations include
analyses of structure, e.g. and "AABA" form, as well as other relationships.
ABSTRACT. A model of music listening has been automated. A program takes digital
audio as input, for example from a compact disc, and outputs an explanation of the music
in terms of repeated sections and the implied structure. For example, when the program
constructs an analysis of John Coltrane's "Naima," it generates a description that relates
to the AABA form and notices that the initial AA is omitted the second time. The
algorithms are presented and results with two other input songs are also described. This
work suggests that music listening is based on the detection of relationships and that
relatively simple analyses can successfully recover interesting musical structure.
Dannenberg and Hu, ``Pattern Discovery Techniques for Music Audio,'' in ISMIR 2002
Conference Proceedings: Third International Conference on Music Information
Retrieval, M. Fingerhut, ed., Paris: IRCAM, (2002), pp. 63-70.
An slightly expanded and revised version of this paper (not online) is published in
JNMR:
Dannenberg and Hu, ``Pattern Discovery Techniques for Music Audio,'' Journal of New
Music Research, (June 2003), pp. 153-164.
ABSTRACT. Human listeners are able to recognize structure in music through the
perception of repetition and other relationships within a piece of music. This work aims
to automate the task of music analysis. Music is “explained” in terms of embedded
relationships, especially repetition of segments or phrases. The steps in this process are
the transcription of audio into a representation with a similarity or distance metric, the
search for similar segments, forming clusters of similar segments, and explaining music
in terms of these clusters. Several transcription methods are considered: monophonic
pitch estimation, chroma (spectral) representation, and polyphonic transcription followed
by harmonic analysis. Also, several algorithms that search for similar segments are
described. These techniques can be used to perform an analysis of musical structure, as
illustrated by examples.
ABSTRACT. Human listeners are able to recognize structure in music through the
perception of repetition and other relationships within a piece of music. This work aims
to automate the task of music analysis. Music is “explained” in terms of embedded
relationships, especially repetition of segments or phrases. The steps in this process are
the transcription of audio into a representation with a similarity or distance metric, the
search for similar segments, forming clusters of similar segments, and explaining music
in terms of these clusters. Several pre-existing signal analysis methods have been used:
monophonic pitch estimation, chroma (spectral) representation, and polyphonic
transcription followed by harmonic analysis. Also, several algorithms that search for
similar segments are described. Experience with these various approaches suggests that
there are many ways to recover structure from music audio. Examples are offered using
classical, jazz, and rock music.
[Acrobat (PDF) Version]
Dannenberg and Goto, ``Music Structure Analysis from Acoustic Signals,'' in Handbook
of Signal Processing in Acoustics, Vol 1, Springer Verlag. 2009, pp. 305-331.
Music Alignment
Music alignment is a capability that forms a bridge between signals and symbols. For
example, by aligning an audio recording with a MIDI file, you obtain a transcription of
the audio. By aligning two audio recordings, you can detect differences in tempo and
interpretation. Computer accompaniment also relies on alignment. The papers listed here
exploit some of the techniques introduced for Computer Accompaniment, but explore
other applications and the possibility of working with polyphonic music.
Hu, Dannenberg, and Tzanetakis. ``Polyphonic Audio Matching and Alignment for
Music Retrieval,'' in 2003 IEEE Workshop on Applications of Signal Processing to Audio
and Acoustics, New York: IEEE (2003), pp. 185-188.
We must have run out of space for a longer abstract. This paper covers two interesting
experiments. One compares different features for alignment and concludes that the
chromagram is better than multiple pitch estimation, spectra, and mel cepstra. The paper
also includes an experiment where the quality of match is used to search for midi files
that match audio. It works, but not very reliably.
Dannenberg and Hu. ``Polyphonic Audio Matching for Score Following and Intelligent
Audio Editors,'' in Proceedings of the 2003 International Computer Music Conference,
San Francisco: International Computer Music Association, (2003), pp. 27-34.
This paper was actually submitted before the WASPAA paper, so it does not have some
results on comparing different distance metrics. Instead, this paper stresses some different
applications, one being the possibility of intelligent audio editors that align audio to
symbolic notation or midi files to help with search, indexing, aligning multiple takes of
live recordings, etc.
Dannenberg and Hu, ``Bootstrap Learning for Accurate Onset Detection," Machine
Learning 65(2-3) (December 2006), pp. 457-471.
See also:
Concatenative Synthesis Using Score-Aligned Transcriptions, a synthesis technique
where alignment is used to build a dictionary mapping time slices of MIDI files to units
of audio, which are selected and concatenated to "resynthesize" other MIDI files.
Remixing Stereo Music with Score-Informed Source Separation, where alignment is used
to help with source separation, with the goal of editing individual instruments within a
stereo audio mix.
Bootstrap Learning for Accurate Onset Detection, which uses alignment to find note
onsets, which are then used as training data for automatic onset detection.