You are on page 1of 2

COSIN: Content-Based Retrieval System for Cover Songs ∗

Yi Yu J. Stephen Downie Fabian Moerchen


Department of Information and Graduate School of Library Siemens Corporate Research,
Computer Sciences, Nara and Information Science 755 College Road East
Women’s University, Japan UIUC, USA Princeton, USA
yuyi@ics.nara-wu.ac.jp jdownie@uiuc.edu fabian.moerchen@siemens.com
Lei Chen Kazuki Joe Vincent Oria
Department of Computer Department of Information and Department of Computer
Science,Hong Kong University Computer Sciences, Nara Science,New Jersey Institute
of Science and Technology Women’s University, Japan of Technology, USA
leichen@cs.ust.hk joe@ics.nara-wu.ac.jp oria@njit.edu

ABSTRACT 1. INTRODUCTION
We develop a content-based audio COver Song IdeNtifi- Open Internet provides a powerful platform for on-line
cation (COSIN) system to detect/group cover songs.The amusement, where users upload their own music audio or live
COSIN takes music audio content as input and performs performance and share them with the world. www.yyfc.com
similarity searching to locate variants of the input (i.e., cover and www.fancovers.com are such web sites that are very
versions). Identified cover songs are returned in the rank or- attractive among youngsters.With large numbers of music
der according to their similarity to the input.The COSIN audio tracks emerging on such music social communities,
also incorporates a set of tools to evaluate retrieval perfor- content-based cover song (http://en.wikipedia.org/wiki/Co
mance so researchers can explore different retrieval schemes ver version) retrieval has become even more important. Sim-
and parameters (e.g. recall, precision).The COSIN utilizes a ilar music audio tracks can be detected or grouped to im-
suite of techniques to detect cover songs including: Pitch + prove the searching experience.
Dynamic Programming (DP), Chroma + DP, and Seman- Audio cover song detection as a new Music Information
tic Feature Summarization (SFS) + Hash-Based Approxi- Retrieval (MIR) task was first introduced and served as a
mate Matching (HBAM). Demonstration system shows that contest section in Music Information Retrieval Evaluation
COSIN is a very potential music content retrieval tool. Run- eXchange (MIREX) 2006.We are in the process of designing
ning some music retrieval schemes on COSIN platform, re- a content-based COver Song IdeNtification (COSIN) sys-
cent experiments with SFS + LSH Variants demonstrate tem for visualizing and comparing performances of retrieval
a nicely balanced efficiency (search speed) v. performance schemes and evaluating the corresponding system parame-
(search accuracy) tradeoff. ters.Currently some techniques related to cover songs de-
tection and retrieval have been implemented in our COSIN
Categories and Subject Descriptors system. These techniques include audio feature extraction
(Chroma[1], Pitch[2], Semantic Feature Summarization (SFS),
H.3.3 [Information Systems]: Information Search and Re- similarity searching methods (Dynamic Programming (DP)[1],
trieval; H.5.5 [Information Systems]: Sound and Music K-Nearest Neighbor (KNN), Hash-Based Approximate Match-
Computing ing (HBAM, consisting of LSH[4] and LSH variants–SoftLSH
and Exact Locality Sensitive Mapping(ELSM)[6]) and sys-
General Terms tem measure metrics.Evaluation over a collection of real
cover songs indicates that the COSIN system is a suitable
Algorithms, Performance, Experimentation
tool for evaluating audio-content-based retrieval schemes.
COSIN demonstration video and some interesting exam-
Keywords ples of cover songs can be found at http://www.e.ics.nara-
Content-based audio retrieval, cover songs, musical audio wu.ac.jp/∼yuyi/.
sequences summarization, hash-based indexing
2. RELATED TECHNIQUES
Now we review the typical techniques for content-based
cover songs detection implemented in COSIN system.The
∗This is a postgraduate project in NWU, second author was task is to identify/group the cover version songs according
supported by the Andrew W. Mellon Foundation, and sixth to their relevance. To guarantee the detection accuracy, the
author was partially supported by a grant from DoD-ARL popular detection methods use the DP [1][2][3] or its vari-
through the KIMCOE center of Excellence. ants [5] to perform audio sequence comparison.However, it is
very time consuming since the feature (Pitch [2][3], Chroma
[1], MFCC [3]) sequence of musical audio is very high di-
Copyright is held by the author/owner(s).
MM’08, October 26–31, 2008, Vancouver, British Columbia, Canada. mensional. To speed up the audio feature sequence compar-
ACM 978-1-60558-303-7/08/10. ison, SFS is extracted from the audio sequences as a concise
Similarity Searching
representation similar to [6]. With SFS as the fixed-length
summarization of audio documents, DP is no longer neces-
K-Nearest
sary. In addition, index-based approximate techniques (such Exhaustive
Semantic Neighbor
Matching
as [4]) can be applied to avoid exhaustive search. The main Feature (KNN)
Summarization Ranked list
idea of index-based approach is to extract some audio rep- Locality Sensitive Hash-Based
(SFS) Hash (LSH) or Approximate
resentation from the acoustic data, which can be indexed LSH variants
Audio Query Matching Results of
and retrieved efficiently. Experimental evaluation demon- System
strates that the approach of SFS + LSH variants, which Chroma Performance
Dynamic Audio
Pitch Sequence Measures
needs not perform the traditional exhaustive sequence com- Programming
Audio Feature (DP) Comparisons
parisons, exhibits a better tradeoff between efficiency and Extractor MFCC
effectiveness. Mel-Magnitude

3. COSIN SYSTEM
Adjustable
The objective of our COSIN system is to design a simi- Collection
larity retrieval mechanism: taking a musical audio as input
and performing similarity searching and finally returning au-
dio tracks relevant to this audio query in a ranked list.The Figure 1: COSIN system architecture.
performance of each cover song detection method can be
evaluated and the corresponding results can be listed based
on different evaluation metrics. COSIN consists of four main ferent retrieval schemes of the cover songs detection. 1)
components: audio feature extraction, similarity searching, Offer a GUI for content-based audio retrieval. 2) Support
adjustable collection, and performance measurement. Fig- flexible management of cover song collection. 3) Implement
ure 1. shows a flow diagram of audio cover songs detection both common schemes and some new methods. 4) Present
with COSIN. the retrieved results ranked by relevance. 5) Evaluate the
Audio feature extraction. Each track is a 30-second- performances of retrieval schemes. 6) Display the retrieval
long clip in single channel 16-bit wave format, 16-bit per results of different detection methods.
sample and re-sampled to the rate of 22.05kHz. The audio
data is normalized and then divided into overlapped frames. 4. FUTURE WORK
Each frame contains 1024 samples (46.4ms) and the adjacent We are working for extending COSIN functionality to
frames have 50% overlap (the step length is 23.2ms). Each broader applications. For example, many more potentially
frame is weighed by a hamming window and appended with useful audio features can be added as candidates for serving
1024 zeros to fit the length of 2048 point FFT and then used the similarity searching to improve the retrieval effective-
to calculate STFT. Chroma is calculated from the instanta- ness. We can also add some machine learning methods to
neous frequencies of STFT. From the amplitude spectrum train the semantic audio features. Moreover, index-based
Pitch, MFCC and Mel-magnitude are also computed. The approximate techniques can be added to further compare
details about the feature extraction methods can be found in with hash-based retrieval. We try to exploit the compatibil-
[1][2][3]. Based on feature sets of Chroma, Pitch, MFCC and ity with other audio-based content retrieval system, such as
Mel-magnitude, a combined SFS feature can be extracted. query-by-humming/singing (www.midomi.com) and query-
Similarity searching. To facilitate the comparison among by-mood and so on.
the detection methods, three independent matching/searching
schemes are included in this part. DP is a typical method
to calculate the pair-wise sequence distance [2][3], which
5. REFERENCES
[1] D. Ellis and G. Poliner. Identifying cover songs with
can offer a high accuracy while taking long time for match-
chroma features and dynamic programming beat
ing. KNN is a non-sequence-comparison, exhaustive search-
tracking. IEEE ICASSP’07, Vol.4, pp.1429–1432, 2007.
ing method. Hash-based approximate techniques (LSH and
LSH variants)[4][6] maps audio features to integer values by [2] W. H. Tsai, H. M. Yu, and H. M. Wang, A
heuristics and reduces the exhaustive search by hashing. Query-by-Example Technique for Retrieving Cover
Adjustable collection. Our cover song collection con- versions of Popular Songs with Similar Melodies,
sists of two non-overlapping datasets with 5275 music au- ISMIR’05, pp.183–190, 2005.
dio tracks in total. Query collection (Covers79) includes 79 [3] Y.Yu, J. S. Downie, and K. Joe, An Evaluation of
songs and in a total 1072 audio tracks (on average 13.5 cov- Feature Extraction for Query-by-Content Audio
ers per song). The database collection is composed of 4203 Information Retrieval, IEEE ISMW’07, pp.297–302,
single-cover songs in addition to the 1072 queries. We may 2007.
adjust the collection size to test the cover songs detection. It [4] P. Indyk and R. Motwani. Approximate Nearest
is also very easy to add/remove music audio tracks to/from Neighbor: Towards Removing the Curse of
the collections. Dimensionality, ACM STOC’98, pp.604–613, 1998.
Performance measurement. To evaluate and compare [5] Y. Yu, K. Joe, J. S. Downie, Efficient Query-by-Content
detection methods under the fair environment, we use some Audio Retrieval by Locality Sensitive Hashing and
well-known evaluation metrics (such as recall and precision) Partial Sequence Comparison, IEICE Trans. Info. and
in this demonstration. It is convenient to judge the relevance Sys., Vol.E91-D, No.6, pp1730–1739, 2008.
of music audio tracks in the ranked list. [6] Y. Yu, Scalable Content-Based Music Retrieval on
System function summary. COSIN system presents Acoustic Datasets via Hashing, PhD Forum, GHC’08,
the following functions to assist researchers in assessing dif- 2008.

You might also like