You are on page 1of 27

This article was downloaded by: [41.235.86.

174]
On: 10 July 2011, At: 12:24
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Journal of New Music Research


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/nnmr20

Perceptual Evaluation of Inter-song Similarity in


Western Popular Music
a b c a b
Alberto Novello , Martin M.F. McKinney & Armin Kohlrausch
a
Technische Universiteit Eindhoven, The Netherlands
b
Philips Research Laboratories, The Netherlands
c
Starkey Laboratories, USA

Available online: 31 Mar 2011

To cite this article: Alberto Novello, Martin M.F. McKinney & Armin Kohlrausch (2011): Perceptual Evaluation of Inter-song
Similarity in Western Popular Music, Journal of New Music Research, 40:1, 1-26

To link to this article: http://dx.doi.org/10.1080/09298215.2010.523470

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching and private study purposes. Any substantial or systematic
reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to
anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should
be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,
proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in
connection with or arising out of the use of this material.
Journal of New Music Research
2011, Vol. 40, No. 1, pp. 1–26

Perceptual Evaluation of Inter-song Similarity in Western Popular


Music

Alberto Novello1,2, Martin M.F. McKinney3, and Armin Kohlrausch1,2


1
Technische Universiteit Eindhoven, The Netherlands; 2Philips Research Laboratories, The Netherlands; 3Starkey
Laboratories, USA

Abstract
Downloaded by [41.235.86.174] at 12:24 10 July 2011

1. Introduction
We describe and test the methodological set up for a web-
based listening experiment that assesses the perception of It is a common phenomenon for music listeners to detect
inter-song similarity optimizing the trade-off between similarity between and within pieces of music. Within a
stimulus coverage and experimental time. The experiment piece of music, listeners spontaneously identify musical
used a relatively large set of stimuli of Western popular segments with similar functions (e.g. choruses, verses,
music: 78 song excerpts selected from 13 genres, involving bridges), deriving a structure for the summarization or
78 participants. The experiment used triadic comparisons description of the song. Similarity between pieces is used
of song excerpts to present the participants with a low- by listeners for comparison of one piece to another, for
complexity task, and a partially balanced incomplete block categorization into styles and genres, and organization of
design (PBIBD) to reduce the number of stimulus songs into collections and play-lists.
comparisons with the consequent possibility of extending Music similarity is an ill-defined concept in the
the stimulus set. The three control variables used in the cognitive and perceptual domains because it is context-
excerpt selection, genre, tempo and timbre, showed dependent (Cambouropoulos, 2009), there is no defini-
statistically significant saliency and a hierarchical degree tion of which musical dimensions influence listeners’
of impact on participants’ pair rankings (genre 4 tempo 4 perception and how music proximity can be objectively
timbre). We investigated the participants’ perceptual space measured (Orpen & Huron, 1992). Nevertheless, in
using a combination of numerical and analytical methods perceptual experiments, participants can easily decide
that help to reduce and represent the dimensionality of the without a formal definition how similar two music pieces
data. We used a combination of scaling and discriminant are (Chupchik, Rickert, & Mendelson, 1982; McAdams,
functions to gain insight into the important factors Vieillard, Houix, & Reynolds, 2004), and they can do it
underlying the organization of the participants’ perceptual consistently (Logan & Salomon, 2001; Pampalk, 2006).
space. In the perceptual space calculated through multi- This fact suggests that although listeners’ perception of
dimensional scaling, we used quadratic discriminant music similarity depends on various complex phenom-
analysis to search for axes that maximized the separation ena, such as timbre, rhythm, culture, social context, and
of the excerpt classes. We identified three axes that were a personal history, listeners can, and do, intuitively
posteriori labelled as ‘slow–fast’, ‘vocal–non-vocal’, and interpret the meaning of similarity consistently. When
‘synthetic–acoustic’. We found a high correlation between asked to describe the motivation for their perceived
the excerpt tempo in beats per minute and the excerpt music similarity, listeners often refer to surface features,
projections on the slow–fast axis. A final analysis showed e.g. prominent music elements of a piece of music such as
that the relevance of the factors responsible for the dynamics, texture, loudness, tempo, and timbre (Lamont
grouping of excerpt subsets is context dependent. & Dibben, 2001; McAdams et al., 2004).

Correspondence: Alberto Novello, Philips Research, DSP Group, High Tech Campus 36, Eindhoven, 5656 AE Netherlands.
E-mail: jestern@libero.it

DOI: 10.1080/09298215.2010.523470 Ó 2011 Taylor & Francis


2 Alberto Novello et al.

These findings support the theoretical model proposed and web-texts, or relying on the delicate assumption that
by Deliège (2001) to explain the perception of music two songs are similar if they belong to the same artist,
similarity across song excerpts: during the listening album, or play-list. The use of different test data makes it
process, music listeners extract musical cues between difficult to compare performance between algorithms.
music-segment boundaries, unconsciously building a Moreover, because all previous sources were not
mental description of each song segment; the musical collected by explicitly asking listeners to rate acoustical
surface features are utilized to compare different similarity in a controlled experiment, they might not
segments of the song, and, based on cue proximity, reliably represent the actual perceived music similarity.
music similarity is evaluated. It is possible to extend the Several studies have run listening experiments to evaluate
validity of Deliège’s conclusions to the case of across- algorithms for music similarity comparing participant
piece similarity, comparing the surface features of two results and computer predictions (Aucoutourier &
different pieces of music. Deliège’s model can then be Pachet, 2002; Herre, Allamanche, & Ertel, 2003; Mirex,
applied to predict within- and across-piece similarity. 2006). The reported listening experiments are rather
Two studies support the extension of Deliège’s time-consuming, and, using only few participants, have
hypothesis on the influence of musical surface features limited validity as perceptual data. Overall, there is
to the case of across-piece similarity (Chupchik et al., relatively little attention paid in the computational
1982; Eerola, Jarvinen, Louhivuori, & Toiviainen, 2001). domain to the accurate modelling of perceptual music
Eerola et al. (2001) found rhythm and pitch to be both similarity. In a recent article, Pampalk, Flexer, and
Downloaded by [41.235.86.174] at 12:24 10 July 2011

important factors on perceived similarity between folk Widmer (2005) suggest the possibility that embedding a
melodies synthesized from MIDI representation. Chup- human perceptual and cognitive model into the algo-
chik et al. (1982) performed two experiments to rithms could help overcome the performance ceiling
investigate across-song music similarity between audio observed recently for the pure feature-based algorithms
excerpts and found that tempo, dominant instrument, (Berenzweig et al., 2003; Logan et al., 2003; Aucoutour-
and articulation were the main musical features used by ier & Pachet, 2004; Pampalk, 2004). Perceptual data on
participants for their ratings of similarity among jazz music similarity collected for a large set of Western
improvisations. In comparisons between Classical, Jazz, popular music would be beneficial for the verification of
and Pop-Rock excerpts, the most relevant dimension theoretical models, as reference for perceptual experi-
found was ‘Classical’ versus ‘Contemporary’. ments, and as training/testing material for algorithmic
Despite the consensus of several perceptual experi- applications. Collecting such a large data-base can be a
ments on the influence of music surface features on time-consuming and fatiguing operation for the partici-
perception of music similarity as hypothesized by pants because of the large number of stimulus compar-
Deliège, there is no agreement across studies on which isons required.
music features are relevant in the case of music similarity To collect such perceptual music-similarity data, we
(Chupchik et al., 1982; Eerola et al., 2001; Lamont & developed an experimental method optimizing stimulus
Dibben, 2001; McAdams et al., 2004). From the coverage, experimental time, and simplicity of the task for
comparison of the experimental results of a few studies the participant. In this article, we present our methodol-
(Chupchik et al., 1982; Lamont & Dibben, 2001; ogy and the results of a large-scale perceptual experiment
McAdams et al., 2004) we hypothesize that this lack of using 78 song excerpts selected from 13 genres of Western
agreement can be due to the context of stimuli used, i.e. popular music. A major problem in collecting such data is
each individual stimulus subset could be perceptually related to the trade-off between the number of stimuli and
organized by a specific set of control variables. Because experimental time: even with a small set of stimuli, the
of the limited number of genres or songs used in the number of necessary comparisons can require long
perceptual experiments, the experimental results can be experiments. In the literature three methods have been
effectively used to verify the impact of individual musical used in perceptual experiments to assess similarity among
dimensions on a very particular music context, but have auditory objects: pair-rating (Lamont & Dibben, 2001),
limited representation of the perceptual space of the pair-ranking (Levelt, van de Geer, & Plomp, 1966;
listener in the context of other genres. This fact is MacRae, Howgate, & Geelhoed, 1990), and object-
essential for the development of most algorithmic grouping (McAdams et al., 2004). In pair-scaling, the
applications based on music similarity. participant chooses a value of similarity for a pair of song
Because of the lack of a commonly agreed database of excerpts on a numerical rating scale. Pair-ranking is an
music similarity for a standard evaluation of the ordinal procedure that asks participants to rank pairs of
algorithm performance (Logan, Ellis, & Berenzweig, objects depending on similarity. In an object-grouping
2003), several authors based the training and testing of task, the participant is presented with a number of stimuli
their applications on different sources (Aucoutourier & and has to group them depending on similarity. Two
Pachet, 2002; Berenzweig, Logan, Ellis, & Whitman, studies have shown the difficulty for the participants
2003; Logan et al., 2003) such as metadata annotations and the possible data bias related to the widely used
Perceptual evaluation of inter-song similarity in Western popular music 3

pair-rating paradigm and proved the easiness and with no direct interpretation. The difficulty of the
robustness of an ordinal task such as pair-ranking analytical task resides in several factors:
(Burton & Nerlove, 1976; MacRae et al., 1990). Although
simple and solid in its conception, the grouping paradigm . the complexity of representation: data are visualized
could be applied only when the number of stimuli is small using multidimensional plots commonly interpreted
due to the memory demands for the participant. as the participants’ perceptual space, whose dimen-
An advantage of both pair-ranking and pair-rating is sions are often not easily defined,
that under particular conditions (i.e. assuming the . the complex interaction of the relevant music dimen-
symmetrical and transitive properties of stimulus simi- sions: the relevant structural factors controlling the
larity) some similarity measures can be inferred from multidimensional data organization have simulta-
previous comparisons. This fact allows the experimenter neous global and contextual influence,
to deduce the organization of the perceptual space from a . the definition and measurement of similarity: several
subset of the whole possible set of comparisons (Levelt approaches can be followed, such as the Euclidean
et al., 1966). As often noted in psychology and cognition distance between two items in the multidimensional
(Tversky, 1977; Tversky & Gati, 1982), similarity does not perceptual space, or a binary value representing the
in general follow a symmetric property: e.g. R.E.M. can membership of an excerpt to a specific spatial region
be similar to the Beatles more than the Beatles are similar (e.g. a cluster). In this latter case, however, the
to the R.E.M, nor the transitive property, e.g. a pair of additional problem arises how to define the cluster-
Downloaded by [41.235.86.174] at 12:24 10 July 2011

scissors can be similar to a knife, and a knife similar to a discriminant function.


fork, but the perceptual similarity distance between a fork
and a pair of scissors is likely greater than the sum of these Several methods have been described in the literature
two similarity distances. Thus, a thorough examination of to visualize and analyse the data collected in a similarity
similarity would include pairwise comparisons for all experiment. The most common is multidimensional
pairs of stimuli in an experimental setup. However, due to scaling (MDS), which operates on similarity ratings of
the practical limitations of collecting data for all possible song pairs (Chupchik et al., 1982; Eerola et al., 2001;
pairs for a large set of stimuli, it can still be useful to relax Lamont & Dibben, 2001; Bella & Peretz, 2005). MDS
the assumptions of non-symmetry and non-transitivity helps to reduce the dimensionality of the original data,
and work with partial but balanced block designs in converts non-metrical (ordinal) measures into metrical
studying music similarity (Levelt et al., 1966; Grey, 1977; distances, and calculates the relevant dimensions of the
MacRae et al., 1990). participants’ perceptual space (Nosofsky, 1992). When
Because no previous theoretical model advanced forcing high-dimensional data to be fitted onto a lower
hypotheses on listener concordance in judging music space, the MDS calculation introduces a distortion, and
similarity and previous experiments have investigated the distances between items are altered. The distortion is
only across-participant concordance with user-tests represented by the RSQ value, which is the squared
involving a small number of participants and stimuli correlation between the original distances and the
(Logan & Salomon, 2001; Pampalk, 2006), we conceived induced alteration. Furthermore, when a solution is
the experimental method to measure the within- and found, the interpretation of the axes is not always clear as
across-participant concordance with an extended experi- the axes rotation is automatically chosen to maximize
ment. The within-participant measure of concordance data variance: it is up to the experimenter, by observa-
evaluates the stability of the perception of music tion of the data distribution, to determine the final
similarity of each participant; the across-participant rotation of the axes and their contextual interpretation
measure of concordance evaluates the common percep- (Kruskall & Wish, 1978; Young & Lewyckyj, 1996).
tion of similarity of a group of listeners. A high degree of Another method to observe stimulus grouping in the
within-participant concordance is a necessary property to perceptual space is hierarchical clustering (McAdams
reveal whether the perception of music similarity is a et al., 2004): from the relative distance of each item,
stable phenomenon. A high degree of across-participant groupings are calculated hierarchically, creating a tree of
concordance would support the possibility for the possible clusters. This method has the advantage of
development of a global perceptual model of music displaying, in one plot, the whole complexity of the
similarity. multidimensional space without distortion, but leaves
Our goal is to identify the most relevant factors to the experimenter the choice of the threshold that
underlying the organization of the complex experimental determines the number of clusters and finalizes the
data. Such an organized structure can be searched if there grouping.
is evidence of its existence from the analysis of participant In the experiment presented in this paper, we
concordance. The choice of using a large stimulus-set measured similarity in a constrained environment: using
implies having an experiment with numerous inter-song triadic comparisons of short music excerpts with limited
comparisons and results in a relatively large set of data vocals in order to narrow the focus and the dimensions
4 Alberto Novello et al.

along which similarity can be judged. In this way we total number of stimuli, the total number of trials, b, in a
attempt to limit the asymmetry and non-transitivity of BIBD is:
pairwise similarity rankings in our database. In the
experimental design we use a partially balanced incom- lnðn  1Þ
b¼ : ð2Þ
plete block design to reduce the number of required kðk  1Þ
comparisons over a relatively large stimulus set. In the
analysis we use the MDS method, which has been used For k ¼ 3 (only) the BIBD reduces the number of
successfully in the past for other perceptual similarity comparisons of the complete design by a factor (n72)/
tasks, such as timbre (Grey, 1977), music intervals l. The BIBD method and the choice of the appropriate
(Levelt et al., 1966), and scent (MacRae et al., 1990), in reduction factor l has been tested in previous perceptual
an attempt to reveal the number of dimensions in the experiments. Levelt et al. (1966) used triadic comparisons
perceptual space. In the data analysis presented here, we and a BIBD for the comparison of musical intervals and
use a combination of scaling and discriminant functions found reliable results with l ¼ 2.
to gain more insight into the relevant dimensions Previous papers (Burton & Nerlove, 1976; MacRae
underlying participants’ rankings of music similarity. In et al., 1990) tested the reliability of the BIBD data in
this way, we obtain a quantitative measure of the comparison to the complete design case for different l
relevance of control variables in similarity rankings to values. Both studies found that a value of l  2 leads
distil the data and discover the important factors in generally to reliable results while the use of l ¼ 1 leads to
Downloaded by [41.235.86.174] at 12:24 10 July 2011

similarity rankings. distortion of the data.


In our experiment, we generated a PBIBD by nesting
two BIBDs: one to create an incomplete but overlapping
set of genres for each participant, and another to create a
2. The experimental setup set of triadic comparisons of songs within each genre set.
We selected a set of 78 song excerpts in order to verify The outside (genre) BIBD divided the 13 genres into sets
and assess the influence of three control variables— of four for each participant, and the inside (excerpt)
genre, tempo, and primary instrument—on music BIBD divided the 24 excerpts (6 per genre) into sets of
similarity rankings. Because of the large set of stimuli, triads. The outside BIBD had n ¼ 13 genres in total, k ¼ 4
we applied a high level of comparison reduction in the (quadratic comparisons, each participant examined four
experimental design. We used a partially balanced genres) and l ¼ 3 (three participants examined each genre
incomplete block design (PBIBD) consisting of two pair): Equation 2 gives b ¼ 39. This is the number of
nested balanced incomplete block designs (BIBDs) to participants required to complete the whole experiment
reduce the number of comparisons without introducing once; we ran the complete PBIBD twice, doubling the
distortions and to allow for cross-checking of data number of participants needed, to collect two sets of
(Levelt et al., 1966; Burton & Nerlove, 1976; MacRae independent data for every triad in the experimental
et al., 1990). To reach a wide variety of participants, the design. The inside BIBD had n ¼ 24 songs per partici-
experiment was conducted on the Internet via a web pant, l ¼ 2 (every song pair was presented twice in the
interface. participant design) and k ¼ 3 (triadic comparisons):
Equation 2 gives 184 triadic comparisons per participant.
Both BIBDs were computed using R software with the
2.1 The block design
AlgDesign package (R-project, 2008).
A complete block design (CBD) of n stimuli and k items We finally added 10 identical triads to each partici-
per trial (with k 5n), consists of all possible sets of k pant’s experimental design to evaluate across-participant
items selected out of the n total stimuli while avoiding the concordance, and repeated them to assess within-
possible within-trial permutations (e.g. ABC, ACB, participant concordance. Thus, the total amount of
BAC, etc.). In the case of k ¼ 3 , the number b of trials triadic comparisons per participant was 204. For every
in a CBD is given by the formula: participant the design was randomized on triadic
comparisons and on song order inside each triadic
nðn  1Þðn  2Þ comparison to reduce potential order effects. With 204
b¼ : ð1Þ triads, normal listening experimental time was estimated
6
to be about 2.3 h per participant.
In a BIBD, all possible pair-wise comparisons of stimuli
occur l times (Burton & Nerlove, 1976) and every pair of
2.2 Stimuli
stimuli is presented equally often in the whole design,
providing an equal amount of information from which to Based on genre classification of the All Music website (All
compute a similarity measure for each stimulus pair. Music, 2008), we chose 13 genres to cover a large range of
Thus, if k is the number of stimuli per trial, and n the Western music, primarily from popular genres: Afro-Pop,
Perceptual evaluation of inter-song similarity in Western popular music 5

Blues, Classical, Country, Folk, Heavy Metal, Hip-Hop, hosted in the Philips Research Laboratories in Eindhoven
Jazz, Latin, Pop, Reggae, R&B, and Rock. We selected (The Netherlands) between December 2006 and March
six song excerpts from each genre to systematically vary 2007. We recruited the participants by sending an
tempo and primary instrument within each genre. invitation with explanation of our experiment through
We chose genre as a selection criterion to obtain a mailing-lists, on-line forums and universities in different
variable degree of similarity between song excerpts: we countries to have variability among the participants. Each
intended to observe if songs within the same genre were participant was explicitly instructed to test the computer
perceived to be more similar to each other than songs and audio settings to adjust the listening intensity to a
from two different genres, i.e. to test if genre was a comfortable level. We asked participants to fill out a
stronger attractor than the other two control variables: questionnaire to record their musical background, musical
tempo and primary instrument. Tempo and timbre (in training, and listening conditions. The participants were
the form of primary instrument) were used in the 59 males and 19 females. Their age ranged from 18 to 72
selection because they were relevant in the perception years and their average age was 28 years. Most participants
of similarity in the context of previous experimental declared to have had few years of music practice as
studies (Chupchik et al., 1982; McAdams & Matzin, obligatory training in the lower or middle part of the
2001). While tempo is an objective and quantitatively school system of their respective countries. Assuming that
measurable property, timbre is a perceptual phenomenon these early years of practice were not enough to consider a
involving several musical dimensions. In the song participant as a musician, we defined participants having
Downloaded by [41.235.86.174] at 12:24 10 July 2011

selection of the experiment, we represented the timbre had fewer than four years of musical training as non-
of each song by the primary instrument in the excerpt; musicians, and participants having had more than six
inside each genre, we selected two excerpts from each of years of musical training as musicians. According to these
three primary-instrument categories (vocal, piano, and boundaries, 37 participants were considered musicians
guitar). For each primary-instrument category we and 41 were considered non-musicians. Loudspeakers
selected one excerpt from each of two tempo categories: were used by 53 participants, while the other 25 used
slow (5100 beats per minute for a quarter note) and fast headphones. The average actual experimental time for the
(4140 beats per minute for a quarter note). Inside each participants was about 2 h.
category, we aimed at selecting prototypical and popular
examples, to produce a collection of songs as common
2.4 Procedure
and varied as possible. Regarding timbre, we chose songs
with the presence of drums, consistent with the majority The experiment was conducted via a graphical web
of Western popular music. We selected only studio interface available in four languages: Dutch, French,
recordings to minimize concert-related uncontrolled English, and Italian. The first page of the web interface
factors, such as audience noise and recording quality. introduced participants to the purpose and procedure of
The songs satisfying the previous criteria were selected the experiment. After acceptance of participation, each
from an expert-composed database or purchased from participant was given a brief questionnaire with some
the ‘iTunes Store’ (iTunes, 2008). If the selected criteria general questions: name, gender, age, years of musical
were still satisfied, the excerpts were extracted from the training, the listening setting during the experiment
beginning of the chorus as it is typically considered to be (headphones or loudspeakers), and favourite musical
the most representative part of the song. All excerpts genre. The participants were then presented with the
were 15 s long, monophonic, normalized in loudness and experiment page and given the following instructions:
faded in and out over one second. The stimuli were ‘Please listen to three songs by pushing the buttons next
converted to the MPEG-1 Audio Layer 3 (320 Kb s 71) to the letters, then push MOST for the Most similar pair
file format and were completely downloaded on the and LEAST for the Least similar pair; push the NEXT
participants’ computer before being played. All selected button when you are ready to go to the next triadic
excerpts are listed in Appendix A.1 comparison’. The three song excerpts of each triad where
presented on the graphical user interface on the corners
of an equilateral triangle to reduce positioning bias. The
2.3 Participants
participant had to listen to each of them, one at a time,
We conducted the large-scale experiment on the Internet before ranking them. This procedure was repeated until
because of the large number of participants required. The the experiment was completed (204 triadic comparisons).
experiment was run via connection to a server computer The participants could stop at any time and continue the
experiment at a later time, completing it in as many
1
The excerpt database (converted to 192 kbps MP3 format) sessions as desired. Each participant was asked not to
of the experiment can be obtained from the author, discuss the experiment with anybody until the experiment
Alberto Novello, by sending a request mail to: was completed to guarantee independence of data. The
jestern77@yahoo.it participants received 15 Euros for their participation. No
6 Alberto Novello et al.

participant reported any problems during the experi-


mental procedure. To assess similarity holistically, we let
the participants spontaneously choose the perceptual
criterion for their rankings and we explicitly did not
provide the participants with any definition of ‘similar’.
Ten participants explicitly asked which dimensions they
had to consider while ranking pair similarity. No
indication was provided. At the end of the experiment,
each participant was asked to report which music
properties were used when ranking similarity.

3. Preliminary results and control for


methodological robustness
3.1 Participant concordance
We measured participant consistency in two ways, as Fig. 1. Within-participant concordance. For each repeated triad
Downloaded by [41.235.86.174] at 12:24 10 July 2011

within- and across-participant concordance. The first the value of the Kendall’s coefficient of concordance is
assesses the reliability of each participant, by measuring displayed. The horizontal line is the significance level
of concordance at p ¼ 0.05. The mean concordance value for
the stability of its perception over time, while the second
all participants is 0.82. Appendix B indicates the excerpt-
assesses the common perception of similarity of a group
composition for each triad.
of listeners. To measure both the within- and across-
participant concordance, we used Kendall’s coefficient of
concordance (Kendall, 1975): one had very low concordance values and three were very
close to significance.
12S
W¼ ; ð3Þ We calculated the mean within-participant concor-
m2 ðn3  nÞ dance per triad. On all 10 repeated triads participants
where m is the number of participants, n ¼ 3 the number were significantly concordant. The two triads that
of song excerpts in a triad, and S is the variance of the showed lowest concordance are numbers 2 and 6.
sum of ranks per excerpt: Excluding these two triads from the calculation, five of
the participants that scored low on within-participant
X
n
S¼  2;
ðRj  RÞ ð4Þ concordance reached the significant level.2 Because
j¼1 participants 16, 20, and 74 were inconsistent even on
triads on which other participants were highly coherent,
where Rj is the sum of participants’ rankings for the we removed their data from the rest of the analysis. We
jth pair of song excerpts, and R is the expected value if calculated a two-sample t-test between distributions of
the rankings by each participant were unrelated, i.e. the concordance for the musicians and the non-musicians for
null hypothesis (equal to 12 mðn þ 1Þ). In the case of each of the 10 repeated triads. We found no statistically
within-participant concordance, for each participant we significant difference. The calculated p values ranged
calculated 10 concordance values using the duplicate from 0.08 to 0.67.
rankings (m ¼ 2) on the 10 repeated triads (n ¼ 3). In the
case of across-participant concordance, we calculated 3.1.2 Across-participant concordance
one concordance value for each of the 102 triads (n ¼ 2)
using the rankings of all 36 participants (m ¼ 36) across The across-participant concordance was significant
each triad. on all ten common triads (Figure 2). The signi-
ficance level was calculated using a Chi-squared

3.1.1 Within-participant concordance 2


The estimation of the 0.05 significance level for n W values is
computed assuming a normal distribution of the W values
We assessed within-participant concordance comparing
with an expected value of 0.5 and a standard deviation of [1/
the rankings on the ten repeated triads common to every (n * 8)]1/2 (ranking three items gives four possible outcomes of
participant’s design (the list of songs is given in W). In the case of the 10 repeated triads per participant n ¼ 10
Appendix B). The results are shown in Figure 1. Out of and the standard deviation is 0.11, in the case of the within-
78 participants, 70 showed a significant level of within- participant concordance per triad n ¼ 78 and the standard
participant concordance. Among the other participants, deviation is 0.04.
Perceptual evaluation of inter-song similarity in Western popular music 7

by the total number of selected triads in each case: with a


fast-tempo pair (panel (a)), with a slow-tempo pair (panel
(b)), with a same-genre pair (panel (c)), and with a same-
primary instrument pair (panel (d)). In all triad
topologies examined, the pair with stimuli belonging to
the same class of the selected control variable is chosen
more often to be the most similar than to be least similar
or intermediate (p 50.01). Statistical significance was
derived using a bootstrap method in which the complete
collection of similarity rankings was resampled on a triad
basis (Efron & Tibshirani, 1993).
In the second case (the relative measure), in order to
rank the relative effect of each control variable, we
compared control variable saliency inside crossed triads,
i.e. triads that contained two pairs of excerpts belonging
to different control variables (e.g. two stimuli belonging
to fast tempo while one of these belonging to the same
Fig. 2. Across-participant concordance on 10 repeated triads. genre of the third stimulus, see template in Table 1). We
Downloaded by [41.235.86.174] at 12:24 10 July 2011

The crosses mark the Kendall’s concordance value of all selected crossed triads for each combination of two
subjects on each of the 10 added triads. The circles mark the
control variables and counted the rankings for each pair.
values of concordance for the second presentation of the 10
added triads. The horizontal bar is the significance level of
The three panels at the bottom of Figure 3 display the
concordance at p ¼ 0.05. The mean across-concordance value occurrence of ‘most similar’ rankings for each of the two
across the 10 triads is 0.51 + 0.06. Appendix B indicates the pairs belonging to the two control-variable categories in
excerpt-composition for each triad. the crossed triads.
When analysing genre and primary-instrument
approximation.3 The two triads which showed the lowest crossed triads (panel (e)), the pair with stimuli belonging
concordance values are number 2 and 6, the same that to the same genre is chosen more often as most similar.
scored low in the within-concordance analysis. We com- In the case of tempo and primary-instrument crossed
pared across-participant concordance for musicians and triads (panel (f)), the tempo pair is chosen more often,
non-musicians on the ten repeated triads using a sign test. but the influence of the two control variables is more
We found no statistically significant difference (p ¼ 0.11). similar than for the conditions presented in panels (e)
and (g). In the case of genre and tempo crossed triads
(panel (g)), the pair with same genre is chosen more often
as most similar. All these differences have statistical
3.2 Saliency of control variables
significance (p 50.01) calculated in a two-sample t-test of
We investigated the salience of each control variable used the bootstrapped distributions.
to select the stimuli (genre, tempo, and primary instru-
ment) on the participant rankings. We used two
3.3 Solidity of the experimental design
approaches: an absolute measure, observing only one
control variable at a time, and a relative measure, To cross-check the nesting-design concordance we
comparing two control variables. In the first case (the examined the rankings of the pairs for participants who
absolute measure), we selected all triads in the PBIBD had the same experimental design. As described before,
with exactly two stimuli belonging to the same class of a we doubled the number of participants in the whole
given control variable (e.g. fast–fast–slow for tempo or experiment to have two participants ranking the same
piano–piano–vocal for primary instrument) and exam- triads. Observing each one of the 39 participant pairs, we
ined the number of occurrences in which that pair of found just one pair whose participants (number 35 and
stimuli was selected as the most similar, least similar, or 74) were not significantly concordant. We attribute the
intermediate. Figure 3 shows the occurrences of each of low concordance of the pair to participant 74 who
the three rankings for each control variable normalized performed very low in the within-participant concor-
dance and was removed from the rest of the analysis.
3
The measure of W is related to the Friedman test statistic,
which can be approximated with a Chi-squared distribution in 3.3.1 Control experiment 1: triadic comparison versus
the case of more than two judges. The significance level is then grouping paradigm
calculated from a Chi-squared distribution with two degrees of
freedom, normalized by the number of participants and degrees To evaluate the methodological solidity of our experi-
of freedom (Kendall, 1975). mental design, we compared the experimental results of
8 Alberto Novello et al.
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Fig. 3. Analysis of the saliency of the three control variables. Panels a to d show the mean number of occurrences of each of the three
possible rankings for each pair in the whole set of triads: a—for the fast pair in the two-fast-song triads, b—for the slow pair in the
two-slow-song triads, c—for the same primary instrument in the two-primary-instrument-song triads, and d—for the same genre pair
in the two-genre-song triads. The three plots at the bottom show the comparison of saliency for each control variable on participants’
rankings. For the two pairs belonging to the two control variables in the crossed triad, we calculated the mean number of occurrences
of most-similar rankings. This operation was performed in the case of e—genre and primary-instrument crossed triad, f—tempo and
primary-instrument crossed triad, and g—genre and tempo crossed triad. In all seven plots we omitted to display the standard error of
the mean that ranged between 0.1% and 2%.

Table 1. Scheme of ‘crossed’ control-variables triads. Blues, Classic, Country, Funk, Heavy Metal, Hip-Hop,
Jazz, Pop, and Rock. For each genre, we chose two
Stimulus 1 Stimulus 2 Stimulus 3 songs, one with a slow tempo (5100 beats per minute for
a quarter note) and one with a fast tempo (4 140 beats
Control variable 1 class A A a per minute for a quarter note).
Control variable 2 class b B B A total of 36 listeners participated in the triadic
comparisons experiment, of which 18 were considered
musicians, having more than seven years of practical
music training and 18 were non-musicians, having less
two side experiments: the first using a triadic comparison than one year of music training. We used these limits in
paradigm and the second using a grouping paradigm the participants selection to have a neat separation
(MacRae et al., 1990). For both we selected a new between the two groups of participants based on music
stimulus set with 18 song excerpts (informations on the practice. Of the participants, 25 were male and 11 were
selected excerpts are listed in Appendix D). The stimuli female, and the average age was 28 years. The experiment
were 10-second excerpts from 18 songs covering nine followed the typical triadic comparison procedure de-
genres of Western music, primarily from popular genres: scribed above.
Perceptual evaluation of inter-song similarity in Western popular music 9

In the grouping paradigm 18 new participants took a BIBD with l ¼ 2. To evaluate the influence of triad
part, nine of which were musicians. The average age was reduction, 12 new participants performed a triadic
24 years. The participants were presented with a graphical comparison experiment with a complete block design
user interface that showed all song excerpts in the form of (CBD) using nine song excerpts (see Appendix C for the
sound icons, randomly placed on the computer screen. selected excerpts). The CBD was presented in three
Each participant was asked to rearrange the items into laboratory sessions. Ten triads were repeated in each
groups depending on similar characteristics. The partici- design to evaluate participant concordance.
pants were free to choose the number of necessary groups All participants showed significant within-participant
and were asked to perform the task twice (with at least concordance on all repeated triads. From the complete
one week break between the two sessions). set of participants’ rankings (84 triads), we extracted the
When the number of items is relatively small and if the rankings of the triads belonging to the BIBD (24 triads).
two-dimensional space used to display the objects is not We built two 9 6 9 similarity matrices adding the
too different from the perceptual space of the participant, rankings of all participants: one for the CBD and one
the grouping method is considered more efficient, user- for the BIBD, and calculated the correlation between the
friendly and intuitive compared to other methodologies, values of the two matrices. A highly significant average
such as pair rating and pair ranking (Goldstone, 1994). correlation was found between the two matrices
We ran this experiment to check if grouping and ranking (r ¼ 0.92, df ¼ 36, p 50.01). We also computed the
provided similar results and to see if the stimulus- correlation between the CBD and BIBD matrices built
Downloaded by [41.235.86.174] at 12:24 10 July 2011

presentation order (imposed in the case of triad using the rankings of each participant. In this case we
comparisons, and free in the case of grouping) influenced found again a highly significant mean correlation
the final results. between the two matrices across participants
We then compared the items’ distance derived from (r ¼ 0.91 + 0.02, df ¼ 36, p 50.01). These results suggest
all participants’ data, collected both in the triadic the reliability of applying a BIBD in substitution of a
comparisons and the grouping paradigm. From the CBD in this perceptual context.
triadic comparisons, we built a 18 6 18 dissimilarity
matrix, assigning two points to every pair ranked as
least similar by participants, one to the intermediate,
and zero to the most similar pair (Levelt et al., 1966).
4. Exploration of the perceptual space
Because we used a non-parametric multi-dimensional By combining the similarity-based pair-rankings for each
scaling method (ALSCAL), the specific values assigned triad from all participants, we constructed a grand
to the three similarity types are not crucial as long as 78 6 78 dissimilarity matrix, in which each cell represents
they are monotonic and rank order is preserved. We the similarity between two independent excerpts. We
also built an 18 6 18 dissimilarity matrix from the assigned two points to each song pair that was chosen to
grouping paradigm using the two sets of experimental be most similar, zero for the least similar and one for the
data for each participant, assigning, for each session, intermediate pair and added each participant’s results to
one point to all song pairs that were not grouped the matrix. In this respect, we tried several combinations
together and a zero value for song pairs grouped of values and found no relevant difference in the
together (MacRae et al., 1990). The values of the two successive analysis. Because our goal is to preserve the
grouping sessions were added. We compared distances ranking, changing these values (e.g. proportionally)
of the same pairs of song excerpts in the triadic would not sensibly alter the perceptual space. Finally,
comparisons and in the grouping paradigm, and found not all matrix cells were judged the same number of
a statistically significant correlation (r ¼ 0.67, df ¼ 152, times; we normalized each cell value by its number of
p 50.01). In line with the finding of MacRae et al. presentations in the perceptual experiment (which ranged
(1990), this result supports the hypothesis that pair from a minimum of 12 to a maximum of 48).
ordering and stimulus grouping are paradigms that In the data analysis presented here, the experimental
provide correlated results, that can be used for a data in the form of a 78 6 78 matrix is used through non-
common perceptual understanding of music similarity. metrical multidimensional scaling (MDS) to estimate the
Although the grouping procedure is easier for a small ‘participant perceptual space’, which is a topological
set of stimuli, pair ordering may be preferable with a representation of the perceived similarity between songs
large number of stimuli. for visualization and analysis, with the main objective of
quantitatively investigating the influence of aggregators
such as tempo and primary instrument on the organiza-
3.3.2 Control experiment 2: complete block design versus
tion of the participant perceptual space.
balanced incomplete block design
MDS has been used successfully in the past for other
We ran a second control experiment to estimate the perceptual similarity tasks, such as timbre (Grey, 1977),
reliability of the reduction of comparisons introduced by music intervals (Levelt et al., 1966), and scent (MacRae
10 Alberto Novello et al.

et al., 1990). To narrow the focus and the dimensions contextual interpretation (Shepard, 1986). Because of
along which similarity can be judged, we measured this fact, in our analysis we used non-metrical MDS to
similarity in a very constrained environment (triadic scan the resulting space for any potential dimension
comparisons, short excerpts, limited vocals, limited correlates, not assuming any particular dimension
control variables: genre-tempo and primary instrument). returned by the MDS algorithm.
Our intention is to test from different perspectives our If the perceptual space requires high dimensionality
hypotheses that specific attributes of our stimuli are used for display, genre topology and hierarchical clustering
by listeners in similarity rankings. Using several techni- are complementary perspectives for a preliminary sim-
ques, we search for a clear ordering of our attributes plified representation of the perceptual space: the first
along any arbitrary dimension in the MDS space, to represents the spatial distribution of clusters and the
provide support for our hypothesis. second represents the distance among individual ex-
cerpts. A combination of both techniques provided
in our case an efficient preliminary approach for a
4.1 Analytical methodology
qualitative investigation of the features of the perceptual
To assess the influence of each control variable on the space.
distribution of song excerpts in the perceptual space, it is To quantitatively verify the existence of features
first necessary to have a representation of the topology of observed in the previous analysis stages and measure
such a space, i.e. compute the coordinates for each song the influence of the control variables, tempo and primary
Downloaded by [41.235.86.174] at 12:24 10 July 2011

excerpt. MDS is a set of statistical techniques largely instrument, used for the stimuli selection on the
used in literature for visualization of complex data to organization of the perceptual space, we use quadratic
reduce data dimensionality (Shepard, 1962a, 1962b; discriminant analysis (QDA). QDA is a classification
Chupchik et al., 1982; Eerola et al., 2001; Lamont & method used to separate two or more classes of objects
Dibben, 2001). MDS takes as input a matrix of pairwise through the use of quadratic functions of the objects’
item similarity scores (such as the 78 6 78 matrix of features (Duda, Hart, & Stork, 2001). In the present
perceptual data) and assigns a location of each item in a article, we employ QDA in a novel fashion using the
lower-dimensional space. MDS can transform metrical control-variable assigned to each song excerpt (genre,
or non-metrical data, such as as rankings, into coordi- tempo (slow and fast), and primary instrument (guitar,
nates. We used the ALSCAL algorithm (Young & piano, vocal)) as classification classes and the MDS
Lewyckyj, 1996) for the non-metrical MDS analysis. coordinates for each song excerpt as features. In this
The data distortion introduced by an MDS algorithm way, we obtain a quantitative measure of the influence of
when attempting to reduce the number of dimensions is each specific control variable, expressed as a measure of
represented by the Pearson’s correlation coefficient spatial separation of stimulus classes, and the axis
between the dissimilarity matrix (input) and the distances interpretation, as the control variable correlated with
between coordinates in the MDS-calculated space (out- the separation. Based on our observation that the
put). Every violation of the ordering of the data in the stimulus classes in the MDS space do not appear to be
dissimilarity matrix is a distortion of the original data of equivalent size, we chose to use QDA, rather than a
and decreases the Pearson’s correlation coefficient value: linear approach where it is assumed that all classes have
the higher the Pearson’s correlation coefficient, the more equivalent variance.
reliable the representation of the original data. One
important parameter that the experimenter has to
4.2 Dimension calculation
carefully define in the MDS computation, is thus the
number of dimensions to keep a low space distortion and As a first step we generated a 2D MDS estimate of
easiness of visualization of the experimental data. the perceptual similarity space. The results show a low
We first investigated the variation of the Pearson’s degree of correlation between MDS representation and
correlation coefficient value with the number of dimen- the data points (Pearson’s correlation coefficient of
sions (Shepard, 1962a, 1962b) to decide on the optimal 0.63). The resulting plot of excerpts in this 2D space
number of final dimensions in the MDS output. We then (Figure 4) shows the typical circular shape of a data
extracted the coordinates of each stimulus in the multi- set whose dimensionality is under-represented. It is
dimensional space that we interpret as the participants’ clear from this analysis that the similarity data
perceptual space. One difficulty of the MDS method requires more than two dimensions to be modelled
resides in the fact that the new coordinates are referred to accurately.
a set of axes chosen by the algorithm to maximize the We calculated the plot of Pearson’s correlation
data variance but may have no explicit connection to coefficient value versus the number of dimensions
physical properties of the stimuli; it is up to the user, by (Figure 5), to estimate the number of dimensions required
observation and extra analysis of the data distribution, to to accurately model the participants’ perceptual space.
determine the final rotation of the axes and their Because the Pearson correlation coefficient saturates at
Perceptual evaluation of inter-song similarity in Western popular music 11

Fig. 4. Two-dimensional representation of the perceptual space


of the 78-song excerpts. The points represent the position of Fig. 5. Pearson’s correlation coefficient value versus the number
Downloaded by [41.235.86.174] at 12:24 10 July 2011

songs in the perceptual space computed in the MDS calculation of dimensions: the points represent the Pearson correlation
(as performed by the ALSCAL algorithm (Young & Lewyckyj, coefficient value corresponding to the final number of dimen-
1996)). The 13 different markers represent the belonging to the sions chosen for the MDS calculation. The Pearson correlation
13 genres used in the stimulus selection. The number next to coefficient saturates at a relatively high value around 0.81.
each song refers to the songs properties reported in the
Appendix A. In the MDS calculation of this plot, the Pearson
correlation coefficient has a value of 0.6.

six dimensions with a value around 0.81, we chose six


dimensions to represent the participants’ data. Applying
the ALSCAL algorithm on the 78 6 78 dissimilarity
matrix, we obtained the coordinates for each song excerpt
in this six-dimensional space.

4.3 Genre topology


When the perceptual space requires high dimensionality
(i.e. more than three dimensions), its visualization might
become complex; in our case, we additionally reduced the
dimensionality of the representation in a controlled
manner to achieve a preliminary visualization of the
perceptual space by computing a two-dimensional dis-
play of the ‘genre topology’. We first computed the 13
genre centroids by averaging the coordinates of the six
song excerpts within each genre. We then constructed a
Fig. 6. Greyscale image of the Euclidean distances between
13 6 13 genre dissimilarity matrix from the Euclidean genre centroids. On the x- and y-axis the labels refer to the 13
distances between genre centroids. genres of the stimuli. Each square represents how close two
Figure 6 displays the dissimilarity matrix between the genre centroids are in the six-dimensional perceptual-space. The
13 genre centroid as a greyscale image. Complete darker the square the closer the centroids. The legend on the
similarity is represented as black, complete dissimilarity side gives a magnitude reference for the relative Euclidean
as white. We observe a dark diagonal, which is expected distances in the six-dimensional space.
as songs belonging to the same genre tend to be perceived
as similar to each other. Some genre, however, present a
less than perfect intra-genre similarity, e.g. Latin, whose Hop, and the relatively large distance between Classical
songs are perceived similar to other genres. The plot and Rock, R&B, and Reggae. However, with this plot we
shows some musicologically-expected results, such as the cannot deduce the distribution of the genres in the
proximity of Country and Folk, and of Electro and Hip- perceptual space.
12 Alberto Novello et al.

To compute the organization of the genres in the


4.4 Clustering interpretation
perceptual space we use the distances between genre
centroids as input for the non-parametric MDS algo- To observe the grouping of song excerpts in the
rithm (ALSCAL): two dimensions provided an accep- participants’ perceptual space without losing the infor-
table compromise between the Pearson correlation mation of each individual song, we calculated the k-
coefficient value (0.86) and ease of display. Figure 7 means hierarchical-clustering from the six-dimensional
displays the topology of the 13 genre centroids. coordinates of all song excerpts. In hierarchical cluster-
Figure 7 confirms the results found in Figure 6 in ing, a series of partitions of the data takes place,
terms of distance and proximity of genre centroids, and depending on the distances between items. The partition-
adds information on their relative position. In the plot ing runs hierarchically creating a tree of possible clusters.
Reggae is positioned half-way between Afro-Pop and The limit of this approach is the positioning of song
Rock, which makes sense musicologically. We drew an excerpts in the space: the relative distance between
axis in the figure labelled ‘Synthetic vs. acoustic’ to stimuli is visualized but not their spatial distribution.
illustrate the distribution along the axis of genres using Figure 8 displays the dendrogram calculated from the
predominantly acoustic instruments and those using hierarchical clustering of the distances in the six-
predominantly synthesized sounds: electronic drums, dimensional space between the song excerpts.
distorted guitars, synthesizers and effects. Classical, Our first observation from the dendrogram is the
Latin, Jazz, Country, Folk are positioned toward the spread of each genre’s excerpt set. In the case of Folk, for
Downloaded by [41.235.86.174] at 12:24 10 July 2011

acoustic side of the axis, while Electro, Hip-Hop, and example, the excerpts are scattered in the two farthest
Rock are positioned toward the synthetic side. This clusters (at the top and bottom of Figure 8): a relatively
topology suggests that the acoustic-synthetic timbral large distance is thus needed to group them all together.
quality is a relevant factor in listeners’ judgments of On the other hand, in the case of Classical, we can use a
music. The genre topology has the advantage of reducing smaller distance to group all excerpts. The genre-
the complexity of the output data by displaying genre topology plot and the hierarchical-clustering dendro-
centroids in two dimensions with a relatively high gram provide complementary information and together
correlation coefficient (0.86) at the price of losing help distinguish different distribution types: Folk and
information of the excerpt-spread within each genre. Country centroids are near each other in the genre
topology plot, but their song-position spread in the
dendrogram is quite large, while the Classical centroid is
isolated from other genres in the topology plot, and the
songs are tightly clustered in the dendrogram. The genre
topology and the clustering dendrogram might be
integrated in one analytical method for the visualization
of the perceptual space. In such an approach, one could
normalize the Euclidean distance between two genres
(from the genre topology) by an estimate of the metrical
spread of the two genres (from the dendrogram) to
display the overlap of two genre clusters while keeping a
relatively low dimensional space. Another observation
from the dendrogram in Figure 8 concerns the char-
acteristic of clusters. At the highest hierarchical level
(right most in the figure), the clusters are split into two
groups differentiated by the tempo and rhythm of the
song excerpts they contain; the bottom-most cluster
contains mainly excerpts belonging to the slow-tempo
class, while the top-most cluster contains excerpts
belonging to the fast-tempo class or slow songs belonging
to rhythmically complex genres, such as Afro-Pop, Hip-
Hop and Reggae. As a measure of cluster separation, we
calculated the discriminability, or d 0 of the distribution of
Fig. 7. Genre topology: two-dimensional MDS plot, calculated
from the 13 6 13 dissimilarity matrix constructed from the
BPM values of the song excerpts for the two clusters (see
relative distances of the 13 genre centroids (Pearson correlation Appendix A for relative values):
coefficient ¼ 0.86). The arrows drawn represent the separation
between prevalently acoustic and synthetic genres. The values difference between means m 1  m2
on the axes refer to the point distances in the two-dimensional d0 ¼ ¼ ; ð5Þ
MDS-space.
spread ðs1  s2 Þ1=2
Perceptual evaluation of inter-song similarity in Western popular music 13
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Fig. 8. Dendrogram of the hierarchical clustering of song excerpts. On the y-axis the individual songs are represented by: the genre,
followed by two subscripts for the fast- (F) and slow- (S) tempo class, and for the guitar- (G), piano- (P), and vocal- (V) primary-
instrument class; the excerpt reference number corresponds to Appendix B. The length of the horizontal lines indicates the distance
between two clusters in the six-dimensional space derived through MDS. At the highest hierarchical level (right-most in the figure) the
clusters are split into two groups: the fast-rhythmically complex group (top of the figure), the slow group (bottom of the figure). The
vertical line is the threshold chosen to separate the ten sub-clusters, six of which are labelled in the figure.

where m1 ¼ 150 BPM and m2 ¼ 86 BPM, the means of the The excerpt characteristics inside each sub-cluster sug-
two tempo distributions and s1 ¼ 58 BPM and s2 ¼ 28 gests a preliminary labelling for six of them (from top to
BPM, their standard deviations. This results in a d 0 ¼ 1.5, bottom in Figure 8): Instrumental fast, Vocal fast, Vocal
which is greater than the value (1.0) typically used as rhythmically complex (Afro-Pop, Reggae, Hip-Hop),
the threshold for perceptual discriminability (Green & Classical, Instrumental slow, Vocal slow. Although still
Swets, 1966/1988). A double sample t-test confirmed the unverified at this stage, this labelling suggests the possible
statistically significant separation of the two distributions interpretation of some of the MDS axes: we see in fact
(p 50.01). excerpt clustering depending on tempo (slow–fast), genre
From the analysis of the composition of the two main (Classical, rhythmically complex genres), and primary
clusters, we set a threshold to distinguish ten sub-clusters. instrument (vocal versus non-vocal music).
14 Alberto Novello et al.

distributions of the two classes of song positions on the


4.5 Identification of dimensions
new axis. In the case of the slow–fast axis, we found
We employed quadratic discriminant analysis (QDA) to d 0 ¼ 2.0. This value is close to the discriminability we
quantitatively verify the interpretations drawn from calculated using the BPM value of each excerpt belong-
hierarchical clustering and genre topology mapping, ing to the two main clusters in the hierarchical clustering.
and to find a new set of axes in the perceptual space To further investigate the slow–fast axis, we calcu-
that could maximize the variance of each control lated the correlation between the annotated BPM values
variable, expressed as spatial separation between stimuli of each song excerpt and the excerpt position on the
stemming from different classes of a control variable. In slow–fast axis. We found a significant correlation
our QDA computation, we employed the tempo and (r ¼ 0.66, df ¼ 77, p 50.01) between the logarithm of
primary-instrument classes assigned to each song excerpt the BPM of each excerpt and the position on the
as classification classes, and the six-dimensional coordi- selected axis, indicating the presence of a BPM-based
nates of each song excerpt as features. The QDA training excerpt-sorting in the perceptual similarity space of the
process computed the best quadratic discriminant func- participants.
tion to separate the stimuli of each class. In addition, our In the case of the primary-instrument control variable,
QDA implementation provides an estimate of classifica- we first analysed the separation between stimulus
tion performance as percentage correct, and a weighting position and the three primary-instrument classes: piano,
matrix that can be used to estimate the relative guitar and vocal. In this case three out of the six MDS
Downloaded by [41.235.86.174] at 12:24 10 July 2011

importance of each dimension (feature) separating the coordinates were necessary to create the best stimulus
excerpts into each class (control variable). Thus, QDA separation, with a correct classification of 75.2 + 7.9%.
was used to determine which MDS axes were most The highest confusion occurs between piano and guitar
relevant in maximizing the separation between the two classes with an average misclassification of 23.2%.
classes of a given control variable: for tempo, the Grouping piano and guitar classes in one new class
separation of fast and slow pieces; for primary instru- named ‘non-vocal’, and comparing vocal and non-vocal
ment, the separation of vocal and non-vocal (guitar and classes with QDA improved classification performance to
piano excerpts were grouped together); and for the 95.9 + 2.4%. In this new case a combination of just two
separation of synthetic and acoustic excerpts. In the MDS coordinates is sufficient to achieve optimal
general case, more than one MDS axis is relevant in classifier performance.
the maximization of separation of stimuli classes for a These results suggest that in the presence of vocal
given control variable: a linear combination of the stimuli, the timbre difference between excerpts with
relevant MDS axes provides then a new axis candidate guitar and piano as primary instrument is not a strong
that alone best separates the classes. factor in determining music similarity perception. We
In the QDA analysis, to verify the separation of determined the axis that best separates vocal and non-
synthetic versus acoustic genres hypothesized in the vocal classes as a linear combination of the two MDS
observation of the genre topology, we added a new post- coordinates sufficient to achieve optimal classifier per-
hoc control variable called ‘Synthetic-acoustic’ manually formance. In the following part of the article, we will
labelling the songs one by one: song excerpts containing refer to this new axis as the ‘vocal–non-vocal axis’. We
primarily timbres of distorted or deeply effected guitars, calculated the discriminability of the two distributions of
electronic drums and synthesizer pads were assigned song-excerpt positions on the vocal–non-vocal axis and
to the synthetic class, while song excerpts using found a d 0 ¼ 2.3.
acoustic instruments or unprocessed amplified instru- In the case of synthetic–acoustic timbre classes, a
ments (clean) were assigned to the acoustic class. In combination of two MDS coordinates was necessary and
Appendix A the reader finds the assigned class for each sufficient to achieve the top performance of the classifier:
song excerpt. 97.2 + 1.3%. One of these two dimensions was the
In the case of the slow and fast classes, the MDS tempo axis, used previously for the separation of slow
output was found to be already optimal in the QDA and fast classes. We defined a new axis, the ‘synthetic–
analysis: one of the six MDS coordinates could alone acoustic axis’, as a linear combination of these two MDS
produce the highest separation between stimuli classes coordinates. Because of this definition, the synthetic–
with a correct classification mean of 82.8 + 5.7%. In the acoustic axis is not orthogonal with respect to the slow–
following part of this article, we will refer to the axis fast axis, with a 368 separation between the fast and the
identified by this coordinate as the ‘slow–fast axis’ synthetic direction and between the slow and the acoustic
because of its salience in separating the tempo classes. directions. We calculated the discriminability of the two
Because the calculation of the QDA performance distributions of song-excerpt positions on the synthetic–
relies only on the number of stimuli correctly classified, acoustic axis and found d 0 ¼ 2.4.
the d 0 computation adds complementary information Figure 9 displays the projection of the MDS positions
because it estimates the classes discriminability using the of the 78 song excerpts on the newly defined axes: the
Perceptual evaluation of inter-song similarity in Western popular music 15

slow–fast, the vocal–non-vocal and synthetic–acoustic stimuli). Further research is needed to test the presence of
axes. Different symbols represent excerpts belonging to such an effect connecting timbre and tempo in a larger set
different control variable classes. All three plots outline of stimuli.
the clear clustering of songs on each of the three
calculated axes. While Figures 9(a) and (c) show a
4.6 Contextual dependency of similarity factors
homogeneous distribution of songs, Figure 9(b) shows
the song positions distributed in a more diagonal Although the QDA analysis showed the global influence
fashion. This distribution reflects the non-orthogonality of tempo and timbre (the latter in the form of two axes:
of the synthetic–acoustic axis and the slow–fast axis, the vocal–non-vocal and synthetic–acoustic), it is reasonable
consequence of the definition of the synthetic–acoustic to suppose the magnitude of the influence on perceived
axis as a linear combination of the slow–fast axis and one music similarity of each possible musical dimension to be
of the axes resulting from the MDS calculation. context dependent (Cambouropoulos, 2009). To analyse
From our stimulus selection, we are not in a position this effect, we iteratively selected three genres (18
of hypothesizing if the distribution of songs along the excerpts); from all experimental triads, we selected those
diagonal of Figure 9(b) could be a general behaviour: one that contained only stimuli belonging to the defined
cause could be that the majority of songs using synthetic subset, and used their participant rankings (assigning
timbres have fast tempi, and the majority of slow songs two points to the least similar pair, one to the
use acoustic timbres; however, the diagonal distribution intermediate and zero to the most similar) to build an
Downloaded by [41.235.86.174] at 12:24 10 July 2011

in Figure 9(b) might also be caused by the unequal 18 6 18 dissimilarity matrix representing the relative
numbers of stimuli in the synthetic and acoustic classes, distances between all stimuli in the subset. We then used
which cannot be balanced because of the a-posteriori MDS to compute the two-dimensional representation of
definition of the control variable: there are fewer slow- the perceptual space of the specific stimulus subset.
synthetic excerpts (11 stimuli) than fast-synthetic ex- In Figure 10(a), we show the MDS plots of data from
cerpts (18 stimuli) and there are fewer fast-acoustic 18 excerpts selected from three rather different genres:
excerpts (21 stimuli) than slow-acoustic excerpts (29 Afro-Pop, Classical, and Rock. We observe three tight

Fig. 9. Projection of the six-dimensional coordinates of the 78 song excerpts on the three axes selected from the QDA analysis. The three
panels show two-dimensional representations for each pair of axes: a) Primary instrument versus tempo axes, b) Synthetic-acoustic versus
tempo axes, c) Synthetic-acoustic versus primary-instrument axes. In the three plots, to outline the clustering, song excerpts belonging to
different control variable classes are represented with different symbols explained in the legends.
16 Alberto Novello et al.

genre-dependent clusters. Substituting Afro-Pop with as Rock, Classical and Afro-Pop, the genre clustering
Latin, in Figure 10(b), we observe again a genre-based might mask the effect of other less salient musical
clustering and the emergence of a tempo separation dimensions such as tempo or timbre. On the other hand,
within genres. Figure 10(c) displays the MDS results for in the case of excerpts selected from less musicologically
the excerpts from Afro-Pop, Country, and Latin genres. distant genres, e.g. Folk and Country, or Latin and Afro-
In this case, the genre separation is less evident, especially Pop, the effect of genre might be less strong and the effect
for the Latin and Afro-Pop excerpts; on the x axis, we of tempo, rhythm, and timbre can have a stronger
observe a global effect of tempo separation with slow influence on the perceived similarity. The effect of
excerpts towards the negative values of the x axis and fast context can be linked to two different causes. Consider-
excerpts towards the positive values; on the y axis, we ing the lower triad level, changing one song can alter the
interpret the genre distribution to be influenced by perception of the triad: if all three songs have slow
rhythmic complexity: relatively simple and steady rhyth- tempo, for example, tempo does not play a strong role; if
mic structures (prevalent in Country) towards positive we have one fast song and two slow songs, then the effect
values of the y axis and more rhythmically complex of tempo on similarity is different. However, this
structures (prevalent in Latin and Afro-Pop) towards perceptual ‘switch’ can be due to the listeners not making
negative values of the y axis. The musicological proximity use of tempo in a specific context, or because there is not
of Latin and Afro-pop supports this interpretation. enough variability of tempo inside the stimuli. In
The results shown in Figure 10 support previous conclusion, from the previous analysis it is not clear
Downloaded by [41.235.86.174] at 12:24 10 July 2011

experimental hypotheses (Lamont & Dibben, 2001; whether the observed variation of the relevance of the
Eerola & Bregman, 2007) suggesting that factors control variables with the stimulus subset reflects a
responsible for the grouping of excerpt-subsets are likely context-dependent perception of the participants, or that
to be context dependent. In particular it seems reason- in a subpart of the perceptual space some control
able to hypothesize that in the presence of excerpts variables have less variation causing one dimension to
selected from three musicologically distant genres, such emerge as dominant.

Fig. 10. Example of different contextual-dependent factors in the interpretation of excerpts-subsets grouping: a) Classical/Latin/Rock,
excerpts are grouped by genres; b) Afro-Pop/Classical/Rock, we observe the influence of tempo in the distribution (along the drawn
axis); c) Afro-Pop/Country/Latin, we interpret the x axis as ‘Tempo’, and y axis as ‘Rhythmic complexity’. Each marker identifies
excerpt positions of the genres in the legend. The letters identify the control variable classes: ‘F’ for fast, ‘S’ for slow, ‘V’ for vocal, ‘G’
for guitar, and ‘P’ for piano.
Perceptual evaluation of inter-song similarity in Western popular music 17

incomplete block design (BIBD) supports the reliability


5. Discussion
of reducing the number of triads with l ¼ 2 in a
Despite the explicit choice of not providing the partici- perceptual experiment as suggested by previous literature
pants with a definition of similarity, to control the (Levelt et al., 1966; MacRae et al., 1990). Results
perceptual dimensions used by the participant in the obtained using a grouping paradigm and triadic compar-
ranking task, the experimental results show high isons in a BIBD showed high correlation, confirming the
participant concordance, suggesting that listeners do solidity of our methodology and the interchangeability of
interpret similarity consistently and can perform a both paradigms as suggested by MacRae et al. (1990).
similarity task. When asked which music properties were The efficiency of our experimental design and low
used in ranking similarity, participants often referred complexity of the task for the participants is indicated by
indirectly to the timbre properties. the actual average experimental time per participant:
We described an experimental method that assesses about 2 h (based on informal feedback from a number of
similarity perception of Western popular music using participants). The high within- and across-participant
triadic comparisons of song excerpts to present the concordance of the results confirm the inner coherence
participants with a low-complexity task, and partially and robustness of the experimental method.
balanced incomplete block design (PBIBD) to reduce the An exploration of the control variables used in the
number of stimulus comparisons with the consequent selection of the stimulus material showed significant
possibility of extending the stimulus set. The listening saliency and a hierarchical degree of impact on partici-
Downloaded by [41.235.86.174] at 12:24 10 July 2011

experiment was conducted through the Internet and used pants’ rankings in our stimulus context: genre 4
78 music excerpts and involved 78 participants with tempo 4 primary instrument. This hierarchy is in
various music backgrounds. agreement with the findings of previous studies that used
The initial analysis of the experimental results was different experimental methodologies and stimulus con-
conducted to assess the existence of commonly-perceived texts (Chupchik et al., 1982; Lamont & Dibben, 2001;
inter-song music similarity and test the solidity of the McAdams & Matzin, 2001), and extends the validity of
methodology used. We found significant within-partici- the previous results to other genres of Western popular
pant concordance for 90% of participants and significant music. Chupchik et al. (1982) found that tempo, primary
across-participant concordance for 100% of the tested instrument, and articulation were the main musical
triads. The high within-participant concordance con- features used by participants for similarity pair-rating
firmed the existence of a clear and stable interpretation of of Jazz improvisations. In a second experiment using
music similarity for most listeners. The high across- pair-rating on excerpts selected from three different
participant concordance suggests the existence of an genres, the most relevant dimensions found were genre
underlying common perception of music similarity across and tempo. McAdams and Matzin (2001) used a group-
listeners, which could be formalized into a theoretical ing paradigm and excerpts of contemporary musical
model able to represent and predict the perception of material and found that tempo and timbre influenced
music similarity. Despite the different experimental participants’ judgments.
contexts, these results support the findings on across- The predominance of genre in influencing partici-
participant concordance of the two user tests run by pants’ judgments is not surprising: genre is a collection
Logan and Salomon (2001) with 20 participants and of attributes encompassing several musical dimensions.
Pampalk (2006) with 25 participants. Our study extends Genre is in fact not necessarily orthogonal to primary
their conclusions on the existence of a common percep- instrument and tempo: e.g. Heavy Metal can be
tion of music similarity to a large set of songs and genres identified by the sound of distorted guitars, and
of Western popular music, with a larger number of Heavy-Metal songs tend to have high tempi. Given the
participants and with several cross-checks. In the large- complexity of genre, it seems necessary to extend the
scale experiment we did not find any statistically validity of our conclusions by exploring each genre in
significant difference between musicians and non-musi- detail with a specific experiment aimed at determining
cians in within- and across-participant concordance. This relevant dimensions inside each genre: the six songs per
result is in line with previous studies (on different song genre that we selected are not sufficient in our opinion to
databases) that found little or no statistically significant cover the musical variety of a musical genre. Further-
difference between musicians and non-musicians in the more it seems reasonable to expect that the relevance of
dimensions used for their judgments of music similarity the control variables might vary for different genres.
(Chupchik et al., 1982; Lamont & Dibben, 2001; Between the two other control variables, tempo can be
McAdams et al., 2004). objectively and quantitatively defined by the BPM value
We tested the robustness of the experimental method on the quarter note. Timbre, on the other hand, is a
with two control experiments. The high correlation perceptual phenomenon, and can be described using
found comparing the similarity rankings of a complete several musical dimensions, such as frequency spectrum,
block design against the rankings of a balanced music instruments in the piece, and recording techni-
18 Alberto Novello et al.

ques. Our choice of representing timbre by the pre- The final analysis showed that factors responsible for
dominant instrument is limited and would need further the grouping of excerpt subsets are likely to be context
investigation, for which the same experimental design dependent. However, our experimental design does not
could be used. allow us to determine whether the varying relevance of
To quantitatively explore the perceptual space of the different music dimensions is due to the different
participants, we used a combination of numerical and cognitive weight that they assume in the mind of the
analytical methods. Through multidimensional scaling listener, or if it is due to the limited variability of a
(MDS), we found that six dimensions were optimal to specific dimension within the stimuli in a region of the
represent the participants’ perceptual space obtained perceptual space. In the song selection of our experi-
from their similarity rankings in our experimental mental design, we used prototypical songs from several
context. By using QDA to model the excerpt positions genres to extend the context of our results and find
in the participants’ perceptual space, we selected and common dimensions that can be used in the majority of
labelled two axes that maximized the discriminability applicative scenarios, i.e. a database with different genres
(and the classifier performance) between the two excerpt of Western popular music. However, because music
classes of each control variable: ‘slow–fast’ (d 0 ¼ 2.0) and similarity is context-based, these results may not be easily
‘vocal–non-vocal’ (d 0 ¼ 2.3). Interpretations based on the extended to the case of specific music databases: for
hierarchical clustering and genre topology analyses example, within a collection of jazz music, the perceptual
suggested the influence of synthetic versus acoustic dimensions defining similarity may have completely
Downloaded by [41.235.86.174] at 12:24 10 July 2011

timbres on the perceived similarity. We defined a different weights from the ones found in this study.
posteriori a third control variable manually assigned Despite all verification used in the experimental and
based on excerpt timbre, that provided a third relevant analytical methodology, because of the subjectivity of
axis: ‘synthetic–acoustic’ (d 0 ¼ 2.4). The high correlation our song selection, it is not certain also that the
(r ¼ 0.66 , df ¼ 77, p 50.01) between projected positions perceptual space we found will be similar in another
of excerpts on the slow–fast axis and the logarithm of experiment using the same number of genres and songs
each excerpt BPM value adds further evidence that the but selected from a different database by another
selected slow–fast axis indeed relates to tempo. Perhaps experimenter. Future research using a similar paradigm
because of the complexity of timbre perception, we could but different song sets is needed to verify how much the
not find a unique objective variable to perform a similar present results can be generalized to predict inter-song
measure of axis-correlation in the case of the vocal–non- similarity using several genres of Western popular music.
vocal, and the synthetic–acoustic axes. This observation about context raises the problem of
It can be argued that the synthetic–acoustic property how to formally integrate the global and contextual
represents just a form of genre similarity, because of the influence of each control variable in a theoretical model
presence of genres such as electronica or classical that or algorithmic application. Although it seems difficult to
have the majority of songs in one of the categories. integrate the various contextual factors in a generally
However our stimulus selection also comprises genres in valid set of similarity descriptors, these results suggest
which the songs are split into the two classes, such as that algorithms should take into account contextual
R&B and Jazz. Further research is needed to verify the information in order to provide results able to simulate
influence of this ad hoc dimension. the judgment of human listeners. Future research might
Plots obtained projecting the MDS excerpt positions test the validity of the similarity distance proposed by
on the three selected axes show clean clustering, Krumhansl (1978), which considers the perceptual
confirming the relevance of the chosen control variable distance to be dependent on the local contextual density
on participants’ rankings. These results are in line with of objects.
previous experimental studies that found a qualitative Our analyses leave some unanswered questions, show
influence of tempo, timbre and genre on participants’ the experimental limitations of the methodological
judgments using different experimental contexts (Chup- approach, and provide possible suggestions for future
chik et al., 1982; McAdams et al., 2004). Our analyses experiments. While we identified and labelled three axes
extend their validity to a larger set of stimuli and genres in the six-dimensional space (slow–fast, vocal–non-vocal,
of Western popular music, with quantitative verification and synthetic–acoustic), we found no interpretation for
through different analytical tools. The results are the three remaining dimensions. Genre and timbre, being
consistent with the hypothesis proposed by Deliège’s multidimensional concepts, could involve other music
‘cue abstraction’ model: participants determine excerpt aspects, such as texture, dynamics, that might explain the
similarity by extracting relevant music cues during the interpretation of the three axes. The choice of control
listening process, and extend the theoretical framework variables, across which we distribute stimuli, is a pre-
selecting the most salient cues: tempo, and primary experimental limiting factor. A follow up experiment
instrument in the form of vocal–non-vocal and acoustic- selecting a larger number of well-chosen control variables
synthetic. (e.g. rhythm complexity, timbre brightness, harmonic
Perceptual evaluation of inter-song similarity in Western popular music 19

stability, etc.), could investigate the interpretation of the Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern
unexplored dimensions using similar experimental and classification. New York: Wiley.
analytical methodology. Eerola, T., & Bregman, M. (2007). Melodic and contextual
In conclusion, the combination of experimental and similarity of folk song phrases. Musicae Scientiae –
analytical methods presented here provided a quantita- Discussion Forum, 4A, 211–233.
tive estimation of the saliency for three musical Eerola, T., Jarvinen, T., Louhivuori, J., & Toiviainen, P.
dimensions that have a statistically significant influence (2001). Statistical features and perceived similarity of folk
on similarity perception on a large set of songs of melodies. Music Perception, 18, 275–296.
Western popular music: song tempo, presence/absence Efron, B., & Tibshirani, R.J. (1993). An Introduction to the
Bootstrap. Boca Raton, FL: Chapman & Hall.
of vocal parts, and use of synthetic/acoustic timbres.
Goldstone, R.L. (1994). An efficient method for obtaining
Together with the previously reported influence of
similarity data. Behavior Research Methods, Instruments,
genre, these results can be used to estimate the optimal
& Computers, 26(4), 381–386.
weight of the different music dimensions obtained
Green, D.M., & Swets, J.A. (1966/1988). Signal Detec-
through music descriptors such as metadata labels or tion Theory and Psychophysics. Los Altos, CA:
audio features to finally derive a formal model for Peninsula Publishing.
music similarity incorporating perceptual information in Grey, J. (1977). Multidimensional perceptual scaling of
its predictions. musical timbres. Journal of the Acoustical Society of
America, 61, 1270–1277.
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Acknowledgements Herre, J., Allamanche, E., & Ertel, C. (2003). How similar
do songs sound? Towards modeling human perception
This work is performed as part of a Marie Curie Early of musical similarity. In Proceedings of the IEEE
Stage Training grant (MEST-CT-2004–8201). We would Workshop on Applications of Signal Processing to Audio
like to thank J. Engel for the statistical support provided and Acoustics (WASPAA), New Paltz, NY, USA.
during the analysis of the data, and Dr H. Honing for iTunes. (2008). Retrieved from http://www.apple.com/
the analysis suggestions and constructive criticism of the itunes/
experimental method, and Janto Skowronek for the Kendall, M.G. (1975). Rank Correlation Methods. London:
advice on quadratic discriminant analysis. Charles Griffin.
Krumhansl, C.L. (1978). Concerning the applicability of
geometric models to similarity data: The interrelationship
References between similarity and spatial density. Psychological
Review, 85, 445–463.
All Music. (2008). Retrieved from http://www.allmusic.com/ Kruskall, J.B., & Wish, M. (1978). Multidimensional Scaling,
Aucoutourier, J.J., & Pachet, F. (2002). Finding songs that Quantitative Applications in the Social Sciences. London:
sound the same. In Proceedings of the IEEE Benelux Sage.
Workshop on Model based Processing and Coding of Lamont, A., & Dibben, N. (2001). Motivic structure
Audio (MPCA-2002), Leuven, Belgium. and the perception of similarity. Music Perception, 18,
Aucoutourier, J.J., & Pachet, F. (2004). Improving timbre 245–274.
similarity: How high’s the sky? Journal of Negative Levelt, W.J.M., van de Geer, J.P., & Plomp, R. (1966).
Research Results in Speech and Audio Sciences, 1(1), Triadic comparisons of musical intervals. The British
1–13. Journal of Mathematical and Statistical Psychology, 19,
Bella, S.D., & Peretz, I. (2005). Differentiation of classical 163–179.
music requires little learning but rhythm. Cognition, 96B, Logan, B., Ellis, D.P.W., & Berenzweig, A. (2003).
65–78. Toward evaluation techniques for music similarity. In
Berenzweig, A., Logan, B., Ellis, D.P.W., & Whitman, B. Proceedings of the 3rd International Symposium on
(2003). A large-scale evaluation of acoustic and sub- Music Information Retrieval (ISMIR 2002), Paris,
jective music similarity measures. Computer Music France.
Journal, 28, 63–76. Logan, B., & Salomon, A. (2001). A music similarity
Burton, M.L., & Nerlove, S.B. (1976). Balanced designs for function based on signal analysis. In Proceedings of the
triads tests: Two examples from English. Social Science IEEE International Conference on Multimedia and Expo
Research, 5, 247–267. (ICME 2001), Tokyo, Japan.
Cambouropoulos, E. (2009). How similar is similar? MacRae, A.W., Howgate, P., & Geelhoed, E. (1990).
Musicae Scientiae – Discussion Forum, 4B, 7–24. Assessing the similarity of odours by sorting and
Chupchik, G.C., Rickert, M., & Mendelson, J. (1982). by triadic comparisons. Chemical Senses, 15, 691–
Similarity and preference judgements of musical stimuli. 699.
Scandinavian Journal of Psychology, 23, 273–282. McAdams, S., & Matzin, D. (2001). Similarity, invariance
Deliège, I. (2001). Similarity perception – categorization – and musical variation. Annals of the New York Academy
cue abstraction. Music Perception, 18, 233–243. of Sciences, 930, 62–76.
20 Alberto Novello et al.

McAdams, S., Vieillard, S., Houix, O., & Reynolds, R. Pampalk, E., Flexer, A., & Widmer, G. (2005). Improvements
(2004). Perception of musical similarity among contem- of audio-based music similarity and genre classification. In
porary thematic materials in two instrumentations. Proceedings of the 6th International Symposium on Music
Music Perception, 22, 207–237. Information Retrieval (ISMIR 2005), London, UK.
Mirex. (2006). Mirex website. Retrieved from http:// R-project. (2008). Retrieved from http://www.r-project.org/
www.music-ir.org/mirex2006 Shepard, R.N. (1962a). The analysis of proximities: Multi-
Nosofsky, R.M. (1992). Similarity scaling and cognitive dimensional scaling with unknown distance function I.
process models. Annual Review of Psychology, 43, Psychometrika, 27(2), 125–140.
25–54. Shepard, R.N. (1962b). The analysis of proximities: Multi-
Orpen, K.S., & Huron, D. (1992). Measurements of dimensional scaling with unknown distance function II.
similarity in music: A quantitative approach for Psychometrika, 27(3), 219–246.
non parametric representations. Computers in Music Shepard, R.N. (1986). Discrimination and generalization
Research, 4, 1–44. in identification and classification: Comment on Nosofs-
Pampalk, E. (2004). A Matlab toolbox to compute ky. Journal of Experimental Psychology, 115(1), 58–61.
music similarity from audio. In Proceedings of the 5th Tversky, A. (1977). Features of similarity. Psychological
International Symposium on Music Information Retrieval Review, 84, 327–352.
(ISMIR 2004), Barcelona, Spain. Tversky, A., & Gati, I. (1982). Similarity, separability, and the
Pampalk, E. (2006). Computational models of music triangle inequality. Physchological Review, 89, 123–154.
similarity and their application in music information Young, F.W., & Lewyckyj, R. (1996). ALSCAL User’s
Downloaded by [41.235.86.174] at 12:24 10 July 2011

retrieval (PhD thesis). Technische Universität Wien, Guide (5th ed.). Chapel Hill: Psychometric Laboratory,
Austria. University of North Carolina.
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Appendix A. Excerpt selection.

Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End

1 Afro-Pop fast 112 guitar acoustic Salif Keita Tekere Folon 1995 Mango 0:10–0:25
2 Afro-Pop fast 130 piano acoustic Don Pullen & African Yebino spring Live . . . again 1995 Blue Note 7:30–7:45
Brazilian Records
Connection
3 Afro-Pop fast 170 vocal acoustic ARC Music Tongoyo Traditional songs and 2003 ARC Music 0:35–0:50
Productions dance from Africa Productions
4 Afro-Pop slow 89 guitar acoustic Chief Ebenezer Obey Operation feed the Ju ju jubilation 1998 EMI Records 2:30–2:45
nation Ltd
5 Afro-Pop slow 88 piano acoustic Abdullah Ibrahim Hajj (the journey) The journey 1978 Chiaroscuro 1:12–1:27
Records
6 Afro-Pop slow 82 vocal acoustic Youssou N’Dour Macoy The lion 1989 Virgin Records 1:40–1:55
7 Blues fast 220 guitar synthetic Little Charlie & The Percolatin’ Crucial guitar blues 2003 Alligator 1:22–1:37
Nightcats Records
8 Blues fast 170 piano acoustic Roosevelt Skies Hot pants Music is my busyness 2001 Corazong 0:53–1:08
Records
9 Blues fast 220 vocal acoustic S.R. Vaughan Give me back my wig Martin Scorsese presents 1986 Montreux 1:20–1:35
Stevie Ray Vaughan Sounds S.A.
10 Blues slow 63 guitar synthetic Johnny Winter I smell trouble Crucial guitar blues 2003 Alligator 0:02–0:17
Records
11 Blues slow 60 piano acoustic Big Mama Thorton I feel the way I feel Big Mama Thornton & 2004 Arhoolie 0:20–0:35
Muddy Waters Blues Records
Band - 1966
12 Blues slow 61 vocal acoustic Johnny B. Moore Back door friend Live at Blue Chicago 1996 Delmark 1:35–1:50
13 Classical fast 162 guitar acoustic J.S. Bach/Eduardo Prelude suite no.4 Johann Sebastian Bach: 2004 Oehms Classics 1:25–1:40
Fernandez 4 Suites for Lute
14 Classical fast 158 piano acoustic F. Chopin Piano concerto 2 Chopin: Piano 1999 EMI Classics 1:15–1:30
Op.21 III Concertos nos. 1 & 2,
Dutoit, Argerich
15 Classical slow 80 vocal acoustic C. Bartoli & G. Paisiello /Chi vuol la Se tu m’ami - Arie 1992 Decca Music 1:10–1:20
Fischer zingarella antiche Group Ltd
16 Classical slow 80 guitar acoustic J. Rodrigo Adagio Concierto de Aranjuez 1994 Philips 9:25–9:40
Fantası́a para un
Gentilhombre
17 Classical slow 80 piano acoustic E. Satie Gymnopédies Gymnopédies 2000 EMI Classics 0:10–0:25
Perceptual evaluation of inter-song similarity in Western popular music

18 Classical slow 80 vocal acoustic W.A. Mozart Requiem - Introitus W.A. Mozart/ 1999 Virgin Records 2:25–2:40
Norrington, London
Classical Players, et
al.
19 Country fast 120 guitar acoustic Chet Atkins Yakety axe The essential Chet 1972 BMG 1:20–1:35
Atkins Entertainment
20 Country fast 150 piano acoustic Floyd Cramer On the rebound 20 greatest hits 207 Gusto Records 0:35–0:50

(continued)
21
Downloaded by [41.235.86.174] at 12:24 10 July 2011

22
Appendix A. (Continued).

Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End

21 Country fast 282 vocal acoustic Statler Brothers Flowers on the wall Pulp Fiction - collector’s 1994 MCA Records 0:07–0:23
edition
22 Country slow 105 guitar acoustic Willie Nelson Blue eyes crying in All the songs I’ve loved 2002 Special 1:10–1:25
the rain before Marketing
23 Country slow 67 piano acoustic Hargus Robbins I’m hurting Belly up to the bar: 2005 Time Records 0:20–0:35
classic country and
western
24 Country slow 87 vocal acoustic George Jones The grand tour George Jones: The 2003 Epic 0:24–0:39
definitive country
collection
25 Electronica fast 112 guitar synthetic Daft Punk Robot rock Human after all 2005 Virgin Records 0:25–0:40
Ltd
26 Electronica fast 160 piano synthetic Goldie Crystal clear Saturnz return 1998 Ffrr Records Ltd 4:01–4:16
27 Electronica fast 144 vocal synthetic Prodigy No good (start the Music for the jilted 1995 Mute 5:40–5:55
dance) generation
28 Electronica slow 50 guitar acoustic Chemical Brothers Where do I begin Dig your own hole 1997 Astralwerks 0:15–0:30
29 Electronica slow 70 piano synthetic Robert Miles Children (dream Dreamland 1996 Arista 0:54–1:09
version)
30 Electronica slow 90 vocal synthetic Kraftwerk The man-machine The man-machine 1978 Capitol 1:26–1:41
31 Folk fast 122 guitar acoustic Sarah Bolen Fantasy Naked on the inside 2001 Sarah Bolen 3:15–3:30
32 Folk fast 122 piano acoustic Jeff Little Grassy creek Piano man from blue 2003 Jeff Little 1:20–1:35
ridge
Alberto Novello et al.

33 Folk fast 126 vocal acoustic The Seeger Family Muskrat Animal folk songs for 1992 Rounder 0:38–0:53
children and other
people!
34 Folk slow 82 guitar acoustic Cheryl Wheeler But the days and Sylvia hotel 1999 Philo/Umgd 1:52–2:07
nights are long
35 Folk slow 78 piano acoustic Gary Remal Malkin Appalachian sunrise The music of the great 1996 Real Music 1:18–1:33
smoky mountains
36 Folk slow 66 vocal acoustic P. Seeger My name is Liza Waist deep in the big 1967 Sony 1:35–1:50
Kalvelage muddy and other love
songs
37 Hip-Hop fast 100 guitar synthetic Public Enemy New whirl odor New whirl odor 2005 Slam Jamz 9:30–9:45
Records
38 Hip-Hop fast 96 piano synthetic Outkast Mrs Jackson Stankonia 2000 La Face 3:57–4:13
39 Hip-Hop fast 136 vocal synthetic Busta Rhymes Gimme some more The best of Busta 2001 Elektra/Wea 1:00–1:15
Rhymes
40 Hip-Hop slow 85 guitars synthetic Cypress Hill Amplified Stoned raiders 2001 Sony 0:08–0:23

(continued)
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Appendix A. (Continued).

Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End

41 Hip-Hop slow 80 piano synthetic Beastie Boys Ricky’s theme The in sound from way 1996 Capitol 0:08–0:23
out!
42 Hip-Hop slow 80 vocal synthetic Arrested Development People everyday 3 Years 5 months & 2 1992 Capitol 0:27–0:43
days in the Life of
43 Jazz fast 260 guitar synthetic Mike Stern Good question Who let the cats out? 2006 Heads Up 1:30–1:45
44 Jazz fast 270 piano acoustic Herbie Hancock One finger snap Empyrean isles 1964 Blue Note 2:43–2:58
45 Jazz fast 220 vocal acoustic E. Fitzgerald and L. I’ve got my love to Verve jazz masters 1957 Polygram 0:57–1:07
Armstrong keep me warm Records
46 Jazz slow 102 guitar acoustic Bireli Lagraine Insensatez Bireli Lagrene: 1992 Blue Note 1:12–1:27
Standards Records
47 Jazz slow 66 piano acoustic Frank Morgan Mood indigo Mood indigo 1990 Polygram 4:04–4:19
Records
48 Jazz slow 64 vocal acoustic Billie Holiday Good morning The Billie Holiday 1952 Polygram 0:20–0:35
heartache songbook Records
49 Latin fast 220 guitar acoustic Vieja Trova Cuida eso La manigua 1998 Virgin Records 0:00–0:15
Santiaguera
50 Latin fast 270 piano acoustic Jesus Alemany Tumbao de coqueta Cubanismo! 1996 Hannibal 2:37–2:52
51 Latin fast 167 vocal synthetic Manu Chao La marea Proxima estación: 2001 Virgin 0:00–0:15
esperanza
52 Latin slow 101 guitar acoustic Compay Secundo Es mejor vivir ası̀ Lo mejor de la vida 1998 Nonesuch 0:02–0:17
53 Latin slow 105 piano acoustic Buena Vista Social Buena Vista Social Buena Vista Social Club 1997 Nonesuch 0:28–0:43
Club Club
54 Latin slow 81 vocal acoustic Ibrahim Ferrer Mil congojas Buenos hermanos 2003 Nonesuch 1:15–1:30
55 Pop fast 207 guitar synthetic The Bangles Walk like an Greatest hits 1990 Sony 1:31–1:46
Egyptian
56 Pop fast 135 piano acoustic Tori Amos Cornflake girl Under the pink 1994 Atlantic/WEA 4:05–4:20
57 Pop fast 190 vocal acoustic The Housemartins Happy hour Now thats what I Call 1988 Go! Disc Ltd 0:33–0:48
quite good
58 Pop slow 79 guitar acoustic Eric Clapton Tears in heaven The best of Eric Clapton 1999 Reprise/Wea 2:32–2:47
59 Pop slow 67 piano acoustic Lionel Ritchie Easy The definitive collection 2003 Motown 0:00–0:15
60 Pop slow 68 vocal acoustic Elton John Rocket man Rocket man: The 2007 Mercury 1:30–1:40
definitive hits Records
61 R&B fast 135 guitar synthetic L. Buksbaum and S. P. Can’t catch me Sports highlights vol. 2 2006 Freeplaymusic, 0:30–0:45
Perceptual evaluation of inter-song similarity in Western popular music

Schreer BMI
62 R&B fast 134 piano synthetic James Taylor’s Splat Message from the 2001 Ubiquity 0:35–0:50
Quartet godfather Recordings
63 R&B fast 133 vocal acoustic Oliver Morgan Roll call (New Saturday night fish fry 2001 Soul Jazz 0:40–0:55
Orleans funk and
soul)

(continued)
23
Downloaded by [41.235.86.174] at 12:24 10 July 2011

24

Appendix A. (Continued).

Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End

64 R&B slow 85 guitar synthetic P. Calandra & S.P. Blusey Funk Blues vol. 3 2006 Pecamusic/BMI/ 0:00–0:15
Schreer Freeplaymusic
65 R&B slow 84 piano synthetic Niels Landgren Calvados Fonk da world 2001 ACT 1:12–1:27
66 R&B slow 70 vocals acoustic Aretha Franklin Something he can Respect: The very best 2002 Warner/BMG 0:47–1:02
feel of Aretha Franklin
67 Reggae fast 140 guitar synthetic Skavoovie & The Nut monkey Fat footin’ 1996 Moon Ska/ 1:43–1:58
Epitones Caroline
68 Reggae fast 160 piano synthetic Pannonia allstars Balkan fever Budapest ska mood 2002 Megalith 0:50–1:05
Records
69 Reggae fast 155 vocal synthetic Gentleman Face off Confidence 2004 Four Music 1:14–1:29
Productions
70 Reggae slow 80 guitar acoustic Ernest Ranglin Below the bassline Below the bassline 1996 Island 1:18–1:33
71 Reggae slow 80 piano synthetic Bob Marley No woman no cry Legend - The best of 2002 Def Jam 0:05–0:20
Bob Marley and The
Wailers
Alberto Novello et al.

72 Reggae slow 70 vocal acoustic Peter Tosh Legalize it Legalize it 1999 Sony 0:30–0:45
73 Rock fast 200 guitar synthetic AC/DC Whole lotta rosie AC/DC box set 2006 Sony Bmg 2:13–2:28
74 Rock fast 185 piano synthetic Supertramp School The very best of 2001 A&M 3:12–3:27
Supertramp
75 Rock fast 162 vocals synthetic Black Sabbath Paranoid Paranoid 1990 Warner Bros./ 0:36–0:51
WEA
76 Rock slow 68 guitar synthetic Jimi Hendrix Little wing Experience Hendrix - 2000 Experience 1:58–2:13
The best of Jimi Hendrix
Hendrix
77 Rock slow 85 piano synthetic Faith No More Epic The real thing 1989 Reprise/WEA 4:05–4:20
78 Rock slow 80 vocals synthetic Alanis Morisette Right through you Jagged little pill 1995 Maverick 0:36–0:51
Perceptual evaluation of inter-song similarity in Western popular music 25

Appendix B. Excerpt composition per repeated triad.

Triad number First excerpt Second excerpt Third excerpt

triad 1 55 - Pop - Fast - Guitar 56 - Pop - Fast - Piano 57 - Pop - Fast - Vocal
triad 2 52 - Latin - Slow - Guitar 53 - Latin - Slow - Piano 54 - Latin - Slow - Vocal
triad 3 43 - Jazz - Fast - Guitar 44 - Jazz - Fast - Piano 48 - Jazz - Slow - Vocal
triad 4 64 - R&B - Slow - Guitar 65 - R&B - Slow - Piano 66 - R&B - Slow - Vocal
triad 5 31 - Folk - Fast - Guitar 32 - Folk - Fast - Piano 39 - Hip-Hop - Fast - Vocal
triad 6 26 - Electro - Fast - Piano 29 - Electro - Slow - Piano 23 - Country - Slow - Piano
triad 7 7 - Blues - Slow - Guitar 10 - Blues - Fast - Guitar 13 - Classical - Fast - Guitar
triad 8 73 - Rock - Fast - Guitar 68 - Reggae - Fast - Piano 63 - R&B - Fast - Vocal
triad 9 48 - Jazz - Slow - Vocal 54 - Latin - Slow - Vocal 57 - Pop - Fast - Vocal
triad 10 27 - Electro - Fast - Vocal 36 - Folk - Slow - Vocal 42 - Hip-Hop - Slow - Vocal

Appendix C. CBD versus BIBD control experiment: excerpt selection (see Appendix D for excerpt detail).
Downloaded by [41.235.86.174] at 12:24 10 July 2011

Number as
Number in Appendix D Genre Tempo Author Title

1 3 Classical slow C. Bartoli & G. Fischer Paisiello/Chi vuol la zingarella


2 4 Classical fast L. V. Beethoven Symphony no. 9, Op. 125, 4th movement
3 5 Rock slow Jimi Hendrix Foxey lady
4 6 Rock fast Queen Headlong
5 10 Heavy Metal slow Paradise Lost True belief
6 13 Jazz slow F. Sinatra & E. Ellington I like the sunrise
7 14 Jazz fast E. Fitzgerald & L. Armstrong I’ve got my love to keep me warm
8 17 Country slow J. Reeves He’ll have to go
9 18 Country fast Cumberland Highlanders Cumberland mountain home
Downloaded by [41.235.86.174] at 12:24 10 July 2011

26

Appendix D. Control experiment I: triadic comparison vs. grouping paradigm, the excerpt selection.

Number Genre Tempo BPM Author Title Album Year Publisher Begin/End

1 Blues fast 220 S.R. Vaughan Give me back Martin Scorsese 1986 Montreux 1:20–1:30
my wig presents Stevie Sounds S.A.
Ray Vaughan
2 Blues slow 75 B.B. King I need you so Reflections 2003 Geffen Records 0:15–0:25
3 Classical fast 140 L. V. Beethoven/ 4th Mouvement Symphony no.9 2005 EMI Records 1:20–1:30
London Classical Prestissimo Op. 125
Players & Schutz
choir of Norrington
4 Classical slow 80 C. Bartoli & G. Fischer Paisiello/Chi vuol Se tu m’ami - 1992 Decca 1:10–1:20
la zingarella Arie antiche Music Group Ltd
5 Country fast 135 Cumberland Cumberland Cumberland 2000 Rural Rhythm 0:40–0:50
Highlanders mountain home mountain home
6 Country slow 85 J. Reeves He’ll have to go Greatest hits 1972 BMG Entertainment 0:30–0:40
7 Funk fast 115 Don Covay Overtime man So soulful 70’s 1999 Kent 0:10–0:20
8 Funk slow 90 B. Collins Hollywood squares Back in the day: 1976 Warner Bros. 1:10–1:20
The best of Bootsy Collins
9 Heavy Metal fast 168 Metallica The prince Garage inc. 1998 Elektra/Wea 1:30–1:40
Alberto Novello et al.

10 Heavy Metal slow 60 Paradise Lost True belief Reflection 1998 Pid 0:50–1:00
11 Hip-Hop fast 134 Dr. Dre and Eminem Forgot about Dre 2001 1999 Aftermath Ent. 1:19–1:29
Interscope Records
12 Hip-Hop slow 80 Coolio Gangsta’s paradise Gangsta’s paradise 1995 Warner Strategic Market 0.51–0.61
13 Jazz fast 220 E. Fitzgerald & I’ve got my love Verve jazz masters 1957 Polygram Records 0:57–1:07
L. Armstrong to keep me warm
14 Jazz slow 67 F. Sinatra & E. Ellington I like the sunrise Francis A. Sinatra and 1967 Warner Bros./Wea 0:41–0:51
Edward K. Ellington
15 Pop fast 190 The Housemartins Happy hour Now that’s what I 1988 Go! Disc Ltd 0:33–0:43
call quite good
16 Pop slow 68 Elton John Rocket man Rocket man: The 2007 Mercury Records 1:30–1:40
definitive hits
17 Rock fast 147 Queen Headlong Innuendo 1991 Queen Prods. 0:30–0:40
18 Rock slow 100 Jimi Hendrix Foxey lady Are you experienced? 1979 Experience Hendrix 0:56–1:06

You might also like