Professional Documents
Culture Documents
174]
On: 10 July 2011, At: 12:24
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
To cite this article: Alberto Novello, Martin M.F. McKinney & Armin Kohlrausch (2011): Perceptual Evaluation of Inter-song
Similarity in Western Popular Music, Journal of New Music Research, 40:1, 1-26
This article may be used for research, teaching and private study purposes. Any substantial or systematic
reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to
anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should
be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,
proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in
connection with or arising out of the use of this material.
Journal of New Music Research
2011, Vol. 40, No. 1, pp. 1–26
Abstract
Downloaded by [41.235.86.174] at 12:24 10 July 2011
1. Introduction
We describe and test the methodological set up for a web-
based listening experiment that assesses the perception of It is a common phenomenon for music listeners to detect
inter-song similarity optimizing the trade-off between similarity between and within pieces of music. Within a
stimulus coverage and experimental time. The experiment piece of music, listeners spontaneously identify musical
used a relatively large set of stimuli of Western popular segments with similar functions (e.g. choruses, verses,
music: 78 song excerpts selected from 13 genres, involving bridges), deriving a structure for the summarization or
78 participants. The experiment used triadic comparisons description of the song. Similarity between pieces is used
of song excerpts to present the participants with a low- by listeners for comparison of one piece to another, for
complexity task, and a partially balanced incomplete block categorization into styles and genres, and organization of
design (PBIBD) to reduce the number of stimulus songs into collections and play-lists.
comparisons with the consequent possibility of extending Music similarity is an ill-defined concept in the
the stimulus set. The three control variables used in the cognitive and perceptual domains because it is context-
excerpt selection, genre, tempo and timbre, showed dependent (Cambouropoulos, 2009), there is no defini-
statistically significant saliency and a hierarchical degree tion of which musical dimensions influence listeners’
of impact on participants’ pair rankings (genre 4 tempo 4 perception and how music proximity can be objectively
timbre). We investigated the participants’ perceptual space measured (Orpen & Huron, 1992). Nevertheless, in
using a combination of numerical and analytical methods perceptual experiments, participants can easily decide
that help to reduce and represent the dimensionality of the without a formal definition how similar two music pieces
data. We used a combination of scaling and discriminant are (Chupchik, Rickert, & Mendelson, 1982; McAdams,
functions to gain insight into the important factors Vieillard, Houix, & Reynolds, 2004), and they can do it
underlying the organization of the participants’ perceptual consistently (Logan & Salomon, 2001; Pampalk, 2006).
space. In the perceptual space calculated through multi- This fact suggests that although listeners’ perception of
dimensional scaling, we used quadratic discriminant music similarity depends on various complex phenom-
analysis to search for axes that maximized the separation ena, such as timbre, rhythm, culture, social context, and
of the excerpt classes. We identified three axes that were a personal history, listeners can, and do, intuitively
posteriori labelled as ‘slow–fast’, ‘vocal–non-vocal’, and interpret the meaning of similarity consistently. When
‘synthetic–acoustic’. We found a high correlation between asked to describe the motivation for their perceived
the excerpt tempo in beats per minute and the excerpt music similarity, listeners often refer to surface features,
projections on the slow–fast axis. A final analysis showed e.g. prominent music elements of a piece of music such as
that the relevance of the factors responsible for the dynamics, texture, loudness, tempo, and timbre (Lamont
grouping of excerpt subsets is context dependent. & Dibben, 2001; McAdams et al., 2004).
Correspondence: Alberto Novello, Philips Research, DSP Group, High Tech Campus 36, Eindhoven, 5656 AE Netherlands.
E-mail: jestern@libero.it
These findings support the theoretical model proposed and web-texts, or relying on the delicate assumption that
by Deliège (2001) to explain the perception of music two songs are similar if they belong to the same artist,
similarity across song excerpts: during the listening album, or play-list. The use of different test data makes it
process, music listeners extract musical cues between difficult to compare performance between algorithms.
music-segment boundaries, unconsciously building a Moreover, because all previous sources were not
mental description of each song segment; the musical collected by explicitly asking listeners to rate acoustical
surface features are utilized to compare different similarity in a controlled experiment, they might not
segments of the song, and, based on cue proximity, reliably represent the actual perceived music similarity.
music similarity is evaluated. It is possible to extend the Several studies have run listening experiments to evaluate
validity of Deliège’s conclusions to the case of across- algorithms for music similarity comparing participant
piece similarity, comparing the surface features of two results and computer predictions (Aucoutourier &
different pieces of music. Deliège’s model can then be Pachet, 2002; Herre, Allamanche, & Ertel, 2003; Mirex,
applied to predict within- and across-piece similarity. 2006). The reported listening experiments are rather
Two studies support the extension of Deliège’s time-consuming, and, using only few participants, have
hypothesis on the influence of musical surface features limited validity as perceptual data. Overall, there is
to the case of across-piece similarity (Chupchik et al., relatively little attention paid in the computational
1982; Eerola, Jarvinen, Louhivuori, & Toiviainen, 2001). domain to the accurate modelling of perceptual music
Eerola et al. (2001) found rhythm and pitch to be both similarity. In a recent article, Pampalk, Flexer, and
Downloaded by [41.235.86.174] at 12:24 10 July 2011
important factors on perceived similarity between folk Widmer (2005) suggest the possibility that embedding a
melodies synthesized from MIDI representation. Chup- human perceptual and cognitive model into the algo-
chik et al. (1982) performed two experiments to rithms could help overcome the performance ceiling
investigate across-song music similarity between audio observed recently for the pure feature-based algorithms
excerpts and found that tempo, dominant instrument, (Berenzweig et al., 2003; Logan et al., 2003; Aucoutour-
and articulation were the main musical features used by ier & Pachet, 2004; Pampalk, 2004). Perceptual data on
participants for their ratings of similarity among jazz music similarity collected for a large set of Western
improvisations. In comparisons between Classical, Jazz, popular music would be beneficial for the verification of
and Pop-Rock excerpts, the most relevant dimension theoretical models, as reference for perceptual experi-
found was ‘Classical’ versus ‘Contemporary’. ments, and as training/testing material for algorithmic
Despite the consensus of several perceptual experi- applications. Collecting such a large data-base can be a
ments on the influence of music surface features on time-consuming and fatiguing operation for the partici-
perception of music similarity as hypothesized by pants because of the large number of stimulus compar-
Deliège, there is no agreement across studies on which isons required.
music features are relevant in the case of music similarity To collect such perceptual music-similarity data, we
(Chupchik et al., 1982; Eerola et al., 2001; Lamont & developed an experimental method optimizing stimulus
Dibben, 2001; McAdams et al., 2004). From the coverage, experimental time, and simplicity of the task for
comparison of the experimental results of a few studies the participant. In this article, we present our methodol-
(Chupchik et al., 1982; Lamont & Dibben, 2001; ogy and the results of a large-scale perceptual experiment
McAdams et al., 2004) we hypothesize that this lack of using 78 song excerpts selected from 13 genres of Western
agreement can be due to the context of stimuli used, i.e. popular music. A major problem in collecting such data is
each individual stimulus subset could be perceptually related to the trade-off between the number of stimuli and
organized by a specific set of control variables. Because experimental time: even with a small set of stimuli, the
of the limited number of genres or songs used in the number of necessary comparisons can require long
perceptual experiments, the experimental results can be experiments. In the literature three methods have been
effectively used to verify the impact of individual musical used in perceptual experiments to assess similarity among
dimensions on a very particular music context, but have auditory objects: pair-rating (Lamont & Dibben, 2001),
limited representation of the perceptual space of the pair-ranking (Levelt, van de Geer, & Plomp, 1966;
listener in the context of other genres. This fact is MacRae, Howgate, & Geelhoed, 1990), and object-
essential for the development of most algorithmic grouping (McAdams et al., 2004). In pair-scaling, the
applications based on music similarity. participant chooses a value of similarity for a pair of song
Because of the lack of a commonly agreed database of excerpts on a numerical rating scale. Pair-ranking is an
music similarity for a standard evaluation of the ordinal procedure that asks participants to rank pairs of
algorithm performance (Logan, Ellis, & Berenzweig, objects depending on similarity. In an object-grouping
2003), several authors based the training and testing of task, the participant is presented with a number of stimuli
their applications on different sources (Aucoutourier & and has to group them depending on similarity. Two
Pachet, 2002; Berenzweig, Logan, Ellis, & Whitman, studies have shown the difficulty for the participants
2003; Logan et al., 2003) such as metadata annotations and the possible data bias related to the widely used
Perceptual evaluation of inter-song similarity in Western popular music 3
pair-rating paradigm and proved the easiness and with no direct interpretation. The difficulty of the
robustness of an ordinal task such as pair-ranking analytical task resides in several factors:
(Burton & Nerlove, 1976; MacRae et al., 1990). Although
simple and solid in its conception, the grouping paradigm . the complexity of representation: data are visualized
could be applied only when the number of stimuli is small using multidimensional plots commonly interpreted
due to the memory demands for the participant. as the participants’ perceptual space, whose dimen-
An advantage of both pair-ranking and pair-rating is sions are often not easily defined,
that under particular conditions (i.e. assuming the . the complex interaction of the relevant music dimen-
symmetrical and transitive properties of stimulus simi- sions: the relevant structural factors controlling the
larity) some similarity measures can be inferred from multidimensional data organization have simulta-
previous comparisons. This fact allows the experimenter neous global and contextual influence,
to deduce the organization of the perceptual space from a . the definition and measurement of similarity: several
subset of the whole possible set of comparisons (Levelt approaches can be followed, such as the Euclidean
et al., 1966). As often noted in psychology and cognition distance between two items in the multidimensional
(Tversky, 1977; Tversky & Gati, 1982), similarity does not perceptual space, or a binary value representing the
in general follow a symmetric property: e.g. R.E.M. can membership of an excerpt to a specific spatial region
be similar to the Beatles more than the Beatles are similar (e.g. a cluster). In this latter case, however, the
to the R.E.M, nor the transitive property, e.g. a pair of additional problem arises how to define the cluster-
Downloaded by [41.235.86.174] at 12:24 10 July 2011
along which similarity can be judged. In this way we total number of stimuli, the total number of trials, b, in a
attempt to limit the asymmetry and non-transitivity of BIBD is:
pairwise similarity rankings in our database. In the
experimental design we use a partially balanced incom- lnðn 1Þ
b¼ : ð2Þ
plete block design to reduce the number of required kðk 1Þ
comparisons over a relatively large stimulus set. In the
analysis we use the MDS method, which has been used For k ¼ 3 (only) the BIBD reduces the number of
successfully in the past for other perceptual similarity comparisons of the complete design by a factor (n72)/
tasks, such as timbre (Grey, 1977), music intervals l. The BIBD method and the choice of the appropriate
(Levelt et al., 1966), and scent (MacRae et al., 1990), in reduction factor l has been tested in previous perceptual
an attempt to reveal the number of dimensions in the experiments. Levelt et al. (1966) used triadic comparisons
perceptual space. In the data analysis presented here, we and a BIBD for the comparison of musical intervals and
use a combination of scaling and discriminant functions found reliable results with l ¼ 2.
to gain more insight into the relevant dimensions Previous papers (Burton & Nerlove, 1976; MacRae
underlying participants’ rankings of music similarity. In et al., 1990) tested the reliability of the BIBD data in
this way, we obtain a quantitative measure of the comparison to the complete design case for different l
relevance of control variables in similarity rankings to values. Both studies found that a value of l 2 leads
distil the data and discover the important factors in generally to reliable results while the use of l ¼ 1 leads to
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Blues, Classical, Country, Folk, Heavy Metal, Hip-Hop, hosted in the Philips Research Laboratories in Eindhoven
Jazz, Latin, Pop, Reggae, R&B, and Rock. We selected (The Netherlands) between December 2006 and March
six song excerpts from each genre to systematically vary 2007. We recruited the participants by sending an
tempo and primary instrument within each genre. invitation with explanation of our experiment through
We chose genre as a selection criterion to obtain a mailing-lists, on-line forums and universities in different
variable degree of similarity between song excerpts: we countries to have variability among the participants. Each
intended to observe if songs within the same genre were participant was explicitly instructed to test the computer
perceived to be more similar to each other than songs and audio settings to adjust the listening intensity to a
from two different genres, i.e. to test if genre was a comfortable level. We asked participants to fill out a
stronger attractor than the other two control variables: questionnaire to record their musical background, musical
tempo and primary instrument. Tempo and timbre (in training, and listening conditions. The participants were
the form of primary instrument) were used in the 59 males and 19 females. Their age ranged from 18 to 72
selection because they were relevant in the perception years and their average age was 28 years. Most participants
of similarity in the context of previous experimental declared to have had few years of music practice as
studies (Chupchik et al., 1982; McAdams & Matzin, obligatory training in the lower or middle part of the
2001). While tempo is an objective and quantitatively school system of their respective countries. Assuming that
measurable property, timbre is a perceptual phenomenon these early years of practice were not enough to consider a
involving several musical dimensions. In the song participant as a musician, we defined participants having
Downloaded by [41.235.86.174] at 12:24 10 July 2011
selection of the experiment, we represented the timbre had fewer than four years of musical training as non-
of each song by the primary instrument in the excerpt; musicians, and participants having had more than six
inside each genre, we selected two excerpts from each of years of musical training as musicians. According to these
three primary-instrument categories (vocal, piano, and boundaries, 37 participants were considered musicians
guitar). For each primary-instrument category we and 41 were considered non-musicians. Loudspeakers
selected one excerpt from each of two tempo categories: were used by 53 participants, while the other 25 used
slow (5100 beats per minute for a quarter note) and fast headphones. The average actual experimental time for the
(4140 beats per minute for a quarter note). Inside each participants was about 2 h.
category, we aimed at selecting prototypical and popular
examples, to produce a collection of songs as common
2.4 Procedure
and varied as possible. Regarding timbre, we chose songs
with the presence of drums, consistent with the majority The experiment was conducted via a graphical web
of Western popular music. We selected only studio interface available in four languages: Dutch, French,
recordings to minimize concert-related uncontrolled English, and Italian. The first page of the web interface
factors, such as audience noise and recording quality. introduced participants to the purpose and procedure of
The songs satisfying the previous criteria were selected the experiment. After acceptance of participation, each
from an expert-composed database or purchased from participant was given a brief questionnaire with some
the ‘iTunes Store’ (iTunes, 2008). If the selected criteria general questions: name, gender, age, years of musical
were still satisfied, the excerpts were extracted from the training, the listening setting during the experiment
beginning of the chorus as it is typically considered to be (headphones or loudspeakers), and favourite musical
the most representative part of the song. All excerpts genre. The participants were then presented with the
were 15 s long, monophonic, normalized in loudness and experiment page and given the following instructions:
faded in and out over one second. The stimuli were ‘Please listen to three songs by pushing the buttons next
converted to the MPEG-1 Audio Layer 3 (320 Kb s 71) to the letters, then push MOST for the Most similar pair
file format and were completely downloaded on the and LEAST for the Least similar pair; push the NEXT
participants’ computer before being played. All selected button when you are ready to go to the next triadic
excerpts are listed in Appendix A.1 comparison’. The three song excerpts of each triad where
presented on the graphical user interface on the corners
of an equilateral triangle to reduce positioning bias. The
2.3 Participants
participant had to listen to each of them, one at a time,
We conducted the large-scale experiment on the Internet before ranking them. This procedure was repeated until
because of the large number of participants required. The the experiment was completed (204 triadic comparisons).
experiment was run via connection to a server computer The participants could stop at any time and continue the
experiment at a later time, completing it in as many
1
The excerpt database (converted to 192 kbps MP3 format) sessions as desired. Each participant was asked not to
of the experiment can be obtained from the author, discuss the experiment with anybody until the experiment
Alberto Novello, by sending a request mail to: was completed to guarantee independence of data. The
jestern77@yahoo.it participants received 15 Euros for their participation. No
6 Alberto Novello et al.
within- and across-participant concordance. The first the value of the Kendall’s coefficient of concordance is
assesses the reliability of each participant, by measuring displayed. The horizontal line is the significance level
of concordance at p ¼ 0.05. The mean concordance value for
the stability of its perception over time, while the second
all participants is 0.82. Appendix B indicates the excerpt-
assesses the common perception of similarity of a group
composition for each triad.
of listeners. To measure both the within- and across-
participant concordance, we used Kendall’s coefficient of
concordance (Kendall, 1975): one had very low concordance values and three were very
close to significance.
12S
W¼ ; ð3Þ We calculated the mean within-participant concor-
m2 ðn3 nÞ dance per triad. On all 10 repeated triads participants
where m is the number of participants, n ¼ 3 the number were significantly concordant. The two triads that
of song excerpts in a triad, and S is the variance of the showed lowest concordance are numbers 2 and 6.
sum of ranks per excerpt: Excluding these two triads from the calculation, five of
the participants that scored low on within-participant
X
n
S¼ 2;
ðRj RÞ ð4Þ concordance reached the significant level.2 Because
j¼1 participants 16, 20, and 74 were inconsistent even on
triads on which other participants were highly coherent,
where Rj is the sum of participants’ rankings for the we removed their data from the rest of the analysis. We
jth pair of song excerpts, and R is the expected value if calculated a two-sample t-test between distributions of
the rankings by each participant were unrelated, i.e. the concordance for the musicians and the non-musicians for
null hypothesis (equal to 12 mðn þ 1Þ). In the case of each of the 10 repeated triads. We found no statistically
within-participant concordance, for each participant we significant difference. The calculated p values ranged
calculated 10 concordance values using the duplicate from 0.08 to 0.67.
rankings (m ¼ 2) on the 10 repeated triads (n ¼ 3). In the
case of across-participant concordance, we calculated 3.1.2 Across-participant concordance
one concordance value for each of the 102 triads (n ¼ 2)
using the rankings of all 36 participants (m ¼ 36) across The across-participant concordance was significant
each triad. on all ten common triads (Figure 2). The signi-
ficance level was calculated using a Chi-squared
The crosses mark the Kendall’s concordance value of all selected crossed triads for each combination of two
subjects on each of the 10 added triads. The circles mark the
control variables and counted the rankings for each pair.
values of concordance for the second presentation of the 10
added triads. The horizontal bar is the significance level of
The three panels at the bottom of Figure 3 display the
concordance at p ¼ 0.05. The mean across-concordance value occurrence of ‘most similar’ rankings for each of the two
across the 10 triads is 0.51 + 0.06. Appendix B indicates the pairs belonging to the two control-variable categories in
excerpt-composition for each triad. the crossed triads.
When analysing genre and primary-instrument
approximation.3 The two triads which showed the lowest crossed triads (panel (e)), the pair with stimuli belonging
concordance values are number 2 and 6, the same that to the same genre is chosen more often as most similar.
scored low in the within-concordance analysis. We com- In the case of tempo and primary-instrument crossed
pared across-participant concordance for musicians and triads (panel (f)), the tempo pair is chosen more often,
non-musicians on the ten repeated triads using a sign test. but the influence of the two control variables is more
We found no statistically significant difference (p ¼ 0.11). similar than for the conditions presented in panels (e)
and (g). In the case of genre and tempo crossed triads
(panel (g)), the pair with same genre is chosen more often
as most similar. All these differences have statistical
3.2 Saliency of control variables
significance (p 50.01) calculated in a two-sample t-test of
We investigated the salience of each control variable used the bootstrapped distributions.
to select the stimuli (genre, tempo, and primary instru-
ment) on the participant rankings. We used two
3.3 Solidity of the experimental design
approaches: an absolute measure, observing only one
control variable at a time, and a relative measure, To cross-check the nesting-design concordance we
comparing two control variables. In the first case (the examined the rankings of the pairs for participants who
absolute measure), we selected all triads in the PBIBD had the same experimental design. As described before,
with exactly two stimuli belonging to the same class of a we doubled the number of participants in the whole
given control variable (e.g. fast–fast–slow for tempo or experiment to have two participants ranking the same
piano–piano–vocal for primary instrument) and exam- triads. Observing each one of the 39 participant pairs, we
ined the number of occurrences in which that pair of found just one pair whose participants (number 35 and
stimuli was selected as the most similar, least similar, or 74) were not significantly concordant. We attribute the
intermediate. Figure 3 shows the occurrences of each of low concordance of the pair to participant 74 who
the three rankings for each control variable normalized performed very low in the within-participant concor-
dance and was removed from the rest of the analysis.
3
The measure of W is related to the Friedman test statistic,
which can be approximated with a Chi-squared distribution in 3.3.1 Control experiment 1: triadic comparison versus
the case of more than two judges. The significance level is then grouping paradigm
calculated from a Chi-squared distribution with two degrees of
freedom, normalized by the number of participants and degrees To evaluate the methodological solidity of our experi-
of freedom (Kendall, 1975). mental design, we compared the experimental results of
8 Alberto Novello et al.
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Fig. 3. Analysis of the saliency of the three control variables. Panels a to d show the mean number of occurrences of each of the three
possible rankings for each pair in the whole set of triads: a—for the fast pair in the two-fast-song triads, b—for the slow pair in the
two-slow-song triads, c—for the same primary instrument in the two-primary-instrument-song triads, and d—for the same genre pair
in the two-genre-song triads. The three plots at the bottom show the comparison of saliency for each control variable on participants’
rankings. For the two pairs belonging to the two control variables in the crossed triad, we calculated the mean number of occurrences
of most-similar rankings. This operation was performed in the case of e—genre and primary-instrument crossed triad, f—tempo and
primary-instrument crossed triad, and g—genre and tempo crossed triad. In all seven plots we omitted to display the standard error of
the mean that ranged between 0.1% and 2%.
Table 1. Scheme of ‘crossed’ control-variables triads. Blues, Classic, Country, Funk, Heavy Metal, Hip-Hop,
Jazz, Pop, and Rock. For each genre, we chose two
Stimulus 1 Stimulus 2 Stimulus 3 songs, one with a slow tempo (5100 beats per minute for
a quarter note) and one with a fast tempo (4 140 beats
Control variable 1 class A A a per minute for a quarter note).
Control variable 2 class b B B A total of 36 listeners participated in the triadic
comparisons experiment, of which 18 were considered
musicians, having more than seven years of practical
music training and 18 were non-musicians, having less
two side experiments: the first using a triadic comparison than one year of music training. We used these limits in
paradigm and the second using a grouping paradigm the participants selection to have a neat separation
(MacRae et al., 1990). For both we selected a new between the two groups of participants based on music
stimulus set with 18 song excerpts (informations on the practice. Of the participants, 25 were male and 11 were
selected excerpts are listed in Appendix D). The stimuli female, and the average age was 28 years. The experiment
were 10-second excerpts from 18 songs covering nine followed the typical triadic comparison procedure de-
genres of Western music, primarily from popular genres: scribed above.
Perceptual evaluation of inter-song similarity in Western popular music 9
In the grouping paradigm 18 new participants took a BIBD with l ¼ 2. To evaluate the influence of triad
part, nine of which were musicians. The average age was reduction, 12 new participants performed a triadic
24 years. The participants were presented with a graphical comparison experiment with a complete block design
user interface that showed all song excerpts in the form of (CBD) using nine song excerpts (see Appendix C for the
sound icons, randomly placed on the computer screen. selected excerpts). The CBD was presented in three
Each participant was asked to rearrange the items into laboratory sessions. Ten triads were repeated in each
groups depending on similar characteristics. The partici- design to evaluate participant concordance.
pants were free to choose the number of necessary groups All participants showed significant within-participant
and were asked to perform the task twice (with at least concordance on all repeated triads. From the complete
one week break between the two sessions). set of participants’ rankings (84 triads), we extracted the
When the number of items is relatively small and if the rankings of the triads belonging to the BIBD (24 triads).
two-dimensional space used to display the objects is not We built two 9 6 9 similarity matrices adding the
too different from the perceptual space of the participant, rankings of all participants: one for the CBD and one
the grouping method is considered more efficient, user- for the BIBD, and calculated the correlation between the
friendly and intuitive compared to other methodologies, values of the two matrices. A highly significant average
such as pair rating and pair ranking (Goldstone, 1994). correlation was found between the two matrices
We ran this experiment to check if grouping and ranking (r ¼ 0.92, df ¼ 36, p 50.01). We also computed the
provided similar results and to see if the stimulus- correlation between the CBD and BIBD matrices built
Downloaded by [41.235.86.174] at 12:24 10 July 2011
presentation order (imposed in the case of triad using the rankings of each participant. In this case we
comparisons, and free in the case of grouping) influenced found again a highly significant mean correlation
the final results. between the two matrices across participants
We then compared the items’ distance derived from (r ¼ 0.91 + 0.02, df ¼ 36, p 50.01). These results suggest
all participants’ data, collected both in the triadic the reliability of applying a BIBD in substitution of a
comparisons and the grouping paradigm. From the CBD in this perceptual context.
triadic comparisons, we built a 18 6 18 dissimilarity
matrix, assigning two points to every pair ranked as
least similar by participants, one to the intermediate,
and zero to the most similar pair (Levelt et al., 1966).
4. Exploration of the perceptual space
Because we used a non-parametric multi-dimensional By combining the similarity-based pair-rankings for each
scaling method (ALSCAL), the specific values assigned triad from all participants, we constructed a grand
to the three similarity types are not crucial as long as 78 6 78 dissimilarity matrix, in which each cell represents
they are monotonic and rank order is preserved. We the similarity between two independent excerpts. We
also built an 18 6 18 dissimilarity matrix from the assigned two points to each song pair that was chosen to
grouping paradigm using the two sets of experimental be most similar, zero for the least similar and one for the
data for each participant, assigning, for each session, intermediate pair and added each participant’s results to
one point to all song pairs that were not grouped the matrix. In this respect, we tried several combinations
together and a zero value for song pairs grouped of values and found no relevant difference in the
together (MacRae et al., 1990). The values of the two successive analysis. Because our goal is to preserve the
grouping sessions were added. We compared distances ranking, changing these values (e.g. proportionally)
of the same pairs of song excerpts in the triadic would not sensibly alter the perceptual space. Finally,
comparisons and in the grouping paradigm, and found not all matrix cells were judged the same number of
a statistically significant correlation (r ¼ 0.67, df ¼ 152, times; we normalized each cell value by its number of
p 50.01). In line with the finding of MacRae et al. presentations in the perceptual experiment (which ranged
(1990), this result supports the hypothesis that pair from a minimum of 12 to a maximum of 48).
ordering and stimulus grouping are paradigms that In the data analysis presented here, the experimental
provide correlated results, that can be used for a data in the form of a 78 6 78 matrix is used through non-
common perceptual understanding of music similarity. metrical multidimensional scaling (MDS) to estimate the
Although the grouping procedure is easier for a small ‘participant perceptual space’, which is a topological
set of stimuli, pair ordering may be preferable with a representation of the perceived similarity between songs
large number of stimuli. for visualization and analysis, with the main objective of
quantitatively investigating the influence of aggregators
such as tempo and primary instrument on the organiza-
3.3.2 Control experiment 2: complete block design versus
tion of the participant perceptual space.
balanced incomplete block design
MDS has been used successfully in the past for other
We ran a second control experiment to estimate the perceptual similarity tasks, such as timbre (Grey, 1977),
reliability of the reduction of comparisons introduced by music intervals (Levelt et al., 1966), and scent (MacRae
10 Alberto Novello et al.
et al., 1990). To narrow the focus and the dimensions contextual interpretation (Shepard, 1986). Because of
along which similarity can be judged, we measured this fact, in our analysis we used non-metrical MDS to
similarity in a very constrained environment (triadic scan the resulting space for any potential dimension
comparisons, short excerpts, limited vocals, limited correlates, not assuming any particular dimension
control variables: genre-tempo and primary instrument). returned by the MDS algorithm.
Our intention is to test from different perspectives our If the perceptual space requires high dimensionality
hypotheses that specific attributes of our stimuli are used for display, genre topology and hierarchical clustering
by listeners in similarity rankings. Using several techni- are complementary perspectives for a preliminary sim-
ques, we search for a clear ordering of our attributes plified representation of the perceptual space: the first
along any arbitrary dimension in the MDS space, to represents the spatial distribution of clusters and the
provide support for our hypothesis. second represents the distance among individual ex-
cerpts. A combination of both techniques provided
in our case an efficient preliminary approach for a
4.1 Analytical methodology
qualitative investigation of the features of the perceptual
To assess the influence of each control variable on the space.
distribution of song excerpts in the perceptual space, it is To quantitatively verify the existence of features
first necessary to have a representation of the topology of observed in the previous analysis stages and measure
such a space, i.e. compute the coordinates for each song the influence of the control variables, tempo and primary
Downloaded by [41.235.86.174] at 12:24 10 July 2011
excerpt. MDS is a set of statistical techniques largely instrument, used for the stimuli selection on the
used in literature for visualization of complex data to organization of the perceptual space, we use quadratic
reduce data dimensionality (Shepard, 1962a, 1962b; discriminant analysis (QDA). QDA is a classification
Chupchik et al., 1982; Eerola et al., 2001; Lamont & method used to separate two or more classes of objects
Dibben, 2001). MDS takes as input a matrix of pairwise through the use of quadratic functions of the objects’
item similarity scores (such as the 78 6 78 matrix of features (Duda, Hart, & Stork, 2001). In the present
perceptual data) and assigns a location of each item in a article, we employ QDA in a novel fashion using the
lower-dimensional space. MDS can transform metrical control-variable assigned to each song excerpt (genre,
or non-metrical data, such as as rankings, into coordi- tempo (slow and fast), and primary instrument (guitar,
nates. We used the ALSCAL algorithm (Young & piano, vocal)) as classification classes and the MDS
Lewyckyj, 1996) for the non-metrical MDS analysis. coordinates for each song excerpt as features. In this
The data distortion introduced by an MDS algorithm way, we obtain a quantitative measure of the influence of
when attempting to reduce the number of dimensions is each specific control variable, expressed as a measure of
represented by the Pearson’s correlation coefficient spatial separation of stimulus classes, and the axis
between the dissimilarity matrix (input) and the distances interpretation, as the control variable correlated with
between coordinates in the MDS-calculated space (out- the separation. Based on our observation that the
put). Every violation of the ordering of the data in the stimulus classes in the MDS space do not appear to be
dissimilarity matrix is a distortion of the original data of equivalent size, we chose to use QDA, rather than a
and decreases the Pearson’s correlation coefficient value: linear approach where it is assumed that all classes have
the higher the Pearson’s correlation coefficient, the more equivalent variance.
reliable the representation of the original data. One
important parameter that the experimenter has to
4.2 Dimension calculation
carefully define in the MDS computation, is thus the
number of dimensions to keep a low space distortion and As a first step we generated a 2D MDS estimate of
easiness of visualization of the experimental data. the perceptual similarity space. The results show a low
We first investigated the variation of the Pearson’s degree of correlation between MDS representation and
correlation coefficient value with the number of dimen- the data points (Pearson’s correlation coefficient of
sions (Shepard, 1962a, 1962b) to decide on the optimal 0.63). The resulting plot of excerpts in this 2D space
number of final dimensions in the MDS output. We then (Figure 4) shows the typical circular shape of a data
extracted the coordinates of each stimulus in the multi- set whose dimensionality is under-represented. It is
dimensional space that we interpret as the participants’ clear from this analysis that the similarity data
perceptual space. One difficulty of the MDS method requires more than two dimensions to be modelled
resides in the fact that the new coordinates are referred to accurately.
a set of axes chosen by the algorithm to maximize the We calculated the plot of Pearson’s correlation
data variance but may have no explicit connection to coefficient value versus the number of dimensions
physical properties of the stimuli; it is up to the user, by (Figure 5), to estimate the number of dimensions required
observation and extra analysis of the data distribution, to to accurately model the participants’ perceptual space.
determine the final rotation of the axes and their Because the Pearson correlation coefficient saturates at
Perceptual evaluation of inter-song similarity in Western popular music 11
songs in the perceptual space computed in the MDS calculation of dimensions: the points represent the Pearson correlation
(as performed by the ALSCAL algorithm (Young & Lewyckyj, coefficient value corresponding to the final number of dimen-
1996)). The 13 different markers represent the belonging to the sions chosen for the MDS calculation. The Pearson correlation
13 genres used in the stimulus selection. The number next to coefficient saturates at a relatively high value around 0.81.
each song refers to the songs properties reported in the
Appendix A. In the MDS calculation of this plot, the Pearson
correlation coefficient has a value of 0.6.
acoustic side of the axis, while Electro, Hip-Hop, and example, the excerpts are scattered in the two farthest
Rock are positioned toward the synthetic side. This clusters (at the top and bottom of Figure 8): a relatively
topology suggests that the acoustic-synthetic timbral large distance is thus needed to group them all together.
quality is a relevant factor in listeners’ judgments of On the other hand, in the case of Classical, we can use a
music. The genre topology has the advantage of reducing smaller distance to group all excerpts. The genre-
the complexity of the output data by displaying genre topology plot and the hierarchical-clustering dendro-
centroids in two dimensions with a relatively high gram provide complementary information and together
correlation coefficient (0.86) at the price of losing help distinguish different distribution types: Folk and
information of the excerpt-spread within each genre. Country centroids are near each other in the genre
topology plot, but their song-position spread in the
dendrogram is quite large, while the Classical centroid is
isolated from other genres in the topology plot, and the
songs are tightly clustered in the dendrogram. The genre
topology and the clustering dendrogram might be
integrated in one analytical method for the visualization
of the perceptual space. In such an approach, one could
normalize the Euclidean distance between two genres
(from the genre topology) by an estimate of the metrical
spread of the two genres (from the dendrogram) to
display the overlap of two genre clusters while keeping a
relatively low dimensional space. Another observation
from the dendrogram in Figure 8 concerns the char-
acteristic of clusters. At the highest hierarchical level
(right most in the figure), the clusters are split into two
groups differentiated by the tempo and rhythm of the
song excerpts they contain; the bottom-most cluster
contains mainly excerpts belonging to the slow-tempo
class, while the top-most cluster contains excerpts
belonging to the fast-tempo class or slow songs belonging
to rhythmically complex genres, such as Afro-Pop, Hip-
Hop and Reggae. As a measure of cluster separation, we
calculated the discriminability, or d 0 of the distribution of
Fig. 7. Genre topology: two-dimensional MDS plot, calculated
from the 13 6 13 dissimilarity matrix constructed from the
BPM values of the song excerpts for the two clusters (see
relative distances of the 13 genre centroids (Pearson correlation Appendix A for relative values):
coefficient ¼ 0.86). The arrows drawn represent the separation
between prevalently acoustic and synthetic genres. The values difference between means m 1 m2
on the axes refer to the point distances in the two-dimensional d0 ¼ ¼ ; ð5Þ
MDS-space.
spread ðs1 s2 Þ1=2
Perceptual evaluation of inter-song similarity in Western popular music 13
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Fig. 8. Dendrogram of the hierarchical clustering of song excerpts. On the y-axis the individual songs are represented by: the genre,
followed by two subscripts for the fast- (F) and slow- (S) tempo class, and for the guitar- (G), piano- (P), and vocal- (V) primary-
instrument class; the excerpt reference number corresponds to Appendix B. The length of the horizontal lines indicates the distance
between two clusters in the six-dimensional space derived through MDS. At the highest hierarchical level (right-most in the figure) the
clusters are split into two groups: the fast-rhythmically complex group (top of the figure), the slow group (bottom of the figure). The
vertical line is the threshold chosen to separate the ten sub-clusters, six of which are labelled in the figure.
where m1 ¼ 150 BPM and m2 ¼ 86 BPM, the means of the The excerpt characteristics inside each sub-cluster sug-
two tempo distributions and s1 ¼ 58 BPM and s2 ¼ 28 gests a preliminary labelling for six of them (from top to
BPM, their standard deviations. This results in a d 0 ¼ 1.5, bottom in Figure 8): Instrumental fast, Vocal fast, Vocal
which is greater than the value (1.0) typically used as rhythmically complex (Afro-Pop, Reggae, Hip-Hop),
the threshold for perceptual discriminability (Green & Classical, Instrumental slow, Vocal slow. Although still
Swets, 1966/1988). A double sample t-test confirmed the unverified at this stage, this labelling suggests the possible
statistically significant separation of the two distributions interpretation of some of the MDS axes: we see in fact
(p 50.01). excerpt clustering depending on tempo (slow–fast), genre
From the analysis of the composition of the two main (Classical, rhythmically complex genres), and primary
clusters, we set a threshold to distinguish ten sub-clusters. instrument (vocal versus non-vocal music).
14 Alberto Novello et al.
importance of each dimension (feature) separating the coordinates were necessary to create the best stimulus
excerpts into each class (control variable). Thus, QDA separation, with a correct classification of 75.2 + 7.9%.
was used to determine which MDS axes were most The highest confusion occurs between piano and guitar
relevant in maximizing the separation between the two classes with an average misclassification of 23.2%.
classes of a given control variable: for tempo, the Grouping piano and guitar classes in one new class
separation of fast and slow pieces; for primary instru- named ‘non-vocal’, and comparing vocal and non-vocal
ment, the separation of vocal and non-vocal (guitar and classes with QDA improved classification performance to
piano excerpts were grouped together); and for the 95.9 + 2.4%. In this new case a combination of just two
separation of synthetic and acoustic excerpts. In the MDS coordinates is sufficient to achieve optimal
general case, more than one MDS axis is relevant in classifier performance.
the maximization of separation of stimuli classes for a These results suggest that in the presence of vocal
given control variable: a linear combination of the stimuli, the timbre difference between excerpts with
relevant MDS axes provides then a new axis candidate guitar and piano as primary instrument is not a strong
that alone best separates the classes. factor in determining music similarity perception. We
In the QDA analysis, to verify the separation of determined the axis that best separates vocal and non-
synthetic versus acoustic genres hypothesized in the vocal classes as a linear combination of the two MDS
observation of the genre topology, we added a new post- coordinates sufficient to achieve optimal classifier per-
hoc control variable called ‘Synthetic-acoustic’ manually formance. In the following part of the article, we will
labelling the songs one by one: song excerpts containing refer to this new axis as the ‘vocal–non-vocal axis’. We
primarily timbres of distorted or deeply effected guitars, calculated the discriminability of the two distributions of
electronic drums and synthesizer pads were assigned song-excerpt positions on the vocal–non-vocal axis and
to the synthetic class, while song excerpts using found a d 0 ¼ 2.3.
acoustic instruments or unprocessed amplified instru- In the case of synthetic–acoustic timbre classes, a
ments (clean) were assigned to the acoustic class. In combination of two MDS coordinates was necessary and
Appendix A the reader finds the assigned class for each sufficient to achieve the top performance of the classifier:
song excerpt. 97.2 + 1.3%. One of these two dimensions was the
In the case of the slow and fast classes, the MDS tempo axis, used previously for the separation of slow
output was found to be already optimal in the QDA and fast classes. We defined a new axis, the ‘synthetic–
analysis: one of the six MDS coordinates could alone acoustic axis’, as a linear combination of these two MDS
produce the highest separation between stimuli classes coordinates. Because of this definition, the synthetic–
with a correct classification mean of 82.8 + 5.7%. In the acoustic axis is not orthogonal with respect to the slow–
following part of this article, we will refer to the axis fast axis, with a 368 separation between the fast and the
identified by this coordinate as the ‘slow–fast axis’ synthetic direction and between the slow and the acoustic
because of its salience in separating the tempo classes. directions. We calculated the discriminability of the two
Because the calculation of the QDA performance distributions of song-excerpt positions on the synthetic–
relies only on the number of stimuli correctly classified, acoustic axis and found d 0 ¼ 2.4.
the d 0 computation adds complementary information Figure 9 displays the projection of the MDS positions
because it estimates the classes discriminability using the of the 78 song excerpts on the newly defined axes: the
Perceptual evaluation of inter-song similarity in Western popular music 15
slow–fast, the vocal–non-vocal and synthetic–acoustic stimuli). Further research is needed to test the presence of
axes. Different symbols represent excerpts belonging to such an effect connecting timbre and tempo in a larger set
different control variable classes. All three plots outline of stimuli.
the clear clustering of songs on each of the three
calculated axes. While Figures 9(a) and (c) show a
4.6 Contextual dependency of similarity factors
homogeneous distribution of songs, Figure 9(b) shows
the song positions distributed in a more diagonal Although the QDA analysis showed the global influence
fashion. This distribution reflects the non-orthogonality of tempo and timbre (the latter in the form of two axes:
of the synthetic–acoustic axis and the slow–fast axis, the vocal–non-vocal and synthetic–acoustic), it is reasonable
consequence of the definition of the synthetic–acoustic to suppose the magnitude of the influence on perceived
axis as a linear combination of the slow–fast axis and one music similarity of each possible musical dimension to be
of the axes resulting from the MDS calculation. context dependent (Cambouropoulos, 2009). To analyse
From our stimulus selection, we are not in a position this effect, we iteratively selected three genres (18
of hypothesizing if the distribution of songs along the excerpts); from all experimental triads, we selected those
diagonal of Figure 9(b) could be a general behaviour: one that contained only stimuli belonging to the defined
cause could be that the majority of songs using synthetic subset, and used their participant rankings (assigning
timbres have fast tempi, and the majority of slow songs two points to the least similar pair, one to the
use acoustic timbres; however, the diagonal distribution intermediate and zero to the most similar) to build an
Downloaded by [41.235.86.174] at 12:24 10 July 2011
in Figure 9(b) might also be caused by the unequal 18 6 18 dissimilarity matrix representing the relative
numbers of stimuli in the synthetic and acoustic classes, distances between all stimuli in the subset. We then used
which cannot be balanced because of the a-posteriori MDS to compute the two-dimensional representation of
definition of the control variable: there are fewer slow- the perceptual space of the specific stimulus subset.
synthetic excerpts (11 stimuli) than fast-synthetic ex- In Figure 10(a), we show the MDS plots of data from
cerpts (18 stimuli) and there are fewer fast-acoustic 18 excerpts selected from three rather different genres:
excerpts (21 stimuli) than slow-acoustic excerpts (29 Afro-Pop, Classical, and Rock. We observe three tight
Fig. 9. Projection of the six-dimensional coordinates of the 78 song excerpts on the three axes selected from the QDA analysis. The three
panels show two-dimensional representations for each pair of axes: a) Primary instrument versus tempo axes, b) Synthetic-acoustic versus
tempo axes, c) Synthetic-acoustic versus primary-instrument axes. In the three plots, to outline the clustering, song excerpts belonging to
different control variable classes are represented with different symbols explained in the legends.
16 Alberto Novello et al.
genre-dependent clusters. Substituting Afro-Pop with as Rock, Classical and Afro-Pop, the genre clustering
Latin, in Figure 10(b), we observe again a genre-based might mask the effect of other less salient musical
clustering and the emergence of a tempo separation dimensions such as tempo or timbre. On the other hand,
within genres. Figure 10(c) displays the MDS results for in the case of excerpts selected from less musicologically
the excerpts from Afro-Pop, Country, and Latin genres. distant genres, e.g. Folk and Country, or Latin and Afro-
In this case, the genre separation is less evident, especially Pop, the effect of genre might be less strong and the effect
for the Latin and Afro-Pop excerpts; on the x axis, we of tempo, rhythm, and timbre can have a stronger
observe a global effect of tempo separation with slow influence on the perceived similarity. The effect of
excerpts towards the negative values of the x axis and fast context can be linked to two different causes. Consider-
excerpts towards the positive values; on the y axis, we ing the lower triad level, changing one song can alter the
interpret the genre distribution to be influenced by perception of the triad: if all three songs have slow
rhythmic complexity: relatively simple and steady rhyth- tempo, for example, tempo does not play a strong role; if
mic structures (prevalent in Country) towards positive we have one fast song and two slow songs, then the effect
values of the y axis and more rhythmically complex of tempo on similarity is different. However, this
structures (prevalent in Latin and Afro-Pop) towards perceptual ‘switch’ can be due to the listeners not making
negative values of the y axis. The musicological proximity use of tempo in a specific context, or because there is not
of Latin and Afro-pop supports this interpretation. enough variability of tempo inside the stimuli. In
The results shown in Figure 10 support previous conclusion, from the previous analysis it is not clear
Downloaded by [41.235.86.174] at 12:24 10 July 2011
experimental hypotheses (Lamont & Dibben, 2001; whether the observed variation of the relevance of the
Eerola & Bregman, 2007) suggesting that factors control variables with the stimulus subset reflects a
responsible for the grouping of excerpt-subsets are likely context-dependent perception of the participants, or that
to be context dependent. In particular it seems reason- in a subpart of the perceptual space some control
able to hypothesize that in the presence of excerpts variables have less variation causing one dimension to
selected from three musicologically distant genres, such emerge as dominant.
Fig. 10. Example of different contextual-dependent factors in the interpretation of excerpts-subsets grouping: a) Classical/Latin/Rock,
excerpts are grouped by genres; b) Afro-Pop/Classical/Rock, we observe the influence of tempo in the distribution (along the drawn
axis); c) Afro-Pop/Country/Latin, we interpret the x axis as ‘Tempo’, and y axis as ‘Rhythmic complexity’. Each marker identifies
excerpt positions of the genres in the legend. The letters identify the control variable classes: ‘F’ for fast, ‘S’ for slow, ‘V’ for vocal, ‘G’
for guitar, and ‘P’ for piano.
Perceptual evaluation of inter-song similarity in Western popular music 17
experiment was conducted through the Internet and used pants’ rankings in our stimulus context: genre 4
78 music excerpts and involved 78 participants with tempo 4 primary instrument. This hierarchy is in
various music backgrounds. agreement with the findings of previous studies that used
The initial analysis of the experimental results was different experimental methodologies and stimulus con-
conducted to assess the existence of commonly-perceived texts (Chupchik et al., 1982; Lamont & Dibben, 2001;
inter-song music similarity and test the solidity of the McAdams & Matzin, 2001), and extends the validity of
methodology used. We found significant within-partici- the previous results to other genres of Western popular
pant concordance for 90% of participants and significant music. Chupchik et al. (1982) found that tempo, primary
across-participant concordance for 100% of the tested instrument, and articulation were the main musical
triads. The high within-participant concordance con- features used by participants for similarity pair-rating
firmed the existence of a clear and stable interpretation of of Jazz improvisations. In a second experiment using
music similarity for most listeners. The high across- pair-rating on excerpts selected from three different
participant concordance suggests the existence of an genres, the most relevant dimensions found were genre
underlying common perception of music similarity across and tempo. McAdams and Matzin (2001) used a group-
listeners, which could be formalized into a theoretical ing paradigm and excerpts of contemporary musical
model able to represent and predict the perception of material and found that tempo and timbre influenced
music similarity. Despite the different experimental participants’ judgments.
contexts, these results support the findings on across- The predominance of genre in influencing partici-
participant concordance of the two user tests run by pants’ judgments is not surprising: genre is a collection
Logan and Salomon (2001) with 20 participants and of attributes encompassing several musical dimensions.
Pampalk (2006) with 25 participants. Our study extends Genre is in fact not necessarily orthogonal to primary
their conclusions on the existence of a common percep- instrument and tempo: e.g. Heavy Metal can be
tion of music similarity to a large set of songs and genres identified by the sound of distorted guitars, and
of Western popular music, with a larger number of Heavy-Metal songs tend to have high tempi. Given the
participants and with several cross-checks. In the large- complexity of genre, it seems necessary to extend the
scale experiment we did not find any statistically validity of our conclusions by exploring each genre in
significant difference between musicians and non-musi- detail with a specific experiment aimed at determining
cians in within- and across-participant concordance. This relevant dimensions inside each genre: the six songs per
result is in line with previous studies (on different song genre that we selected are not sufficient in our opinion to
databases) that found little or no statistically significant cover the musical variety of a musical genre. Further-
difference between musicians and non-musicians in the more it seems reasonable to expect that the relevance of
dimensions used for their judgments of music similarity the control variables might vary for different genres.
(Chupchik et al., 1982; Lamont & Dibben, 2001; Between the two other control variables, tempo can be
McAdams et al., 2004). objectively and quantitatively defined by the BPM value
We tested the robustness of the experimental method on the quarter note. Timbre, on the other hand, is a
with two control experiments. The high correlation perceptual phenomenon, and can be described using
found comparing the similarity rankings of a complete several musical dimensions, such as frequency spectrum,
block design against the rankings of a balanced music instruments in the piece, and recording techni-
18 Alberto Novello et al.
ques. Our choice of representing timbre by the pre- The final analysis showed that factors responsible for
dominant instrument is limited and would need further the grouping of excerpt subsets are likely to be context
investigation, for which the same experimental design dependent. However, our experimental design does not
could be used. allow us to determine whether the varying relevance of
To quantitatively explore the perceptual space of the different music dimensions is due to the different
participants, we used a combination of numerical and cognitive weight that they assume in the mind of the
analytical methods. Through multidimensional scaling listener, or if it is due to the limited variability of a
(MDS), we found that six dimensions were optimal to specific dimension within the stimuli in a region of the
represent the participants’ perceptual space obtained perceptual space. In the song selection of our experi-
from their similarity rankings in our experimental mental design, we used prototypical songs from several
context. By using QDA to model the excerpt positions genres to extend the context of our results and find
in the participants’ perceptual space, we selected and common dimensions that can be used in the majority of
labelled two axes that maximized the discriminability applicative scenarios, i.e. a database with different genres
(and the classifier performance) between the two excerpt of Western popular music. However, because music
classes of each control variable: ‘slow–fast’ (d 0 ¼ 2.0) and similarity is context-based, these results may not be easily
‘vocal–non-vocal’ (d 0 ¼ 2.3). Interpretations based on the extended to the case of specific music databases: for
hierarchical clustering and genre topology analyses example, within a collection of jazz music, the perceptual
suggested the influence of synthetic versus acoustic dimensions defining similarity may have completely
Downloaded by [41.235.86.174] at 12:24 10 July 2011
timbres on the perceived similarity. We defined a different weights from the ones found in this study.
posteriori a third control variable manually assigned Despite all verification used in the experimental and
based on excerpt timbre, that provided a third relevant analytical methodology, because of the subjectivity of
axis: ‘synthetic–acoustic’ (d 0 ¼ 2.4). The high correlation our song selection, it is not certain also that the
(r ¼ 0.66 , df ¼ 77, p 50.01) between projected positions perceptual space we found will be similar in another
of excerpts on the slow–fast axis and the logarithm of experiment using the same number of genres and songs
each excerpt BPM value adds further evidence that the but selected from a different database by another
selected slow–fast axis indeed relates to tempo. Perhaps experimenter. Future research using a similar paradigm
because of the complexity of timbre perception, we could but different song sets is needed to verify how much the
not find a unique objective variable to perform a similar present results can be generalized to predict inter-song
measure of axis-correlation in the case of the vocal–non- similarity using several genres of Western popular music.
vocal, and the synthetic–acoustic axes. This observation about context raises the problem of
It can be argued that the synthetic–acoustic property how to formally integrate the global and contextual
represents just a form of genre similarity, because of the influence of each control variable in a theoretical model
presence of genres such as electronica or classical that or algorithmic application. Although it seems difficult to
have the majority of songs in one of the categories. integrate the various contextual factors in a generally
However our stimulus selection also comprises genres in valid set of similarity descriptors, these results suggest
which the songs are split into the two classes, such as that algorithms should take into account contextual
R&B and Jazz. Further research is needed to verify the information in order to provide results able to simulate
influence of this ad hoc dimension. the judgment of human listeners. Future research might
Plots obtained projecting the MDS excerpt positions test the validity of the similarity distance proposed by
on the three selected axes show clean clustering, Krumhansl (1978), which considers the perceptual
confirming the relevance of the chosen control variable distance to be dependent on the local contextual density
on participants’ rankings. These results are in line with of objects.
previous experimental studies that found a qualitative Our analyses leave some unanswered questions, show
influence of tempo, timbre and genre on participants’ the experimental limitations of the methodological
judgments using different experimental contexts (Chup- approach, and provide possible suggestions for future
chik et al., 1982; McAdams et al., 2004). Our analyses experiments. While we identified and labelled three axes
extend their validity to a larger set of stimuli and genres in the six-dimensional space (slow–fast, vocal–non-vocal,
of Western popular music, with quantitative verification and synthetic–acoustic), we found no interpretation for
through different analytical tools. The results are the three remaining dimensions. Genre and timbre, being
consistent with the hypothesis proposed by Deliège’s multidimensional concepts, could involve other music
‘cue abstraction’ model: participants determine excerpt aspects, such as texture, dynamics, that might explain the
similarity by extracting relevant music cues during the interpretation of the three axes. The choice of control
listening process, and extend the theoretical framework variables, across which we distribute stimuli, is a pre-
selecting the most salient cues: tempo, and primary experimental limiting factor. A follow up experiment
instrument in the form of vocal–non-vocal and acoustic- selecting a larger number of well-chosen control variables
synthetic. (e.g. rhythm complexity, timbre brightness, harmonic
Perceptual evaluation of inter-song similarity in Western popular music 19
stability, etc.), could investigate the interpretation of the Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern
unexplored dimensions using similar experimental and classification. New York: Wiley.
analytical methodology. Eerola, T., & Bregman, M. (2007). Melodic and contextual
In conclusion, the combination of experimental and similarity of folk song phrases. Musicae Scientiae –
analytical methods presented here provided a quantita- Discussion Forum, 4A, 211–233.
tive estimation of the saliency for three musical Eerola, T., Jarvinen, T., Louhivuori, J., & Toiviainen, P.
dimensions that have a statistically significant influence (2001). Statistical features and perceived similarity of folk
on similarity perception on a large set of songs of melodies. Music Perception, 18, 275–296.
Western popular music: song tempo, presence/absence Efron, B., & Tibshirani, R.J. (1993). An Introduction to the
Bootstrap. Boca Raton, FL: Chapman & Hall.
of vocal parts, and use of synthetic/acoustic timbres.
Goldstone, R.L. (1994). An efficient method for obtaining
Together with the previously reported influence of
similarity data. Behavior Research Methods, Instruments,
genre, these results can be used to estimate the optimal
& Computers, 26(4), 381–386.
weight of the different music dimensions obtained
Green, D.M., & Swets, J.A. (1966/1988). Signal Detec-
through music descriptors such as metadata labels or tion Theory and Psychophysics. Los Altos, CA:
audio features to finally derive a formal model for Peninsula Publishing.
music similarity incorporating perceptual information in Grey, J. (1977). Multidimensional perceptual scaling of
its predictions. musical timbres. Journal of the Acoustical Society of
America, 61, 1270–1277.
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Acknowledgements Herre, J., Allamanche, E., & Ertel, C. (2003). How similar
do songs sound? Towards modeling human perception
This work is performed as part of a Marie Curie Early of musical similarity. In Proceedings of the IEEE
Stage Training grant (MEST-CT-2004–8201). We would Workshop on Applications of Signal Processing to Audio
like to thank J. Engel for the statistical support provided and Acoustics (WASPAA), New Paltz, NY, USA.
during the analysis of the data, and Dr H. Honing for iTunes. (2008). Retrieved from http://www.apple.com/
the analysis suggestions and constructive criticism of the itunes/
experimental method, and Janto Skowronek for the Kendall, M.G. (1975). Rank Correlation Methods. London:
advice on quadratic discriminant analysis. Charles Griffin.
Krumhansl, C.L. (1978). Concerning the applicability of
geometric models to similarity data: The interrelationship
References between similarity and spatial density. Psychological
Review, 85, 445–463.
All Music. (2008). Retrieved from http://www.allmusic.com/ Kruskall, J.B., & Wish, M. (1978). Multidimensional Scaling,
Aucoutourier, J.J., & Pachet, F. (2002). Finding songs that Quantitative Applications in the Social Sciences. London:
sound the same. In Proceedings of the IEEE Benelux Sage.
Workshop on Model based Processing and Coding of Lamont, A., & Dibben, N. (2001). Motivic structure
Audio (MPCA-2002), Leuven, Belgium. and the perception of similarity. Music Perception, 18,
Aucoutourier, J.J., & Pachet, F. (2004). Improving timbre 245–274.
similarity: How high’s the sky? Journal of Negative Levelt, W.J.M., van de Geer, J.P., & Plomp, R. (1966).
Research Results in Speech and Audio Sciences, 1(1), Triadic comparisons of musical intervals. The British
1–13. Journal of Mathematical and Statistical Psychology, 19,
Bella, S.D., & Peretz, I. (2005). Differentiation of classical 163–179.
music requires little learning but rhythm. Cognition, 96B, Logan, B., Ellis, D.P.W., & Berenzweig, A. (2003).
65–78. Toward evaluation techniques for music similarity. In
Berenzweig, A., Logan, B., Ellis, D.P.W., & Whitman, B. Proceedings of the 3rd International Symposium on
(2003). A large-scale evaluation of acoustic and sub- Music Information Retrieval (ISMIR 2002), Paris,
jective music similarity measures. Computer Music France.
Journal, 28, 63–76. Logan, B., & Salomon, A. (2001). A music similarity
Burton, M.L., & Nerlove, S.B. (1976). Balanced designs for function based on signal analysis. In Proceedings of the
triads tests: Two examples from English. Social Science IEEE International Conference on Multimedia and Expo
Research, 5, 247–267. (ICME 2001), Tokyo, Japan.
Cambouropoulos, E. (2009). How similar is similar? MacRae, A.W., Howgate, P., & Geelhoed, E. (1990).
Musicae Scientiae – Discussion Forum, 4B, 7–24. Assessing the similarity of odours by sorting and
Chupchik, G.C., Rickert, M., & Mendelson, J. (1982). by triadic comparisons. Chemical Senses, 15, 691–
Similarity and preference judgements of musical stimuli. 699.
Scandinavian Journal of Psychology, 23, 273–282. McAdams, S., & Matzin, D. (2001). Similarity, invariance
Deliège, I. (2001). Similarity perception – categorization – and musical variation. Annals of the New York Academy
cue abstraction. Music Perception, 18, 233–243. of Sciences, 930, 62–76.
20 Alberto Novello et al.
McAdams, S., Vieillard, S., Houix, O., & Reynolds, R. Pampalk, E., Flexer, A., & Widmer, G. (2005). Improvements
(2004). Perception of musical similarity among contem- of audio-based music similarity and genre classification. In
porary thematic materials in two instrumentations. Proceedings of the 6th International Symposium on Music
Music Perception, 22, 207–237. Information Retrieval (ISMIR 2005), London, UK.
Mirex. (2006). Mirex website. Retrieved from http:// R-project. (2008). Retrieved from http://www.r-project.org/
www.music-ir.org/mirex2006 Shepard, R.N. (1962a). The analysis of proximities: Multi-
Nosofsky, R.M. (1992). Similarity scaling and cognitive dimensional scaling with unknown distance function I.
process models. Annual Review of Psychology, 43, Psychometrika, 27(2), 125–140.
25–54. Shepard, R.N. (1962b). The analysis of proximities: Multi-
Orpen, K.S., & Huron, D. (1992). Measurements of dimensional scaling with unknown distance function II.
similarity in music: A quantitative approach for Psychometrika, 27(3), 219–246.
non parametric representations. Computers in Music Shepard, R.N. (1986). Discrimination and generalization
Research, 4, 1–44. in identification and classification: Comment on Nosofs-
Pampalk, E. (2004). A Matlab toolbox to compute ky. Journal of Experimental Psychology, 115(1), 58–61.
music similarity from audio. In Proceedings of the 5th Tversky, A. (1977). Features of similarity. Psychological
International Symposium on Music Information Retrieval Review, 84, 327–352.
(ISMIR 2004), Barcelona, Spain. Tversky, A., & Gati, I. (1982). Similarity, separability, and the
Pampalk, E. (2006). Computational models of music triangle inequality. Physchological Review, 89, 123–154.
similarity and their application in music information Young, F.W., & Lewyckyj, R. (1996). ALSCAL User’s
Downloaded by [41.235.86.174] at 12:24 10 July 2011
retrieval (PhD thesis). Technische Universität Wien, Guide (5th ed.). Chapel Hill: Psychometric Laboratory,
Austria. University of North Carolina.
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End
1 Afro-Pop fast 112 guitar acoustic Salif Keita Tekere Folon 1995 Mango 0:10–0:25
2 Afro-Pop fast 130 piano acoustic Don Pullen & African Yebino spring Live . . . again 1995 Blue Note 7:30–7:45
Brazilian Records
Connection
3 Afro-Pop fast 170 vocal acoustic ARC Music Tongoyo Traditional songs and 2003 ARC Music 0:35–0:50
Productions dance from Africa Productions
4 Afro-Pop slow 89 guitar acoustic Chief Ebenezer Obey Operation feed the Ju ju jubilation 1998 EMI Records 2:30–2:45
nation Ltd
5 Afro-Pop slow 88 piano acoustic Abdullah Ibrahim Hajj (the journey) The journey 1978 Chiaroscuro 1:12–1:27
Records
6 Afro-Pop slow 82 vocal acoustic Youssou N’Dour Macoy The lion 1989 Virgin Records 1:40–1:55
7 Blues fast 220 guitar synthetic Little Charlie & The Percolatin’ Crucial guitar blues 2003 Alligator 1:22–1:37
Nightcats Records
8 Blues fast 170 piano acoustic Roosevelt Skies Hot pants Music is my busyness 2001 Corazong 0:53–1:08
Records
9 Blues fast 220 vocal acoustic S.R. Vaughan Give me back my wig Martin Scorsese presents 1986 Montreux 1:20–1:35
Stevie Ray Vaughan Sounds S.A.
10 Blues slow 63 guitar synthetic Johnny Winter I smell trouble Crucial guitar blues 2003 Alligator 0:02–0:17
Records
11 Blues slow 60 piano acoustic Big Mama Thorton I feel the way I feel Big Mama Thornton & 2004 Arhoolie 0:20–0:35
Muddy Waters Blues Records
Band - 1966
12 Blues slow 61 vocal acoustic Johnny B. Moore Back door friend Live at Blue Chicago 1996 Delmark 1:35–1:50
13 Classical fast 162 guitar acoustic J.S. Bach/Eduardo Prelude suite no.4 Johann Sebastian Bach: 2004 Oehms Classics 1:25–1:40
Fernandez 4 Suites for Lute
14 Classical fast 158 piano acoustic F. Chopin Piano concerto 2 Chopin: Piano 1999 EMI Classics 1:15–1:30
Op.21 III Concertos nos. 1 & 2,
Dutoit, Argerich
15 Classical slow 80 vocal acoustic C. Bartoli & G. Paisiello /Chi vuol la Se tu m’ami - Arie 1992 Decca Music 1:10–1:20
Fischer zingarella antiche Group Ltd
16 Classical slow 80 guitar acoustic J. Rodrigo Adagio Concierto de Aranjuez 1994 Philips 9:25–9:40
Fantası́a para un
Gentilhombre
17 Classical slow 80 piano acoustic E. Satie Gymnopédies Gymnopédies 2000 EMI Classics 0:10–0:25
Perceptual evaluation of inter-song similarity in Western popular music
18 Classical slow 80 vocal acoustic W.A. Mozart Requiem - Introitus W.A. Mozart/ 1999 Virgin Records 2:25–2:40
Norrington, London
Classical Players, et
al.
19 Country fast 120 guitar acoustic Chet Atkins Yakety axe The essential Chet 1972 BMG 1:20–1:35
Atkins Entertainment
20 Country fast 150 piano acoustic Floyd Cramer On the rebound 20 greatest hits 207 Gusto Records 0:35–0:50
(continued)
21
Downloaded by [41.235.86.174] at 12:24 10 July 2011
22
Appendix A. (Continued).
Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End
21 Country fast 282 vocal acoustic Statler Brothers Flowers on the wall Pulp Fiction - collector’s 1994 MCA Records 0:07–0:23
edition
22 Country slow 105 guitar acoustic Willie Nelson Blue eyes crying in All the songs I’ve loved 2002 Special 1:10–1:25
the rain before Marketing
23 Country slow 67 piano acoustic Hargus Robbins I’m hurting Belly up to the bar: 2005 Time Records 0:20–0:35
classic country and
western
24 Country slow 87 vocal acoustic George Jones The grand tour George Jones: The 2003 Epic 0:24–0:39
definitive country
collection
25 Electronica fast 112 guitar synthetic Daft Punk Robot rock Human after all 2005 Virgin Records 0:25–0:40
Ltd
26 Electronica fast 160 piano synthetic Goldie Crystal clear Saturnz return 1998 Ffrr Records Ltd 4:01–4:16
27 Electronica fast 144 vocal synthetic Prodigy No good (start the Music for the jilted 1995 Mute 5:40–5:55
dance) generation
28 Electronica slow 50 guitar acoustic Chemical Brothers Where do I begin Dig your own hole 1997 Astralwerks 0:15–0:30
29 Electronica slow 70 piano synthetic Robert Miles Children (dream Dreamland 1996 Arista 0:54–1:09
version)
30 Electronica slow 90 vocal synthetic Kraftwerk The man-machine The man-machine 1978 Capitol 1:26–1:41
31 Folk fast 122 guitar acoustic Sarah Bolen Fantasy Naked on the inside 2001 Sarah Bolen 3:15–3:30
32 Folk fast 122 piano acoustic Jeff Little Grassy creek Piano man from blue 2003 Jeff Little 1:20–1:35
ridge
Alberto Novello et al.
33 Folk fast 126 vocal acoustic The Seeger Family Muskrat Animal folk songs for 1992 Rounder 0:38–0:53
children and other
people!
34 Folk slow 82 guitar acoustic Cheryl Wheeler But the days and Sylvia hotel 1999 Philo/Umgd 1:52–2:07
nights are long
35 Folk slow 78 piano acoustic Gary Remal Malkin Appalachian sunrise The music of the great 1996 Real Music 1:18–1:33
smoky mountains
36 Folk slow 66 vocal acoustic P. Seeger My name is Liza Waist deep in the big 1967 Sony 1:35–1:50
Kalvelage muddy and other love
songs
37 Hip-Hop fast 100 guitar synthetic Public Enemy New whirl odor New whirl odor 2005 Slam Jamz 9:30–9:45
Records
38 Hip-Hop fast 96 piano synthetic Outkast Mrs Jackson Stankonia 2000 La Face 3:57–4:13
39 Hip-Hop fast 136 vocal synthetic Busta Rhymes Gimme some more The best of Busta 2001 Elektra/Wea 1:00–1:15
Rhymes
40 Hip-Hop slow 85 guitars synthetic Cypress Hill Amplified Stoned raiders 2001 Sony 0:08–0:23
(continued)
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Appendix A. (Continued).
Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End
41 Hip-Hop slow 80 piano synthetic Beastie Boys Ricky’s theme The in sound from way 1996 Capitol 0:08–0:23
out!
42 Hip-Hop slow 80 vocal synthetic Arrested Development People everyday 3 Years 5 months & 2 1992 Capitol 0:27–0:43
days in the Life of
43 Jazz fast 260 guitar synthetic Mike Stern Good question Who let the cats out? 2006 Heads Up 1:30–1:45
44 Jazz fast 270 piano acoustic Herbie Hancock One finger snap Empyrean isles 1964 Blue Note 2:43–2:58
45 Jazz fast 220 vocal acoustic E. Fitzgerald and L. I’ve got my love to Verve jazz masters 1957 Polygram 0:57–1:07
Armstrong keep me warm Records
46 Jazz slow 102 guitar acoustic Bireli Lagraine Insensatez Bireli Lagrene: 1992 Blue Note 1:12–1:27
Standards Records
47 Jazz slow 66 piano acoustic Frank Morgan Mood indigo Mood indigo 1990 Polygram 4:04–4:19
Records
48 Jazz slow 64 vocal acoustic Billie Holiday Good morning The Billie Holiday 1952 Polygram 0:20–0:35
heartache songbook Records
49 Latin fast 220 guitar acoustic Vieja Trova Cuida eso La manigua 1998 Virgin Records 0:00–0:15
Santiaguera
50 Latin fast 270 piano acoustic Jesus Alemany Tumbao de coqueta Cubanismo! 1996 Hannibal 2:37–2:52
51 Latin fast 167 vocal synthetic Manu Chao La marea Proxima estación: 2001 Virgin 0:00–0:15
esperanza
52 Latin slow 101 guitar acoustic Compay Secundo Es mejor vivir ası̀ Lo mejor de la vida 1998 Nonesuch 0:02–0:17
53 Latin slow 105 piano acoustic Buena Vista Social Buena Vista Social Buena Vista Social Club 1997 Nonesuch 0:28–0:43
Club Club
54 Latin slow 81 vocal acoustic Ibrahim Ferrer Mil congojas Buenos hermanos 2003 Nonesuch 1:15–1:30
55 Pop fast 207 guitar synthetic The Bangles Walk like an Greatest hits 1990 Sony 1:31–1:46
Egyptian
56 Pop fast 135 piano acoustic Tori Amos Cornflake girl Under the pink 1994 Atlantic/WEA 4:05–4:20
57 Pop fast 190 vocal acoustic The Housemartins Happy hour Now thats what I Call 1988 Go! Disc Ltd 0:33–0:48
quite good
58 Pop slow 79 guitar acoustic Eric Clapton Tears in heaven The best of Eric Clapton 1999 Reprise/Wea 2:32–2:47
59 Pop slow 67 piano acoustic Lionel Ritchie Easy The definitive collection 2003 Motown 0:00–0:15
60 Pop slow 68 vocal acoustic Elton John Rocket man Rocket man: The 2007 Mercury 1:30–1:40
definitive hits Records
61 R&B fast 135 guitar synthetic L. Buksbaum and S. P. Can’t catch me Sports highlights vol. 2 2006 Freeplaymusic, 0:30–0:45
Perceptual evaluation of inter-song similarity in Western popular music
Schreer BMI
62 R&B fast 134 piano synthetic James Taylor’s Splat Message from the 2001 Ubiquity 0:35–0:50
Quartet godfather Recordings
63 R&B fast 133 vocal acoustic Oliver Morgan Roll call (New Saturday night fish fry 2001 Soul Jazz 0:40–0:55
Orleans funk and
soul)
(continued)
23
Downloaded by [41.235.86.174] at 12:24 10 July 2011
24
Appendix A. (Continued).
Synthetic/
Number Genre Tempo BPM Timbre Acoustic Author Title Album Year Publisher Begin/End
64 R&B slow 85 guitar synthetic P. Calandra & S.P. Blusey Funk Blues vol. 3 2006 Pecamusic/BMI/ 0:00–0:15
Schreer Freeplaymusic
65 R&B slow 84 piano synthetic Niels Landgren Calvados Fonk da world 2001 ACT 1:12–1:27
66 R&B slow 70 vocals acoustic Aretha Franklin Something he can Respect: The very best 2002 Warner/BMG 0:47–1:02
feel of Aretha Franklin
67 Reggae fast 140 guitar synthetic Skavoovie & The Nut monkey Fat footin’ 1996 Moon Ska/ 1:43–1:58
Epitones Caroline
68 Reggae fast 160 piano synthetic Pannonia allstars Balkan fever Budapest ska mood 2002 Megalith 0:50–1:05
Records
69 Reggae fast 155 vocal synthetic Gentleman Face off Confidence 2004 Four Music 1:14–1:29
Productions
70 Reggae slow 80 guitar acoustic Ernest Ranglin Below the bassline Below the bassline 1996 Island 1:18–1:33
71 Reggae slow 80 piano synthetic Bob Marley No woman no cry Legend - The best of 2002 Def Jam 0:05–0:20
Bob Marley and The
Wailers
Alberto Novello et al.
72 Reggae slow 70 vocal acoustic Peter Tosh Legalize it Legalize it 1999 Sony 0:30–0:45
73 Rock fast 200 guitar synthetic AC/DC Whole lotta rosie AC/DC box set 2006 Sony Bmg 2:13–2:28
74 Rock fast 185 piano synthetic Supertramp School The very best of 2001 A&M 3:12–3:27
Supertramp
75 Rock fast 162 vocals synthetic Black Sabbath Paranoid Paranoid 1990 Warner Bros./ 0:36–0:51
WEA
76 Rock slow 68 guitar synthetic Jimi Hendrix Little wing Experience Hendrix - 2000 Experience 1:58–2:13
The best of Jimi Hendrix
Hendrix
77 Rock slow 85 piano synthetic Faith No More Epic The real thing 1989 Reprise/WEA 4:05–4:20
78 Rock slow 80 vocals synthetic Alanis Morisette Right through you Jagged little pill 1995 Maverick 0:36–0:51
Perceptual evaluation of inter-song similarity in Western popular music 25
triad 1 55 - Pop - Fast - Guitar 56 - Pop - Fast - Piano 57 - Pop - Fast - Vocal
triad 2 52 - Latin - Slow - Guitar 53 - Latin - Slow - Piano 54 - Latin - Slow - Vocal
triad 3 43 - Jazz - Fast - Guitar 44 - Jazz - Fast - Piano 48 - Jazz - Slow - Vocal
triad 4 64 - R&B - Slow - Guitar 65 - R&B - Slow - Piano 66 - R&B - Slow - Vocal
triad 5 31 - Folk - Fast - Guitar 32 - Folk - Fast - Piano 39 - Hip-Hop - Fast - Vocal
triad 6 26 - Electro - Fast - Piano 29 - Electro - Slow - Piano 23 - Country - Slow - Piano
triad 7 7 - Blues - Slow - Guitar 10 - Blues - Fast - Guitar 13 - Classical - Fast - Guitar
triad 8 73 - Rock - Fast - Guitar 68 - Reggae - Fast - Piano 63 - R&B - Fast - Vocal
triad 9 48 - Jazz - Slow - Vocal 54 - Latin - Slow - Vocal 57 - Pop - Fast - Vocal
triad 10 27 - Electro - Fast - Vocal 36 - Folk - Slow - Vocal 42 - Hip-Hop - Slow - Vocal
Appendix C. CBD versus BIBD control experiment: excerpt selection (see Appendix D for excerpt detail).
Downloaded by [41.235.86.174] at 12:24 10 July 2011
Number as
Number in Appendix D Genre Tempo Author Title
26
Appendix D. Control experiment I: triadic comparison vs. grouping paradigm, the excerpt selection.
Number Genre Tempo BPM Author Title Album Year Publisher Begin/End
1 Blues fast 220 S.R. Vaughan Give me back Martin Scorsese 1986 Montreux 1:20–1:30
my wig presents Stevie Sounds S.A.
Ray Vaughan
2 Blues slow 75 B.B. King I need you so Reflections 2003 Geffen Records 0:15–0:25
3 Classical fast 140 L. V. Beethoven/ 4th Mouvement Symphony no.9 2005 EMI Records 1:20–1:30
London Classical Prestissimo Op. 125
Players & Schutz
choir of Norrington
4 Classical slow 80 C. Bartoli & G. Fischer Paisiello/Chi vuol Se tu m’ami - 1992 Decca 1:10–1:20
la zingarella Arie antiche Music Group Ltd
5 Country fast 135 Cumberland Cumberland Cumberland 2000 Rural Rhythm 0:40–0:50
Highlanders mountain home mountain home
6 Country slow 85 J. Reeves He’ll have to go Greatest hits 1972 BMG Entertainment 0:30–0:40
7 Funk fast 115 Don Covay Overtime man So soulful 70’s 1999 Kent 0:10–0:20
8 Funk slow 90 B. Collins Hollywood squares Back in the day: 1976 Warner Bros. 1:10–1:20
The best of Bootsy Collins
9 Heavy Metal fast 168 Metallica The prince Garage inc. 1998 Elektra/Wea 1:30–1:40
Alberto Novello et al.
10 Heavy Metal slow 60 Paradise Lost True belief Reflection 1998 Pid 0:50–1:00
11 Hip-Hop fast 134 Dr. Dre and Eminem Forgot about Dre 2001 1999 Aftermath Ent. 1:19–1:29
Interscope Records
12 Hip-Hop slow 80 Coolio Gangsta’s paradise Gangsta’s paradise 1995 Warner Strategic Market 0.51–0.61
13 Jazz fast 220 E. Fitzgerald & I’ve got my love Verve jazz masters 1957 Polygram Records 0:57–1:07
L. Armstrong to keep me warm
14 Jazz slow 67 F. Sinatra & E. Ellington I like the sunrise Francis A. Sinatra and 1967 Warner Bros./Wea 0:41–0:51
Edward K. Ellington
15 Pop fast 190 The Housemartins Happy hour Now that’s what I 1988 Go! Disc Ltd 0:33–0:43
call quite good
16 Pop slow 68 Elton John Rocket man Rocket man: The 2007 Mercury Records 1:30–1:40
definitive hits
17 Rock fast 147 Queen Headlong Innuendo 1991 Queen Prods. 0:30–0:40
18 Rock slow 100 Jimi Hendrix Foxey lady Are you experienced? 1979 Experience Hendrix 0:56–1:06