Professional Documents
Culture Documents
We have all heard numerous melodies, whether they in content-based retrieval methods for large music
come from commercial jingles, jazz ballads, operatic databases such as query by humming (QBH) (Ghias
arias, or any of a variety of different sources. How a et al. 1995; Mo, Han, and Kim 1999) but also in
human detects similarities in melodies has been other diverse applications such as helping prove
studied extensively (Martinez 2001; Hofmann-Engl music copyright infringement (Cronin 1998). Previ-
2002; Mllensiefen and Frieler 2004). There has ous formal mathematical approaches to rhythmic
also been some effort in modeling melodies so that and melodic similarity, such as the one taken in this
similarities can be detected algorithmically. Some article, are based on methods like one-dimensional
results in this fascinating study of musical percep- edit-distance computations (Toussaint 2004), ap-
tion and computation can be found in a collection proximate string-matching algorithms (Bainbridge
edited by Hewlett and Selfridge-Field (1998). et al. 1999; Lemstrm 2000), hierarchical correla-
Similarity measures for melodies find application tion functions (Lu, You, and Zhang 2001), two-
dimensional augmented suffix trees (Chen et al.
Computer Music Journal, 30:3, pp. 6776, Fall 2006 2000), transportation distances (Typke et al. 2003;
2006 G. Aloupis, T. Fevens, S. Langerman, T. Matsui, A. Mesa, Lubiw and Tanur 2004), and maximum segment
Y. Nuez, D. Rappaport, and G. Toussaint. overlap (Ukkonen, Lemstrm, and Mkinen 2003).
Aloupis et al. 67
Figure 1. The first two Figure 2. Two orthogonal
measures of a well-known periodic melodies.
melody are shown below
our representation using
an orthogonal polygonal
chain.
sented in Aloupis et al. (2003). We describe two al- lines and two horizontal edges (one from each
gorithms to find the minimum area between two melody). Note that k is at most n + m. The area be-
given orthogonal melodies, Ma and Mb, of size n and tween Ma and Mb is the sum of the areas of all Ci. If
m, respectively (n > m). Here, n (or m) is the number Mb starts completely below Ma and moves in the
of non-vertical edges in the polygonal chain repre- positive z direction, then for any given Ci the lower
sentation. The algorithms can be used for cyclic horizontal edge (from Mb) will approach the upper
melodies as well as in the context of retrieving fixed horizontal edge, while the area of Ci decreases
short patterns from a database (open planar orthogo- linearly. This happens until the horizontal edges are
nal chains). Apart from minor details, there is no coincident and the area of Ci is zero. Then the upper
difference between the cyclic and open cases. We horizontal edge (now from Mb) moves away from
have chosen to describe the algorithms for the case the lower fixed horizontal edge, while the area of Ci
where the melodies are cyclic. The first algorithm increases linearly.
assumes that the Q direction is fixed, and it runs in We will consider the vertical position of Mb to be
O(n) time. The second algorithm finds the mini- the z-coordinate of its first edge. We define z = 0 to
mum area when both the z and Q relative positions be the position where this edge overlaps the first
can be varied. We prove that it runs in O(nmlogn) edge of Ma. Let Ai(z) denote the area of Ci as a func-
time. In each case, we assume that the edges defin- tion of z. Define zi to be the coordinate at which
ing Ma and Mb are given in the order in which they Ai = 0. These k positions of Mb where some Ai be-
appear in the melodies. Finally, we discuss natural comes zero are called z-events. The slope of Ai(z) is
extensions, both for the polygonal description of determined by the length of the horizontal edges of
melodies and for different types of queries. Ci. The total area between Ma and Mb is given by
A(z) = i =1 Ai(z).
k
Minimization with Respect to z Direction Note that because A(z) is the sum of piecewise-
linear convex functions, it too is piecewise-linear
In the first algorithm, we will assume that both and convex. Furthermore, its minimum must occur
melodies are fixed in the Q direction. Without loss of at a z-event.
generality, we will assume that melody Ma is fixed The function A(z) is given by
in both directions, so all motions are relative to Mb.
A(z) = w i | zbi zai | ,
To see how the area between the two melodies
changes as Mb moves in the z direction, consider a where zbi is the vertical coordinate of Mb in Ci,
set of lines defined by all vertical edges of the zai corresponds to Ma, and wi is the weight (width) of
melodies as shown in Figure 3. This set of lines par- Ci, as shown in Figure 3. Let ai denote the vertical
titions the area between the melodies into rectan- offset of each horizontal edge in Mb from zb1. Thus
gles Ci (i = 1, . . . , k), each defined by two vertical we have zbi = zb1 + ai, and A(z) is now given by
Aloupis et al. 69
A(z) = w i | zb1 (zai !i)| . all leaves in T) and D (the number of growing leaves
minus the number of shrinking leaves in T). The
Finally, notice that the term zai ai is equal to zi. weighted median of all zi can be calculated by tra-
Thus, we obtain versing the tree from root to leaf, always choosing
A(z) = w i | zi zb1 | . the path that balances the total weight on both
sides of the path. The time for this is O(logk).
This is a weighted sum of distances from zb1 to all Suppose that we shift Mb by some offset DQ,
the z-events. The minimum is the weighted uni- which is small enough such that no vertical edges
variate median of all zi and can be found in O(k) overlap during the shift. Each wi belonging to a
time (Reiser 1978). This median is the vertical coor- growing leaf must be increased by DQ, and each wi
dinate that zb1 must have so that A(z) is minimized. belonging to a shrinking leaf must be decreased by
Once this is accomplished, it is straightforward to this amount. Instead of actually updating all our in-
compute the sum of areas O(k) in time. Recall that puts, we just maintain a global variable DQ, repre-
k is at most n + m; therefore, a minimum of A(z) can senting the total offset in the Q direction. The total
be computed in O(n) time. weight of a subtree T is now WT + DDQ.
When we shift to a position where two vertical
edges share the same Q coordinate, we potentially
Minimization with Respect to z and Q Directions eliminate some Ci, create a new Ci, or change type of
Ci. The number of such changes is constant for each
If no vertical edges among Ma and Mb share the same pair of collinear vertical edges. The weight given to
Q coordinate, then Mb may be shifted in at least one a created leaf must equal DQ. Each of these changes
of the two directions Q so that the sum of areas involves O(logk) work to update the information
does not increase. This means that to find the global stored in the ancestors of a newly inserted, deleted,
minimum, the only Q coordinates that need to be or altered leaf. There are O(nm)such instances
considered are those where two vertical edges coin- where this must be done and where the median
cide. Thus, our first algorithm may be applied O(nm) must be recomputed, so the total time to compute
times to find the global minimum in a total of all candidate positions of Mb is O(nmlogn).
O(n2m) time. We now propose a different approach At every Q coordinate where we recalculate the
to improve this time complexity. median, we also need to calculate the integral of
As described in the previous section, for a given Q, area between the two melodies. For a given median
the area minimization resembles the computation z*, the area summation for those Ci for which z* > zi
of a weighted univariate median. When we shift Mb has the form wi(z* zi). This can be calculated in
by DQ, we are essentially changing the input weights O(logk) time if we know the value of this summa-
to this median. Some Ci grow in width, some be- tion for every subtree. To do this, we store some ad-
come narrower, and some stay the same width. As ditional information at every subtree T. Specifically,
we keep shifting, at Q coordinates where vertical the area is given by
edges coincide, we have the destruction of a Ci and
z(WT + D"#) (w i zi) "# Izi ,
creation of another Ci. An important observation is
that all Ci grow (or shrink) at the same rate. where in the second summation, I takes the values
Let us store the z-events and their weights in the (+1, 0, 1) for growing, unchanged, and shrinking
leaves of a balanced binary search tree. Each leaf leaves, respectively. These two summations are the
represents one Ci. The leaves are ordered by the additional parameters that need to be stored, and
value zi. Each leaf also has a label to distinguish be- they can be updated in O(logk) time at every critical
tween the three types of Ci: those that are growing, Q coordinate. We must also perform a similar O(logk)
shrinking, or unaffected when Mb is shifted infini- time calculation of wi(zi z*) for all zi > z*. No ad-
tesimally in the positive direction. At every node ditional parameters are needed for this.
with subtree T, we store WT (the sum of weights of Thus, at every critical Q position, we can calculate
Aloupis et al. 71
Figure 4. Two monotonic
chains and their strips.
A A
A(Ci )
Z Q1 Q2 Z
Figure 5 Figure 7
A A
Z Q2 Q Z
Q1 1
Figure 6 Figure 8
between two consecutive inflection points. This functions in R. If not at the global mini-
would be the only region between two successive in- mum, assume without loss of generality that
flection points where the function is not monotonic. the minimum is to the left of Q1.
To compute the minimum of the aggregate func- 4. For the subset of functions in R whose min-
tion, we give the following algorithm: ima are to the right of Q1, compute the me-
dian Q2 of their left inflection points. Q2
1. Let R be the set of individual area functions. splits the subset into the left group and the
Let F be a single quadratic term, initialized right group.
to zero. 5. If Q2 Q1, as shown in Figure 7, replace all
2. Compute Q1, the median of the x-coordinates functions in the right group with a single lin-
of the minima of all functions in R, as shown ear term, which is a summation of all indi-
in Figure 6. vidual left-hand linear terms. Update F by
3. Compute the value and gradient of the total adding this term to it. Remove the right
area function at Q1 by querying F and all group from R.
Aloupis et al. 73
6. Otherwise, if Q2 < Q1, as shown in Figure 8, points in O(nmlogn) and sweep along each height
compute the gradient of the total function at difference horizontally, updating the area function
Q2. If the global minimum is to the left of in O(1) time per critical point (i.e., O(nm) per height
Q2, follow the instructions of step 5 on the difference). As a result, the time complexity is dom-
right group. Otherwise, if the minimum is inated by the sorting step. Even in the simplest
between Q2 and Q1, replace all functions in case, where we just wish to compute the minimum
the left group with a single quadratic term, area while keeping z fixed, we do not know how to
which is a summation of all individual quad- avoid sorting all critical positions.
ratic terms. Then update F and remove the If all weights are equal (i.e., we have evenly spaced
left group from R. sampling of melodies), then each median computa-
7. Go to step 2. tion takes O(m) time, and there are O(n) critical po-
sitions. Thus, a brute force approach takes O(nm)
The algorithm performs O(|R|)work during each time. A direct implementation of our tree algorithm
iteration, and a constant fraction of R is removed would take O(nmlogn) time, because at each of the
each time. The total time is O(n), by a simple geo- O(n) critical positions we would have to update all
metric series summation, as given in Cormen et al. O(m) leaves of our tree. It is possible that this can
(2001). Thus, in linear time we can compute the be greatly improved.
minimum area between two chains monotonic in x,
found over all vertical translations.
Updating the aggregate function as we shift one of
the chains along the x-axis appears to be non-trivial.
Conclusion
It is no longer true that the optimal position must
We have given efficient algorithms for computing
occur when vertices from each chain are aligned
the minimum area between two polygonal chains,
vertically. Also, when we make a small shift along
which is a known method of comparing melodies.
the x-axis, not only do the two linear parts of each
Other sweep-line algorithms for melodic similarity
individual function change slopes, but the center of
exist (e.g., Ukkonen, Lemstrm, and Mkinen 2003;
symmetry of each function may also shift. (Recall
Lubiw and Tanur 2004); however, ours is designed
that these are functions of the z-coordinate.) These
to handle a continuous spectrum of pitch and time.
changes depend on the slopes of our chains within
We do not assume a fixed set of allowed pitches or
each strip and are not difficult to compute on an in-
time differences. On the other hand, we do assume
dividual basis. However, understanding their aggre-
that the input melodies are monophonic. Extending
gate effect is a different matter. To rephrase, each
these methods to polyphonic music and arbitrarily
strip now has three z-events instead of one: the
complex pitch functions are interesting challenges
two boundaries between linear and quadratic forms,
for future study.
plus the center of symmetry. To make things worse,
the z-events change position as a chain is shifted
along Q. Thus, if a tree is used to maintain the me-
dian, it will be necessary not only to insert/delete Acknowledgments
leaves but also to rearrange the order of leaves (to
say the least). We wish to thank all participants of the Second
Cuban Workshop on Algorithms and Data Struc-
tures, held at the University of Havana, 1319 April
Integer Weights/Heights 2003. We also thank Remco Veltkamp and Rainer
Typke for bringing Arkin et al. (1991) to our atten-
Now, we discuss the cases where only certain pitches tion. We appreciate the constructive comments by
(heights) and/or weights are allowed. If there are O(1) two anonymous referees and by two of the editors,
height differences allowed, we can sort all critical Douglas Keislar and George Tzanetakis.
Aloupis et al. 75
Maidn, D. S. 1998. A Geometrical Algorithm for Typke, R., et al. 2003. Using Transportation Distances
Melodic Difference. Computing in Musicology for Measuring Melodic Similarity. Proceedings of the
11:6572. 4th International Symposium on Music Information
Ornstein, R. 1971. The Five-tone Gamelan Anklung of Retrieval. Baltimore, Maryland: Johns Hopkins Univer-
North Bali. Ethnomusicology 15(1):7180. sity, pp. 107114.
Reiser, A., 1978. A Linear Selection Algorithm for Sets Ukkonen, E., K. Lemstrm, and V. Mkinen. 2003. Geo-
of Elements with Weights. Information Processing metric Algorithms for Transposition Invariant Con-
Letters 7:159162. tent-Based Music Retrieval. Proceedings of the 4th
Toussaint, G. T. 2004. A Comparison of Rhythmic Simi- International Conference on Music Information Re-
larity Measures. Proceedings of the 5th International trieval. Baltimore, Maryland: Johns Hopkins Univer-
Symposium on Music Information Retrieval. Barcelona, sity, pp. 193199.
Spain: Universitat Pompeu Fabra, pp. 242245.