Professional Documents
Culture Documents
Department of Computing
Feature Set Patterns in
City University London
United Kingdom
Music
conklin@city.ac.uk
Pattern discovery is an important part of computa- there is no melodic interval pattern that spans the
tional music-processing systems. The discovery of complete fragments, though some events do have
patterns repeated within a single piece is an impor- conserved melodic intervals.
tant step to segmentation according to thematic This article describes a new approach to pattern
structures (Ruwet 1966). Patterns found within a representation and discovery in music, where pat-
few works may be signatures that can be instanti- tern components can contain any number of fea-
ated for style emulation of novel musical material tures and can be as general or as specific as required
(Cope 1991; Rowe 1993) and can reveal a deep by the data. An important concept in this work is
similarity in musical material. Patterns that are subsumption, which provides a natural way to
conserved across many pieces in a large corpus can explore the pattern search space in a general to
represent structural building blocks and used for specific manner, pruning entire branches when
comparative style analysis and music genre recogni- patterns become infrequent. The space of patterns
tion (Huron 2001; Conklin and Anagnostopoulou that may be discovered is very rich, and patterns
2001; Lin et al. 2004). have highly flexible levels of abstraction, much
Pattern discovery methods can be discussed more than is possible with single attribute patterns.
according to the expressiveness of patternsin A well-known fact of knowledge representation is
particular, the levels of abstraction permitted by that increased expressiveness usually leads to in-
pattern components. Many approaches are re- creased time complexity of reasoning (Brachman
stricted to a representation in which every pattern and Levesque 2004). For pattern discovery in par-
component is described using the same musical ticular, allowing pattern components to contain a
attribute: pitch, duration, interval, or fixed combi- flexible and varying number of attributes substan-
nations of these (e.g., linked interval / duration, etc.). tially increases the size of the pattern search space.
In these approaches, an event has only one possible Though the size of this space is substantially re-
representation, and therefore patterns can be effi- duced by the restriction to frequent patterns, when
ciently found using general string algorithms (Gus- there are too many frequent patterns in the search
field 1997) after transforming the corpus to strings space, heuristic algorithms must be employed.
of attribute values. This article presents two algorithms to solve these
Recent methods have considered whether this pattern discovery problems: a complete algorithm
restriction can be relaxed by allowing patterns with and a heuristic probabilistic algorithm.
heterogeneous components and subsumption rela- A large number of patterns can be revealed in a
tions among possible pattern components (Lartillot single piece or corpus, and these patterns must
2004; Cambouropoulos et al. 2005; Conklin and somehow be filtered and ranked for presentation.
Bergeron 2007). The need for such patterns can be In this article, a statistical measure is used to order
motivated with a few melodic fragments (see Figure patterns according to the deviation of their observed
1) from the music of the famous twentieth-century from their expected frequency in the corpus.
French singer and songwriter Georges Brassens The methods developed here are applied to the
(19211981). In both pairs of fragments, the descrip- music of Georges Brassens. The style of Brassens,
tion of events by melodic interval or melodic con- described by Keefe (2002) as sophisticated simplic-
tour alone is inadequate. Though the fragments ity, is well suited for pattern discovery studies. It
within each pair have a common duration pattern, did not change substantially throughout his career,
and his output was prolific: he set over 170 texts to
Computer Music Journal, 32:1, pp. 6070, Spring 2008 music, in addition to over a dozen texts of other
2008 Massachusetts Institute of Technology. well-known French poets.
(a) (b)
(c) (d)
Events and Viewpoints closed feature set no subsumee feature set has the
same total count in the corpus
An event is a music object within a sequence. feature set pattern a sequence of feature sets occurring
Music objects can be notes, or structured objects in the corpus
such as simultaneities (Conklin 2002) and se-
Cp() the number of pieces having at least
quences (Conklin and Anagnostopoulou 2005). In
one occurrence of the specified
this article, we focus on melody, and we only con- feature set or pattern
sider note objects. A viewpoint is a function that
computes values for events in a sequence (Conklin Ct() the number of non-overlapping
and Witten 1995; Conklin 2006). For example, for occurrences of a feature set or a
the melodic interval viewpoint, the values in the pattern in the corpus
range of the viewpoint are integers representing the Et() pattern expected total count
difference in semitones between an event and its
I() pattern interest measure
predecessor.
Table 2 provides a catalog of viewpoints used in frequent occurring in the corpus with at least
this study. At the top of the table are the primitive the specified piece and total count
viewpoints, namely, those used to select the basic thresholds
attributes of notes. Following this are derived view- infrequent occurring in the corpus below the
points that compute values using the primitive specified piece count or total count
viewpoints. They can be further grouped into unary threshold
(computable from a single event, e.g., pitch class) and maximal frequent a more specific frequent
binary (requiring one preceding event to compute a pattern pattern cannot be found in the
value, e.g., melodic interval or melodic contour). A corpus
potentially unbounded set of derived viewpoints is
available through the use of constructors, which
Primitive viewpoints
pitch 63 66 67 63 66 67 63 66 67 65
key 3 3 3 3 3 3 3 3 3 3
onset 0 192 288 384 576 672 768 960 1152 1344
duration 192 96 96 192 96 96 192 192 192 192
ml1 t t
ml2 t t t t
Derived viewpoints (unary)
pc 3 6 7 3 6 7 3 6 7 5
ref 3 3 3 3 3 3 3 3 3 3
intref 0 3 4 0 3 4 0 3 4 2
Derived viewpoints (binary)
int +3 +1 4 +3 +1 4 +3 +1 2
intu 3 1 4 3 1 4 3 1 2
intpc 3 1 8 3 1 8 3 1 10
intupc 3 1 4 3 1 4 3 1 2
intscs 1s 2n 1s 2n 1s 1n
intuscs 1s 2n 1s 2n 1s 1n
repeat
up t t t t t t
down t t t
step t t t t
leap t t t t t
ioi 192 96 96 192 96 96 192 192 192
dc3 < = > < = > = = =
dr 1/2 1/1 2/1 1/2 1/1 2/1 1/1 1/1 1/1
The onset time of the fragment has been shifted to time 0.
set {} is instantiated by any event. For example, the the taxonomy are feature sets, directed links repre-
feature set {pitch:63,int:-4} is instantiated by sent subsumption, transitive subsumption links are
the events at positions 4 and 7 of the melody frag- omitted, and a node is placed between its most spe-
ment in Table 3. A feature set can be specialized by cific subsumers and its most general subsumees.
adding one or more features to the set. The more Figure 2 shows a small feature set taxonomy
general feature set is said to subsume the special- using the five viewpoints step, down, repeat,
ized feature set: all instances of the specialized leap, and up. These viewpoints are used to repre-
feature set are also instances of the more general sent various levels of abstraction of melodic con-
feature set. For example, the feature set {pitch:63} tour classes (Conklin and Bergeron 2007). Note that
subsumes the feature set {pitch:63, int:-4} this taxonomy is not presented as a full lattice of
which in turn subsumes {pitch:63, int:-4, subsets, because some feature sets (e.g., the set
ml1:t}. {step:t, repeat:t}) cannot be instantiated by
The subsumption relation between feature sets any event and are therefore contradictory.
can be visualized as a taxonomy in which nodes of A pattern is a sequence of feature sets. A pattern
dc3:>
2921 0.26
intupc:2
3354 0.30
intref:4
1944 0.17
down:t
4635 0.41
intu:2 intu:2
dc3:=
intupc:2 intupc:2
dr:1/1
step:t intuscs:1n
5460 0.48
3294 0.29 step:t
11273 1.00
3114 0.28
dc3:<
intuscs:1n
2760 0.24
step:t
4338 0.38 intuscs:1n
step:t
step:t
4966 0.44
step:t ml2:t intuscs:1n
ml2:t 2264 0.20 step:t
intref:0 2604 0.23 ml1:t
2208 0.20
step:t ml2:t
ml1:t ml1:t 1414 0.13
ml2:t ml2:t ml2:t
5392 0.48 2958 0.26 1614 0.14
up:t up:t
4366 0.39 ml2:t
2132 0.19
in Table 3). The pattern discovery algorithm then efficiently computed using the method of Uno et al.
proceeds in two phases. In the first phase, all fre- (2003). This set is further pruned to those having at
quent feature sets are found and configured into a least the minimum specified piece count.
taxonomy using a description logic classification Figure 3 displays the subsumption taxonomy of
algorithm (Brachman and Levesque 2004), which closed feature sets from a corpus of 132 pieces com-
places each feature set between its most specific prising 11,273 events (the Results section describes
subsumers and its most general subsumees. Because this corpus in more detail), using the viewpoints
a frequent pattern cannot have an infrequent feature provided in Table 2 and restricted to feature sets
set as a component, the potential feature-set space occurring in all 132 pieces. At the bottom of each
may be restricted to those feature sets that actually node of the taxonomy are the total count of the
occur frequently in the corpus. Furthermore, it feature set and the relative frequency of the feature
suffices to consider only closed feature sets, namely, set in the corpus. For example, 48% of all events
those that do not subsume any other feature set have the feature ml2:t, and 19% of all events are
with the same total count. This is because all com- further specialized to include the feature up:t.
ponents of a maximal pattern must by implication The second phase of the discovery algorithm
be closed feature sets. The set of all closed feature explores the specialization space of frequent feature
sets that are frequent in their total count can be set patterns in the search for maximal frequent pat-
(a)
(b)