You are on page 1of 12

Comparing Human and Computational Models of Music Prediction

Author(s): Ian H. Witten, Leonard C. Manzara and Darrell Conklin


Source: Computer Music Journal, Vol. 18, No. 1 (Spring, 1994), pp. 70-80
Published by: The MIT Press
Stable URL: http://www.jstor.org/stable/3680523
Accessed: 15-11-2017 03:13 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to
Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
lan H. Witten*, Leonard C. Manzarat,
and Darrell Conklint Comparing Human and
*Department of Computer Science
University of Waikato, Hamilton, New Zealand Computational Models of
Music Prediction
ihw@waikato.ac.NZ
tDepartment of Computer Science
University of Calgary, Calgary, Canada
manzara@cpsc.ucalgary.CA
tDepartment of Computer and Information Science
Queens University, Kingston, Canada
conklin@qucis.queensu.CA

The information content of each successive note in a Glass or Joseph Haydn. Within a composer's
Philip
piece of music is not an intrinsic musical property oeuvre, similar distinctions can be made. The late
but instead depends on the listener's own model string
of a quartets of Ludwig van Beethoven, for ex-
genre of music. Two modeling techniques explored ample, are considerably denser, in the informational
in this article yield remarkably similar values for the than the early ones. Meaningful stylistic com-
sense,
information content of the Bach chorale melodies. parisons between individual pieces, between com-
We quantify the information content in termsposers,
of and between genres can be undertaken by
entropy, a measure of the amount of order, redun- objectively measuring perceptual complexity in
dancy, or predictability of a system. When applied to of entropy.
terms
any communications situation, entropy measures At a finer level of detail, the amount of informa-
the amount of information contained in a message; it
tion conveyed in a particular piece of music varies as
is small when there is little information and large the piece progresses. This is instinctively felt, for ex-
when there is a lot. As messages become more disor- ample, in the development section of a sonata or in
dered, that is, less predictable, more symbols (or the stretto section of a fugue. Descending further, the
bits-see below) become necessary to represent information
them content varies markedly on a note-by-
because there is no redundancy that can be exploitednote basis, some notes being quite predictable,
to reduce their size. Disordered messages requirewhereas others are quite unpredictable-that is,
more symbols in their representation, whereas or- "surprising." This variation is reflected in the en-
dered ones can be coded with fewer symbols. The tropy profile of the work, which measures the infor-
worst case is a message generated entirely at random;
mation flow as the piece progresses. Entropy profiles
here, there is no redundancy and hence high entropy.give the music theorist a valuable new tool with
This may seem paradoxical at first because we are which to characterize individual works.
used to ignoring random noise as meaningless. How- This article assesses and compares the entropy of
ever, from a communications point of view, a par- musical style with respect to two different models of
ticular sample of random noise is indeed very music: human and computational. The former can
demanding to communicate if it must be reproduced be elicited by having people guess successive notes
in exact detail by the receiver-just as a particularin se-
a piece and having them assign probabilities to
quence of random notes would be very demanding to guesses by gambling (Manzara et al. 1992). The
their
sight-read. latter can be constructed by developing a structural
Perceptive musicians have an instinctive feel for framework for prediction and by "training" the mod-
the amount of information communicated in the eler by having it assimilate a corpus of music and ad-
music they play and hear. For example, most would just its internal probability estimates accordingly
agree that the music of Arnold Sch6nberg or Elliott (Conklin and Witten 1991). In this latter case, we can
Carter is much more information-laden than that of examine in detail how the model views particular
Computer Music Journal, 18:1, pp. 70-80, Spring 1994 musical events by inspecting its inner workings at
? 1994 Massachusetts Institute of Technology. each stage of computation. With a human model, di-

70 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
rect examination of the mind's expectations is not the sequence. Models can take many forms, includ-
possible, and we must adopt a more indirect strategy ing informal models that develop naturally in our
to measure the entropy with respect to the model. minds over many years of training, formal models
For the purpose of musical analysis, it would be at- that are synthetically constructed from a theory of
tractive to distinguish stylistic entropy, or the uncer-music, and computational models that use adaptive
tainty inherent in the musical style, from perceptual techniques of learning. All share the ability to pre-
entropy, or that which is relative to the listener's dict upcoming events based on what has happened
model and which fluctuates from one listener to an- previously.
other. One might further want to distinguish de- Models predict continuations by assigning a prob-
signed entropy in order to account for the composer'sability to each possible next event. For example, if
intentional deviations from the stylistic norm (see the events x, y, and z comprise the possible continu-
Meyer 1957 for a full treatment of this idea). Unfor- ations, a model might say that event x is 50 percent
tunately, it is not possible to distinguish between likely to occur, event y 10 percent likely, and event z
these measures in practice, and it is a matter of some40 percent likely. Such a model is never categorically
debate whether the distinctions hold even in prin- "wrong" unless it assigns a probability of 0 percent
ciple. The present work measures entropy only withto an event that actually occurs.
respect to a model. In the case of human listeners, An entropy profile for a sequence of events can be
this is their perceptual model. The fact that there is derived by examining the probabilities that the
no single human model complicates the task, and in model assigns to events that actually occur in the se-
the experiments outlined below we increase the reli-quence. If an event is given a high probability, it con-
ability of the entropy estimate by basing it on the tributes only a little to the total entropy of the
performance of several human subjects. sequence. If it is given a low probability, then it con-
The two modeling techniques have been described tributes a great deal. The profile thus consists of an
and evaluated separately on the Bach chorales event-by-event plot of entropy against time.
(Conklin and Witten 1991; Manzara et al. 1992). They Entropy is usually expressed in terms of bits per
turn out to yield surprisingly similar values for the event. A bit is a binary digit, with the value of either
entropy of their melodies. Previous research has fo- 0 or 1. If an event has an entropy of 1 bit, this implies
cused on the overall information content of whole that the model assigns a probability of 50 percent to
pieces or large sections of music (Hiller and Bean that event. If its entropy is 2 bits, then the model as-
1966; Hiller and Fuller 1967). In contrast, the present signs it a probability of 25 percent. In general, the
study evaluates and compares the two kinds of model probability p of an event with entropy e bits can be
in fine detail in terms of the individual predictions found from the formula p = 1/2e (or, equivalently, e =
they make. Their predictions for two chorale melo- -log2p) (Shannon and Weaver 1949).
dies are examined on a note-by-note basis, from a mu- Given different models of music, it is natural to
sical point of view. In addition, we compare the ask which model is better or, more operationally,
overall information profiles of the chorales according which performs the best. Such a determination can
to the models. Apart from the intrinsic interest of be made by examining the average entropy produced
comparing human with computational models of mu-by the model as it predicts each event of the se-
sic, several conclusions are drawn for the improve- quence. The lower the value, the more successfully
ment of computational models. the model has "understood" the sequence of events
under scrutiny. Conversely, the higher the average
entropy, the less successful the model is at predic-
Models of Prediction tion. We thus equate the quality of the model with
its predictive ability, measured in terms of entropy.
The entropy of any sequence of events is measured By comparing entropy profiles, the particular
with respect to a model of potential continuations of weaknesses of one model with respect to another can

Witten, Manzara, and Conklin 71

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
be spotted. If one model has difficulty predicting cer- 1. Show a passage of text to a subject, but only up
tain events in a sequence (indicated by peaks in its to a certain point.
entropy profile that are not present in the other 2. Ask the subject to guess the next letter.
model's profile), then the model can perhaps be 3. If the guess is wrong, have the subject guess
modified to handle better such events. For a compu- again.
tational system, this may mean altering the particu- 4. Repeat until the guess is correct.
lar algorithms it uses. In the case of humans, the 5. Have the subject guess the following letter in
model might be improved by providing instruction. the same manner, continuing until the entire
passage has been completed.

The number of guesses taken by the subject for


Human and Computational Music Prediction each letter of the text is recorded. From this informa-
tion, an estimate of the entropy of English with re-
We became interested in human models of music spect to the subject's model can be derived.
prediction through our research into the creation of aThe drawback of this technique is that it can only
powerful, machine-based predictive model of musicyield upper and lower bounds to the entropy esti-
using machine learning (Conklin and Witten 1991). mate, and these turn out to be quite far apart. The
There were two reasons why we needed more infor- guessing methodology was refined by Cover and
mation about human performance. First was the King (1978) by adding "gambling" in order to elicit
compelling question of determining just how good subjective probabilities for the next letter. At each
the model was. Although its performance is easily guess, the subject is required to bet a proportion of
measured in the abstract, these measurements holdtheir capital on the choice. If the prediction turns out
little meaning if we do not know how well people to be correct, they are paid fair odds. If it is incorrect,
perform at the same task. The second reason was then their capital decreases by the amount bet. The
more technical. As described below, the system usestotal capital amount accrued by the subject is used to
several musical dimensions, or viewpoints, to guidederive an entropy estimate. The results from several
its adaptive learning abilities. A major challenge hassubjects can be combined using a "committee gam-
been to decide which viewpoints are most important bling estimate," which is often lower than the best
in developing an effective model and in what manner individual subject's estimate.
they should be linked. At first, we thought that theseWe adapted this gambling methodology to derive
choices could be guided by standard music theory an entropy estimate of the pitch of selected Bach
alone, but it soon became clear that a precise, morechorale melodies, as described more fully in Manzara
objective, method was needed to tune the model. et al. (1992). A computer program called Chorale Ca-
These two questions prompted the experiments de- sino was written to facilitate these experiments. The
scribed below. program is fully interactive and automates the play
of the game by displaying the music, recording bets,
and calculating winnings. Each player's actions are
Eliciting Predictions from People fully logged, and from this record we can derive en-
tropy profiles for each chorale examined. The profiles
Around 1950, Claude Shannon, the father of informa-for Chorales 151 and 61 are analyzed below.
tion theory, sought entropy estimates for ordinary
people's models of plain written English text (Shan-
non 1951). Other researchers have since applied his Multiple-Viewpoint Model of Music
methods to a great variety of other natural languages
(Witten and Bell 1990). Shannon conducted his experi-Can a machine generate acceptable specimens of a
ments by eliciting predictions from people in the formmusical style? Our research on computational mod-
of a guessing game. The procedure was as follows. els of music has taken two rather unusual ap-

72 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
proaches to the question (Conklin and Witten 1991). divides the music into a set of fixed length traces and
First, we use predictive power as an objective yard- incorporates these into the knowledge base. A trace
stick of generative ability. This is measured by the is simply a sequence of contiguous viewpoint ele-
compression that our system gives to a set of test ments. The knowledge base is initially empty; each
cases and is determined by calculating the average incorporation specializes it slightly to the musical
number of bits needed to represent them. Second, style. Statistics are kept on all traces and their
rather than basing the model on rules and con- subtraces, and they are indexed for efficient retrieval
straints that are elicited from experts or musical trea- in the future. The incorporation algorithm is de-
tises on the style, we use machine learning signed to avoid overspecializing the theory to the ex-
techniques to automate its construction. A large cor- amples. After processing a large sample of traces,
pus of training cases is shown to the machine, which each viewpoint will have developed a constrained
incrementally specializes its model by incorporating probabilistic model for its musical perspective.
them into a knowledge base. A viewpoint makes inferences about an incom-
The main features of any computational model of plete piece by taking as context the events that have
music are a representation for musical knowledge occurred recently, converting these into a context of
and inference methods that operate on that represen- viewpoint elements, and retrieving from the knowl-
tation. We chose to view the musical surface from edge base all traces that match it. Each trace is then
many different perspectives, or viewpoints--for ex- converted back to a sequence of events, the final ele-
ample, as sequences of melodic intervals, pitch ment of which is interpreted as a prediction of the
classes, or contour indicators. Each viewpoint is an upcoming event. It is arranged that each viewpoint
independent knowledge source that makes infer- predicts every possible next event with some non-
ences about some feature of the musical surface. zero probability. A multiple-viewpoint system de-
The multiple-viewpoint scheme is defined for- rives its power by having many viewpoints working
mally in Conklin and Witten (1991); a brief informalsimultaneously on the same prediction problem.
description follows. The musical surface of a chorale Two distinct factors contribute to the prediction of
is represented as a sequence of discrete events. Each the next event in a piece of music. One is long-term
has a start time, a duration, a pitch, and a fermata in- knowledge about the style and the general con-
dicator. These constituents are called basic types. A straints that it implies. The other is short-term tran-
viewpoint derives new types from sequences of sitory constraints introduced within a particular
events and views the musical surface as a string of piece. Ignoring short-term effects would yield poor
these derived types. predictions because the system would be oblivious to
A contour viewpoint, for example, regards the sur-repetition and patterns within the present piece. To
face as a sequence of elements in the set {-1, 0, 1}, ac-model this problem of the "particular and the gen-
cording to whether the pitch falls, remains eral" (Brown and Dempster 1989) we introduce, for
unchanged, or rises from one note to the next. The every viewpoint, a short-term knowledge base that is
elements of an interval from referent viewpoint are discarded after each piece is predicted. A viewpoint
descriptors in the set {0, ..., 12}, giving the interval makes a prediction by combining the independent
in semitones of the note above a reference pitch. inferences from its short- and long-term knowledge
Viewpoints can be linked together to form new ones. bases.
For example, a viewpoint linking the contour and in- For the experiments described below, the machine
terval from referent viewpoints has elements in the used a multiple-viewpoint knowledge base that it
Cartesian product set {-1, 0, 1} {0, ..., 12}. Linked had learned from a training sample of 95 chorales. A
viewpoints model correlations between descriptors. set of five chorales was used to measure the entropy
A machine learning system for music, including of the style. This test set was disjoint from the train-
about 20 viewpoints, has been implemented in Com- ing set, and chorales 151 and 61 used in the human
mon LISP. During the learning phase, each viewpointexperiments were deliberately omitted from the

Witten, Manzara, and Conklin 73

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
Figure 1. Chorale 151: (a)
instantaneous entropy pro-
file; (b) windowed-average
entropy profile; (c) melody.

training set. For each of the 220 events in the test


chorales, an incomplete piece was shown to the ma-
chine, and a probability distribution for all possible
continuations was produced. The computational
model is general enough to predict all basic types in ..............1.------ - --.
an event-start time, duration, pitch, and fermatas. i-
In conformance with the human experiments, the
system was instructed to predict only pitch.

--.. . .....

Experimental Results and 01


0 10 20 30

Their Musical Interpretation (a) Note number

Chorales 151 and 61 were analyzed in terms of both


8 1
human and computational models. Average entro-
pies are given, along with the melodies, shown in 7 .................................... ......................................................................

Figures 1 and 2. Both "instantaneous" and "win- Legend

dowed-average" entropies are shown. The former ..(.......... ...... Hm nw4lgied average)

(Figures la and 2a) is the note-by-note entropy dis-


cussed above, averaged over several subjects in the
case of the human model. The latter (Figures lb and
2b) has been smoothed by averaging over time as ............................ .. .. .
well and is useful for discerning medium-term trends
in the information content of a piece of music. The
profiles shown were derived by averaging over seven 0 10 20 30

events of the instantaneous entropy data, using a tri- (b) Note number
angular sliding window. This is tantamount to
smoothing the data with a linear-phase FIR low-pass
filter (Kaiser and Reed 1977). S12

For Chorale 151, the entropy of pitch is estimated


to be 1.97 bits/event using the human model, 2J0 A 2362202
whereas the computational one averages 2.09 bits/ -" . (C ) i
event. For Chorale 61, the pitch entropy is 1.53 bits/
(c)
event for the human model and 1.77 bits/event for
the computational one. These results indicate (reas-
suringly!) that people perform better than the ma-
chine at predicting pitch. However, the machine
could undoubtedly be tuned to improve its perfor- first the profile in Figure la produced by the human
mance, and indeed the instantaneous entropy plots model. The chorale exhibits a wavelike entropy,
help to indicate just where and how such improve- with valleys more or less corresponding to cadence
ments could be made. points, and peaks occurring in the middle of phrases.
The troughs occur exactly at the cadence points for
the first, second, and fourth phrases (notes 7, 15, and
Chorale 151: Human Model 29). The only exception is note 22, where the trough
falls at the penultimate note. This perhaps reflects
Instantaneous entropy profiles are most useful forthea lack of closure associated with the chorale's only
half cadence.
detailed note-by-note analysis of the music. Consider

74 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
Figure 2. Chorale 61: (a) in-
stantaneous entropy pro-
file; (b) windowed-average
entropy profile; (c) melody.

13

12 .......... . ................... .................. ..... second phrase seems to be confirmed, our expe
tions are thwarted when a new pattern is create
10 %............................. Human (welted average) note 19 with the repetition of the A, causing a
increase in entropy.
S....... i .......... The biggest peak of the chorale occurs at not
An upward leap of a minor seventh is extremely
usual in this style of music and is doubly unex
at this point because the leading tone at note 2
would normally resolve to a G. This leap does m
.................... - ............. .........
2 -0...-.-..................,-...... . - it. .....
...Le. ..........................
g musical sense, though, since the descending ste
12 .................. .. .................. .......... motion to the final tonic is merely displaced by
octave to keep the melody within the range of
prano. However, when considered on a note-to-
........ - Hu.(....a. . e)j. ....... basis, the leap is unexpected and results in the
increase in entropy.

0 10 20 30 40 50 60
Chorale 151: Computational Model
(b) Note number
The instantaneous entropy produced by the com
tational model for Chorale 151 is also shown in
ure la. The two profiles often track each other
closely, but at several points they diverge subst
( 13Human o3 e 1
tially. At note 6, for example, the computationa
model does poorly; it predicts a B with probabil
percent, and the actual note C with only 2 perc
Obviously, it favors a stepwise resolution of the
enth of the implied dominant harmony to the
mediant of the tonic chord (C to B). Because of
model's construction, it cannot look ahead to th
coming cadence at note 7, where the likely note
the third of the tonic triad. At note 7, however
A 1 40 , 43 4T (50c)56 computational model does better than the hum
,F 1 . ... ' ! . ... ., -_t , " --. 4 ,-,I-.- -,!!. - model; it predicts the B with 98.6 percent proba
ity. The descending half-step resolution of the
plied seventh is highly favored by the long-term
knowledge base at this point, and the choice is
The peak of the wave for phrase 1 occurs forced at note by 4. the short-term knowledge base that h
This is due to the leap of a third, which breaks beentheconditioned to expect the mediant. This il
repetition of the chorale's initial note. Another tratespeak
the fact that the computational model is
occurs at note 10. Here we expect duplication of the
a very aggressive predictor. People also do well
chorale's initial phrase, but the third repetition of the but they are more cautious and temp
this point,
B does not occur and is replaced by a stepwise theirdown-
predictions somewhat.
ward motion. The peak at notes 18 and 19 is At alsonote
due 8 the computational model predicts A
to a variation on the first phrase. At note 18 there
with 35 is
percent probability, C with 29 percent,
some uncertainty as to whether the pattern(the of the
actual note) with 19 percent. At note 9, A i
first or second phrase will be repeated. Although dicted the
with 59 percent and the actual note B wi

Witten, Manzara, and Conklin 75

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
percent. Again, the model favors stepwise motion nize the need for octave displacement of the descend-
and has not recognized that this phrase begins in the ing melodic line to avoid exceeding the soprano's
same manner as the initial one. At note 12, B is pre- range. At note 26 the computational model predicted
dicted with 49 percent and C (the actual note) with the B with 96 percent certainty. It seemed very cer-
25 percent. At this point, the model seems finally to tain that the seventh of the implied dominant har-
mony would resolve to the mediant of the tonic,
have realized that repetition of the note B is a statis-
tically significant element of the chorale; however, although people were less sure. The peak in entropy
the note that actually occurs is the model's second at note 28 again points out a major weakness of the
choice. computational model-it does not adjust its predic-
At notes 13 and 14 the computational model out- tion by looking ahead to the upcoming cadence. At
performs the human one. It predicts the resolution of this note the model predicted a B with 35 percent
the seventh of the implied dominant harmony with probability, a G with 32 percent, and an A (the actual
79 percent certainty and predicts the stepwise mo- note) with 27 percent. This failure to recognize tem-
tion from the B to the A with 91 percent. The com- poral context also manifests itself in the model's pre-
putational model tends to be an enthusiastic diction of the final note. Here, the G is predicted at
predictor of stepwise motion. At note 15 it does only 69 percent, a level of confidence that is signifi-
much poorer than humans. It predicts a B with 53 cantly lower than that of the human participants. It
percent certainty and G (the actual note) with only is clear that, unlike most musicians, the model does
20 percent. Obviously, people are not surprised to see not realize that virtually all melodies end on the
a cadence on the tonic in the middle of the chorale. tonic.
However, the computational model does not favor
this straightforward choice but prefers an imperfect
cadence. This is because the model has been condi- Chorale 151: Windowed-Average Entropy Profile
tioned by the short-term knowledge base to favor the
note B after so many repetitions. We turn now to the windowed-average entropy pro-
At notes 16 and 17 the computational model again files shown in Figure ib, which can be used to dis-
fails to recognize the tendency of phrases to begin cern medium-term trends in the chorale's
with repeated B's. Instead, it favors stepwise motion, information content. The profile for the human sub-
predicting at note 16 an A with 33 percent, G with jects shows a simple wavelike structure whose
20 percent, and B (the actual note) with 16 percent; troughs correspond roughly to the cadence points at
at note 17 it predicts a C with 43 percent and B (the notes 7, 15, and 29. The cadence at note 22, which
actual note) with 31 percent probability. At note 21 occurs just after the high point of the second wave, is
this tendency to predict stepwise motion manifests an exception. It is interesting to note that the largest
itself as another peak in the entropy. Pitch A is pre- excursions in the entropy profile occur in the second
dicted with 37 percent, B with 21 percent, and the half of the piece.
actual note G with 20 percent. The computational The windowed-average entropy profile produced
model cannot look ahead, so it does not know about by the computational model is quite similar in shape
the high probability of a half cadence at note 22 and to that of the human model, particularly from note
thus cannot discern the likelihood of a repeated G to 10 onward. Up to this point, the computational
prepare for this event. model does rather poorly compared to the human
Although humans did poorly at predicting note 23, one; it is less able to provide good predictions with
the computational model did somewhat worse. Its the limited context that naturally occurs at the be-
top three predictions were G at 40 percent, E (one oc- ginning of any piece. Note also that even though the
tave below the actual note) at 33 percent, and F-sharpshape of the two profiles is similar, the computa-
at 17 percent. All are reasonable choices, although it tional model does roughly half a bit per event poorer
is apparent that the model cannot explicitly recog- than the human subjects from note 10 to the end.

76 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
Chorale 61: Human Model notes 13, 28, 36, 40, and 56 it consistently predicts
such a motion, but it is repeated notes that actually
Chorale 61 can be analyzed in a manner similar tooccur. Because the predictions are wrong, the compu-
Chorale 151. Its instantaneous entropy profiles are tational model shows entropy peaks at these points.
shown in Figure 2a. An examination of the human- The result is disastrous at note 28; the model pre-
model profile reveals wavelike motion, with valleys dicts an F with 99.7 percent certainty, leaving very
at each cadence. This is particularly clear at noteslittle
7, probability space for the G that actually occurs.
14, 21, 27, 42, 49, and 57. The cadence at note 34 is The model's propensity to predict downward
the only exception, probably because the expectedstepwise motion also creates entropy peaks at notes
resolution is E-flat, whereas a G is heard. 15-16 and 44-45. Here, intervals larger than a second
This chorale differs from the other in that we see occur. Note that people also had difficulty predicting
not one but two peaks of unequal sizes in the middle these notes but still performed about 2 or 3 bits bet-
of each phrase. The larger peak comes first in phrases ter than the computational model. At notes 6 and 26
2, 3, 5, 6, 7, and 8 on either the first or the second we see motion from G to A-flat, which to the ma-
note following the cadence. This reflects the fact that chine is relatively surprising, again partly because of
it is difficult to predict which direction a new phrase the model's propensity to predict downward stepwise
will take. In phrase 7 this uncertainty is heightened motion.
by the unexpected downward leap of a third at note At note 38 the chorale modulates to the dominant
44. In phrase 4 the smaller peak comes first, indicat- with the introduction of the A-natural. To people,
ing that the cadence's resolution is quite clear, being this is not at all surprising, as indicated by the slight
leading tone to tonic. The larger peak comes at note decrease in entropy at this point. To the computa-
25 with the repetition of the G. Up to this point, tional model, however, a peak of entropy occurs; an F
there has only been one repeated note, so there is a is predicted with 44 percent certainty, and the A-
high expectation that all movement will be by step. natural with only 20 percent. At note 47 the chorale
The minor peak in phrase 6 (note 40) is also due to modulates back to the tonic with the reintroduction
an unexpected repeated note, and the small peaks at of the A-flat. This is somewhat surprising to the hu-
notes 31 and 53 (phrases 5 and 8) can be attributed to man subjects, as shown by the small peak in entropy,
the fact that rising motion by step is discontinued in but the computational model has even more diffi-
favor of descending stepwise motion. culty; it predicts A-natural with 57 percent certainty
and the A-flat with only 26 percent. The humans had
no difficulty in recognizing that the modulation back
Chorale 61: Computational Model to the tonic was permanent, whereas the computa-
tional model predicted another A-natural at note 51
Figure 2a also shows the instantaneous entropy pro- with a confidence of 31 percent, F with 25 percent,
file as produced by the computational model. As and A-flat (the note that actually occurs) with only
with Chorale 151, the profiles track each other fairly 23 percent.
closely, although they diverge at several points. The The inability of the computational model to look
computational model consistently beats the human ahead for cadence points accounts for the sharp peak
one when it can successfully predict downward in entropy at notes 20-22. Here, the half cadence is
stepwise motion. This can be seen at notes 1-3, not at all surprising to humans but presents diffi-
8-12, 18-19, 30-33, 48-49, and 52-55. The computa- culty for the model. The lack of goal orientation and
tional model's advantage at these events is small, the inability to recognize temporal position work
however: usually less than half a bit. against it at this point and also at the penultimate
This aggressive tendency of the computational note of the chorale, where people had no difficulty
model to predict downward stepwise motion works recognizing the need for a repeated note to prepare
against it at several other points. For example, at for the perfect authentic cadence.

Witten, Manzara, and Conklin 77

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
Chorale 61: Windowed-Average Entropy Profile consider the methods used for selecting and combin-
ing viewpoints. As we have seen, the system is at
Figure 2b shows the windowed-average entropy pro- times an overly aggressive predictor. Its predictions
file for Chorale 61, as generated by the human and could be tempered by using a different technique for
computational models. Recall that the profile pro- combining the predictions from individual view-
duced by the human model shows medium-term in- points. Furthermore, it should be possible to develop
formation trends. Like Chorale 151, there is a a more principled way of selecting individual view-
tendency for the troughs of the waves to fall more or points from the huge number of possibilities avail-
less at cadence points. This is true for the cadences able. Finally, it would be profitable to explore
at notes 7, 14, 21, 34, 42, 49, and 57, but not at note different ways of combining predictions from the
27. It is interesting that-as in Chorale 151-the short- and long-term viewpoint models. For example,
largest excursions in entropy occur in the second half the relative weighting of the short-term model could
of the piece. We can perhaps speculate that to avoid be increased as a chorale progresses. The wide dis-
monotony there is a need to inject some uncertainty crepancy between human and machine performance
toward the end, just before the final resolution. in the opening notes of Chorale 151 (Figure lb) high-
The profile produced by the computational model lights the weakness of having a fixed weighting.
tracks the human one quite closely, and their shapes Several structural changes to the framework
are remarkably similar, although the former model should be investigated. Our computational model
does somewhat poorer at most events. This is particu- has some significant shortcomings. It does not incor-
larly striking at notes 16, 27, and 44, and it is ex- porate a priori musical knowledge (except that im-
plained by the effect of the model's poor predictions at plicit in the selection of viewpoints). It does not plan,
these points on the windowed average. However, what nor does it learn from failure. It uses no form of hier-
is encouraging is that the computational model can, archical representation or control over its predic-
with some tuning, closely approximate the human tions. Here are some ways that such features might
model's predictive ability and thus produce accurate be introduced.
windowed-average entropy profiles. This is important First, some kind of constraint mechanism that
because these profiles can be used to characterize me- codifies inviolable rules of chorale construction
dium-term information trends in the music. could be incorporated. The current prediction system
was designed to be nonexclusive, predicting all pos-
sible events with some nonzero probability. Some
Improving the Computational Model form of "hard" constraint could be used to reduce
viewpoint prediction sets, opening up probability
It seems possible that a machine will eventually out- space for unconstrained events. This would extend
perform people at the chorale guessing game. This the system from a pure learning system to a hybrid
section outlines some potential improvements to the constraint-based learning system.
multiple-viewpoint model that may achieve this A second area for improvement is in rhythm and
goal. The improvements are divided into two classes: look-ahead. The system's current structure prohibits
(1) slight parameter adjustments to the existing it from looking ahead to inspect the rhythmic skel-
model and (2) structural changes to the modeling and eton; it only takes into account the past context, not
representation framework. the future. People are free to scan the complete
One very simple adjustment that could be made is rhythmic surface, gaining useful a priori constraints
to increase the corpus of example chorales used to on pitches. This is quite striking at cadences and in
create the model. Experiments have indicated that particular at the final cadence.
performance improves as more training chorales are Third, it is likely that different viewpoints change
seen. Only 95 of the many existing Bach chorales their relative importance as a piece progresses. To ac-
were used. A second area for improvement is to re- commodate this, a metamodel could predict which

78 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
viewpoints should be invoked at various stages. Such tics that are being measured, rather than some arti-
changes in perspective could perhaps be learned by fact of the measuring instruments.
inspecting and reasoning about the entropy profile. Finally, people still outperform the computational
This kind of introspection about the performance of model in predicting chorales. However, there is a
the model could be called "metareasoning." As an great deal of room for improvement in the latter, and
example, the modulation in Chorale 61 indicates these experiments indicate the areas in which fur-
that people come to expect excitement or variety to ther research is needed. It does not seem unlikely
be encountered as a piece progresses. that machines will eventually outclass people in pre-
Learning systems often avoid overgeneralization dicting a restricted and well-defined genre of music.
by using negative examples. The utility of negative
examples here is not clear because our learning algo-
rithm is probabilistic and implicitly searches a spe- Acknowledgments
cialization hierarchy rather than the more usual
generalization hierarchy. However, negative ex- Special thanks go to Mark James, who programmed
amples could be used to adjust the statistics given to most of the Chorale Casino game. Thanks also go to
various traces. Alternatively, a "negative" viewpoint all the subjects who took part in the original experi-
could be used to bias predictions. mental trials. This work is supported by the Na-
tional Sciences and Engineering Research Council of
Canada.
Conclusions

This is the first detailed, quantitative comparison of References


the performance of a computational model of music
with the human listener. The basis for comparison is Brown, M., and D. J. Dempster. 1989. "The Scientific
predictive ability, measured in terms of entropy. To Image of Music Theory." Journal of Music Theory 33(1):
64-164.
investigate and compare both computational and hu-
Conklin, D., and I. H. Witten. 1991. "Prediction and
man models required the use of novel techniques.
Entropy of Music." Technical Report 91/457/41.
For the former, the development of the multiple-
University of Calgary, Canada, Department of
viewpoint scheme and the use of machine learning
Computer Science.
are both significant innovations that account, in Cover, T. M., and R. C. King. 1978. "A Convergent
large part, for the success achieved. To elicit human Gambling Estimate of the Entropy of English." IEEE
models, the methodology of the guessing game was Transactions on Information Theory 24(4): 413-421.
instrumental in gaining quantitative access to Hiller, L., and C. Bean. 1966. "Information Theory
people's predictions. Analysis of Four Sonata Expositions." Journal of Music
Probably the most striking feature of the experi- Theory 10(1): 96-137.
ments was the similarity in performance of the two Hiller, L., and R. Fuller. 1967. "Structure and Information
systems. Although we have naturally tended to focus in Webern's Symphonie, Op. 21." Journal of Music
Theory 11(1): 60-115.
on and discuss the differences, even cursory exami-
Kaiser, J. F., and W. A. Reed. 1977. "Data Smoothing
nation of Figures 1 and 2 indicates remarkable simi-
Using Low-Pass Digital Filters." Review of Scientific
larities, in both instantaneous entropy profile and
Instruments 48(11): 1447-1457.
windowed-average profile, between the computa- Manzara, L. C., I. H. Witten, and M. James. 1992. "On the
tional and human models. By and large, both encoun- Entropy of Music: An Experiment with Bach Chorale
tered difficulties at precisely the same points in the Melodies." Leonardo 2(1): 81-88.
chorales; the differences are in the amount of diffi- Meyer, L. B. 1957. "Meaning in Music and Information
culty experienced rather than where it occurs. This Theory." Journal of Aesthetics and Art Criticism 15(4):
indicates that it is fundamental musical characteris- 412-424.

Witten, Manzara, and Conklin 79

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms
Shannon, C. E. 1951. "Prediction and Entropy of Printed Witten, I. H., and T. C. Bell. 1990. "Source Models for
English." Bell System Technical Journal 30(1): 50-64. Natural Language Text." International Journal of Man-
Shannon, C. E., and W. Weaver. 1949. The Mathematical Machine Studies 32(5): 545-579.
Theory of Communication. Urbana, Illinois: University
of Illinois Press.

80 Computer Music Journal

This content downloaded from 14.187.163.144 on Wed, 15 Nov 2017 03:13:37 UTC
All use subject to http://about.jstor.org/terms

You might also like