(B) (Klingner, 2010) Measuring Cognitive Load During Visual Tasks

MEASURING COGNITIVE LOAD DURING VISUAL TASKS BY
COMBINING PUPILLOMETRY AND EYE TRACKING
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Jeff Klingner
May 2010
iv
Abstract
Visualizations and visual interfaces can provide the means to analyze and communi-
cate complex information, but such interfaces often overwhelm or confuse their users.
Evaluating an interfacess propensity to overload users requires the ability to assess
cognitive load.
Changes in cognitive load cause very small dilations of the pupils. In controlled
settings, high-precision pupil measurements can be used to detect small differences in
cognitive load at time scales shorter than one second. However, cognitive pupillometry
has been generally limited to experiments using auditory stimuli and a blank visual
field, because the pupils responsiveness to changes in brightness and other visual
details interferes with load-induced pupil dilations.
In this dissertation, I present several improvements in methods for measuring cog-
nitive load using pupillary dilations. First, I extend the set of eye tracking equipment
validated for cognitive pupillometry, by determining the pupillometric precision of a
remote-camera eye tracker and using remote camera equipment to replicate classic
cognitive pupillometry experiments performed originally using head-mounted cam-
eras. Second, I extend the applicability of cognitive pupillometry in visual tasks by
developing fixation-aligned averaging methods to to handle the unpredictability of vi-
sual attention, and by demonstrating the measurement of cognitive load during visual
search and map reading. I describe the methods used to accomplish these results,
including experimental protocols and data processing methods to control or correct
for various non-cognitive pupillary reflexes and methods for combining pupillometry
with eye tracking. I present and discuss a new finding of a cognitive load advantage
to visual presentation of simple arithmetic and memorization tasks.
v
vi
Acknowledgements
Advisors and committee I am grateful to my advisors Pat Hanrahan and Bar-

bara Tversky. Together, they made it possible for me to stand with a foot in computer
science and a foot in cognitive psychology without slipping. Both have wide expertise
and their perspectives have been an invaluable guide to my research. Pat has stuck by
me through several thesis topic changes and funding droughts, and Barbara has pro-
tected me from sloppy psychological thinking and overbroad experimental ambitions.
Together with Pat and Barbara, Scott Klemmer provided valuable feedback on this
dissertation, and with Jeff Heer and Roy Pea, helped me during my oral defense to see
the the broader implications of this work. I am grateful to Manu Kumar, who worked
hard to get the eye tracker for our lab and turned me on to its rich experimental uses.
I am also grateful to the rest of the Stanford Graphics Lab faculty and the profes-
sors in the department with whom I have taught, for showing me how to do all of the
things that professors do. I am grateful to Kathi DiTommaso, Meredith Hutchin, and
the rest of the department staff who helped me navigate the paperwork of graduate
school and find so many opportunities to teach.
Family and Friends Above all, I am grateful to my wife Sophie, for her ceaseless
love, support, and encouragement. My parents and my brother have also been won-
derfully encouraging. Dave Akers enriched our office and kept me going through the
toughest times of grad school. I am deeply thankful to him and my other close friends
and fellow students at Stanford, who made my years at Stanford some of the hap-
piest of my life: Augusto Roman, Doantam Phan, Daniel Horn, Kayvon Fatahalian,
vii
Dan Morris, and all the g-slackers, gates-poker folks, and Christmas decoration over-
achievers.
Funding This work was funded by the Stanford Regional Visual Analytics Center,
through the U.S. Department of Energys Pacific Northwest National Laboratory.
Portions of this research were supported by NSF grants HHC 0905417, IIS-0725223,
IIS-0855995, and REC 0440103. Our eye tracker was funded by the Stanford MediaX
project and the Stanford School of Engineering. My graduate studies were also funded
by a National Science Foundation Graduate Research Fellowship and by the John and
Kate Wakerly Stanford Graduate Fellowship.
viii
Contents
Abstract v
Acknowledgements vii
1 Introduction 1
2 Background 7
2.1 Cognitive load defined . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Past cognitive pupillometry research . . . . . . . . . . . . . . . . . . 8
2.2.1 Cognitive psychology . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Human-computer interaction . . . . . . . . . . . . . . . . . . . 10
2.3 Cognitive pupillometry . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Infrared video pupillometry . . . . . . . . . . . . . . . . . . . 10
2.3.2 Cognitive pupillometry uses eye trackers . . . . . . . . . . . . 10
2.3.3 Types of video eye trackers used for pupillometry . . . . . . . 11
2.3.4 Advantages of remote imaging . . . . . . . . . . . . . . . . . . 13
2.3.5 Relative scales and the need for trial aggregation . . . . . . . 15
3 Remote eye tracker performance 17

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Study description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Evaluated instrument . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Reference instrument . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
ix
3.3 Metrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Pupil diameter metrology . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Pupil dilation metrology . . . . . . . . . . . . . . . . . . . . . 26
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Replication of classic pupillometry results 31

4.1 Digit span memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Study description . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Mental multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Vigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 From auditory to visual 41

5.1 The need to use visual stimuli . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Controlling for non-cognitive pupillary motions . . . . . . . . . . . . 42
5.2.1 Pupillary light reflex . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Luminance changes caused by shifting gaze . . . . . . . . . . . 43
5.2.3 Other visual causes of pupil changes . . . . . . . . . . . . . . 43
5.2.4 Pupillary blink response . . . . . . . . . . . . . . . . . . . . . 44
5.3 Visual replication of classic auditory studies . . . . . . . . . . . . . . 45
5.3.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.3 Digit sequence memory . . . . . . . . . . . . . . . . . . . . . . 48
5.3.4 Mental multiplication . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.5 Vigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
x
6 Combining gaze data with pupillometry 65
6.1 The usefulness of gaze data . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 Fixation-aligned pupillary response averaging . . . . . . . . . . . . . 67
6.2.1 Identifying subtask epochs using patterns in gaze data . . . . 68
6.2.2 Aligning pupil data from selected epochs . . . . . . . . . . . . 69
6.2.3 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.3 Example applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 Visual search . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.2 Map legend reference . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 Unsolved problems 87
7.1 Current limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.1.1 Simple, short tasks . . . . . . . . . . . . . . . . . . . . . . . . 87
7.1.2 Restrictions on task display . . . . . . . . . . . . . . . . . . . 88
7.1.3 Restrictions on interaction . . . . . . . . . . . . . . . . . . . . 88
7.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.1 Disentangling various pupillary influences . . . . . . . . . . . . 88
7.2.2 Combining pupillometry with other psychophysiological mea-
surements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2.3 Modeling and compensating for the pupillary light reflex . . . 90
7.2.4 Expanding proof-of-concept studies . . . . . . . . . . . . . . . 91
A Experimental Methods 93
A.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.3 Physical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.3.1 Room illumination . . . . . . . . . . . . . . . . . . . . . . . . 96
A.4 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.4.1 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.4.2 Perspective distortion . . . . . . . . . . . . . . . . . . . . . . . 99
xi
A.4.3 Data processing for statistical evaluation of differences in dila-
tion magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.4.4 Significance tests . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.4.5 Baseline subtraction . . . . . . . . . . . . . . . . . . . . . . . 100
A.4.6 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xii
List of Tables
3.1 Breakdown of the lighting and task conditions used to induce pupil
states and movements between double measurements. . . . . . . . . . 23
3.2 Breakdown of the diameter precision results for the eye tracker by
study participant and eye. . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Summary of the Tobii 1750s pupillometric performance . . . . . . . . 29
xiii
xiv
List of Figures
1.1 Scan path on the Stanford parking map . . . . . . . . . . . . . . . . . 2
2.1 Tobii 1750 eye tracker and highlighted pupil image . . . . . . . . . . 11

2.2 Chin-rest and head-mounted style eye trackers . . . . . . . . . . . . . 12
2.3 Two off-the-shelf remote eye tracking systems . . . . . . . . . . . . . 13
2.4 Sources of variation in measurements of pupil diameter. . . . . . . . . 15
3.1 Instruments used in the metrology study . . . . . . . . . . . . . . . . 20

3.2 Arrangement of equipment in the metrology study . . . . . . . . . . . 21
3.3 Scatterplot of simultaneous measurements taken during the metrology
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Participant fields of view during auditory experiments. . . . . . . . . 33

4.2 Comparison to classic auditory digit span result . . . . . . . . . . . . 34
4.3 Comparison to classic auditory mental multiplication result . . . . . . 36
4.4 Auditory vigilance pupil trace . . . . . . . . . . . . . . . . . . . . . . 38
5.1 Pupillary blink response for blinks of length 0.1 sec . . . . . . . . . . 46

5.2 Pupillary reaction to auditory vs. visual presentation of the digit span
task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Pupillary reaction to auditory vs. visual presentation of the mental
multiplication task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Pupillary reaction to sequential vs. simultaneous visual presentation of
the mental multiplication task . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Pupillary reaction to mental multiplication problems of varying difficulty 54
xv
5.6 Pupillary reaction to auditory vs. visual presentation of the vigilance
task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7 Pupillary reaction to vigilance moments that require reactions. . . . . 58
5.8 Performance on simple tasks by auditory vs. visual task presentation 60
6.1 Illustration of epoch alignment via temporal translation followed by

averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Illustration of piecewise linear warping applied to a single epoch of
pupil diameter data defined by four gaze events . . . . . . . . . . . . 72
6.3 Illustration of epoch alignment via piecewise linear warping followed
by averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.4 A fragment of a search field used in my visual search study . . . . . . 75
6.5 Pupillary response to visual search fixations on targets vs. non-targets 77
6.6 Pupillary response to sisual search discovery vs. revisit fixations . . . 79
6.7 Pupillary responses to visual search target discovery order . . . . . . 80
6.8 Gaze trace illustrating a legend reference in the map reading task . . 82
6.9 Pupillary response to legend references . . . . . . . . . . . . . . . . . 84
A.1 Arrangement of experimental equipment . . . . . . . . . . . . . . . . 95

A.2 Illustration of data cleaning steps . . . . . . . . . . . . . . . . . . . . 98
A.3 Left-right correlation of pupil size by frequency component . . . . . . 99
xvi
Chapter 1
Introduction
Every active intellectual process, every psychical effort, every exertion

of attention, every active mental image, regardless of content, particularly
every affect just as truly produces pupil enlargement as does every sensory
stimulus.
Oswald Bumke, 1911 [21]
Visualizations and visual interfaces work well when they make good use of peoples
perceptual and cognitive abilities. When patterns in data are mapped to visual forms
in which those patterns are easily explored and apprehended, visualizations expand
human capabilities. But unfortunately, many visualizations and visualizations often
overwhelm us, overloading our perceptual and cognitive resources instead of using
them efficiently. Figure 1.1 illustrates an example of the confusion that can be caused
by an overwhelming amount of detail.
The experience of being overloaded or overwhelmed is captured in part by the
psychological concept of cognitive load. That people have limited and measurable
cognitive capacities has been known and studied for decades [79], and experimental
psychologists have developed many experimental methods to measure the cognitive
load imposed by various tasks.
1
2 CHAPTER 1. INTRODUCTION
Figure 1.1: Scan path of somebody looking for visitor parking near the Clark center on
the Stanford campus. Circles along the path show fixations, and their area indicates
how long each fixation lasted. The person solving this problem spent a lot of time
looking at places in the map and on the legend which were irrelevant to the task.
The legend is so long that this person needed to refer to it several times. The map,
being designed for general-purpose way-finding, contains so much detail that the
information relevant to one particular task is hard to find.
3
Assessing the cognitive load imposed by visual tasks is important to the design
of cognitively efficient visual interfaces. Most interfaces are visual, and many require
people to shift attention between a variety of tasks with varying loads on perception,
attention, memory, and information processing. The psychophysiological study of
cognitive load in this context requires a physiological proxy which responds to load
quickly and reliably reflects small differences in load. One such proxy is the tendency
of the pupils to dilate slightly in response to cognitive loads.
The use of pupillometry for studying the cognitive load of visual tasks is compli-
cated by the pupils responsiveness to brightness and other features of visual stimuli.
Nevertheless, pupillometry has many advantages. First, the quick reactivity of pupils
(about 100 ms) enables study of the detailed, moment-by-moment timecourse of cog-
nitive load during visual tasks. Second, pupil diameter can now be measured using
high-speed remote infra-red cameras, without chin rests, bite bars, or head-mounted
equipment [63], making it the least invasive of all psychophysiological proxies for
cognitive load. Finally, pupil measurements are recorded as a side effect by most eye-
trackers, so it is convenient to collect pupil diameter data, and such data are usually
synchronized to gaze direction measurements, enabling the study of how cognitive
load is related to the locus of attention.
Measuring cognitive load can add depth to our understanding of visualization
performance in a way that goes beyond time and errors [88]. Two people may display
equal completion times or error rates on a task while devoting different levels of mental
effort. An interface that allows people to achieve the same task performance, including
completion time or error rate, with less effort than another is superior, because it frees
the user to devote more attention to higher level tasks such as hypothesis formulation
and pattern finding.
There have been, however, many limitations to current cognitive pupillometry
methods that make it difficult to apply to visualizations and visual interfaces. These
limitations include:
1. Cognitive pupillometry requires a camera fixed to the head, which is inconve-

nient and can interfere with task performance.
2. Pupil dilations and contractions caused by the visual field, especially the pupil-
lary light reflex, interfere with measurement of task-evoked pupil dilations.
3. Visual tasks are complicated, with many overlapping subtasks that occur with
unpredictable timing, precluding the time alignment of data from multiple task
instances required to detect task-evoked pupil dilations.
4. Pupil motions caused by motor activity confound task-evoked dilations, limiting

the use of interaction in tasks studied with cognitive pupillometry.
This dissertation expands the scope of cognitive pupillometry by addressing the

first three of these limitations, enabling the use of trial-averaged pupillometry to
measure cognitive load during simple visual tasks. Specifically, I
1. establish the viability of cognitive pupillometry using remote cameras, by mea-

suring the pupillometric precision and accuracy of a remote video eye tracker
(chapter 3) and by replicating classic cognitive pupillometry results using a
remote eye tracker (chapter 4); and
2. extend the applicability of cognitive pupillometry in visual tasks, by repeat-

ing standard auditory cognitive load experiments using visual stimuli (chap-
ter 5), by developing fixation-aligned averaging methods to to handle the un-
predictability of visual attention (chapter 6), and by demonstrating the mea-
surement of cognitive load in several visual tasks (section 6.3).
In chapter 2, I give the background context for these contributions, including

a definition of cognitive load (section 2.1), a survey of past cognitive pupillometry
research (section 2.2), and a summary of the current state of the art in cognitive
pupillometry (section 2.3).
Most of the content of this dissertation is also published in conference proceedings
and journals [61, 62, 63, 64]. I am the primary or sole author of all these papers.
My advisors Pat Hanrahan and Barbara Tversky provided guidance on experimental
design throughout my research, and Barbara helped me in particular with the presen-
tation and context of the results comparing aural and visual presentation of simple
5
tasks (chapter 5). Rakshit Kumar implemented the frequency-space analysis of the
correlation in dilations of the left and right pupils (subsection A.4.1).
Chapter 2
Background
Summary
I discuss the operational definition of the term cognitive load and various physi-
ological proxies that have been used to measure it. I briefly review the history of
cognitive pupillometry in psychology and human-computer interaction research. I
describe how cognitive pupil dilations are measured using infrared imaging and task-
aligned averaging of pupil diameter measurements.
2.1 Cognitive load defined

This dissertation is about new methods for measuring cognitive load. It is therefore
appropriate to begin with a careful definition of that term.
Psychologists have long used the physical analogies of cognitive load [88], pro-
cessing load [12], or effort [56] to describe mental states during problem solving.
Such physical metaphors are justified by findings that people have a limited capacity
for cognitive tasks [79] and the fact that engaging in one mental task interferes with
ones ability to engage in others [56]. As with other vague psychological concepts,
cognitive load gains definition through the experimental methods that are used to
measure it. Dual-task experiments are the most commonly used operationalization
of cognitive load, but for contexts where task interference causes problems, it is also
7
8 CHAPTER 2. BACKGROUND
possible to use a variety of physiological proxies.

Electroencephalography (EEG) and magnetoencephalography (MEG) measure
changes in magnetic fields at the scalp caused by changing electrical currents in brain
neurons. The main strength of these techniques is their millisecond-level time pre-
cision [37]. Brain imaging techniques based on the brains consumption of glucose
(via PET scanning) or oxygen (via functional MRI) provide a more delayed response
to cognition but enable 3D localization of brain activity with millimeter-level spatial
precision, which has led to their widespread use in functional neurology and neu-
roanatomy [7, 45].
Because increased cortical activity causes a brief, small autonomic nervous re-
sponse, techniques measuring non-neural secondary effects of this response are also
used as a proxy for cognitive load. Such physiological effects include electrodermal
activity [2], small variations in heart rate [103, 38], blood glucose [29, 107], peripheral
arterial tone [48], electrical activity in facial muscles [16], the details of eye movements
[76, 117, 84], and small dilations of the pupil, the focus of this dissertation.
There is neurological justification for using these physiological proxies as measures
of cognitive load, but in psychology, their operational justification comes from a
body of findings associating them with differences in task difficulty and differences in
individual task performance. Kahneman favored pupil dilations in his effort theory of
attention because they exhibit sensitivity to three variables which should be expected
of a proxy for effort: differences in difficulty grades within a single tasks, differences
in difficulty between different kinds of task, and differences in individual ability [12].
Because this operational approach defines cognitive load as measurements of
physiological proxies, based on desirable experimental properties of the proxies, the
question of what the vague term cognitive load means does not arise. Pupil dilations
elicited by tasks are cognitive load.
2.2 Past cognitive pupillometry research

For full reviews of cognitive pupillometry research, see Goldwater [33], Beatty [12],
Beatty and Lucero-Wagoner [13], Andreassi [3, ch. 12].
2.2. PAST COGNITIVE PUPILLOMETRY RESEARCH 9
2.2.1 Cognitive psychology
The earliest references to cognitive pupil dilations I am aware of are in the late
19th German neurology literature [104, 40, cited by Beatty and Lucero-Wagoner 13],
though no work appeared in English until the publication in Science of two articles by
Eckhard Hess [43, 44]. In a series of experiments, Hess found strong pupillary dilations
in response to emotions such as interest [43], disgust [42], and sexual arousal [42].
Among the earliest tasks used to validate pupillary dilations as an index of cogni-
tive load was mental arithmetic, especially mental multiplication. Simple multiplica-
tion tasks were part of Hess & Polts early experiments [44], and the task has since
been used to show that pupil dilations also reflect individual differences in mental
multiplication skill [1, also see section 4.2]. Performance on mental multiplication is
believed to depend strongly on working memory [4], and similar results have been
found for a broad set of short-term recall tasks. Kahneman found that the size of
pupillary dilations directly reflects the current load on working memory in simple
tasks requiring short term retention of a sequence of digits [57, 58], a result that
has since been replicated and extended to other short-term recall tasks [e.g. 32, also
see section 4.1]. Beatty and Kahneman [9] also observed dilations in response to
long-term memory retrieval tasks, interpreting them as a reflection of information
being retrieved from long-term and placed in short-term memory in preparation for
response.
Pupillary dilations have also been shown to be a reliable indicator of cognitive
load in tasks that do not depend on working memory, such as vigilance [11, 77] and
perceptual tasks [96]. Pupillometric studies of pitch discrimination [54, 106] and visual
threshold flash detection [36] were used to show that these processes are essentially
data-limited rather than resource-limited. With careful brightness controls, the effect
has been used to study how performance on line length discrimination is related to
general intelligence [119]. In the area of reading and language comprehension, pupil
dilations have provided insight into many levels of processing from low-level character
recognition [10] up to complex sentence comprehension [53] and language translation
[47].
2.2.2 Human-computer interaction

Recently, several human-computer interaction research groups have begun to use the
pupillometric capability of head-mounted video eye trackers to measure cognitive
load. Marshall [75] applied a wavelet decomposition to the pupil size signal in order
to estimate the average number of abrupt discontinuities in pupil size per second, and
used this measure as a general index of cognitive activity. Pomplun and Sunkara [95]
described how to correct bias in observed pupil size based on gaze direction. Moloney
et al. [81] used differences in pupil responses to distinguish older, visually impaired
subjects from younger, visually healthy control groups performing a drag-and-drop
task. Iqbal et al. [51] applied eye tracker pupillometry to show that mental workload
drops at task boundaries in a multi-step task and can be used as an indicator of
interruptability.
2.3 Cognitive pupillometry
2.3.1 Infrared video pupillometry

The most popular method for measuring pupil diameter is with video cameras under
infrared illumination. When infrared light shines into the eye near the optical axis of
the camera, it reflects off the retina much more efficiently than the cornea, causing
the pupil to light up brightly relative to the iris [28] (see 2.3.1). This is the same
effect responsible for red-eye artifacts in flash photography.
In images captured under this illumination, the bright oval of the pupil is easy to
segment and measure. The number of pixels spanned by this oval is converted into
a measurement of pupil diameter via a foreshortening division that accounts for the
distance between the pupil and the camera.
2.3.2 Cognitive pupillometry uses eye trackers

In most modern studies, pupil measurements are made using equipment designed
primarily for eye tracking, the measurement of the direction of a persons gaze. Eye
2.3. COGNITIVE PUPILLOMETRY 11
infrared
lights
camera
(a) Tobii 1750 (b) Pupil image
Figure 2.1: The eye tracker illuminates the participants eyes using several infrared
LEDs which surround the screen. The infrared light reflects efficiently off the partic-
ipants retinas causing their pupils to appear very bright in the image recorded by
the infrared camera mounted at the bottom of the screen. The image on the right is
an illustration of this effect based on a visible-light photograph of an eye [110].
tracking requires high resolution imaging of the eye and often involves infrared illumi-
nation to aid in locating the center of the pupil. Extending such systems to measure
pupil diameter is relatively easy, and off-the-shelf eye tracking systems today compute
pupil diameter routinely. Because gaze tracking involves locating the center of the
pupil, high precision eye trackers tend also to be high precision pupillometers.
2.3.3 Types of video eye trackers used for pupillometry

Head-fixed camera pupillometry
High precision measurements of pupil diameter depend on a setup in which the pupil
spans many pixels in the camera image. This is most easily achieved by placing the
camera close to the eye and fixing its position relative to the head, giving a large
pupil image and avoiding any foreshortening errors caused by head motion after the
initial calibration. There are two types: head mounted cameras, and large table-top
systems with chin rests or bite bars used to immobilize the head (Figure 2.2). Head-
mounted systems are the most popular kind of eye tracker and are used commonly
Figure 2.2: Typical eye trackers used for cognitive pupillometry. The eye tracker on
the left is an SMI iView X chin-rest style instrument, used primarily for reading and
other high-precision applications [108]. The eye tracker on the right is the Polhe-
mus VisionTrak Standard Head Mounted Eye Tracking System [93], used for mobile
applications, especially driving and piloting.
for pupillometry. Table-top systems with bite bars or other means of preventing head
motion are the most precise video eye trackers available and so also provide the most
precise pupil measurements.
Remote camera pupillometry
The alternative to configurations with a fixed camerapupil distance involves a re-

mote camera, not fixed with respect to the participants head. Cameras are typically
located on the desktop or mounted at the bottom of the field of vision, because the
view of the eyes from below is occluded by eyelids less often than the view from
above. Figure 2.3.3 shows two off-the-shelf remote eye tracking systems. Because the
camera is usually located 50-100 cm from the eye rather than the 10 cm or less used
in chin-rest and head-mounted systems, these systems measure the pupil with lower
precision. In addition, the freedom of head motion requires these systems to esti-
mate the camerapupil distance for each frame separately in order to implement the
foreshortening division. They do this by tracking the 3D position of both eyes with
respect to the camera, based on the positions of specular highlights on the surface of
(a) MangoldVision Eye Tracker, set up (b) Interactive Minds binocular Eye-
for use with a standard desktop com- gaze Analysis System [49], with two
puter [73] cameras mounted at the bottom of a
computer display on motor-controlled
gimbals to actively point directly at
participants eyes.
Figure 2.3: Two off-the-shelf remote eye tracking systems
the eye caused by the same infrared LEDs used to illuminate the retina.
Calibration errors in eye trackers can lead them to provide biased measurements
of absolute pupil size. This problem is worse in remote camera systems, but because
this bias is stable over time, measurements of relative pupil size are unbiased (see
chapter 3). This better performance for relative pupil size is what matters for cog-
nitive pupillometry, where the measurement of interest is usually changes in pupil
diameter relative to their diameter at the end of an accommodation period preced-
ing each trial [13]. Such dilation magnitudes have been found to be independent
of baseline pupil diameter and commensurate across multiple labs and experimental
procedures [12, 20, 19].
2.3.4 Advantages of remote imaging

All trial-aggregated cognitive pupillometry research I am aware of has been done using
head-fixed cameras. I believe that researchers have made this choice for the better
precision of head-fixed configuration. Equipment of this type is known to work and
broadly available, so there has been no incentive to validate alternative measurement
equipment. Although remote cameras are not as precise at measuring pupils, they
offer some important advantages.
Some applications require remote imaging
There are some applications which require remote, free-head eye tracking or pupil-
lometry, such as studies with infants [22] or investigations of small changes in anxiety,
distraction, or mental effort [97], where head-mounted equipment can interfere with
the effects being measured. Marshall reported that some of her experimental sub-
jects were bothered by wearing a head-mounted eye tracker, and that this may have
distorted some of her results [75].
With remote eye tracking, the lack of head-mounted equipment and obvious
screen-mounted cameras makes using an instrumented computer almost indistinguish-
able from normal desktop computer use. It is very easy for users to fall into their
usual habits and behave normally.
Remote imaging is becoming ubiquitous and cheap
The eye tracking industry is still small, with most eye trackers costing more than
$10,000 and marketed for research or disability applications. However, many manu-
facturers plan to move into mass-market eye tracking as soon as camera technology
with sufficient resolution for eye tracking becomes cheap enough. Many laptop mod-
els currently integrate screen-mounted cameras. All that is needed to implement
mass-market remote eye tracking is higher imaging resolution and perhaps an infra-
red light source, technologies which will become cheaper with time. To serve this
future of low-overhead eye tracking, many researchers are developing calibration-free
or minimal-calibration eye tracking methods [87, 39].
Mass-market gaze tracking will enable many new interactive and data collecting
applications. For cognitive pupillometry to ride this wave of deployment, it will need
to work with remote imaging.
autonomic
cognitive pupil
nervous data
load diameter
response
cognition, brightness, measurement

emotion contrast, etc. noise
Figure 2.4: Sources of variation in measurements of pupil diameter.
2.3.5 Relative scales and the need for trial aggregation
The magnitude of workload-related pupil dilations is usually less than 0.5 mm, smaller
than the magnitude of other simultaneously ongoing pupil changes caused by light
reflexes, emotions, and other brain activity, which collectively cause a constant vari-
ation in pupil size over a range of a few millimeters (see Figure 2.4). This difference
in magnitude between dilations related to cognitive workload and the background
pupil variability makes it impossible to distinguish the pupillary response to any
one instance of increased cognitive load from the background noise of other pupil
changes. In order to measure task-induced pupil dilations it is necessary to combine
measurements from several repetitions of the task.
One way to address this measurement challenge is to record pupil diameter during
a long period of time that includes many task instances or repetitions, then either
average pupil size over that long period [95], find consistent short-timescale changes
via wavelet transforms [75], or apply frequency-domain analysis [84, 81] to assess
aggregate cognitive load during that long task.
An alternative to this aggregation technique allows the measurement of differences
in cognitive load at a time scale of fractions of second rather than minutes. This
precision is achieved by measuring pupil size during many repetitions of the same
short task, then aligning windows of pupil measurements temporally at the moment
of task onset and averaging them [13]. The averaging operation will preserve any
component of the pupil size signal which is correlated in time with the onset of the
task (the task-evoked response), while measurement noise and other variation in pupil
size not correlated in time with the stimulus will tend to average to zero. As more
trials are conducted and included in the average, the ratio achieved between the level
of the signal (pupillary responses caused by the task) and the noise (all other pupillary
motions) becomes larger, and the time resolution of the average signal improves.
All pupil dilations measurements reported in this dissertation are based on this
trial-averaging method. The full details are described in section A.4.
Chapter 3
The pupillometric precision of a

remote eye tracker
Summary
This chapter describes a metrological study in which I determined the pupillometry
precision of the Tobii 1750 remote eye tracker and a set of experiments in which I
replicated classic cognitive pupillometry experiments performed originally on fixed-
head equipment, to demonstrate that a remote-imaging eye tracker can successfully
be used for cognitive pupillometry. Most of the content of this chapter was published
at the 2010 Symposium on Eye Tracking Research & Applications [62].
3.1 Motivation
Most eye trackers used in cognitive pupillometry use head-mounted cameras or chin
rests, because a fixed camera-pupil distance enables high pupillometric precision. In
contrast, remote eye trackers use cameras placed further from and not fixed to the
subjects head. As a result, remote eye trackers devote fewer pixels to each pupil
and must correct for variations in the camera-pupil distance and therefore exhibit
worse pupillometric precision. However, because of experimental advantages of remote
imaging, and because remote imaging is becoming ubiquitous and cheap, there is a
17
18 CHAPTER 3. REMOTE EYE TRACKER PERFORMANCE
need to know whether and how well cognitive pupillometry can be done using remote
imaging.
The pupillometric performance of remote eye trackers is not well known, because
this equipment is generally only used for eye tracking and not for pupillometry. Man-
ufacturers do not currently optimize designs for pupillometric performance, and rarely
document pupillometric performance in eye tracker specifications.
Quantifying the precision of remote pupillometry is important, to establish the
measurement feasibility of the equipment and to guide equipment choices and deter-
mine the number of participants and trials required to measure a given magnitude
pupillary response using a remote eye tracker. In order to determine the pupillometric
performance of the eye tracker I used in my research, I conducted a formal metrolog-
ical study with respect to a calibrated reference instrument, a medical pupillometer.
3.2 Study description
3.2.1 Evaluated instrument

I evaluated the pupillometric performance of the Tobii 1750 remove video eye tracker
[114], shown in Subfigure 3.1(a). This is the eye tracker I used for all the experiments
described in this dissertation.
The Tobii 1750 measures the size of a pupil by fitting an ellipse to the image of
that pupil under infrared light, then converting the width of the major axis of that
ellipse from pixels to millimeters based on the measured distance from the camera to
the pupil. According to Tobii, errors in this measurement of camerapupil distance
cause measurements of pupil diameter to have errors of up to 5% for fixed-size pupils
[113].
This 5% figure is a good start, but for guiding experimental design, we need
to extend it by a) distinguishing bias and precision components of the error, and
b) determining the average-case, rather than worst-case performance, because it is
usually the averages of many repeated pupil measurements which are used to quantify
task-evoked pupillary responses [13].
3.2. STUDY DESCRIPTION 19
3.2.2 Reference instrument

The reference instrument is a Neuroptics VIP-200 ophthalmology pupillometer, shown
in Subfigure 3.1(b). The Neuroptics VIP-200 records two seconds of video of the pupil,
then reports the mean and standard deviation of the pupils diameter over those two
seconds.
The manual for the Neuroptics VIP-200 reports its accuracy as 0.1 mm or 3%,
whichever is larger [85]. I asked Neuroptics for clarification and learned:
The accuracy reported in the manual is the maximum bound of the

error and it refers to the possible error of each single frame during the two
seconds measurement. The mean reported by the device is evaluated over
all frames; in the hypothetical case that the pupil does not fluctuate, yes,
this should result in a better accuracy. However, the pupil is always char-
acterized by a level of neurophisiological [sic] unrest and the two seconds
mean serves to eliminate the effect of this unrest in the determination of
the pupil size [86].
If measurement errors are normally distributed and we make the conservative

assumptions that a) maximum bound means two standard deviations and b) errors
within a two-second window are perfectly correlated, giving us no reduction in error
from the averaging, we get a reference instrument precision of about 0.05 mm. Since
this measurement error is caused in part by imaging noise which is independent for
each frame, the pupillometers true precision is probably better. Neuroptics calibrated
the VIP-200 to zero bias when it was manufactured.
3.2.3 Procedure
Three volunteers participated in the metrology study, which took place in an eye
clinic exam room. After a pilot study of 56 measurements to refine the measurement
and data recording procedure, I conducted a main study of 336 double measurements
in which I measured participants pupils using the eye tracker and the pupillometer
simultaneously. Because the pupillometer covers the eye it measures, I could not
(a) Tobii 1750 (b) Neuroptics VIP-200
Figure 3.1: The eye tracker and reference pupillometer
conduct simultaneous measurements of the same eye using both instruments, so for
each double measurement, the pupillometer measured one of the participants pupils
while the eye tracker measured the other (Figure 3.2). The metrological validity of
this study is therefore based on the strong correlation between the diameters of the
left and right pupils [68]. Measurements taken with the eye tracker were averages
over the 100 camera frames gathered in the same two-second measurement window
used by the reference pupillometer.
The measurements were conducted under various lighting conditions so that they
would span a variety of pupil states: half under normal room lighting and half under
dim lighting, where a third of the time I switched the lights on or off during the
few seconds between successive double measurements (see Table 3.1). In all trials,
subjects looked at a small fixation target at the center of the eye trackers screen,
which was otherwise filled with 64 cd/m2 medium gray. I excluded 120 measurements
in which I did not get a clean reading with the pupillometer and 10 measurements in
3.2. STUDY DESCRIPTION 21
Figure 3.2: Metrology study arrangement. An investigator is measuring the partici-

pants left pupil using the reference pupillometer while the eye tracker simultaneously
measures his right pupil.
which I did not get a clean reading with the eye tracker, leaving 206 successful double
measurements, analyzed below.
3.3 Metrology
I present two different metrological analyses of these double measurements: the first,
based on pupil diameters, is simpler and can use all of the data but is limited by strong
assumptions. The second analysis, based on dilations, uses weaker assumptions but
is restricted to a subset of the available data.
3.3.1 Pupil diameter metrology
For both instruments, I model the measurement error as being additive and normally
distributed:
=+ N (, ),
where is the diameter of the pupil, is the measurement of that diameter, and is
the measurement error. and are random variables that take on new values for each
measurement. Each instruments bias is the the fixed component of the measurement
error , its accuracy is the magnitude of the bias ||, and its precision is the standard
deviation of the measurement error . For the reference pupillometer (pm), [pm ] =
0 mm and [pm ] = 0.05 mm, according to information provided by its manufacturer.
For the eye tracker (et), the parameters of the measurement error distribution [et ]
(bias) and [et ] (precision) are what we are trying to determine.
We can estimate these parameters by analyzing the differences between simulta-
neous measurements made with the eye tracker and the pupillometer:
et pm = (et + et ) (pm + pm )

= et pm + et pm
Trial type Pupil reaction Measurement timing Num. suc-
3.3. METROLOGY
cessful double
measurements
bright static lighting stable wide one double measurement 75
dim static lighting stable narrow one double measurement 41
lights turned off during the reflex dilation double measurements before and 36
trial after lighting change
lights turned on during the reflex constriction double measurements before and 28
trial after lighting change
digit span memory task cognitive dilation double measurements before 26
memorization and during the
retention pause (maximum load
on memory)
Table 3.1: Breakdown of the lighting and task conditions used to induce pupil states and movements between
double measurements.
23
This is an equation of random variables. Considering the variance of each side:
2 [et pm ] = 2 [et pm + et pm ]

2 [et pm ] = 2 [et pm ] + 2 [et ] + 2 [pm ]
:o

2 [et pm ] =
2 2 2
et pm ] + [et ] + [pm ]
[ (3.1)
2 [et pm ] = 2 [et ] + 2 [pm ]
q
[et ] = 2 [et pm ] 2 [pm ] (3.2)
The relationship in Equation 3.2 gives us a way to estimate the precision of the
eye tracker based on the known precision of the reference pupillometer [pm ] and the
variance in the differences between the simultaneous measurements 2 [et pm ].
Similarly, we can compute the bias of the eye tracker based on the mean of those
differences:
[et pm ] = [et pm + et pm ]

[et pm ] = [et pm ] + [et ] [pm ]
:o
*o

[et pm ] =
[
et

pm ] + [et ] + [
pm
] (3.3)
[et ] = [et pm ] (3.4)
Substituting the mean and variance of the actually observed differences et pm

in Equations 3.4 and 3.2, the eye trackers pupillometric bias is 0.11 mm, and its
precision is 0.38 mm.
These figures are misleading, however, because the bias and precision of the eye
tracker varied substantially between the three participants and between the two eyes
of each participant. Figure 3.3 shows the results of all 206 successful simultaneous
measurements and illustrates this inter-subject variation. For each eye individually,
the measurement error has a much narrower spread, but the average is wrong by as
much as 0.67 mm. The accuracy and precision varies from eye to eye because the eye
3.3. METROLOGY 25
5.0 1.0 o o
o ooo o o
o o oo
difference in measurements (mm)

o ooo o
ooooooo
eye tracker measurement (mm)
o o oooo
oooo o oo
o oo oooo ooooooo
4.5
oo o
oo
o
ooooooooo ooooo ooo
o o ooo ooo o o oooo
o oo ooo oo o ooooo o oooo
oo o
o
oo
o o oo 0.5 oo
ooo
oooo o ooo o ooo oo o
4.0 o
ooo
o o o ooo o o
o o ooo o oooo
o o ooo oo
o oo oooo oo o
oo
o ooooo o oo o o
ooo o o oo
o ooo oooooo o oo o oo
o o
o o o o oo oooo ooooo o
oo oooooo oooo oo
oo oooo ooooo
ooo o ooooo oooo
ooo oo
3.5 ooo
ooo oooo ooo ooooo ooo
ooo oo 0.0 oooooooo
o oo oooo o oo oooooo
o o ooo
o o o o
ooooo
o o oo
ooo o o
o oo ooo o o
o
o o
o o oooo oooooo oooo oo oo
o ooo ooooo o
o o
3.0 ooooo oooo oo
ooo oo
ooooo oo
o ooooo
oooo ooooo o ooooo o
ooo o oo ooo o
oo oo ooo oo o
0.5 o ooo oo
2.5 oo o o
all data
Participant 1
Left Eye
Participant 1
Right Eye
Participant 2
Left Eye
Participant 2
Right Eye
Participant 3
Left Eye
Participant 3
Right Eye
2.5 3.0 3.5 4.0 4.5 5.0
pupillometer measurement (mm)
Figure 3.3: The left graph shows the raw data of the metrology study, with each point
representing a double measurement (pm , et ). Data from each participant and each
subject are plotted in a different color. The right chart shows the differences between
the eye tracker and pupillometer measurements, et pm , broken down by study
participant and eye, showing how the eye trackers pupillometric bias varies for each
eye.
trackers pupil measurements depend on its estimate of the camera-pupil distance,

which is affected by errors in the eye trackers calibration to each eyes corneal shape.
Table 3.2 shows the result of applying Equations 3.4 and 3.2 to the data from each
eye separately. Across all six eyes, I found an average bias of 0.34 mm (worse than the
overall 0.11 mm) and an average precision of 0.12 mm (better than the overall 0.38
mm). Because it is differences in measurements for the same eye (dilations) that form
the basis of most experimental use of pupillometry [13], and because pupillometric
experiments are usually conducted with several participants, these per-eye results for
the eye trackers bias and precision are the most relevant and are the ones summarized
in Table 3.3.
In Equation 3.3 of the derivation for accuracy, the term [et pm ] was assumed to
be zero. I ensured this zero mean left-right difference in pupil size by counterbalancing
which of the two eyes was measured with which instrument within the trials for each
participant.
eye tracker accuracy (mm) eye tracker precision (mm)

Participant 1 left eye 0.61 0.15
Participant 1 right eye 0.34 0.11
mean 0.34 0.12
Table 3.2: Breakdown of the diameter precision results for the eye tracker by study
participant and eye.
Similarly, the cancelation in Equation 3.1 of the derivation for precision assumes
that term 2 [et pm ] is zero. This assumption, that the difference in size between
participants left and right pupils is constant throughout the study, is much stronger.
Judging from pupil data I have recorded in a variety of studies, the assumption holds
over short periods of time (a few minutes), but the left-right difference in pupil size
can sometimes drift over the 1520 minutes it takes to make the measurements of
each participant. Violations of this assumption would lead to an underestimate of
the average error the eye tracker. A more conservative analysis, based on differences
in short-term dilations measured by each instrument, provides an alternative estimate
of the eye trackers precision.
3.3.2 Pupil dilation metrology
We can determine the pupillometric precision of the eye tracker using differences in
measurements of dilations rather than absolute pupil diameters. Using = 2 1
to denote the dilation of the pupil from time 1 to time 2 and = 2 1 to denote
the measurement of that dilation,
3.3. METROLOGY 27
et pm = (et2 et1 ) (pm2 pm1 )

= [(et2 + et2 ) (et1 + et1 )] [(pm2 + pm2 ) (pm1 + pm1 )]
= (et2 et1 ) (pm2 pm1 ) + et2 et1 + pm1 pm2
= (et pm ) + et2 et1 + pm1 pm2 (3.5)
As before, now considering the variance of the random variables on each side of
Equation 3.5:
2 [et pm ] = 2 [(et pm ) + et2 et1 + pm1 pm2 ]

:o 2

= 2[ 2 2 2
et pm ] + [et2 ] + [et1 ] + [pm1 ] + [pm2 ] (3.6)
= 2 [et2 ] + 2 [et1 ] + 2 [pm1 ] + 2 [pm2 ]
= 2 2 [et ] + 2 2 [pm ] (3.7)
2 [et ] = 12 2 [et pm ] 2 [pm ]
q
[et ] = 12 2 [et pm ] 2 [pm ] (3.8)
The cancellation in Equation 3.6 is based on the assumption that the difference
between the left eyes dilation and the right eyes dilation is constant over a short
period of time. I observed this fact in an earlier study conducted on lateralized
pupillary responses, in which I tried several stimulus based ways of inducing different
dilations in subjects two pupils but never succeeded in causing any significant left-
right differences. I abandoned the effort after learning that the neuroanatomy of
pupil size regulation renders such differences extremely unlikely [68]. Step 3.7 relies
on the assumption that the bias of the measurement error is stable over time for both
instruments ([pm1 ] = [pm2 ] and [et1 ] = [et2 ]).
Among the 206 successful double measurements, there are 84 pairs of double mea-
surements that took place within 30 seconds of each other. That is, there were 84
dilations with duration less than 30 seconds with starting diameters and ending diam-
eters that were both measured with the two instruments simultaneously. Substituting
the observed et pm in Equation 3.8 gives pupillometric precision of the eye tracker
as 0.15 mm, slightly worse than the diameter-based precision of 0.12 mm.
3.4 Conclusion
This chapter presents two analyses of measurements made simultaneously using the
Tobii 1750 remote eye tracker and a medical pupillometer. The first analysis, diameter-
based metrology (subsection 3.3.1), provided an estimate of the eye trackers pupil-
lometric accuracy andvia a relatively strong assumptiona lower bound on the
eye trackers pupillometric precision. The second analysis, dilation-based metrol-
ogy (subsection 3.3.2), provided an alternative estimate of precision relying on fewer
assumptions but also with less applicable data. The results of both analyses are sum-
marized in Table 3.3, together with the resultant derived precision for binocular and
dilation measurements.
The Tobii 1750, which is typical of recent research-targeted remote eye trackers,
has a binocular pupillometric precision of 0.15. While this performance is worse than
the precision offered by head mounted (0.02 mm) or chin-rest systems (0.01 mm), it
is good enough for task-averaged cognitive pupillometry. The background variation
in pupil size which requires averaging over many task repetitions is on the order of 1.0
mm, so all of these systems have sufficient resolution to detect task-induced dilations.
Increasing the resolution of a remote eye trackers camera would of course improve
its pupillometric precision. Another means of improving the performance of remote
camera systems is to use an active aiming system to point the camera directly at a
users eyes to allow a narrow, eyes-only field of view even during head motion [49].
A similar improvement can also be gained by using a programmable CCD in which
faster sampling rates and more image processing are applied to the region of the
camera image containing the eyes, wherever they appear in the field of view.
eye tracker eye tracker precision (mm)
diameter pupil diameter dilation magnitude
3.4. CONCLUSION
accuracy
assumption data per-eye monocularbinocular monocularbinocular
(mm) mean mean
The difference in size be- 206 double mea- 0.34 0.12 0.08 0.17 0.12
tween the left and right surements
pupils is constant over the
study.
The difference between the 84 pairs of NA 0.15 0.10 0.21 0.15
left eyes dilation and the double measure-
right eyes dilation is con- ments
stant over 30 sec.
Table 3.3: Summary of the Tobii 1750s pupillometric performance. Figures for monocular diameter accuracy and
precision are the results of the metrological analysis above. Other figures in the table
were then derived from
these primary results. Dilation measurement precision is larger (worse) by a factor of 2, because it is based on
the difference of two diameter measurements. When both eyes are measured
and averaged, the precision in the
estimate of their mean dilation (or diameter) improves by a factor of 2 over the monocular case.
29
Chapter 4
Replication of classic cognitive

pupillometry results on a remote
eye tracker
Summary
The previous chapter demonstrated that remote eye trackers have a binocular pupil-
lometric precision of 0.15, which should be enough for cognitive pupillometry ap-
plications. However, because remote eye trackers have not yet been used for trial-
aggregated pupillometry, I conducted several basic experiments to see if they work.
In this section, I report three of them. In choosing tasks, I sought to (a) span diverse
types of cognitive load, (b) replicate well-studied tasks to enable comparisons to prior
results, and (c) use simple stimuli that are easy to match between aural and visual
presentation (see section 5.3). I chose mental multiplication, digit-span memory, and
vigilance. The first two replicate classic cognitive pupillometry studies, to determine
whether I could observe expected well-established pupil dilation patterns. The third
experiment was original. Most of the content of this chapter was published at the
2008 Symposium on Eye Tracking Research & Applications [63] and in the journal
Psychophysiology [64].
31
32 CHAPTER 4. REPLICATION OF CLASSIC PUPILLOMETRY RESULTS
4.1 Digit span memory
4.1.1 Background
Short-term recall of a paced sequence of digits (also known as the digit span task) is the
most popular experimental task in cognitive pupillometry. First used by Kahneman
and Beatty [57], the task was also used to investigate the related processes of long-term
recall [9], grouping [55], and rehearsal [58]. Peavler [91] showed that the pupil reaches
a plateau diameter of about 0.5 mm around the presentation of the seventh digit.
Granholm et al. [34] replicated this finding, confirming that pupil dilation averaging
can be used to estimate both the momentary load and the maximum capacity of
working memory. My experiment replicated the original Kahneman and Beatty [57]
study.
4.1.2 Study description

I ran 98 trials of this task with seven participants. Details regarding study par-
ticipants, equipment, and procedures not specific to this task are described in Ap-
pendix A.
I began each trial with a two-second pre-stimulus accommodation period, during
which participants rested their eyes on a fixation target in the center of the screen in
order to stabilize their pupils (see Subfigure 4.1(a)). I then presented a sequence of
digits at the rate of one per second, spoken aloud over a speaker placed behind the
eye trackers screen. After a brief retention pause, participants then reported back
the sequence. In a departure from Kahneman and Beattys procedure, rather than
speaking the product, participants typed it into an on-screen keypad using the mouse
(Subfigure 4.1(b)). I randomly varied the length of the digit sequence for each trial
between 6 and 8 digits.
For this task, and for mental multiplication, where the tasks required numerical
responses, I asked participants to type their responses into a low-contrast on-screen
keypad. I did this to automate data collection and to avoid pupillary reflexes to
varying brightness caused by looking away from the screen. Because button-press
4.1. DIGIT SPAN MEMORY 33
(a) Display during auditory stimulus (b) Display with keypad used for gath-
presentation, with fixation target at the ering subject responses in Digit Span
center Memory (section 4.1) and Mental Mul-
tiplication (section 4.2) tasks
Figure 4.1: Participant fields of view during auditory experiments.
responses themselves induce pupillary responses [99], and I could not avoid such
interference by using spoken responses [17, 55], I limited our analysis to pre-response
periods.
4.1.3 Results
My findings matched those of Kahneman and Beatty [57]. In both experiments,

pupil diameter increased as the digits to be memorized were heard and encoded,
peaked during the pause while they were retained, and declined as the subjects re-
ported them back (see Figure 4.2). The magnitude of the response increased mono-
tonically with the length of the memorized sequence. In the 1966 study, subjects
repeated the sequence aloud, one digit per second, while in my study, the response
was entered using an on-screen numeric keypad (Subfigure 4.1(b)). This enabled a
4.2
sequence length
7 digits
6 digits
Average Pupil Diameter (mm)
5 digits
4.0
3.8
3.6
-10 -5 0 5 10
Time (seconds)
0.6
sequence length
0.5
8 digits
Change in pupil diameter (mm)
7 digits
0.4
6 digits
0.3
0.2
0.1
0.0
0.2
12 10 8 6 4 2 0 2 4 6 8 10 12
Time (seconds)
Figure 4.2: Pupillary response during the digit span short-term memory task. The
top graph shows the results reported by Kahneman and Beatty [57], and the bottom
graph shows my results. The two graphs are aligned and plotted at the same scale.
4.2. MENTAL MULTIPLICATION 35
faster response and resulted in the observed steeper decline in pupil diameter in my
results.
4.2 Mental multiplication

Mental multiplication is one of the oldest tasks studied with cognitive pupillometry,
with the first experiments conducted in the 19th century [Heinrich 40, cited by Beatty
and Lucero-Wagoner 13]. Hess and Polt [44] triggered broad interest in cognitive
pupillometry when they reported that solving mental multiplication problems caused
pupil dilations and that harder problems evoked larger dilations. Their results were
replicated by Bradshaw [18] for mental division with remainders; Boersma et al. [15]
for mental addition in a study of mental retardation; and Ahern and Beatty [1] in
a study of the effect of individual differences in ability as measured by SAT scores.
Recently, Marshall [75] used a mental arithmetic task to validate a wavelet-based
method of analyzing pupil measurements.

I ran 65 trials of this task with seven participants. Details regarding study par-
pendix A.
As in the study of the digit span task, I began each trial with a two-second pre-
stimulus pupil accommodation period. I then presented the participant with two
numbers, the multiplicand and multiplier, separated by two seconds. Five seconds
after I presented the multiplier, participants were prompted for the two numbers
product. As I did for the digit span task, I departed from the original experiments
procedure by using an on-screen keypad to record the participants response (4.1.2).
For each trial, I randomly selected a difficulty level of easy, medium, or hard, then
chose the multiplier and multiplicand randomly according to Ahern and Beattys def-
inition of these difficulty levels: easy problems took the form {6, 7, 8, 9} {12, 13, 14}
(e.g. 7 13), medium were {6, 7, 8, 9} {16, 17, 18, 19}, and hard {11, 12, 13, 14}
{16, 17, 18, 19}. I instructed participants not to provide a response in cases when they
forgot one of the two numbers or gave up on computing their product.
4.2.2 Results
There was a small (0.1 mm) increase in pupil size as the multiplicand was committed
to short term memory and a larger, longer-lasting increase after the subjects heard
the multiplier and began computing the product (See Figure 4.3). Although I gave
problems at all three difficulties, the easy level was the only one for which I col-
lected sufficient correct responses for analysis. The pupillary response I observed for
these easy problems resembles the prior result for medium and difficult problems. I
speculate that students in 1979 had more practice with mental arithmetic.
4.3 Vigilance
The mental multiplication and digit span tasks are both strongly dependent on work-
ing memory. I designed my third experiment to investigate pupil dilations evoked by
less memory-dependent processes, using a task that requires intermittent vigilance,
stimulus discrimination, and speeded motor responses.

I ran 94 trials of this task with eight participants. Details regarding study par-
pendix A.
In each trial, I presented an ascending sequence of numbers from 1 through 20. I
told participants that the sequence might progress normally or might contain errors
at the number 6, 12, and/or 18. When they noticed an error (a target), they were
to push a button as quickly as possible. For example, part of the sequence might be
. . . 10, 11, 12, 13, . . . , in which case I instructed the participants to do nothing,
or it might be . . . 10, 11, 7, 13, . . . , in which case I told them to push the button
as soon as possible after noticing the 7. I inserted sequence errors (targets) at the
4.3. VIGILANCE 37
multiplicand multiplier DIFFICULT

0.5 presented presented
MEDIUM
0.4
0.3
0.2
0.1
EASY
0.0
-0.1
0 2 4 6 8
Time (seconds)
0.5
multiplicand multiplier
spoken spoken EASY
0.4
0.3
0.2
0.1
0.0
0.1
0 1 2 3 4 5 6 7 8 9
Time (seconds)
Figure 4.3: Pupillary response during the mental multiplication task. The top graph
shows the results reported by Ahern and Beatty [1]. The bottom graph shows the
results from my replication of their experiment. The two graphs are aligned and
plotted at the same scale.
0.2
0.1
0.0
-0.1
possible possible possible
target target target
0 5 10 15 20
Time (seconds)
Figure 4.4: Pupillary response to an aural vigilance task. The grey bars mark mo-
ments when subjects needed to listen carefully and react quickly to mistakes in a
spoken sequence of numbers.
three possible positions independently and randomly with probability one half. Thus
any trial could contain 0, 1, 2, or 3 targets, and participants knew exactly when the
targets might appear. 6 was never replaced by 16, nor 18 by 8, so that errors
were apparent from the start of each spoken target stimulus.
Unlike my experiments with digit span memory and mental multiplication, this
experiment did not replicate a past study, though it incorporated aspects of prior
experiments. Beatty [11] found pupil dilations evoked by target tones in an auditory
vigilance task, though in that experiment target locations were randomized, so that
participants could not anticipate them, and continuous rather than intermittent vig-
ilance was required. The anticipated increase in vigilance required by this task was
studied by Richer et al. [100].
4.3.2 Results
I observed sharp spikes in pupil diameter with consistent magnitude, onset timing,
duration, and shape following all three mistake points (see Figure 4.4)
4.4. CONCLUSION 39
4.4 Conclusion
For all three tasks, I observed patterns of pupil dilation with timing matched to the
details of the task. In the digit span tasks, the dilation profile tracked the number of
digits held in memory over time. For mental multiplication, a small dilation followed
the presentation of the multiplicand and a larger, longer dilation followed presentation
of the multiplier. For counting vigilance, dilations occurred at each of the possible
mistake points. For the two tasks which replicated classic studies, my results matched
the standard findings. These findings confirm that the Tobii 1750 remote eye tracker
has sufficient precision to measure task-evoked pupillary dilations.
Chapter 5
From auditory to visual
Summary
Pupil dilation magnitude has been shown to be a valid and reliable measure of cog-
nitive load for auditory tasks. Because the pupil dilates for reasons other than cogni-
tive load, especially changes in brightness, assessing cognitive load in visual tasks has
been problematic. I review the pupillary light reflex and other non-cognitive sources
of pupil motions and how they can be controlled experimentally, including a novel
method for compensating for pupillary blink reflexes. I describe a repetition of the
three studies described in chapter 4, in which visual stimuli are used instead of audi-
tory. These studies found that remote cognitive pupillometry works well for the visual
versions of digit span memory, mental multiplication, and vigilance, and that visual
versions of the tasks all evoke smaller pupil dilations than the auditory versions. Most
of the content of this chapter was published in the journal Psychophysiology [64].
5.1 The need to use visual stimuli

Developments in graphics have brought interfaces, newspapers, textbooks, and in-
structions which increasingly present changing visual information. Viewers need to
attend to, search through, and evaluate this information in order to integrate it. Are
visual interfaces the best way to present this information or might cognitive load be
41
42 CHAPTER 5. FROM AUDITORY TO VISUAL
lessened with auditory presentation? Are the parameters of cognitive load similar for
visual and auditory presentation?
Kahneman utilized pupillary dilations extensively [e.g. 59, 55, 58] and used pupil-
lary dilations as the primary empirical foundation for his attention theory of effort
[56]. He identified three criteria desirable for physiological proxies for effort and which
he observed in pupillary dilations: differences in the magnitude of averaged pupillary
dilations reliably reflect (a) different difficulty levels of a single task, (b) differences in
difficulty across qualitatively different tasks, and (c) individual differences in ability.
In a review nine years later, Beatty [12] reaffirmed that the experimental evidence
then available showed that pupillary dilations fulfill all three of Kahnemans criteria.
To my knowledge, nobody has examined the effect of aural vs. visual presentation
mode itself on the magnitude of pupillary dilations. This lack of data confounds
the use of dilations for comparing cognitive loads between visual and aural tasks,
because it can not be known how much of the difference is caused by the difference in
presentation modalities and how much is caused by differences in post-perception task
demands. In other words, it is still not known whether Kahnemans second criterion,
inter-task comparability, is fulfilled by pupil dilations when used to study visual as
well as auditory tasks.
The following section reviews the pupillary light reflex and other non-cognitive
sources of pupil dilations and how they can be controlled experimentally, including a
novel method for compensating for pupillary blink reflexes. The rest of the chapter
describes a replication of the three auditory tasks described in chapter 4, this time
using visual instead of auditory stimuli, in order to see the difference caused by
presenting tasks visually.
5.2 Controlling for non-cognitive pupillary motions
5.2.1 Pupillary light reflex

The largest potentially confounding pupillary motion is the pupillary light reflex,
which is much larger in magnitude than cognition-induced pupil changes [68].
5.2. CONTROLLING FOR NON-COGNITIVE PUPILLARY MOTIONS 43
I followed standard practice [e.g. 118, 82], and dealt with this problem by avoiding
it. In all of my studies, I maintained constant visual field luminance across exper-
imental conditions. Additionally, I used isoluminant pre-stimulus masks to avoid
luminance changes at stimulus onset (see subsection 5.3.2). A few researchers have
have attempted to adjust pupil diameter data to compensate for the overall luminance
of stimuli [94, 83], but these approaches only model constant luminance, so they are
not yet applicable to trial-averaged cognitive pupillometry.
5.2.2 Luminance changes caused by shifting gaze

Experiments in which participants shift their gaze to look at many parts of a visual
stimulus, including studies of reactions to photographs [67, 26], visual search [5, 97],
and visual scanning [95, 117] are subject to pupillary light reflexes when participants
fixate on local areas of the stimulus with varying luminance even though the overall
luminance of the stimulus does not change. Reading studies, in which textual stimuli
have relatively uniform local luminance and consistent fixation sequences, are not as
vulnerable to this problem and have successfully measured small task-evoked pupillary
responses amidst active eye movements [e.g. 53]. I controlled for saccade-induced
luminance changes by presenting all stimuli at a fixed location within an area small
enough to fall within the fovea, and by helping participants to keep their gaze fixed by
presenting a fixation target at all times and keeping trial durations under 20 seconds.
5.2.3 Other visual causes of pupil changes

In addition to the light reflex and the cognitive load response, the pupil also exhibits
small dilations or contractions in response to changes in accommodation distance [69],
contrast [116], spatial structure [25] and the onset of coherent motion [101]. Kohn
and Clynes [65] showed that simply changing the color content of a visual stimulus,
without changing either local or global luminance, can cause the pupils to either dilate
or contract, depending on the nature of the color change. Many of these effects have
been explained as special cases of the pupillary light reflex caused by local neighbor
inhibition on the retina [60]. I controlled for all of these influences on pupil size by
using achromatic, fixed-distance, non-moving, constant-contrast stimuli.
Fatigue and habituation
Over a long experiment, the baseline diameter of the pupil gradually declines [42], an
instance of the general affect of fatigue on pupil diameter [71, 91]. In addition, over
many replications of the same stimulus the magnitude the resultant pupil dilations
gradually decreases [72, 66, cited in Tryon [115, p. 91]]. These effects make pupil-
lometry suitable for the measurement of operator fatigue, but when focusing on tasks
rather than people, it is important to control for these effects. In my experiments,
I limited the duration of experimental sessions to one hour, including 30 minutes of
actual measurements, and limited any one trial type to 50 repetitions per participant.
All trial repetitions were initiated by participants, and I told them that they could
take a break whenever they wanted; only a few participants ever did so.
5.2.4 Pupillary blink response

Blinks cause the pupils to very briefly contract and then recover to their pre-blink
diameter [27, cited by 115, p. 91]. Normally, this reaction is controlled in experiments
by instructing participants not to blink during trials. For standard pupillometry
studies in which the tasks are short (less than ten seconds) to enable trial averaging,
suppressing blinks is not difficult. But in order to extend cognitive pupillometry to
visual tasks, with less controlled structure and longer duration, a method is needed to
compensate for pupillary blink responses. To the extent that blinks occur randomly,
pupillary blink responses add noise to averaged pupil diameter measurements, and to
the extent that blinks are correlated with stimuli, pupillary blink responses adds bias
to averaged pupil diameter measurements.
I pooled data from twenty thousand binocular blinks that occurred during several
of my eye tracking studies, grouped the blinks by duration, and averaged them to
determine 3-second-long blink response correction signals. I observed blink responses
consisting of a very brief dilation of about 0.04 mm, followed by a contraction of
about 0.1 mm and then a gradual recovery to pre-blink diameter over the next two
5.3. VISUAL REPLICATION OF CLASSIC AUDITORY STUDIES 45
seconds. The timing and magnitude of these changes depend on the duration of the
blink. In data processing of each study, I then removed the pupillary blink responses
by altering the data following each blink by subtracting the blink response correction
signal corresponding to the length of that blink. Figure 5.1 shows the blink correction
signal for blinks that lasted five samples (100 ms).
For stimulus-correlated blinks, the general effect of this correction is to decrease
the magnitude of pupillary responses measured in the first second following a blink by
about 0.03 mm and increase the magnitude of pupillary responses measured in the sec-
ond second following a blink by about 0.05 mm. For stimulus-uncorrelated blinks, the
general effect of this correction is to remove measurement noise and thereby decrease
the standard errors of the mean in stimulus-locked averages of dilation magnitude.
This correction applies to data gathered after each blink. For the missing data
points that fall during the blink itself, I followed the standard practice of filling the
gaps with linear interpolation.
Because this is a new data processing technique for pupil data, I re-ran the analysis
of auditory vs. visual stimuli without blink response correction and found that the
correction did not change the significance of any of my results and changed the effect
sizes by only 0.0050.01 mm, suggesting that blinks were not well-correlated with
stimuli for the tasks I examined and contributed only noise to the stimulus-aligned
averages.
5.3 Visual replication of classic auditory studies

In my visual replication of the three auditory cognitive pupillometry studies described
in chapter 4, I took care to control for all known non-cognitive pupillary reflexes. The
visual conditions employ visual fields with matching brightness and contrast to the
original auditory studies; the difference is that in the aural conditions, the task-
relevant stimuli were heard, and in the visual conditions they were seen.
Because visual perception is generally believed to involve less effort, but the subse-
quent central processing demands were matched between the two presentation condi-
tions, I expected dilations evoked by visually presented tasks to start out smaller but
pupillary changes around blinks of length 5

(average of 1575 blinks)
Observed blink reaction

0.04 blink
Derived Blink Correction Signal
0.02
0.00
0.02
0.04
0.06
0.08
1 0 1 2 3
Time (seconds)
Figure 5.1: Pupillary blink response for blinks with a 5-sample (0.1 sec) duration.
Note that the vertical scale is much smaller than other pupil traces. This blink
response was subtracted from data gathered after every blink with this same duration.
Similar blink responses were computed and used for blinks with durations up to 25
samples (0.5 sec).
to eventually reach the same peak diameter as those evoked by the aurally presented
versions. I also expected this difference in effort to be reflected in lower error rates
and quicker responses in the visual conditions.
5.3.1 Procedure
For details of experimental procedure common to all studies, see Appendix A.
5.3.2 Stimuli
As in the auditory versions of these experiments, stimuli for all experiments were num-
bers between 1 and 20. Under the auditory condition, stimuli were 500 ms digitized
recordings of spoken numbers played over a computer speaker placed directly behind
the screen. Under the visual condition described here, I displayed these numbers at
the center of the eye trackers integrated 17-inch 1280 1024 LCD screen. I used a
28-point font size so that the digits spanned 0.73 (about a third of the foveal span)
when viewed from participants initial seating distance of 60 cm. These numerals
were black, and the rest of the screen was always filled with a uniform background of
64 cd/m2 medium gray.
The onset timing and duration of visual number presentation were matched to
the timing used in my auditory study. During periods of time with no stimulus
(between trials, during the pre-stimulus pupil accommodation period, and in between
presentation of numbers during the task), where the auditory experiment used silence,
I masked the stimulus by displaying an X at the center of the screen in place of a
number, in order to remove contrast and brightness changes caused by the appearance
or disappearance of the numerals. The absence of clear constrictions following the
time of visual stimulus change in the visual waveforms provides evidence that these
stimulus changes per se had little effect on the pupil in my experiments.
5.3.3 Digit sequence memory
All prior investigations of pupil dilations evoked by the digit span recall task (see
section 4.1) presented the digit sequence aurally. This study is another replication
of the original Kahneman and Beatty [57] study, with visual rather than auditory
presentation of the numbers. I ran 607 repetitions of this task with 17 experimental
participants.
Unlike the auditory study, where I used sequences of length 6, 7, or 8 digits, I
randomly varied the length of the presented sequence for each trial independently
between 3 and 8 digits. I used the first two seconds of the retention pause following
presentation of the digit sequence as the response window for pupil diameter averaging
and significance testing, because this is the moment when Kahneman and Beatty [57]
observed maximum dilations.
Results
Averaged pupil traces from both auditory and visual versions of this experiment are
compared in Figure 5.2. Under both auditory and visual presentation, changes in
pupil diameter followed the same qualitative pattern observed by Kahneman and
Beattys auditory study: participants pupils gradually dilated as the digits were
memorized, reached a peak two seconds after the final digit, during the pause while
the sequence was retained in memory, then gradually contracted as the participants
reported the digits back. I observed a faster post-retention constriction than Kah-
neman and Beatty [57], probably because he used paced recall, and my participants
typed their response into the on screen keyboard, usually faster than the one digit
per second rate used by Kahneman and Beatty.
Dilation magnitude by presentation mode Aural presentation caused signif-

icantly larger pupil dilations during the retention pause than visual presentation
(M = 0.44 mm, SD = 0.22 mm vs. M = 0.24 mm, SD = 0.17 mm; F (1, 20) = 5.9,
p = .02).
Aural Visual
sequence length sequence length end of response

0.6 8 digits (35 trials) 0.6 8 digits (41 trials) sequence prompt
7 digits (16 trials) 7 digits (64 trials)
6 digits (15 trials) 6 digits (75 trials)
5 digits (122 trials)
8 4 digits (63 trials)
3 digits (73 trials)

0.4 0.4
7
7
6
5 6 6 8
5
5 7
0.2 0.2 7
4 4
3
6
3 6
2 6
3 4 5 5
2 5 5
1 1 4
0.0 1 0.0 1 1 1
4 1
3
1 1 3 4 4 4
3
2
3 3
2
3
2
2 2 2 2
end of response
sequence prompt
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12
Time (seconds) Time (seconds)
Figure 5.2: Pupil dilation evoked by a digit-span memory task presented aurally (left)
and visually (right). The two charts are aligned and plotted at the same vertical
scale. The numbered circles on each line show the times at which each digit was
spoken (auditory presentation) or displayed (visual presentation). The curves are
each shifted horizontally so they are aligned at end of the stimulus sequence. Thus the
longest sequence (8 digits) starts the furthest to the left. Aural presentation caused
larger dilations than visual, and under both presentation modes, longer memorization
sequences elicited larger pupil dilations.
Dilation magnitude by task difficulty I found a significant effect of sequence

length on the magnitude of pupil dilations during the retention pause (F (3, 60) = 3.73,
p = .02, = .96; see Figure 5.2). The magnitude of the dilation increased monotoni-
cally with the length of the memorized sequence.
Task performance Considering sequences of all lengths, participants made sig-

nificantly more recall errors under auditory (30%) than visual (24%) presentation;
2 (1, N = 1232) = 3.94, p = .02, though this result is reversed if only the longest
(length 7 and 8) sequences are considered. Average digit span was 6.0 digits for au-
ditory presentation and 5.6 digits for visual. Error rates for all tasks are shown in
Figure 5.8.
Discussion
This experiment compared cognitive load under short-term memorization of aurally

and visually presented digit sequences. As with the mental arithmetic task, the
qualitative shape of average pupil dilations was similar in both presentation modes,
but the magnitude of dilations was smaller under visual presentation.
Although visual presentation led to significantly greater overall performance, the
difference was not large, and rates of recall for the longer sequences and average
digit span scores suggest a small performance advantage for auditory presentation. A
general advantage to serial recall under auditory presentation, especially for items late
in the sequence, is well documented [92; 35, p. 22; but see 8]. My findings on recall
performance are mixed, but the larger dilations I observed in the auditory condition
suggest that task performance in this mode comes with the cost of higher cognitive
load.
5.3.4 Mental multiplication

I similarly repeated my study of the standard mental multiplication task (see sec-
tion 4.2), again replacing spoken numbers with numbers displayed on the screen. I
ran 431 repetitions of this task with 12 experimental participants. As with the digit
span task, I ran the study with a visual condition with timing matched to the auditory
study, but I also added a second timing variant.
In the sequential treatment, which replicates Ahern and Beatty [1], the multipli-
cand and multiplier were presented one after the other with timing matched to the
auditory study. In the simultaneous treatment, both numbers were shown on the
screen together for the full eight seconds between the pre-stimulus accommodation
period and the response prompt. This simultaneous and continuous presentation was
intended to remove the requirement that subjects quickly read and remember the
short-lived stimuli and thereby isolate the cognitive load imposed by mental multi-
plication from that caused by remembering the numbers.
As in the auditory study, I instructed participants not to provide a response
in cases when they forgot one of the two numbers or gave up on computing their
product. This occurred in 10% (65/632) of the trials, mostly for hard problems.
Since these trials didnt involve mental multiplication, I excluded them from analysis.
Nine participants had memorized the multiplication table through 1212 and the rest
through 10 10, so all but a few of the easiest problems required mental computation
beyond simple recall.
Results
Dilation magnitude by presentation mode Presentation mode affected the

overall magnitude of pupil dilations but not their qualitative shape. The onset timing,
duration and overall shape of pupil dilations caused by mental multiplication was the
same for both auditory and visual presentation. The size of participants dilations,
however, was significantly larger in the auditory condition (M = 0.35 mm, SD = 0.11
mm vs. M = 0.16 mm, SD = 0.13 mm; F (1, 22) = 12.1, p = .002). This difference in
magnitude is clear in Figure 5.3, which shows the pupil dilation evoked by the mental
multiplication task, averaged across all trials and participants and broken down by
task presentation mode.
Dilation shape and magnitude by visual presentation timing The pupil

dilation evoked by problems with both components visible simultaneously for eight
0.5
multiplicand multiplier response
presented presented prompted
0.4
0.3
0.2
0.1
0.0 aural (37 trials)

visual (165 trials)
0 2 4 6 8 10
Time (seconds)
Figure 5.3: Average pupil dilation evoked by visually and aurally presented mental
multiplication problems. The two presentation modes elicited dilations with similar
timing, duration, and shape, but different magnitude. Vertical lines show the times
during which the two numbers were spoken or displayed and the time during which
the participants responded. In this and other figures, the shaded region enclosing
each curve shows the standard errors of the mean for the average pupil diameter
represented by that curve.
Sequential (165 trials) Simultaneous (365 trials)

0.30 multiplicand multiplier response 0.30 multiplier and multiplicand response
presented presented prompted presented prompted
0.25 0.25

0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 2 4 6 8 10 0 2 4 6 8 10
Figure 5.4: Comparison of pupil dilations evoked by sequentially and simultaneously

presented mental multiplication problems. The left panel shows the average dilation
in trials where the multiplier and multiplicand were shown briefly and one after the
other; these show the same data as the blue/dashed (visual) curve in Figure 5.3. The
right panel shows the average dilation in trials where the two numbers were shown
together and continuously for eight seconds.
seconds had a different pattern, shown in comparison to the visual sequential treat-
ment in Figure 5.4: a single long dilation and contraction, rather than the two peaks I
observed in the sequential case. In addition, the mean pupil dilation was smaller in the
simultaneous case (M = 0.13 mm, SD = 0.11 mm vs. M = 0.30 mm, SD = 0.13 mm;
F (1, 22) = 10.3, p = .004). This result is not surprising, because the simultaneous-
presentation trials lack a second stimulus event to cause a second peak, and these
trials were easier to solve, because they did not require participants to remember the
two presented numbers.
Dilation magnitude by task difficulty Consistent with prior investigations of

mental arithmetic, I found a clear difficulty effect on dilation magnitude. (See Fig-
ure 5.5). Easy multiplication problems caused the smallest pupil dilations (M = 0.17
mm, SD = 0.19 mm), hard problems the largest (M = 0.27 mm, SD = 0.16 mm),
with dilations to medium problems in between (M = 0.21 mm, SD = 0.15 mm).
multiplier and multiplicand response

presented prompted
0.3
0.2
0.1
0.0
HARD (83 trials)
MEDIUM (153 trials)
EASY (129 trials)
-0.1
0 2 4 6 8 10
Time (seconds)
Figure 5.5: Difficulty effect on pupil dilation evoked by mental multiplication of two
numbers displayed together for eight seconds. The data shown are the same as those
in the right panel of Figure 5.4, here separated by difficulty.
These differences were significant (F (2, 30) = 13.1, p = .0008, = .67).
Task performance by presentation mode Participants made significantly more

errors on aurally presented problems (40%) than visually presented problems (25%);
2 (1, N = 632) = 3.39, p = .03. Error rates for all tasks are shown in Figure 5.8.
Discussion
This experiment compared cognitive load under auditory and visual presentation
of mental arithmetic problems. The overall pattern of task-evoked pupil dilations
was similar in both conditions and replicated previous auditory work. Intriguingly,
both the better performance under visual presentation and greater cognitive load
under auditory presentation suggest an advantage for visual presentation of mental
arithmetic. This may be because post-stimulus visual persistence alleviates some load
on working memory.
5.3.5 Vigilance
I also repeated my pupillometric study of a counting sequence vigilance task (sec-

tion 4.3), again replacing auditory stimuli with visual while controlling the brightness
and contrast of the visual field. I ran 231 repetitions of this task with 17 experimental
participants.
Results
Dilation magnitude by presentation mode Figure 5.6 shows the average dila-
tion evoked by the vigilance task, comparing aurally- and visually-presented trials.
Both conditions elicited strong dilation peaks beginning about one second before and
peaking 5001000 ms after each moment when participants were alert for mistakes
in the counting sequence. The one-second anticipatory dilation is consistent with
measurements of the readiness potential made using scalp electrodes by Becker et al.
[14], who found evidence of motor preparation beginning a bit more than one second
before action, and is shorter than the 1.5 second lead observed by Richer et al. [100]
before the presentation of an action-determining stimulus.
For significance testing, I used a wide response window, starting three seconds
before each moment when a target could occur and ending three seconds after, en-
compassing both the pre-stimulus anticipatory dilation and the post-stimulus motor-
response peak. The mean dilation in the auditory presentation condition (M = 0.096
mm, SD = 0.048 mm) was significantly larger than for visual presentation (M = 0.057
mm, SD = 0.046 mm); F (1, 23) = 7.93, p = .01.
0.4
aural (78 trials)
visual (180 trials)
0.3
0.2
0.1
0.0
-0.1 possible possible possible

target target target
0 5 10 15 20
Time (seconds)
Figure 5.6: Average pupil dilations evoked by a vigilance task presented aurally and
visually. The vertical grey bars show the moments at which participants were vigilant
for mistakes in a counting sequence (targets). The aurally presented task led to
larger dilations, but the two presentation modes elicited dilation profiles with similar
shape and timing.
Dilation onset and peak latency by presentation mode In contrast to the

digit span and mental multiplication studies, the three task repetitions in each of
this studys trials effectively tripled the number of trials available for analysis and
so provided enough data to pinpoint the peak dilation precisely in time and revealed
a minor timing difference between the dilations for auditory and visual vigilance.
Whether the target was present or absent, the dilation began and peaked slightly
later under auditory presentation (see Figure 5.7). This slightly later dilation evoked
by auditory stimulus was probably due to the time taken for the stimulus to be
presented, because hearing is generally believed to have lower latency than vision
[120, 80]. This interpretation is consistent with the difference in mean reaction time I
observed: 410 ms (SD = 111 ms) for visual presentation and 713 ms (SD = 140 ms)
for auditory.
Dilation magnitude and timing by target presence At every potential mis-

take point, whether or not a target is present, this task required heightened vigilance,
motor response preparation, and comparison of the presented number with the ex-
pected correct sequence number. I therefore expected dilations in both cases to be
similar, perhaps with slightly larger or longer dilations in cases where targets actually
appeared, caused by error recognition and/or the additional requirement of carrying
out the motor response. I checked this hypothesis by grouping all time segments
surrounding moments when the targets were present and averaging them separately
from those when the targets were absent. The resultant pupil dilation averages are
shown in Figure 5.7.
Pupil dilations evoked by targets were larger and longer than those measured
during moments when targets were possible but did not appear (M = 0.10 mm,
SD = 0.046 mm vs. M = 0.037 mm, SD = 0.047 mm; F (1, 23) = 22.8, p .0001).
The averaged pupil diameter trace for cases with a target (right side of Figure 5.7)
showed a secondary peak about 1.5 seconds after the target appeared. Because mean
response time was 515 ms (SD = 188 ms), the latency between response and this
secondary peak was about one second. Because Richer and Beatty [99] observed
similar dilation-response latencies in a non-reactive button pushing task, and because
Target Absent Target Present

0.4 0.4
aural (101 trials) aural (133 trials)
visual (285 trials) visual (255 trials)
0.3 0.3

0.2 0.2
0.1 0.1
0.0 0.0
-0.1 moment when target -0.1 moment when target

might be presented might be presented
-4 -2 0 2 -4 -2 0 2
Figure 5.7: Target effect on pupil dilations evoked by heightened vigilance. The data
shown are the same as those in Figure 5.6. Each trial had three moments at which
I told participants to expect possible targets, which occurred independently at each
moment with probability one half. The chart on the left shows the mean dilation
in moments in which a target did not occur, and the chart on the right shows the
mean dilation in moments when a target did occur. Targets elicited longer and larger
pupil dilations, with a secondary peak about 1.5 seconds after target presentation.
This secondary peak corresponds to the motor activity of responding to the targets
presence. Whether a target was present or absent, dilations were larger in the auditory
condition, and the peak dilation under auditory presentation occurred about half a
second later than with visual presentation.
this secondary peak was only present when motor response was required, I interpreted
the secondary peak as an artifact of that motor response.
The interaction of stimulus mode and target presence was not significant (F (1, 23) = 0.351,
p = .6). The larger dilations evoked by auditory task presentation persist whether a
target is present or absent (see Figure 5.7).
Task performance Participants made more errors in the counting vigilance task
when it was presented aurally (8.5%) than visually (6.1%), but this difference was
not significant: 2 (1, N = 774) = 7.80, p = .14. Error rates for all tasks are shown
in Figure 5.8.
Discussion
This experiment compared the cognitive load under aurally and visually presented
intermittent vigilance tasks. As with the other two tasks I studied, the two presen-
tation modes elicited pupil dilations with very similar timing and overall shape, and
although I did not observe a significant performance difference, visual presentation
caused lower cognitive load.
In addition to the presentation mode effect, I also observed that the presence of
targets was associated with larger pupil dilations. This difference is consistent with
the additional cognitive demand of pushing the button in cases when the target is
present.
5.3.6 Discussion
Summary of experiments
In my first experiment, participants memorized sequences of digits either spoken

aloud or displayed on a computer screen. My second experiment examined mental
multiplication, again presented both aurally and visually, and my third experiment
considered a speeded-reaction vigilance task which did not rely heavily on working
memory. In all tasks, I controlled the stimulus timing between the two modes, as well
60%
aural
visual
50%
40%
Error Rate
30%
20%
10%
0%
Mental Multiplication Sequence Memory Vigilance
Figure 5.8: Error rates for all three tasks. Whiskers on error rate bars show 95% 2
confidence intervals. The differences in error rate on the mental multiplication and
sequence memory tasks are significant (p = .033 and p = .0026, respectively, under
one-tailed tests for equality of proportions with Yates continuity correction). The
difference in error rates for the vigilance task was not significant (p = .14).
as controlling all aspects of the visual fieldbrightness, contrast, and participant

fixationin order to minimize non-cognitive pupillary reactions.
Summary of findings
I found that the pupil dilations evoked by all three tasks were qualitatively similar
under auditory and visual presentation, but that auditory presentation led to larger
pupillary dilations.
Qualitative match In all three of my experiments, I observed that pupil dilations

in both modes had about the same onset timing, duration, and overall shape (See Fig-
ures 5.3, 5.2, and 5.6). Additionally, in the two tasks which replicated classic pupillary
response studies, mental multiplication [44] and digit span [57], I also found a qual-
itative match between the dilations I observed and the auditory-only classic results.
Both of these qualitative correspondencesvisual to auditory in my experiments and
visual to classic auditory findingssuggest that the pupil dilations I observed to
visually-presented tasks reflect the cognitive demands of the tasks and were generally
free of distortion caused by non-cognitive pupillary reactions to brightness or contrast
changes.
Quantitative difference In all three of my experiments, I observed significantly

larger pupillary dilations when I presented tasks aurally than when I presented them
visually. The differences were 0.19 mm (0.35 mm vs. 0.16 mm) for mental multipli-
cation, 0.18 mm (0.43 mm vs. 0.25 mm) for digit span memory, and 0.08 mm (0.23
mm vs. 0.15 mm) for vigilance.
Implications
Because I was careful to control for non-cognitive pupillary responses caused by

brightness, contrast, etc., and because of my finding of a qualitative match in dilation
trajectories between conditions, I believe that the difference in magnitude between
the two conditions was a result of differences in cognitive load. I therefore interpret
this result as evidence that visual task presentation leads to lower cognitive load than
auditory presentation across all three of the tasks I studied.
This finding contradicted my hypothesis that similar task demands would lead to
similar magnitude dilations in the two cases, perhaps with an initially smaller dilation
under visual presentation caused by the lesser difficulty of seeing vs. hearing numbers.
Instead, I found that auditory task presentation led to larger pupil dilation not only
during initial stimulus comprehension but also throughout task completion.
Taken together with the better performance I observed in the visual conditions,
this finding indicates that visual presentation facilitates processing for all three tasks.
That is, comprehending and remembering numbers is easier when they are seen than
when they are heard.
Relation to prior digit span findings In the case of digit span, my finding of an
advantage for visual presentation seemed to contradict prior studies which found bet-
ter performance under auditory task presentation. Improved recall of heard numbers
relative to seen numbers is very well established [92; 35, p. 22; but see 8]. Indeed, in
my measurements of error rates, I found that although visual presentation led to sig-
nificantly greater overall performance, the difference was not large, and rates of recall
for the longer sequences and average digit span scores suggest a small performance
advantage for auditory presentation, as was found in the cited investigations.
This apparent contradiction between lower cognitive load under visual presen-
tation and superior recall of heard numbers can perhaps be resolved by drawing a
distinction between levels of effort and levels of performance [c.f. 88]. Although per-
formance was better for heard numbers, my pupillary data suggest that this greater
performance may have come with the cost of greater effort and cognitive load.
Relation to prior mental arithmetic findings Prior investigations of mental

arithmetic have not often addressed the effect of stimulus mode. In a study of the rela-
tive importance of different components of working memory in serial mental addition,
Logie et al. [70] observed that visual problem presentation led to better performance
and less degradation in the context of a variety of interfering tasks. My finding of
better performance in the visual case matches theirs. They concluded that the central
executive, the visuo-spatial store, and subvocal rehearsal are all involved in mental
arithmetic. Taken together with this data, my finding of lower cognitive load in the
visual case suggests that visual presentation facilitates mental arithmetic performance
by aiding the recruitment of all three of these components of working memory. This
possibility is supported by recent fMRI data collected by Fehr et al. [30], who found
that presentation mode can significantly impact which regional neuronal networks are
employed in the calculation process for mental arithmetic.
Conclusion
It is well known that visual presentation can lead to higher performance on com-
plicated tasks such as schema learning [24] and finding patterns in data [23]. Such
advantages are typically attributed to the benefits of a persistent external represen-
tation that reduces load on working memory. My finding of a visual advantage even
for simple tasks and even though we controlled presentation duration, displaying the
digits exactly as long as they took to speak, suggests that something besides visual
persistence underlies this visual advantage.
One account for superior performance under visual rather than auditory presenta-
tion rests on the role of dual codes in working memory [e.g. 6]. Visual presentation is
likely to encourage dual coding of the stimuli [89]. Extensive research has shown that
having two mental representations for something, notably, both visual and verbal, is
better for memory than having one. If one internal representation is lost or corrupted,
the other can compensate. People tend to spontaneously name visual stimuli but they
do not spontaneously generate visual images to verbal stimuli, so that visual presen-
tation is more likely to generate two codes than verbal presentation. The existence
of two codes could facilitate information processing in addition to augmenting mem-
ory. Mental operations like arithmetic are regarded as performed by the articulatory
loop. If memory for the stimuli is retained in the visuospatial sketchpad, then the
articulatory loop, relieved of memory load, has more capacity for information pro-
cessing. These findings, if replicated and extended, have broad-ranging implications
for education as well as interface design.
Alternatively, it is possible that the greater effort required by aural presentation

is due only to differences in the difficulty of perception and not because of any subse-
quent processing differences, such as visual persistence or differential recruitment of
working memory components. Future work could resolve this question by adjusting
stimulus discriminability to equalize perception difficulty between the two modes and
then check to see whether the effort differences remain.
Further research to determine the true cause of mode-related differences in pupil
dilations will help to determine whether such dilations can fulfill Kahnemans second
criterion for an effort proxy, inter-task comparability, and thus be useful for compar-
isons of cognitive load between the auditory and visual domains.
Chapter 6
Combining gaze data with

pupillometry
Summary
I describe a new way of analyzing pupil measurements made in conjunction with eye
tracking: fixation-aligned pupillary response averaging, in which short windows of
continuous pupil measurements are selected based on patterns in eye tracking data,
temporally aligned, and averaged together. Such short pupil data epochs can be
selected based on fixations on a particular spot or a scan path. The windows of pupil
data thus selected are aligned by temporal translation and linear warping to place
corresponding parts of the gaze patterns at corresponding times and then averaged
together. This approach enables the measurement of quick changes in cognitive load
during visual tasks, in which task components occur at unpredictable times but are
identifiable via gaze data. I illustrate the method through example analyses of visual
search and map reading. I conclude with a discussion of the scope and limitations
of this new method. Most of the content of this chapter were published at the 2010
Symposium on Eye-Tracking Research & Applications [61].
65
66 CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY
6.1 The usefulness of gaze data

In preceding chapters, I have described how eye trackers, used as pupillometers, can
be used to measure instantaneous cognitive load. But the primary purpose of these
machines has always been to measure gaze direction. The datum of where somebody
is looking is extremely rich. It can be exploited for gaze-based interfaces, which are
especially useful to disabled people. It is an almost perfect proxy for attention, so it
has been applied broadly to investigations of cognitive psychology, perception, and
psychophysics.
The fact that a single device simultaneously measures pupil and gaze opens the
possibility for rich experiments that investigate cognitive load in tandem with atten-
tion. There have been two main obstacles to such research:
Investigation of visual attention requires visual stimuli, and such stimuli can
cause pupillary reflexes that interfere with the measurement of cognitive pupil-
lary dilations.
Visual attention changes quickly and unpredictably. This precludes the time
alignment of experimental trials based on stimulus presentation, which is nec-
essary to study cognitive dilations on short timescales, as described in subsec-
tion A.4.6.
The first problem can be addressed through careful control of experimental stimuli,
as described in section 5.2. But even when this problem is solved, studies are still
limited to short, simple tasks, in which the cognition of interest occurs at a consistent
time soon after an experimenter-controlled stimulus.
In visual tasks such as map reading, chart reading, visual search, and scene com-
prehension, people shift their attention rapidly and unpredictably, with scan paths
being planned on the fly in response to what has been seen so far. It would be useful
to measure the dynamics of cognitive load during such tasks, and pupillary response
averaging provides good time resolution. But the unpredictability of visual problem
solving violates the requirement of signal averaging that the cognitive process being
studied happen with predictable timing, which is necessary for aligning the pupillary
6.2. FIXATION-ALIGNED PUPILLARY RESPONSE AVERAGING 67
responses from multiple trials.

This chapter describes a solution to the second obstacle, enabling the assessment
of cognitive load in such tasks: fixation-aligned pupillary response averaging, in which
eye fixations are used instead of stimulus or response events to temporally align win-
dows of pupil measurements before averaging. This method enables the detection
of quick changes in cognitive load in the midst of long, unstructured tasks, espe-
cially visual tasks where fixations on certain points or sequences of points are reliable
indicators of the timing of certain task components.
For example, there are certain subtasks that are usually required to read a bar
chart, but these subtasks occur at different times from trial to trial and from person
to person: e.g. reading the title and axis labels, judging the relative heights of bars,
estimating bar alignment with axis ticks, and looking up bar colors in the legend. If
we conduct many repetitions of such a chart reading task, changing the details, we can
later use scan path analysis to identify all times when somebody compared the height
of two bars. We can then align pupil signals from those epochs at the moments of
key fixations in the comparison, then average them to determine the average changes
in cognitive load during comparison of the height of two bars in a bar chart.
I describe the details of this new averaging method in section 6.2, then illustrate
its application in two example analyses of cognitive load: one of target discovery
in visual search (subsection 6.3.1) and one of references to the legend during map
reading (subsection 6.3.2). I conclude the chapter with a brief discussion of the
methods applicability and limitations.
6.2 Fixation-aligned pupillary response averaging

Fixation-aligned pupillary response averaging can be broken down into three steps:
1. the identification of subtask epochs, short spans of time in which the task com-
ponent occurs,
2. the temporal alignment of all such subtask epochs, and
3. averaging the aligned epochs.

6.2.1 Identifying subtask epochs using patterns in gaze data
I use the term epoch to refer to short windows of time in which a consistent task
component (and therefore a consistent pupillary response) occurs, as well as to the
pupil diameter measurements collected during that window of time. Epochs are
typically two to ten seconds long. An epoch is characterized by one or more gaze
events, experimenter-defined fixations or saccades.
Single fixations The simplest gaze event is fixation on a particular spot identified
by the experimenter. Epochs defined by single fixations encompass a brief window
of time a few seconds long and centered on the fixation. For example, in a visual
search task requiring discrimination between targets and distractors, each fixation
on a search item determines an epoch containing a visual discrimination subtask.
Fixations on targets determine epochs of target recognition (see Section 6.3.1). In a
flight simulation study, each fixation on the altimeter could define an epoch.
Scan paths Gaze events can also be sequences of fixations (scan paths) or saccades
from one location to another. For example, fixation on the axis of a bar chart before
looking at any of the bars indicates general orientation to the chart, while fixation
on the axis immediately after fixating the edge of one of the bars indicates an epoch
of axis reading. Comparison of two bars is signaled by several consecutive fixations
alternating between them. Epochs defined by scan paths are usually composed of
more than one gaze event. For example, in map reading, when somebody looks
a symbol in the map, then saccades to the legend to look up the meaning of the
symbol, then saccades back to the symbol, these three gaze events comprise a legend
reference epoch (see subsection 6.3.2).
In each of these cases, the experimenter defines a sequence of fixations that they
expect to reliably occur together with the task component or cognitive process under
investigation.
Other gaze data attributes
There are other attributes of gaze data that might be used to identify subtask epochs,
including fixation duration, fixation frequency, saccade velocity, and blink rate. This
dissertation only addresses the use of fixations and scan paths.
Identifying epochs from non-gaze timing signals
Although I do not explore it in these studies, it would also be possible to identify

subtask epochs using timing signals from other measurable events besides the gaze.
When a task can be modeled in advance and subtask boundaries detected through the
state of the interface, such subtask boundaries can serve as timing signals for pupil-
lometry [50], and could potentially be used for trial-averaged cognitive pupillometry if
they occur frequently enough without being contaminated by motor-evoked dilations
(see section 5.2 and section 7.1).
6.2.2 Aligning pupil data from selected epochs

After all the epochs containing the subtask of interest are identified, they need to be
aligned based on the timing of gaze events that make them up.
Temporal translation
For epochs defined by a single gaze event, like fixation on a particular spot, temporal
alignment simply requires translation of each epoch so that their gaze events coincide.
Formally, if Pi (t) is pupil diameter as a function of time during epoch i, and gi is the
time of epoch is gaze event, T [Pi (t)] = Pi (t + gi ) is the temporal translation of Pi (t)
which places its gaze event at t = 0. Such alignment is done for all epochs that will
be averaged. Alignment via temporal translation is illustrated in Figure 6.1.
Warping
Sometimes, epochs of interest are characterized by multiple gaze events. For example,
referencing the legend during a map reading task involves a saccade from the map
| underlying pattern
| trial 1
| | trial 2
| trial 3
| trial 4
Change in Pupil Diameter
mean
| epoch 1
| epoch 2a
| epoch 2b
| epoch 3
| epoch 4
| fixation-aligned mean
| plus-minus average
Time
Figure 6.1: Illustration of epoch alignment via temporal translation followed by av-
eraging. The top half of the figure shows four simulated trials with gaze events
(fixations) occurring at various times. The simulated pupil diameter data for these
trials is the sum of random walks (simulating typical background pupil motions) and
a dilation response occurring at a fixed delay following the fixation (illustrated at the
top of the figure in grey). Because the fixations in these four trials are not aligned,
neither are the pupillary responses, and averaging without translation fails to recover
the underlying pattern.
Epochs aligned by translation are shown in the bottom half of the figure. Because
these epochs are aligned on their gaze events, the pupillary responses are aligned too,
and averaging the five signals reveals the underlying pupillary response pattern. The
final line in the figure is the -average of the four aligned signals (see section A.4.6),
which shows the level of noise present in the mean above it.
In this example, the magnitude of the signal relative to the background pupil
noise is exaggerated; in real pupil dilation data, many dozens (sometimes hundreds)
of epochs must be averaged before the noise level in the average is low enough to
distinguish the pupillary response.
to the legend, a period of time spent on the legend, and a saccade back to the map.
Translation could align the map legend saccades in all these epochs, but because
people do not always dwell on the legend for the same amount of time, the returning
legend map saccades will not line up. If the cognition of interest occurs relative
to both points, then signal averaging will not reinforce it.
Porter et al. [97] faced a similar problem in their analysis of pupil data from tasks
of various lengths, in which they needed the start and end times of each task to align.
They solved it by setting a fixed task duration and then stretching or compressing
the data from each trial to fit in that window. A similar warping operation to the one
described in this section is used by Slaney et al. [109] in their method for morphing
one sound into another. They first decompose the sound into pitch and spectral
signals, then they align each of these one-dimensional components between the first
and second sounds using a piecewise linear warp similar to the one described in this
section in order to align corresponding parts of the two sounds before cross-fading.
In the context of task epochs defined by several gaze events, I applying a linear
time-stretching operation to the span of time between each pair of consecutive gaze
events. Formally, in an average of n epochs (indexed by i), each of which is defined
by m gaze events (indexed by j), the piecewise linear warping of Pi (t) is defined as
W [Pi (t)] =
t g1

Pi gi,1 + (gi,2 gi,1 ) g g for g1 t < g2
2 1
tg

Pi gi,2 + (gi,3 gi,2 ) g g2 for g2 t < g3
3 2
.. ..
. .
tg

Pi gi,n1 + (gi,n gi,n1 ) g gn1 for gn1 t gn ,
n n1
where gi,j is the time of the jth gaze event in the ith epoch, and g1 , g2 , . . . , gn are the
gaze event reference times, the mean times of occurrence for each gaze event across
all the epochs being aligned: gj = n1 ni=1 gi,j . Epoch alignment via piecewise linear
P
warping is illustrated in detail in Figure 6.2 and applied to averaging several signals
in Figure 6.3. This alignment technique is applied to the analysis of legend references
72
Change in Pupil Diameter CHAPTER 6. COMBINING GAZE DATA WITH PUPILLOMETRY
unwarped
warped
Time
Figure 6.2: Illustration of piecewise linear warping applied to a single epoch of pupil
diameter data defined by four gaze events. In the original unwarped pupil diameter
data, shapes mark the times at which the gaze events occurred. The epoch is divided
into segments at the gaze events, and each segment is linearly transformed in time
so that the gaze events that bound it are moved into their reference positions in
time. These reference positions are determined by averaging the time of occurrence
of each gaze event across all epochs (see section 6.2.2). Figure 6.3 shows this warping
operation applied to several epochs at once before averaging them to reveal pupillary
responses that occur with consistent timing with respect to the gaze events.
in subsection 6.3.2.
It is important to note that epoch warping is a selective focusing operation. When
pupillary responses take place with respect to more than one gaze event, it can reveal
them, but at the same time it will obscure any pupillary responses that do not follow
that pattern.
6.2.3 Averaging
Once epochs have been aligned, they can then be averaged using the standard trial
averaging procedures used in traditional cognitive pupillometry, described in subsec-
tion A.4.6. Epochs can be averaged using a simple mean: P (t) = n1 ni=1 B[T [Pi (t)]],
P
underlying pattern
trial 1
trial 2
trial 3
trial 4
Change in Pupil Diameter
mean
epoch 1 warped
epoch 2 warped
epoch 3 warped
epoch 4 warped
fixation-aligned mean
plus-minus average
Time
Figure 6.3: Illustration of epoch alignment via piecewise linear warping followed by
averaging. The top half of the figure shows four simulated trials, each with four gaze
events (fixations or saccades) occurring at various times. As in Figure 6.1, simulated
pupil diameter data are the sum of random walks and the indicated pupillary response
relative to the four gaze events.
The bottom half of the figure shows the result of aligning each epoch via piecewise
linear warping. The average of the aligned signals reveals the underlying pupillary
responses, because they occurred with consistent timing relative to the gaze events.
The final line in the figure is the -average of the four warped epochs (see
section A.4.6), which indicates the level of noise present in the mean above it. As
in Figure 6.1, the magnitude of the signal relative to the background pupil noise is
exaggerated.
1
Pn
or n i=1 B[W [Pi (t)]], depending on whether translation or warping is used for align-
ment. The -average can also be used in place of the standard average at this stage,
to evaluate the residual noise in the averaged signal (see section A.4.6).
6.3 Example applications
While eye trackers have been successfully used for stimulus-locked cognitive pupillom-
etry, it is not obvious that fixation-aligned signal averaging will work. It is possible
that the timing of cognitive processes is not consistent enough with respect to eye
movements or that the very act of looking around suppresses or interferes with the
task-related pupillary response. A successful application of fixation-aligned averaging
requires an averaged pupillary response which differs from its corresponding -average
and any relevant control conditions, and which is consistent with known patterns of
cognition for the studied subtask.
In the following two example analyses, I apply fixation-locked averaging to two
well-studied tasks, in order to illustrate its use and to demonstrate its validity. In
both tasks, I defined epochs using gaze events I expected to be strongly correlated
with shifts in cognitive load, in order to find out whether fixation-aligned averaging
revealed the expected shifts in excess of background pupillary noise.
6.3.1 Visual search
Visual search has been studied extensively with eye tracking [e.g. 31], and occasionally
with pupillometry [e.g. 97, 5], though signal averaging has only ever been applied with
respect to full task onset and completion. This section summarizes an application
of fixation-aligned pupillary response averaging to investigate shifts in cognitive load
that occur around the moments of search target discovery.
The averaging used in this example uses the single gaze events and translation
alignment described in Section 6.2.2 and illustrated in Figure 6.1.
6.3. EXAMPLE APPLICATIONS 75
11
16
12
15
19 17 13
18 14
Figure 6.4: A fragment of a search field used in my visual search study. Participants
searched for Ls (targets) in a field of Ts (distractors). Each character spanned about
0.73 . The scan path from one trial is shown in blue, with circle area proportional to
fixation duration and numbers giving the order of fixations. Fixation 17 is a target
fixation; all other fixations are non-target fixations.
Task description
I designed an exhaustive visual search task in which study participants counted the
number of Ls (targets) in a field of Ts (distractors) (See Figure 6.4). The search field
contained a variable number of targets, and each trial continued until the participant
found them all. Targets were often fixated more than once during a search, with later
fixations performed to confirm previously-discovered targets locations and avoid over-
counting.
Before the start of the task, a field of Xs was shown; when the search task started,
the Xs changed to Ts and Ls, so that task onset would not correspond to a change
in the brightness or contrast of the visual field, both of which could have caused pupil
reflexes.
Participants I recruited seventeen undergraduate participants, according to the

standard criteria described in section A.1.
Fixation identification I segmented scan paths into fixations using the dispersion
threshold technique (described by Widdel [121]; see also Salvucci and Goldberg [102]
for alternatives), with a minimum fixation duration of 160 ms, and a dispersion
threshold of 2 .
Consecutive sequences of fixations that all fell within 1.25 of targets were grouped
into dwells, within which the fixation that fell closest to the target was labeled as
a target fixation. Fixations located within the search field but at least 5 from any
target, and excluding the first five and last five fixations of the trial, were labeled
as control fixations, included in the analysis to check for any consistent pupillary
response to fixation itself. Both target fixations and control fixations were used as
gaze events for selecting pupil data epochs for averaging.
Results
Target fixations vs. control fixations Figure 6.5 shows the average pupillary re-
sponse to target and control (non-target) fixations, aligned to the start of the fixation,
and showing a few seconds of pre- and post-fixation context. For baseline subtraction,
I used a baseline interval 0.4 seconds (20 samples) long, starting 1.75 sec before the
fixations. I found a clear difference in pupillary responses to the two different types
of event. Fixations far from targets had no consistent pupillary response and so av-
eraged to an approximately flat line, while fixations on targets resulted in a dilation
of about 0.06 mm.
Surprisingly, the averaged dilation begins about one second before fixation on the
target. A further breakdown of the data by difficulty and fixation sequence shows the
cause:
Target discoveries vs. target revisits Figure 6.6 shows the same target fixations,
but grouped in two averages, one for all the first fixations on each target (discoveries)
and one for fixations on targets that have already been fixated during the search
Pupillary response to fixation on targets vs. non-targets
start of fixation
0.06
0.04
0.02
0.00
-0.02
far from targets (1511 fixations)
on targets (770 fixations)
-2 -1 0 1 2 3
Time (seconds)
Figure 6.5: Fixations on targets vs. fixations on non-targets. Each line in the chart
represents the average of many fixations of each type. The shaded regions surrounding
each line indicate the standard errors for those averages. Fixations far from targets
had no consistent pupillary response and so averaged to an approximately flat line,
while fixations on targets resulted in a dilation of about 0.06 mm.
(revisits). The dilation response to revisits begins at least one second before the
target fixation, perhaps reflecting recall of the previously identified target or saccade
planning in order to re-confirm its location.
Target discovery sequence Finally, Figure 6.7 shows the average pupillary re-
sponse to discovering the 1st, 2nd, and 3rd targets during the search. The magnitude
of the average dilation is larger in response to 3rd target discoveries than to 1st and
2nd discoveries. The first two discoveries are signs of task progress, but finding a
third targets means that exhaustive search is not needed and the task is completed.
Discussion
As a check on the validity of fixation-aligned pupillary response averaging, this case

study was a success. I observed averaged dilations well above the background noise
level and which differed substantially between fixations on targets and fixations on
non-targets. In addition, the time resolution in the averaged turned out to be fine
enough to suggest differences in memory dynamics surrounding target discoveries vs.
revisits.
A more thorough analysis of visual search would use more complicated attributes
of the scan path, like the fraction of search area that has been covered, to identify
additional subtasks, or explore how pupillary responses vary over the course of the
search. Additionally, the differences in timing and magnitude of pupillary reactions
could be analyzed between subjects, or with respect to task performance.
Pupillary response to first vs. later fixations on targets
start of fixation
0.08
0.06
0.04
0.02
0.00
-0.02
Target Discovery (449 fixations)
Target Revisit (321 fixations)
Control (fixations on non-targets) (1511 fixations)
-0.04
-2 -1 0 1 2 3
Time (seconds)
Figure 6.6: Average pupillary responses to first (discovery) fixations on targets vs.
later (revisit) fixations on targets. The dilation response begins about one second
before the fixation for revisits.
start of fixation
0.20
0.15
0.10
0.05
0.00
Discovery of 1st Target (359 fixations)

Discovery of 2nd Target (223 fixations)
0.05 Discovery of 3rd Target (89 fixations)
Control (fixations on nontargets) (2557 fixations)
2 1 0 1 2 3
Time (seconds)
Figure 6.7: Average pupillary responses to first fixations on targets (discoveries),

grouped by how many targets had previously been discovered. The magnitude of the
average dilation is larger in response to 3rd target discoveries than to 1st and 2nd
discoveries.
6.3.2 Map legend reference
The second example application of fixation-aligned pupillary response averaging uses

more complicated epochs, defined using multiple gaze events and aligned with warp-
ing.
Task description
In a study of map reading, participants examined a fictitious map showing the loca-
tions of frog and toad habitats. The map uses abstract symbols to show places where
frogs and toads live, with each symbol standing for a different species. The symbols
are identified in a legend providing the species name and classification as frog or toad
(Figure 6.8). In reading the map, participants must look up the abstract symbols in
the legend to learn which of them correspond to frogs and which correspond to toads.
It is these legend references which I analyze here.
Participants & apparatus This study included fifteen undergraduates, distinct

from those in the first study but subject to the same selection criteria. The eye tracker
was the same.
Identifying legend reference epochs
Epochs of pupil measurements encompassing legend references were identified using

scan paths. I defined a legend reference as fixation on a cluster of map symbols,
followed by a fixation on the legend, followed by a return saccade to that same cluster
of map symbols. An example epoch is shown in Figure 6.8. I used the saccades to
and from the legend as gaze events on which to align epochs via piecewise warping
before averaging (see section 6.2.2). Baseline intervals used for baseline pupil diameter
subtraction were 0.4 sec (20 samples) long, starting 1.5 sec before the saccade from
the map to the legend. I calculated fixations using dispersion threshold clustering as
described in section 6.3.1.
13
7 9
8
10
6 5
11
12
Figure 6.8: A fragment of a task map and corresponding legend. A sample scan
path is shown in blue, with circle area proportional to fixation duration and numbers
indicating the order of fixations. A legend reference epoch begins at the time of
fixation 7 and ends at the time of fixation 13. The gaze events used for alignment of
this epoch are the 8 9 and 12 13 saccades.
Results
I expected a momentary dilation preceding each legend reference, caused by the need
to store the symbol or symbols being looked up in visual working memory during the
saccade to the legend and the search there for the matching symbol(s). This pattern
did emerge in the averaged pupil response, along with other changes also correlated
with the legend reference saccades (Figure 6.9). On average, participants pupils
contracted while looking at the legend before recovering nearly to their pre-saccade
diameter. Although the changes were small (on the order of 0.05 mm), I collected
enough epochs (925 legend references) that these changes stood out above the noise
level indicated by the -average.
Discussion
Like the visual search study, this application of fixation-aligned pupillary response
averaging succeeded. I found a pupillary response evoked by the subtask of referencing
the map legend which substantially differed from its corresponding -average, though
in this case several hundred trials were required to reveal the response.
The changes in pupil diameter which I observed are intriguing; in addition to the
simple pre-reference dilation I expected, I also observed a pupil constriction during
the dwell on the legend. This pattern is consistent with the loading of visual working
memory with the symbol to be looked up, the release of that memory once the symbol
has been located in the legend, and a final increase in load as the participants running
count of frogs is updated depending on the symbols classification.
Unfortunately, the legend reference task is complicated enough that it is difficult
to associate the patterns in pupil dilations with specific cognitive activities like the
use of memory. Without more careful experiments that control the context of the
averaged gaze epochs, speculative interpretation of the sort given in the previous
paragraph is largely unjustified just-so-story telling. The difficulty of using a one-
dimensional physiological proxy that is affected by many kinds of mental activity to
understand complicated tasks is one of the basic limitations of cognitive pupillometry
(see section 7.1)
looking at looking at
map symbols looking at legend map symbols
0.02
0.00
-0.02
Mean Pupil Response

Plus-Minus Average
-0.04
-1.0 -0.5 0.0 0 0.5 1.0

0% 50% 100%
Time (seconds Time (seconds after
before legend Time (fraction of legend dwell) return to map)
saccade)
Figure 6.9: Average pupillary response to 925 legend references in a map reading
task. Black circles indicate reference gaze event times. The semi-transparent regions
bounding each curve show the standard errors of the mean at each time for the
plotted average. The average dwell on the legend lasted about 1.2 seconds. The
average pupillary response includes a 0.02 mm dilation prior to saccade to the legend,
a 0.06 mm constriction and recovery while looking at the legend, and a return to the
map at a slightly higher pupil diameter.
6.4. CONCLUSIONS 85
6.4 Conclusions
This chapter described fixation-aligned pupillary response averaging, a new method
for combining synchronized measurements of gaze direction and pupil size in order to
assess short-term changes in cognitive load during unstructured visual tasks. Com-
ponents of the visual tasks with consistent demands but variable timing are located
by analyzing scan paths. Pupil measurements made during many instances of each
task component can then be aligned in time with respect to fixations and averaged,
revealing any consistent pupillary response to that task component.
This new mode of analysis expands the scope of tasks that can be studied using
cognitive pupillometry. With existing stimulus-locked averaging methods, only shifts
in cognitive load that occur relative to experimenter-controlled stimuli are measurable,
but with fixation-aligned averaging, pupillary responses can also be used to study any
shifts in cognitive load that occur consistently with respect to patterns of attention
detectable in gaze direction data.
In the example study of visual search described in subsection 6.3.1, the timing
differences in pupillary responses to target discoveries and revisits, which show the
recall of previously-visited targets, are only detectable through fixation-aligned av-
eraging. Similarly, the shifts in cognitive load surrounding subject-initiated legend
references described in subsection 6.3.2 could only be detected by determining the
timing of those legend references using gaze direction data and then using that tim-
ing information to align and average pupil diameter measurements.
There are many other tasks that could be studied using this method. Reading,
for example, which has been studied using eye tracking [98] and pupillometry [53]
separately, has many gaze-signaled cognitive events, such as back-tracking, which
could be studied with fixation aligned averaging of pupil measurements.
Chapter 7
Unsolved problems
7.1 Current limitations
7.1.1 Simple, short tasks
Cognitive pupillometry can be employed when tasks can be reliably constrained by

experimental design, or, in the case of fixation-aligned averaging, when epochs can
be reliably identified from gaze direction data and when changes in cognitive load
during these epochs occur with consistent timing relative to the gaze events that
define the epochs. In practice, this limits both the specificity and duration of tasks
that can be studied. For specificity, either the general task must be constrained
enough or the gaze events defined specifically enough that cognitive processes are
consistent across epochs. For example, in the map-reading task (subsection 6.3.2), I
only considered legend references which began and ended at the same point in the
map, to avoid including fixations on the legend that were simple orientation to the
display, or otherwise did not involve looking up a particular symbol.
Even when epochs containing consistent cognitive processes are identified, the
requirement that the timing of those processes be consistent with respect to the
fixations and saccades which define the epochs is generally only satisfied for short
epochs, in practice usually 25 seconds long.
87
88 CHAPTER 7. UNSOLVED PROBLEMS
7.1.2 Restrictions on task display

Because pupils respond reflexively to the brightness [68] and contrast level [116] of
whatever is currently fixated, the task display must be designed with spatially uni-
form brightness and contrast. In addition, epochs occurring soon after changes to
the display can be contaminated by reflex dilations to motion cues [101] (see also
section 5.2).
In all the studies presented here, I used static, low-contrast stimuli with spatially-
uniform brightness (e.g. Figures 6.4 and 6.8). In addition, I left a large margin
between the edge of the stimulus and the boundary of the display, because when
subjects fixate near that boundary, their field of view includes the display bezel and
the wall behind, both of which are difficult to match in brightness to the stimulus
itself.
7.1.3 Restrictions on interaction

Any user interaction, such as mouse or keyboard use, needs to be well separated
in time from the subtask epochs studied, because the preparation and execution of
motor actions also causes momentary pupil dilations [99]. All studies described in this
dissertation were designed without any interaction during the task. Analysis must
generally exclude data gathered that occurred within two seconds of the button-
pushes that participants use to initiate or complete each trial.
7.2 Future research
7.2.1 Disentangling various pupillary influences

The same principle that inspires research in cognitive pupillometry also sets limits its
use in the study internal cognitive processes. Recall Bumkes 1991 observation [21],
first quoted at the start of this dissertation:
Every active intellectual process, every psychical effort, every exertion

of attention, every active mental image, regardless of content, particularly
7.2. FUTURE RESEARCH 89
every affect just as truly produces pupil enlargement as does every sensory
stimulus.
In section 6.2.2, I described a technique for warping pupil data before averaging
which enables high-precision measurements of pupillary responses to subtasks even
when those subtasks occur with variable timing. I showed how this technique could
be used to analyze references to legends in a map reading task (see subsection 6.3.2).
However, as the scope of cognitive pupillometry is pushed toward subtasks of
greater complexity, it becomes more and more difficult to call the pupillary responses
thus measured cognitive load. In tasks like digit sequence memorization (sec-
tion 4.1), the cognitive process of solving the task is well understood. We know
that short term verbal memory is being used to store the digits, so it is easy to call
the matching pattern of pupil dilations observed during that task a proxy for load on
working memory.
But when referencing the legend, there are many possible cognitive processes that
could be contributing to the pupillary response, including holding the symbol being
looked up in short term visual or verbal memory, searching the legend key for the
desired symbol, maintaining a count of the number of frog symbols found so far,
and finding ones way to and from the legend itself. All of these processes have the
potential to cause pupillary responses
More experiments can be done to help disentangle the effects of these different
task components, by varying the task demands to eliminate some of them. Even so,
as cognitive pupillometry is pushed toward more complex tasks, this entanglement
of all the different sources of pupil motions will limit its applicability. To go further,
it will be necessary to combine pupillometry with other kinds of measurements that
can isolate types of mental effort.
7.2.2 Combining pupillometry with other psychophysiologi-

cal measurements
Some researchers have combined pupillometry with eye movement parameters like fix-
ation frequency [e.g. 5] or with EEG [74] to make composite measurements of mental
effort. Any technology that measures the spatial distribution of brain activity, like
EEG [37] or fMRI [45], has the potential to help disentangle the different influences
on a monolithic quantity like cognitive load.
7.2.3 Modeling and compensating for the pupillary light re-

flex
In all my experiments, I carefully controlled the visual field of participants in order
to minimize interference from the pupils sensitivity to variations in brightness. Light
reflexes are so large and neurologically dominant that they have the potential to
overwhelm task-induced pupil dilations. In future experiments, I plan to do even more
to control these reflexes, by matching the luminance of the screen to the luminance
of the wall behind it, and by covering the bezel of the eye tracker with masking tape
or some other light-colored mask, to match it with the back wall as well and remove
variations in the luminance of the visual field caused by looking around.
However, for cognitive pupillometry to be applied to more realistic visualization
tasks, it will be necessary to lift these limitations on the visual field. One avenue for
doing so may be to model the pupillary light reflex, use eye tracking to estimate the
luminance of the visual field, and then use the model to estimate the contribution of
the light reflex to the pupils size. A few researchers have have attempted to adjust
pupil diameter data to compensate for the overall luminance of stimuli [94, 83], but
these approaches only model constant luminance, so are not yet applicable to trial-
averaged cognitive pupillometry.
The neurology of the pupillary light reflex is very well understood [68], and pupil
size is determined by two simply-shaped opposing muscles. I believe that this neuro-
physical system is simple enough to be successfully modeled, so that the light reflex
caused by any given visual field could be predicted. I do not know whether the joint
influence of the light reflex and cognitive pupillary responses could be modeled. This
is an important area of research for extending the scope of cognitive pupillometry.
The recent success of Palinko et al. [90] in measuring cognitive load via remote
pupillometry in a driving simulator are encouraging in this regard, because they show
7.2. FUTURE RESEARCH 91
that task-evoked dilations are measurable even in a complicated visual environment

with frequent shifts in gaze, at least when the task being studied is presented aurally
with timing not correlated to the subjects locus of attention.
7.2.4 Expanding proof-of-concept studies

The contributions of this dissertation are mainly methodological. The experiments
described in chapter 4, section 5.3, and section 6.3 were designed to validate the
methods rather than explore the cognitive psychology of the tasks. They demonstrate
that cognitive pupillometry with a remote camera eye tracker can be used to study
cognitive load in a variety of tasks, but they do not tell us much new about how
people think about and complete those tasks. The few conclusions I was able to draw,
regarding differences caused by aural vs. visual presentation of tasks (section 5.3.6),
are very exciting, and it will be fascinating to explore them further, along with other
applications of this new measurement method.
Appendix A
Experimental Methods
The following details of my experimental methods apply to all the studies described
in this dissertation.
A.1 Participants
Participants in all my studies were college students recruited from the Computer
Science and Communications departments at Stanford University. All were in a pool
of students in introductory HCI, design and communications classes who were required
to participate in experiments on campus for course credit.
Besides awarding this course credit, I also compensated participants with Ama-
zon.com gift certificates. The value of each participants gift certificate depended
on his or her task performance and varied from about $15 for the lowest scores to
about $35 for the highest. Such monetary incentive was shown by Heitz et al. [41] to
increase the magnitude of task-evoked pupillary dilations.
I screened all participants for normal or corrected-to-normal vision. I excluded
participants with contact lenses or eyeglasses providing an astigmatism correction or
a refractive correction greater than ten diopters, because such corrective lenses can
interfere with accurate pupil diameter tracking.
93
94 APPENDIX A. EXPERIMENTAL METHODS
A.2 Apparatus
I used a Tobii 1750 remote eye tracker [114]. This device is designed primarily to
track peoples gaze direction, but its method of gaze tracking also enables high-speed
pupillometry [63]. The eye tracker is based on a standard LCD computer display,
with infrared lights and a high resolution infrared camera mounted at the edges of
the screen. This remote-camera setup requires neither a chin rest nor a head-mounted
camera, enabling pupil measurements without encumbrance or distraction. Measure-
ments are corrected for changes in apparent pupil size due to head motion toward or
away from the camera. Accurate pupil tracking with this equipment requires a head
motion speed of less than 10 cm/sec within a head box of about 30 15 20 cm at
our initial seating distance of 60 cm from screen.
Under infrared illumination, participants pupils appear as bright ovals in the

eye trackers camera image. The Tobii 1750 measures the size of a participants
pupil by fitting an ellipse to the pupil image then converting the width of the major
axis of that ellipse from pixels to millimeters based on the measured distance from
the camera to the pupil. Due to inaccuracy in this measurement of camerapupil
distance, measurements of absolute pupil size may have errors of up to 5%, but
sample-to-sample changes in pupil diameter are much more accurate [113]. This
better accuracy for relative measures makes eye trackers well suited for cognitive
pupillometry, where the measurement of interest is usually changes in pupil diameter
relative to their diameter at the end of an accommodation period preceding each trial
[13]. This measure has been found to be independent of baseline pupil diameter and
commensurate across multiple labs and experimental procedures [12, 20, 19].
The Tobii 1750 samples pupil size at 50 Hz with each sample measuring both eyes
simultaneously. For gaze direction, the Tobii 1750 has a resolution of 0.25 and an
accuracy of 0.5 .
A.3. PHYSICAL SETUP 95
Figure A.1: Physical arrangement of equipment, experimenter, and participant used

for all studies.
A.3 Physical setup
I placed the eye tracker on a desk with the top of the screen approximately 140 cm
from the floor. Participants sat in a chair adjusted so that their eyes were at this
same height. Participants initiated trials and gave task responses using a two-button
computer mouse on the desk between them and the eye tracker. The physical setup
is shown in Figure A.1.
A.3.1 Room illumination
The size of the pupil is controlled by the relative tone of two opposing smooth muscles
in the iris: the parasympathetically innervated, stronger sphincter pupillae and and
the sympathetically innervated, weaker dilator pupillae. The task-evoked pupillary
response involves tone changes in both muscles [68]. First, parasympathetic inhibition
caused by cortical activity or motor response preparation causes the sphincter pupillae
to relax, and then sympathetic excitation causes the dilator pupillae to contract,
further expanding the pupil [111, pp. 197200]. Because the sphincter muscles are
more active in bright surroundings, the first of these effects is larger under brighter
ambient lighting. Thus, the level of ambient lighting can affect dilation onset latency
and latency to peak, but not the total change in pupil diameter as measured in
millimeters. [112].
In order to get the most accurate measurements of the timing of changes in pupil
size, I used relatively bright ambient illumination. I blacked out all windows and
used standard overhead diffused fluorescent lighting, leading to 27 cd/m2 of lumi-
nance from the surrounding walls at eye level and 32 lx incident at participants eyes.
Because bright-environment pupillometry is more common than dark, this choice also
facilitated comparison to other studies.
Procedure
Before each task, I explained the task to participants then allowed them to practice
until they were familiar and comfortable with the task presentation and providing
their responses.
All trials were initiated by participants, who first fixated a small target at the
center of the screen before starting the trial by clicking a mouse button. Participants
gaze thus remained at the center of the screen for the duration of each trial and during
most of the short intervals between trials. A run of trials for a single task generally
took about five minutes. I told participants that they could take breaks at any point
between trials to rest their eyes; two did so.
A.4. DATA PROCESSING 97
A.4 Data processing

Because the left and right eyes exhibit matching pupillary responses, I used the aver-
age of the two eyes pupil diameters to reduce measurement noise. During moments
when an eyelid, eyelash, or eyeglasses frame blocked the cameras view of one pupil,
I used the other pupil alone. I performed standard baseline subtraction in each trial
based on the average pupil diameter measured over 20 samples (400 ms) at the end
of a pre-stimulus accommodation period. After filling blinks via linear interpolation,
I smoothed the raw pupil signals with a 10 Hz low-pass digital filter. The effect of
these data cleaning steps is illustrated in Figure A.2.
A.4.1 Smoothing
The pupil measurements made by eye trackers are rather noisy, and remote eye track-
ers are especially bad, because the freedom of head motion poses two additional prob-
lems. First, unless the eye-trackers camera is actively pointed at the eyes, it must
maintain a wider field of view, in which fewer pixels can be devoted to observing the
pupil. Second, pupil measurements must be corrected for foreshortening by dividing
the raw pixel-based pupil size by the distance from the camera to each eye. Drift,
tremors, and non-spherical eye shape introduce noise into this distance measure [28],
which in turn causes a noisy pupil size signal.
Because this instrument noise is high-frequency, and pupils are known to dilate
and constrict at low frequencies [77], I smoothed the pupil size signal with a low-
pass filter. To determine the appropriate cutoff frequency for this filter, I analyzed
the correlation between the pupil size signal from the left and right eyes at different
frequencies. Since the instrument noise is independent for the two eyes, I expected
the noisy frequency components of the pupil signal to be uncorrelated. In contrast,
the frequency components containing the true pupil size signal should be correlated,
because they are driven by general cognitive activation, which affects both eyes [13].
To find the boundary between these two parts of the signal, I applied a band-
pass filter with a bandwidth of 0.5 Hz and varying central frequencies to the pupil
size signal of each eye separately to isolate their individual frequency components
Raw Data (Left Eye)

4.4 Raw Data (Right Eye)
Cleaned Data (Left Eye)
Cleaned Data (Right Eye)
4.2
Pupil Diameter (mm)
4.0
3.8
3.6
13.5 14.0 14.5 15.0 15.5
Time (seconds)
Figure A.2: Illustration of data cleaning steps applied to a two-second period of pupil
measurements gathered during a single trial. The gap in left eye data at t = 13.7
has been filled in with scaled right-eye data. The blink at t = 14.2 has been filled by
linear interpolation, and the outlier near its end has been removed. Data from both
eyes are smoothed with a 10Hz low-pass filter.
1
Correlation Coefficient
0.8 Correlation between pupil size

of the left and right eyes
95% Confidence interval for
0.6 each correlation coefficient
0.4
0.2
0
0 5 10 15 20 25
frequency component (Hz)
Figure A.3: Correlation between the measured pupil size of the left and right eyes,
by frequency component. Pupil size frequency components above about 10 Hz are
uncorrelated. I therefore considered this part of the pupil signal to be noise and
removed it with a low-pass filter.
and computed the correlation between the left and right eyes at different frequencies
(Figure A.3).
A.4.2 Perspective distortion
Pomplun and Sunkara [95] reported a systematic dependence of pupil size on gaze
direction. I replicated the ascending numeral visual search task they used to check for
this bias but did not find it in our pupil measurements. I believe this is because the
system used by Pomplun and Sunkara measured pupil size as the number of pixels
encompassed by the pupil image, and optical perspective causes this size to vary with
gaze direction. The Tobii 1750 I used instead measures pupil size as the length of
the major axis of an ellipse fitted to the pupil image. This method is not affected by
perspective distortion, though it is still subject to small errors caused by non-circular
pupil shape. I recommend either using the ellipse-fitting method or calibrating for
the bias as per Pomplun and Sunkara.
A.4.3 Data processing for statistical evaluation of differences

in dilation magnitude
I quantified dilation magnitudes with the mean amplitude method [37, p. 38; 13, p.
148]. This method involves first measuring a baseline pupil size for each trial by
averaging pupil size during a pre-stimulus accommodation period, then computing
the average pupil size relative to this baseline during a response window defined for
each task. I chose the mean dilation quantification method over the also-common
peak dilation, because the latter is more sensitive to noise. I quantified each trial
separately, enabling statistical evaluation of effect size and significance.
A.4.4 Significance tests

I used an alpha level of .05 for all statistical tests. Tests of differences in mean dilation
magnitude were all based on partitions of variance (ANOVA). Following the policy
of Jennings [52], I applied the Huynh-Feldt [46] correction to degrees of freedom for
within-subjects factors with more than two levels. In such cases, I report the Huynh-
Feldt non-sphericity correction parameter , the uncorrected degrees of freedom, and
the corrected p-value. I evaluated the significance of differences in error rates through
one-tailed tests for equality of proportions with Yates continuity correction [78].
A.4.5 Baseline subtraction

In cognitive pupillometry, the physical quantity of interest is the change in pupil
diameter relative to its diameter shortly before the mental activity being studied
[13]. That is, what matters is dilation (or constriction), not absolute pupil size.
The magnitude of dilation responses to simple tasks is independent of baseline pupil
diameter and commensurate across multiple labs and experimental procedures [12,
20, 19].
This means that the pupil data we are averaging needs to be transformed from
absolute pupil diameter measurements to dilation measurements. This transformation
is accomplished by first determining the baseline pupil size for each epoch by averaging
the pupil diameter measurements during the epoch (or during a short window of time
at the start of the epoch or surrounding its gaze event), then subtracting that baseline
diameter from all pupil diameter measurements made in the epoch. Formally, if the
time interval chosen for the baseline extends from t = b1 to t = b2 , subtracting
the mean pupil diameter during that interval from the full signal gives B[Pi (t)] =
Pb2
Pi (t) b2t
b1 t=b1 Pi (t), where t is the sampling interval of the eye tracker.
This transformation from diameters to dilations has an important implication for

the precision of pupil measurements. For cognitive pupillometry applications, an eye
trackers accuracy in measuring changes in pupil diameter is much more important
than its accuracy in measuring absolute pupil size.
A.4.6 Averaging
1
Pn
Trials can be averaged using a simple mean: P (t) = n i=1 Pi (t). If the data are
messy, it may be better to use a trimmed mean or the median instead. The averaged
pupillary response P (t) is the main object of analysis in cognitive pupillometry.
Averaging epochs containing consistent pupillary responses preserves the pupillary
responses while decreasing the magnitude of the noise in which they are embedded.
Because the noise component of the signal is random with respect to the gaze events,
the magnitude of the noise average (its standard deviation) decreases in proportion
to the square root of the number of epochs included in the average. Cutting the noise
by a factor of two requires quadrupling the number of epochs. The actual number of
epochs required for a specific experiment depends on the level of measurement noise in
the pupillometer and the level of background pupil noise in study participants. In my
studies using a remote video eye tracker and tightly controlled visual field brightness
(see Section 7.1), I have found that it takes at least 50 epochs to see large pupillary
responses (0.20.5 mm) cleanly, and hundreds of epochs to reveal pupillary responses
smaller than 0.1 mm.
The -average
The purpose of averaging aligned pupil dilation data is to preserve the signal of interest
(the task-evoked pupillary response) while decreasing the power of signal components
not correlated in time with gaze events (the noise). However, the magnitude of the
pupillary response being investigated is usually not known a priori, so in practice it
is difficult to tell whether a feature of the averaged signal P (t) is noise or not.
This problem also arises in the analysis of averaged EEG data, where a procedure
called the -average is used to estimate magnitude of the noise by itself ([122], orig-
inally described as the -reference by Schimmel [105]). Instead of simply adding all
the epochs and dividing by n, the epochs are alternately added and subtracted from
the running total: P (t) = n1 ni=1 Pi (t)(1)i (only defined for even n). This way,
P
any time-correlated signal will be positive half the time and negative half the time
and thus cancel exactly to zero, while any other components of the average, which
were already as likely to be positive as negative, will be unchanged and approach
zero as a function of n as in the normal average. The average magnitude of P (t) is
usually a good estimate of the noise power in the standard average. If no pupillary
response stands out above this level, then either there is no pupillary response to see,
or more trials are required to drive the noise power even lower.
Bibliography
[1] S Ahern and J Beatty. Pupillary responses during information processing vary
with scholastic aptitude test scores. Science, 205(4412):12891292, September
1979. doi: 10.1126/science.472746.
[2] John L. Andreassi. Electrodermal Activity and Behavior, pages 259288. Rout-
ledge, 5 edition, 2006. ISBN 0805849513, 9780805849516.
[3] John L. Andreassi. Pupillary response and behavior. In Psychophysiology:

Human Behavior and Physological Response, chapter 12, pages 289307. Rout-
ledge, 5th edition, 2006. ISBN 0805849513, 9780805849516.
[4] M H Ashcraft. Cognitive arithmetic: a review of data and theory. Cognition,

44(1-2):75106, August 1992. ISSN 0010-0277. PMID: 1511587.
[5] Richard W. Backs and Larry C. Walrath. Eye movement and pupillary response
indices of mental workload during visual search of symbolic displays. Applied
Ergonomics, 23(4):243254, August 1992.
[6] Alan D. Baddeley. Working memory, thought, and action. Oxford University
Press, 2007. ISBN 0198528019.
[7] Dale L. Bailey, David W. Townsend, Peter E. Valk, and Michael N. Maisey.
Positron Emission Tomography: Basic Sciences. Springer, 1st edition, April
2005. ISBN 1852337982.
[8] C. P. Beaman. Inverting the modality effect in serial recall. The Quarterly
Journal of Experimental Psychology Section A, 55(2):371389, 2002.
103
104 BIBLIOGRAPHY
[9] J. Beatty and D. Kahneman. Pupillary changes in two memory tasks. Psycho-
nomic Science, 5(10):371372, 1966.
[10] J Beatty and BL Wagoner. Pupillometric signs of brain activation vary with
level of cognitive processing. Science, 199(4334):12161218, March 1978. doi:
10.1126/science.628837.
[11] Jackson Beatty. Phasic not tonic pupillary responses vary with auditory vig-
ilance performance. Psychophysiology, 19(2):167172, 1982. doi: 10.1111/j.
1469-8986.1982.tb02540.x.
[12] Jackson Beatty. Task-evoked pupillary responses, processing load, and the
structure of processing resources. Psychological Bulletin, 91(2):276292, 1982.
ISSN 0033-2909 (Print); 1939-1455 (Electronic). doi: doi:10.1037/0033-2909.
91.2.276.
[13] Jackson Beatty and Brennis Lucero-Wagoner. The pupillary system. In John T.
Cacioppo, Louis G. Tassinary, and Gary Berntson, editors, Handbook of Psy-
chophysiology, pages 142162. Cambridge University Press, 2nd edition, 2000.
ISBN 052162634X.
[14] W. Becker, K. Iwase, R. J

urgens, and H. H. Kornhuber. Brain potentials pre-
ceding slow and rapid hand movements. In W. C. McCallum and J. R. Knott,
editors, The Responsive Brain, pages 99102. Wright, Bristol, 1976.
[15] Frederic Boersma, Keri Wilton, Richard Barham, and Walter Muir. Effects
of arithmetic problem difficulty on pupillary dilation in normals and educable
retardates. Journal of Experimental Child Psychology, 9(2):142155, April 1970.
ISSN 0022-0965. doi: 10.1016/0022-0965(70)90079-2.
[16] A. Boxtel and M. Jessurun. Amplitude and bilateral coherency of facial and jaw-
elevator EMG activity as an index of effort during a two-choice serial reaction
task. Psychophysiology, 30(6):589604, 1993. doi: 10.1111/j.1469-8986.1993.
tb02085.x.
BIBLIOGRAPHY 105
[17] J. Bradshaw. Pupil size as a measure of arousal during information processing.

Nature, 216:515516, November 1967.
[18] J. L. Bradshaw. Pupil size and problem solving. QJ Exp Psychol, 20(2):11622,
1968.
[19] J. L. Bradshaw. Pupil size and drug state in a reaction time task. Psychonomic
Science, 18(2):112113, 1970.
[20] John L. Bradshaw. Background light intensity and the pupillary response in a
reaction time task. Psychonomic Science, 14(6):271272, 1969. ISSN 0033-3131
(Print).
[21] Oswald Bumke. translated by Eckhard Hess in The Tell Tale Eye, pp. 2324.
New York: Van Nostrand. 1975, 1911.
[22] Christopher H. Chatham, Michael J. Frank, and Yuko Munakata. Pupillometric

and behavioral markers of a developmental shift in the temporal dynamics of
cognitive control. Proceedings of the National Academy of Sciences, 106(14):
55295533, 2009. doi: 10.1073/pnas.0810002106.
[23] Chaomei Chen. Information Visualization. 2004. ISBN 1852337893,

9781852337896.
[24] James Clark and Allan Paivio. Dual coding theory and education. Educational
Psychology Review, 3(3):149210, 1991. doi: {10.1007/BF01320076}.
[25] Kenneth D. Cocker. Development of pupillary responses to grating stimuli. Oph-

thalmic and Physiological Optics, 16(1):6467, 1996. doi: 10.1046/j.1475-1313.
1996.9500016x.x.
[26] J. M. Dabbs and R. Milun. Pupil dilation when viewing strangers: Can testos-
terone moderate prejudice? Social Behavior and Personality, 27(3):297301,
1999.
106 BIBLIOGRAPHY
[27] J. DeLaunay. A note on the photo-pupil reflex. Journal of the Optical Society
of America, 39:364367, 1949.
[28] Andrew T. Duchowski. Eye Tracking Methodology: Theory and Practice.

Springer, 1 edition, 2003. ISBN 1852336668.
[29] Stephen H Fairclough and Kim Houston. A metabolic measure of mental effort.
Biological Psychology, 66(2):17790, April 2004. ISSN 0301-0511. doi: 10.1016/
j.biopsycho.2003.10.001. PMID: 15041139.
[30] Thorsten Fehr, Chris Code, and Manfred Herrmann. Auditory task presenta-
tion reveals predominantly right hemispheric fMRI activation patterns during
mental calculation. Neuroscience Letters, 431(1):3944, 2008. ISSN 0304-3940.
doi: 10.1016/j.neulet.2007.11.016.
[31] J.M. Findlay and Larry R. Squire. Saccades and visual search. In Encyclopedia
of Neuroscience, pages 429436. Academic Press, Oxford, 2009. ISBN 978-0-
08-045046-9.
[32] R M Gardner, J S Beltramo, and R Krinsky. Pupillary changes during encoding,

storage, and retrieval of information. Perceptual and Motor Skills, 41(3):9515,
December 1975. ISSN 0031-5125. PMID: 1215138.
[33] B C Goldwater. Psychological significance of pupillary movements. Psycholog-

ical Bulletin, 77(5):34055, May 1972. ISSN 0033-2909. PMID: 5021049.
[34] Eric Granholm, Robert F. Asarnow, Andrew J. Sarkin, and Karen L. Dykes.
Pupillary responses index cognitive resource limitations. Psychophysiology, 33
(4):457461, 1996. doi: 10.1111/j.1469-8986.1996.tb01071.x.
[35] Robert Leo Greene. Human memory. Lawrence Erlbaum Associates, 1992.
ISBN 080580997X, 9780805809978.
[36] Gad Hakerem and Samuel Sutton. Pupillary response at visual threshold. Na-
ture, 212(5061):485486, October 1966. doi: 10.1038/212485a0.
BIBLIOGRAPHY 107
[37] Todd C. Handy. Event-related Potentials: A Methods Handbook. The MIT

Press, 1 edition, October 2004. ISBN 0262083337.
[38] T C Hankins and G F Wilson. A comparison of heart rate, eye activity, EEG
and subjective measures of pilot mental workload during flight. Aviation, Space,
and Environmental Medicine, 69(4):3607, April 1998. ISSN 0095-6562. PMID:
9561283.
[39] Dan Witzner Hansen and Arthur E.C. Pece. Eye tracking in the wild. Computer
Vision and Image Understanding, 98(1):155 181, 2005. ISSN 1077-3142. doi:
10.1016/j.cviu.2004.07.013. Special Issue on Eye Detection and Tracking.
[40] W. Heinrich. Die aufmerksamkeit und die funktion der sinnesorgane. Zeitschrift
f
ur Psychologie und Physiologie der Sinnesorgane, 9:342388, 1896.
[41] Richard P. Heitz, Josef C. Schrock, Tabitha W. Payne, and Randall W. Engle.
Effects of incentive on working memory capacity: Behavioral and pupillometric
data. Psychophysiology, 45(1):119129, 2008. doi: 10.1111/j.1469-8986.2007.
00605.x.
[42] Eckhard H. Hess. Pupillometrics. In N. S. Greenfield and R. A. Sternbach, edi-

tors, Handbook of Psychophysiology, pages 491531. Holt, Rinehart and Winston
(New York), 1972.
[43] Eckhard H. Hess and James M. Polt. Pupil size as related to interest value of
visual stimuli. Science, 132(3423):349350, August 1960. doi: 10.1126/science.
132.3423.349.
[44] Eckhard H. Hess and James M. Polt. Pupil size in relation to mental activity
during simple Problem-Solving. Science, 143(3611):11901192, March 1964.
doi: 10.1126/science.143.3611.1190.
[45] Scott A Huettel, Allen W Song, and Gregory McCarthy. Functional Mag-
netic Resonance Imaging. Sinauer Associates;;Palgrave, Sunderland Mass. ;Bas-
ingstoke, 2004. ISBN 9780878932887.
108 BIBLIOGRAPHY
[46] H. Huynh and L. S Feldt. Performance of traditional f tests in repeated measures

designs under covariance heterogeneity. Communications in Statistics-Theory
and Methods, 9(1):6174, 1980.
[47] Jukka Hyona, Jorma Tommola, and Anna-Mari Alaja. Pupil dilation as
a measure of processing load in simultaneous interpretation and other lan-
guage tasks. The Quarterly Journal of Experimental Psychology Section A:
Human Experimental Psychology, 48(3):598, 1995. ISSN 0272-4987. doi:
10.1080/14640749508401407.
[48] Cristina Iani, Daniel Gopher, and Peretz Lavie. Effects of task difficulty and
invested mental effort on peripheral vasoconstriction. Psychophysiology, 41(5):
789798, 2004. doi: 10.1111/j.1469-8986.2004.00200.x.
[49] Interactive Minds. Binocular eyegaze analysis system. http://www.

interactive-minds.com/en/eye-tracker/eyegaze-analysis-system,
March 2010.
[50] Shamsi T. Iqbal, Xianjun Sam Zheng, and Brian P. Bailey. Task-evoked pupil-
lary response to mental workload in human-computer interaction. In CHI 04
extended abstracts on Human factors in computing systems, pages 14771480,
Vienna, Austria, 2004. ACM. ISBN 1-58113-703-6.
[51] Shamsi T. Iqbal, Piotr D. Adamczyk, Xianjun Sam Zheng, and Brian P. Bailey.
Towards an index of opportunity: understanding changes in mental workload
during task execution. In Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 311320, Portland, Oregon, USA, 2005.
ACM. ISBN 1-58113-998-5.
[52] J. Richard Jennings. Editorial policy on analyses of variance with repeated

measures. Psychophysiology, 24(4):474475, 1987. doi: 10.1111/j.1469-8986.
1987.tb00320.x.
[53] Marcel A. Just and Patricia A. Carpenter. The intensity dimension of thought:
BIBLIOGRAPHY 109
Pupillometric indices of sentence processing. Canadian Journal of Experimen-

tal Psychology/Revue canadienne de psychologie exprimentale, 47(2):310339,
1993. ISSN 1196-1961 (Print). doi: 10.1037/h0078820.
[54] D. Kahneman and J. Beatty. Pupillary responses in a pitch-discrimination task.

Perception & Psychophysics, 2:101105, 1967.
[55] D Kahneman, L Onuska, and R E Wolman. Effects of grouping on the pupillary

response in a short-term memory task. The Quarterly Journal of Experimental
Psychology, 20(3):30911, August 1968. ISSN 0033-555X. PMID: 5683772.
[56] Daniel Kahneman. Attention and Effort. Prentice Hall, September 1973. ISBN
0130505188.
[57] Daniel Kahneman and Jackson Beatty. Pupil diameter and load on memory.
Science, 154(3756):15831585, December 1966. doi: 10.1126/science.154.3756.
1583.
[58] Daniel Kahneman and Patricia Wright. Changes of pupil size and rehearsal
strategies in a short-term memory task. The Quarterly Journal of Experimental
Psychology, 23(2):187, 1971. ISSN 1747-0218. doi: 10.1080/14640747108400239.
[59] Daniel Kahneman, Jackson Beatty, and Irwin Pollack. Perceptual deficit during
a mental task. Science, 157(3785):218219, July 1967. doi: 10.1126/science.157.
3785.218.
[60] R Kardon. Pupillary light reflex. Current Opinion in Ophthalmology, 6(6):206,

December 1995. ISSN 1040-8738. PMID: 10160414.
[61] Jeff Klingner. Fixation-aligned pupillary response averaging. In ETRA 10:

Proceedings of the 2010 Symposium on Eye-Tracking Research and Applications,
pages 275282, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-994-7. doi:
10.1145/1743666.1743732.
[62] Jeff Klingner. The pupillometric precision of a remote video eye tracker. In
ETRA 10: Proceedings of the 2010 Symposium on Eye-Tracking Research and
110 BIBLIOGRAPHY
Applications, pages 259262, New York, NY, USA, 2010. ACM. ISBN 978-1-
60558-994-7. doi: 10.1145/1743666.1743727.
[63] Jeff Klingner, Rakshit Kumar, and Pat Hanrahan. Measuring the task-evoked
pupillary response with a remote eye tracker. In Proceedings of the 2008 sympo-
sium on Eye tracking research and applications, pages 6972, Savannah, Geor-
gia, 2008. ACM. ISBN 978-1-59593-982-1. doi: 10.1145/1344471.1344489.
[64] Jeff Klingner, Barbara Tversky, and Pat Hanrahan. Effects of visual and ver-
bal presentation on cognitive load in vigilance, memory and arithmetic tasks.
Psychophysiology, TBD:TBD, 2010.
[65] Michael Kohn and Manfred Clynes. Color dynamics of the pupil. Annals of
the New York Academy of Sciences, 156(Rein Control or Unidirectional Rate
Sensitivity a Fundamental Dynamic and Organizing Function in Biology):931
950, 1969. doi: 10.1111/j.1749-6632.1969.tb14024.x.
[66] D. J. Lehr and B. O. Bergum. Note on pupillary adaptation. Perceptual and

Motor Skills, 23:917918, 1966.
[67] William L. Libby, Beatrice C. Lacey, and John I. Lacey. Pupillary and cardiac
activity during visual attention. Psychophysiology, 10(3):270294, 1973. doi:
10.1111/j.1469-8986.1973.tb00526.x.
[68] Irene Loewenfeld. The Pupil: Anatomy, Physiology, and Clinical Applications,
volume 1. Butterworth-Heinemann, Oxford, UK, 2nd edition, 1999. ISBN 0-
7506-7143-2.
[69] A. D. Loewy. Autonomic control of the eye, page 268285. Oxford University
Press, New York, A.D. loewy & k. m. spyer edition, 1990.
[70] R H Logie, K J Gilhooly, and V Wynn. Counting on working memory in

arithmetic problem solving. Memory & Cognition, 22(4):395410, July 1994.
ISSN 0090-502X. PMID: 7934946.
BIBLIOGRAPHY 111
[71] O. Lowenstein and Irene Loewenfeld. The sleepwaking cycle and pupillary
activity. Annals of the New York Academy of Sciences, 117:142156, 1964.
[72] O. Lowenstein and Irene E. Loewenfeld. Disintegration of central autonomic reg-

ulation during fatigue and its reintegration by psychosensory controlling mecha-
nisms: I. disintegration. pupillographic studies. Journal of Nervous and Mental
Disease, 115:121, 1952.
[73] Mangold International. Mangoldvision eye tracker product page. http:

//www.mangold-international.com/products/eye-tracker-solutions/
stationary.html, March 2010.
[74] Sandra P. Marshall, C. W. Pleydell-Pearce, and B. T. Dickson. Integrating

psychophysiological measures of cognitive workload and eye movements to de-
tect strategy shifts. In Proceedings of the 36th Annual Hawaii International
Conference on System Sciences (HICSS03) - Track 5 - Volume 5, page 130.2.
IEEE Computer Society, 2003. ISBN 0-7695-1874-5.
[75] S.P. Marshall. The index of cognitive activity: measuring cognitive workload.
In Human Factors and Power Plants, 2002. Proceedings of the 2002 IEEE 7th
Conference on, pages 7579, 2002.
[76] James G. May, Robert S. Kennedy, Mary C. Williams, William P. Dunlap, and
Julie R. Brannan. Eye movement indices of mental workload. Acta Psychologica,
75(1):7589, October 1990.
[77] JW McLaren, JC Erie, and RF Brubaker. Computerized analysis of pupil-

lograms in studies of alertness. Invest. Ophthalmol. Vis. Sci., 33(3):671676,
March 1992.
[78] O Miettinen and M Nurminen. Comparative analysis of two rates. Statistics in

Medicine, 4(2):213226, June 1985. ISSN 0277-6715. PMID: 4023479.
[79] G. A. Miller. The magic number seven, plus or minus two. Psychological Review,
63:8197, 1965.
112 BIBLIOGRAPHY
[80] Karl E. Misulis and Toufic Fakhoury. Spehlmanns Evoked Potential Primer.
Butterworth-Heinemann, 3rd edition, May 2001. ISBN 0750673338.
[81] Kevin P. Moloney, Julie A. Jacko, Brani Vidakovic, Francois Sainfort, V. Kath-
lene Leonard, and Bin Shi. Leveraging data complexity: Pupillary behavior
of older adults with visual impairment during HCI. ACM Transactions on
Computer-Human Interaction, 13(3):376402, 2006.
[82] Sofie Moresi, Jos J. Adam, Jons Rijcken, Pascal W.M. Van Gerven, Harm
Kuipers, and Jelle Jolles. Pupil dilation in response preparation. International
Journal of Psychophysiology, 67(2):124130, February 2008. ISSN 0167-8760.
doi: 10.1016/j.ijpsycho.2007.10.011.
[83] M. Nakayama, I. Yasuike, and Y. Shimizu. Pupil size changing by pattern

brightness and pattern contents. The Journal of the Institute of Television
Engineers of Japan, 44:288293, 1990.
[84] Minoru Nakayama and Yasutaka Shimizu. Frequency analysis of task evoked
pupillary response and eye-movement. In Proceedings of the 2004 symposium
on eye tracking research & applications, pages 7176, San Antonio, Texas, 2004.
ACM. ISBN 1-58113-825-3. doi: 10.1145/968363.968381.
[85] Neuroptics, Inc. Instruction manual, VIP-200 pupillometer, revision A, 2008.
[86] Neuroptics, Inc. personal communication, August 2009.
[87] Takehiko Ohno and Naoki Mukawa. A free-head, simple calibration, gaze track-
ing system that enables gaze-based interaction. In ETRA 04: Proceedings of the
2004 symposium on Eye tracking research & applications, pages 115122, New
York, NY, USA, 2004. ACM. ISBN 1-58113-825-3. doi: 10.1145/968363.968387.
[88] Fred Paas and Jeroen Van Merrienboer. Instructional control of cognitive load
in the training of complex cognitive tasks. Educational Psychology Review, 6
(4):351371, December 1994. doi: 10.1007/BF02213420.
BIBLIOGRAPHY 113
[89] Allan Paivio. Mental Representations: A Dual Coding Approach. Oxford Uni-
versity Press US, 1990. ISBN 0195066669, 9780195066661.
[90] Oskar Palinko, Andrew L. Kun, Alexander Shyrokov, and Peter Heeman. Esti-
mating cognitive load using remote eye tracking in a driving simulator. In ETRA
10: Proceedings of the 2010 Symposium on Eye-Tracking Research & Ap-
plications, pages 141144, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-
994-7. doi: 10.1145/1743666.1743701.
[91] W. Scott Peavler. Pupil size, information overload, and performance differences.
Psychophysiology, 11(5):559566, 1974. doi: 10.1111/j.1469-8986.1974.tb01114.
x.
[92] C. G. Penney. Modality effects and the structure of short-term verbal memory.
Memory & Cognition, 17(4):398422, 1989.
[93] Polhemus, Inc. Visiontrak product web page. http://www.polhemus.com/

?page=Eye_VisionTrak, March 2010.
[94] M. Pomplun, S. Sunkara, A. V. Fairley, and M. Xiao. Using pupil size

as a measure of cognitive workload in Video-Based Eye-Tracking studies.
unreviewed manuscript available at http://www.cs.umb.edu/~marc/pubs/
pomplun_sunkara_fairley_xiao_draft.pdf, 2009.
[95] Marc Pomplun and Sindhura Sunkara. Pupil dilation as an indicator of cognitive
workload in human-computer interaction. In Proceedings of the International
Conference on HCI, 2003.
[96] G Porter, T Troscianko, and I D Gilchrist. Pupil size as a measure of task

difficulty in vision. In Perception 31 ECVP Abstract Supplement, 2002.
[97] Gillian Porter, Tom Troscianko, and Iain D. Gilchrist. Effort during vi-
sual search and counting: Insights from pupillometry. The Quarterly Jour-
nal of Experimental Psychology, 60(2):211, 2007. ISSN 1747-0218. doi:
10.1080/17470210600673818.
114 BIBLIOGRAPHY
[98] K Rayner. Eye movements in reading and information processing: 20 years of re-
search. Psychological bulletin, 124(3):372422, November 1998. ISSN 00332909.
PMID: 9849112.
[99] Francois Richer and Jackson Beatty. Pupillary dilations in movement prepa-
ration and execution. Psychophysiology, 22(2):204207, 1985. doi: 10.1111/j.
1469-8986.1985.tb01587.x.
[100] Francois Richer, Clifford Silverman, and Jackson Beatty. Response selection and
initiation in speeded reactions: A pupillometric analysis. Journal of Experimen-
tal Psychology: Human Perception and Performance, 9(3):360370, 1983. ISSN
0096-1523 (Print); 1939-1277 (Electronic). doi: 10.1037/0096-1523.9.3.360.
[101] Arash Sahraie and John L. Barbur. Pupil response triggered by the onset of co-
herent motion. Graefes Archive for Clinical and Experimental Ophthalmology,
235(8):494500, 1997. doi: 10.1007/BF00947006.
[102] Dario D. Salvucci and Joseph H. Goldberg. Identifying fixations and saccades in
eye-tracking protocols. In Proceedings of the 2000 Symposium on Eye Tracking
Research & Applications, pages 7178, New York, NY, 2000. ACM. ISBN 1-
58113-280-8. doi: 10.1145/355017.355028.
[103] Hal Scher, John J. Furedy, and Ronald J. Heslegrave. Phasic T-Wave ampli-
tude and heart rate changes as indices of mental effort and task incentive. Psy-
chophysiology, 21(3):326333, 1984. doi: 10.1111/j.1469-8986.1984.tb02942.x.
[104] J. M. Schiff and F. Foa. La pupille consideree comme ethesiom`etre (translated

by r.g. de choisity). Marseille Medical, 2:736741, 1874.
[105] Herbert Schimmel. The () reference: Accuracy of estimated mean components

in average response studies. Science, 157(3784):9294, July 1967. doi: 10.1126/
science.157.3784.92.
[106] Kathrin B. Schlemmer, Franziska Kulke, Lars Kuchinke, and Elke Van Der
Meer. Absolute pitch and pupillary response: Effects of timbre and key color.
Psychophysiology, 42(4):465472, 2005. doi: 10.1111/j.1469-8986.2005.00306.x.
BIBLIOGRAPHY 115
[107] Andrew Scholey, Philippa Jackson, and David Kennedy. Mental effort, blood
glucose and performance. Appetite, 47(2):277, September 2006. ISSN 0195-6663.
doi: 10.1016/j.appet.2006.07.066.
[108] SensoMotoric Instruments. iview x hi speed product page. http:

//www.smivision.com/en/eye-gaze-tracking-systems/products/
iview-x-hi-speed.html, March 2010.
[109] Malcolm Slaney, Michelle Covell, and Bud Lassiter. Automatic audio morphing.
In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Con-
ference Proceedings., 1996 IEEE International Conference - Volume 02, pages
10011004. IEEE Computer Society, 1996. ISBN 0-7803-3192-3.
[110] Benjamin Smith. 023/365: Eye see you! Flickr, under a Creative Commons
Attribution-Noncommercial-Share Alike 2.0 Generic license, July 2008. URL
http://www.flickr.com/photos/dotbenjamin/2636942186/.
[111] Stuart R. Steinhaer. Pupillary responses, cognitive psychophysiology, and psy-

chopathology, 2002.
[112] Stuart R. Steinhauer, Greg J. Siegle, Ruth Condray, and Misha Pless. Sympa-
thetic and parasympathetic innervation of pupillary dilation during sustained
processing. International Journal of Psychophysiology, 52(1):7786, March
2004. ISSN 0167-8760. doi: 10.1016/j.ijpsycho.2003.12.005.
[113] Tobii Technologies, Inc. personal communication, 2007.
[114] Tobii Technologies, Inc. Tobii 1750, 2007. URL http://www.tobii.com.
[115] Warren W. Tryon. Pupillometry: A survey of sources of variation. Psychophys-

iology, 12(1):9093, 1975.
[116] Kazuhiko Ukai. Spatial pattern as a stimulus to the pupillary system. Journal
of the Optical Society of America A, 2(7):10941100, July 1985. doi: {10.1364/
JOSAA.2.001094}.
116 BIBLIOGRAPHY
[117] Karl F Van Orden, Wendy Limbert, Scott Makeig, and Tzyy-Ping Jung. Eye
activity correlates of workload during a visuospatial memory task. Human
Factors: The Journal of the Human Factors and Ergonomics Society, 43(1):
111121, 2001.
[118] Steven P. Verney, Eric Granholm, and Daphne P. Dionisio. Pupillary responses
and processing resources on the visual backward masking task. Psychophysiol-
ogy, 38(1):7683, 2001. ISSN 0048-5772 (Print); 1469-8986 (Electronic). doi:
10.1017/S0048577201990195.
[119] Steven P. Verney, Eric Granholm, Sandra P. Marshall, Vanessa L. Malcarne, and
Dennis P. Saccuzzo. Culture-Fair cognitive ability assessment: Information pro-
cessing and psychophysiological approaches. Assessment, 12(3):303319, 2005.
ISSN 1073-1911.
[120] W. T. Welford. Reaction Times. Academic Pr, November 1980. ISBN

0127428801.
[121] H. Widdel. Operational problems in analysing eye movements. In A. G. Gale

and F. Johnson, editors, Theoretical and Applied Aspects of Eye Movement
Research, pages 2129. Elsevier, New York, 1984.
[122] P K Wong and R G Bickford. Brain stem auditory evoked potentials: the use of
noise estimate. Electroencephalography and Clinical Neurophysiology, 50(1-2):
2534, 1980. ISSN 0013-4694. PMID: 6159189.

(B) (Klingner, 2010) Measuring Cognitive Load During Visual Tasks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(B) (Klingner, 2010) Measuring Cognitive Load During Visual Tasks

Uploaded by

Copyright:

Available Formats

MEASURING COGNITIVE LOAD DURING VISUAL TASKS BY

COMBINING PUPILLOMETRY AND EYE TRACKING

Advisors and committee I am grateful to my advisors Pat Hanrahan and Bar-

3 Remote eye tracker performance 17

4 Replication of classic pupillometry results 31

5 From auditory to visual 41

1.1 Scan path on the Stanford parking map . . . . . . . . . . . . . . . . . 2

2.1 Tobii 1750 eye tracker and highlighted pupil image . . . . . . . . . . 11

3.1 Instruments used in the metrology study . . . . . . . . . . . . . . . . 20

4.1 Participant fields of view during auditory experiments. . . . . . . . . 33

5.1 Pupillary blink response for blinks of length 0.1 sec . . . . . . . . . . 46

6.1 Illustration of epoch alignment via temporal translation followed by

A.1 Arrangement of experimental equipment . . . . . . . . . . . . . . . . 95

Every active intellectual process, every psychical effort, every exertion

Oswald Bumke, 1911 [21]

1. Cognitive pupillometry requires a camera fixed to the head, which is inconve-

4. Pupil motions caused by motor activity confound task-evoked dilations, limiting

This dissertation expands the scope of cognitive pupillometry by addressing the

1. establish the viability of cognitive pupillometry using remote cameras, by mea-

2. extend the applicability of cognitive pupillometry in visual tasks, by repeat-

In chapter 2, I give the background context for these contributions, including

2.1 Cognitive load defined

possible to use a variety of physiological proxies.

2.2 Past cognitive pupillometry research

2.2.1 Cognitive psychology

2.2.2 Human-computer interaction

2.3 Cognitive pupillometry

2.3.1 Infrared video pupillometry

2.3.2 Cognitive pupillometry uses eye trackers

(a) Tobii 1750 (b) Pupil image

2.3.3 Types of video eye trackers used for pupillometry

Remote camera pupillometry

The alternative to configurations with a fixed camerapupil distance involves a re-

Figure 2.3: Two off-the-shelf remote eye tracking systems

2.3.4 Advantages of remote imaging

Some applications require remote imaging

Remote imaging is becoming ubiquitous and cheap

cognition, brightness, measurement

Figure 2.4: Sources of variation in measurements of pupil diameter.

2.3.5 Relative scales and the need for trial aggregation

The pupillometric precision of a

3.2 Study description

3.2.1 Evaluated instrument

3.2.2 Reference instrument

The accuracy reported in the manual is the maximum bound of the

If measurement errors are normally distributed and we make the conservative

(a) Tobii 1750 (b) Neuroptics VIP-200

Figure 3.1: The eye tracker and reference pupillometer

Figure 3.2: Metrology study arrangement. An investigator is measuring the partici-

3.3.1 Pupil diameter metrology

et pm = (et + et ) (pm + pm )

This is an equation of random variables. Considering the variance of each side:

2 [et pm ] = 2 [et pm + et pm ]

[et pm ] = [et pm + et pm ]

Substituting the mean and variance of the actually observed differences et pm

difference in measurements (mm)

pupillometer measurement (mm)

trackers pupil measurements depend on its estimate of the camera-pupil distance,

eye tracker accuracy (mm) eye tracker precision (mm)

3.3.2 Pupil dilation metrology

et pm = (et2 et1 ) (pm2 pm1 )

2 [et pm ] = 2 [(et pm ) + et2 et1 + pm1 pm2 ]

Replication of classic cognitive

et pm = (et + et ) (pm + pm )

2 [et pm ] = 2 [et pm + et pm ]

[et pm ] = [et pm + et pm ]

2 [et pm ] = 2 [(et pm ) + et2 et1 + pm1 pm2 ]

These differences were significant (F (2, 30) = 13.1, p = .0008, = .67).