You are on page 1of 30


Narrative Literature Review: Eye Tracking in Multimedia Learning Applications

Shannon Tucker

Towson University

Fall 2017

Narrative Literature Review: Eye Tracking in Multimedia Learning Applications

Designing effective multimedia instruction requires the careful attention to design of

learning objectives and the cognitive processes involved in knowledge acquisition, including the

integration of visual and auditory elements that optimize a learner’s working memory. Defined

as the presentation of words and pictures designed to promote learning, multimedia instruction

encompasses an unlimited range of formats designed for learning (e.g. online lectures, mobile

games, tutorials, learning management systems) with divergent approaches to development

(Mayer, 2005, pp. 1-24). The Cognitive Theory of Multimedia Learning (CTML) provides a

format agnostic framework linking the effective design of multimedia to cognitive processing.

Synthesizing prior research, CTML establishes three assumptions governing multimedia

learning: 1) information processing occurs in separate auditory and visual channels (Baddeley,

1986; Paivio, 1986), 2) each channel has a limited capacity within working memory (Baddeley,

1992; Chandler & Sweller, 1991), and 3) learning is an active process requiring attention to

relevant information that the learner can then organizes into a coherent model for integration

with existing knowledge in long-term memory (Mayer, 1999). Together, these assumptions

provide a theoretical framework for the development of practical design principles aimed at

universally improving the performance of multimedia instruction. By recognizing the

boundaries of learner cognitive information processing and working memory, designers can

reduce the extraneous cognitive load that makes the achievement of learning objectives more


Empirical research in multimedia learning leaves questions regarding cognitive

processing inherent to the active learning process and how design principles aimed at reducing

extraneous cognitive load work in practice across a range of learners in a variety of applications.

Since learning performance is of primary concern in multimedia instruction, research

methodology has largely focused on performance measures as a validation of design principles.

However, this creates an assumed cause-and-effect relationship where researchers presume

stimuli changes yield positive or negative results without direct insight into the attentional

selection of relevant information during learner-multimedia interaction. Without exposing this

process, researchers cannot define the practical boundaries of CTML informed design principles

across contexts and learner demographics (i.e. knowledge, age, neurological development, etc.)

for universal application in multimedia applications. Eye tracking provides a method for

researchers to bridge this gap by objectively measuring the cognitive processes associated with

learning attention through eye movement measures.

The Cognitive Theory of Multimedia Learning

By synthesizing prior

research in cognitive processing,

CTML (figure 1) provides

researchers and designers with a

theoretical framework that defines

the learning interaction between

learners and multimedia without

Figure 1: Cognitive Theory of Multimedia Learning (Mayer, 2003).
requiring lengthy discussion of the

individual contributing theories. Depicting the processing of words (visual and auditory) and

pictures in two channels through the memory system to integrate into schemas for prior

knowledge in long-term memory, Mayer provides a framework to discuss the design of effective

multimedia learning. By first discussing CTML’s three assumptions 1) dual channel assumption,

2) limited capacity assumption, and 3) active processing assumption a shorthand is developed for

theories of cognitive processing related to multimedia instruction and learning. Subsequent

discussion of multimedia design principles to reduce cognitive load and the triarchic cognitive

load theory provides insight on the cognitive limits of working memory and how multimedia

design can effectively reduce cognitive load. Lastly, perceptual load in sensory memory and its

effect on attention is used to discuss the relationship of intrinsic cognitive load on multimedia

design and distraction filtering.

CTML Assumptions

Coupled with Mayer’s schematic diagram detailing the cognitive processing of

multimedia presentations into sensory, working, and long-term memory, CTML’s assumptions

define the boundaries of effective multimedia learning.

Dual channel assumption. The dual channel assumption proposes that human

information processing is conducted in “an auditory/verbal channel and a visual/pictorial

channel” separately (figure 1) (Mayer, 2003, p.33). When presented with a multimedia

presentation, learners begin processing words (written or verbal) and pictures (images/video) in

sensory memory according to their sensory input (auditory or visual) (Mayer, 2003, p.43).

Limited capacity assumption. With links to Sweller and Chandler’s work on cognitive load

theory (1991), the limited capacity assumption states that each channel within the dual channel

assumption has a limited capacity for information that can be held in working memory (Mayer,

2003, p. 35). This cognitive limitation necessitates learners make decisions on what

“information to pay attention to” and to what degree information should be integrated into

existing knowledge systems (Mayer, 2003, p. 36).


Active processing assumption. When processing information humans use active

cognitive processes to build mental models of the presented material and assimilate this into their

existing knowledge structure (Mayer, 2003, p. 36). When presented with multimedia instruction,

a learner must actively use the cognitive processes of selecting (paying attention), organizing,

and integrating information in order to bring information into the “working memory component

of the cognitive system” (Mayer, 2003, pp. 36-37).

Design Principles to Reduce Cognitive Load

Moreno and Park’s triarchic model of cognitive

load theory (figure 2) illustrates the relationship between

instructional design and the capacity of a learner’s

working memory providing a framework for multimedia

principles to support learning with multimedia (Moreno

& Park, 2010). By associating the capacity of working

memory with design elements in instructional design,

Figure 2: Triarchic model of cognitive load
Moreno and Park provide a visual context to the theory (Moreno & Park, 2010, p.17)
describe the relationship between the instructional

design of multimedia and the capacity of working memory.

Since the intrinsic complexity of content, learning objectives, and learning tasks cannot

be reduced, the importance of limiting the total cognitive load through instructional and

multimedia design is critical to ensure the total cognitive load of an activity is within a learner’s

working memory capacity (Moreno and Park, 2010). Mayer’s multimedia design principles for

reducing extraneous processing, managing essential processing, and fostering generative

processing fit within the cognitive load theory’s extraneous load framework.

Principles for reducing extraneous processing. Supporting the limited capacity

assumption, Mayer proposes that reducing the appearance of material “that does not serve the

instructional goal” will eliminate extraneous cognitive processing (Mayer, 2009, p. 85). By

deleting “extraneous words, sounds or graphics” (coherence), “highlighting essential words or

graphics” (signaling), deleting “redundant captions” (redundancy), placing essential words in

close to graphics (spatial contiguity), and simultaneously presenting “corresponding words and

pictures” (temporal contiguity), the learning outcomes of multimedia instruction can be

improved (Mayer, 2009. p. 87).

Principles for managing essential processing. Limiting the cognitive processing needed

for essential material allows learners to engage in generative processing (the organization and

integrating of information into coherent knowledge structures) that supports learning transfer

(Mayer, 2009, pp.171, 221). By presenting lessons “in user-paced segments (segmenting),

introduce vocabulary and key concepts before the lesson (pre-training), and use graphics and

narration rather than printed subtitles (modality) the overall cognitive load of the lesson is

reduced (Mayer, 2009, p.171).

Principles for fostering generative processing in multimedia. To encourage learner

motivation to generatively process information, multimedia instruction must be engaging

(Mayer, 2009, p.221). Presenting words and pictures together (multimedia), using

conversational speech using a human narrator (personalization, voice) provide added

engagement without compromising other multimedia principles (Mayer, 2009, p.221)

Attention and Perceptual Load

While Mayer’s multimedia design principles provide a framework to optimize working

memory, there is no prescriptive implementation approach. Even as principle boundaries


continue to be defined by further research, designers are left to decide how to implement specific

features based on their content, goals, and learning outcomes. The varying complexity of

resulting multimedia is a significant consideration when considering the effectiveness of a

design. The relationship between sensory and working memory as seen in figure 1 shows the

perceptual integration of visual and auditory content before content enters working memory.

Like the capacity of working memory, learners have a perceptual load capacity that filters

attention. As perceptual load capacity is exceeded in complex tasks, learners become able to

selectively filter visual distractors (Lavie & Dalton, 2014, p.61). This focus is amplified when

working memory capacity is reached as learners are able to control internal distractions (e.g.

mind-wandering) to focus attention only on relevant stimuli (Lavie & Dalton, 2014, pp.59-60;

Lavie, Hirst, De Fockert, & Viding, 2004). When considering the effectiveness of multimedia

design principles, the relationship between content/task complexity and attention must be

considered in conjunction with learning performance. Eye tracking provides an opportunity to

expose the selective attention of a learner allowing researchers insight into the relationship

between attention and the cognitive processes in sensory memory and working memory.

Eye Tracking as a Research Methodology

Eye tracking methodology provides an objective measure of visual attention allowing the

correlation of visual attention with cognitive processing (Just & Carpenter, 1984).

Contemporary remote and head-mounted eye tracking equipment measures visual attention

associated with the use of screen-based interfaces and interactions in real-world environments.

The ability to measure eye movement in both laboratory and real-world environments has

encouraged the adoption of eye tracking methodology in a wide range disciplines including

education, engineering, sports science, neurophysiology, psychology, and user experience (UX)

(Holmqvist et al, 2011, p.1). Matching the appropriate use of eye tracking methodology to a

research program requires an understanding

2o Foveal Vision
of physiological function of eye movement

and available eye movement measures.

From this context, researchers are able to

appropriately apply eye tracking

180O Horizontal Field of View

methodology accounting for the practical

Figure 3: Foveal Vision. Adapted from Eye
limitations of eye tracking technology. Tracking the User Experience: A Practical Guide
to Research, by Aga Bojko, (2013).
Eye Movement Physiology

Using pupil and corneal reflection, eye tracking technology uses infrared light reflection

to correlate pupil center and corneal reflection to estimate gaze position against a given stimuli

(Holmqvist et al, 2011, pp.24-28). This measurement is focused in the 2 degrees of foveal vision

in visual focus (figure 3), excluding parafoveal and peripheral vision comprising the remaining

horizontal field of view (Duchowski, 2007, pp.4-13). This provides researchers insight on

where vision is focused, what is seen, and how attention is directed (Duchowski, 2007, pp.4-13).

However, reaching a conclusion on attention requires understanding the common eye movements

measures eye fixations, saccades, and smooth pursuits.

Fixations. Eye fixations are characterized as a temporary stabilization of eye movement

where attention is focused in foveal vision (Holmqvist et al, 2011, pp.21-22). With typical

research measurements ranging from 200-300 milliseconds, this movement includes three

miniature eye movements: drift (slow movement of the eye away from the point of fixation),

microscaccades (a quick return to the central point of fixation), and tremors (eye movement with

an unknown physiological purpose) as a single simplified measurement (Holmqvist et al, 2011,

pp.22; Duchowski, 2007, pp. 46-47).

Saccade. A saccade is defined as the rapid movement between two fixations with typical

durations ranging between 30-80 milliseconds (Holmqvist et al, 2011, p.23). During a saccade

the eye is effectively blind providing no visual feedback (Duchowski, 2007, pp.42-43). Without

a visual feedback loop, viewers also cannot change the direction of saccades once they have

begun. This ultimately may result in more frequent fixations resulting in saccade path patterns

that are longer than necessary to reach the final point of fixation (Duchowski, 2007, pp. 42-43).

Smooth pursuit. A smooth pursuit is a slower eye movement where a viewer

intentionally tracks a moving object, matching the movement velocity of a target with eye

movement (Duchowski, 2007, pp.45-46). The visual feedback loop inherent in smooth pursuit

allows viewers to change eye movement direction in response to changes in visual stimuli.

Generally, smooth pursuit eye movements cannot be invoked without the presence of a moving

object (Duchowski, 2007, pp. 45-46).

Classification and Use of Eye Movement Measures

Since the availability of eye movement measures is dependent on the manufacturer and

type of equipment, establishing a general understanding of the relationship between the typical

measurements of anatomical eye movement and measurement goals aids researchers in selecting

research appropriate measurements. Using a simplified classification of 1) attention and 2)

performance measurements provides a context that ensures that measurements are used

appropriately. The dual use of measures like fixation duration to measure either attention or

performance makes the context appropriate selection of eye movement measures critically

important (Poole & Ball, 2005).


Attention. Attention related measures in multimedia research provide researchers with

insight on the visual focus of learners in global, local (within a bounded area of interest (AOI)),

or targeted measures associated with a specific research question. The combined use of global,

local, and targeted measures provides researchers with metrics that can be specifically

customized to research aims and hypotheses. While fixation, saccade, and scanpath measures

can be applied in a variety of contexts, in a multimedia design, these are primarily linked to

multimedia principles to reduce extraneous processing, including signaling, redundancy, and

spatial and temporal contiguity (Mayer, 2010).

Performance. With applicability to multimedia principles to manage essential

processing, performance measures provide broad measures associated with cognitive processing

in working memory and task difficulty. Due to limitations in equipment related measurement,

only average fixation duration can be commonly used as a measure for cognitive processing

across a range of equipment. Additional measures such as pupil diameter and blink rate provide

quantitative insight on the cognitive functioning and physiological arousal of learners.

Typical Measurements. Based in part on the anatomical eye movement they represent,

eye movement measurements are group into six measurement groups: 1) fixation related

measures, 2) saccades, 3) scanpaths, 4) heat maps, 5) pupil diameter, and 6) blink rate.

Fixation Related Measures. Associated with the anatomical movement, fixations provide

is the most commonly applied eye movement measure used in research. While this is primarily a

measure of what individuals have looked at, fixation data is used to evaluate both attention and


Table 1

Fixation Eye Movement Measures

Measurement Interpretation Classification

Number of Increased fixations typically indicate a less Attention, Performance

Overall efficient searcha

Number of Increased fixations on a single AOI Attention, Performance

Fixations per indicates greater attention/attractiveness or
AOI increase difficulty processing contentb.

Fixations per To separate test-based AOIs with a greater Attention, Performance

AOI adjusted number of words from those that are more
for text length difficult to comprehend, dividing total
fixations by the number of words in a text-
based AOI establishes a comparable

Fixation Longer periods of fixation indicate greater Attention, Performance

Duration attention/attraction or increased processing

Fixation Diffuse fixations over a wide geographic Performance

Spatial Density region indicates a less efficient search than
fixations concentrated in a small geographic

Time to First The time (in seconds) between the display of Attention
Fixation (AOI) a stimulus to the first fixation in an AOId.
Used in conjunction with the percentage of
participants who fixated on an AOI, this
indicates the noticeability or attractiveness
of an aread.

Percentage of A context measure to establish how many Attention

Participants participants looked at an AOI. When used
Fixating on an in conjunction with Time to First Fixation,
AOI this measure helps place this measure in the
context of the total participant populationd.

Dwell Time on Related to fixation duration, dwell time is a Attention

an AOI sum of all eye movement measures that fall
within an AOI area. This measure is

inclusive of fixations, saccades, blinks, and

fixations that do not meet the study defined
minimum fixation criteriae.

Poole & Ball (2005), bBojko (2013, p.128 ), cGoldberg & Kotval (1999), dBojko (2013, p.126),
Holmqvist et al (2011, pp.386-387)

Saccades. The lack of visual feedback during a saccade limits its use as an attention

measure. Associated primarily with mental workload, fatigue and inefficient search, saccades

provide an objective measure of task performance and cognitive load.

Table 2

Saccade Movement Measures

Measurement Interpretation Classification

Saccadic Rate Calculated as the number of saccades per Performance

second. A decrease in saccade rate can
indicate an increase in mental workload or
fatigueab or an inefficient searchc. However,
use of visual imagery that requires
horizontal or vertical movement or visual
disorders can produce higher saccadic rates.

Saccadic A measure of the distance traveled from Performance

Amplitude saccade start to finish. Decreased saccadic
amplitude can be an indicator of search task
difficulty, increased cognitive load.
However, brain injury, high-frequency
visual information, reading ability, and age
can affect amplitude measuresd.

Nakyama, Takahashi, & Shimizu (2002), bHolmqvist et al (2011, pp. 404-405), cGoldberg &
Kotval (1999), dHolmqvist et al (313-315)

Gaze plots/Scanpaths. A gaze

plot/scanpath provides a visual indication

of where a participant looked and in what

order (figure 4). Using a sequential path

of fixations with dots ranging in size

based on fixation duration, gaze plots/

scanpaths represent an individual

measure of attention and performance

(Bojko, 2013,pp.124-139). This can be

Figure 4: Gaze plot. (Bojko, 2013,p.264)
used to assess reading path, attention

order, or the search strategies used to locate relevant information. The length and duration of the

gaze plot/scanpath is an indication of search (Goldberg & Kotval, 1999).

Heat maps. Using a color scale, heat maps

(figure 5) provide a visualization of attention over a

static image showing the frequency of fixations and

gaze for individual or aggregate participant groups.

While heat maps provide a quick visual indication

Figure 5: Heatmap (Bojko, 2013, p.126)
of attention, the use of aggregate heat maps can

result in the inappropriate interpretation of eye movement data if used without care.

Inappropriate interpretation can result from the aggregation of fixation or gaze data without clear

information on the weight of fixations relative to the number of participants with measured

attention in an area. Differences in participant stimulus exposure and resulting fixation count or

absolute gaze duration can skew visualizations resulting in a visualization weighted in favor of

participants with higher fixations or longer gaze durations. While the use of relative gaze

durations in heat maps may moderate the effect of unequal session times, the varied weights of

colors in the heat map spectrum can lead readers to misinterpret the weight of a heat map across

visualizations. Yet, when used in conjunction with other quantitative and qualitative findings,

heat maps can be a powerful tool to illustrate research findings when appropriately constructed to

represent arguments central to overall findings (Bojko, 2013, pp. 226-239).

Pupil Diameter. With a central role in the regulation of physiological arousal and

cognitive functioning, the noradrenergic locus coeruleus modulates pupil dilation providing a

measure of cognitive effort and emotional response (Eckstein, Guerra-Carrillo, Singley, &

Bunge, 2017). In research context, pupil diameter can be used to quantify differences in task

difficulty between learners where greater pupil diameter equals greater intrinsic load.

Specifically, pupil diameter has been successfully validated to define task difficulty based on the

cognitive characteristics of learners where learners with greater expertise exhibit smaller

pupillary response to a given task (Eckstein, Guerra-Carrillo, Singley, & Bunge, 2017).

However, the use of pupil diameter must be carefully considered as other factors such as

age, pain, drug use, and fatigue affect pupil size (Bojko, 2013,pp. 129-133; Holmqvist, 2011, pp.

393-394). The use of pupil size in elderly populations is particularly problematic as pupil size

and the rate of pupil constriction decreases linearly with age (Birren, Casperson & Botwinick,

1950; Winn, Whitaker, Elliott, & Phillips, 1994). While researchers could increase the internal

validity of this measure by using age as an independent variable, the increased need for vision

correction in this population coupled with the precision needed to correct vision at smaller pupil

sizes increases the likelihood for data loss due to calibration issues (Winn, Whitaker, Elliott, &

Phillips, 1994).

Blink Duration and Rate. Blinking serves two roles within eye tracking, 1) it is a

measure of cognitive function and 2) it is a critical part of event detection for other measures

(e.g. fixation) in the analysis of eye movement (Fogarty & Stern, 1989). Both blink rate and

duration are a measure of drowsiness and fatigue (Benedetto et al, 2011; Stern, Boyer, and

Schroeder 1994) and can be used to provide context to performance measures.

Limitations of Eye Tracking

Despite the potential for eye tracking to broadly answer questions regarding the selective

attention of learners and cognitive load, the use of eye tracking in multimedia research is limited

by 1) equipment and 2) vision-related measurement issues.

Equipment. Equipment cost is the overarching limitation that influences eye tracking

research across a spectrum of research applications. With the cost of eye tracking equipment and

software ranging between 15-25k (US currency), access to equipment for field use is largely

limited to commercial and university research groups with appropriate funding and a focused

interest in eye movement research.

Hardware form-factor. However, even for

research groups with appropriate funding,

manufacturer and equipment form factor defines

the use of eye tracking in research applications. The Figure 6: Tobii Glasses Wearable Eye Tracker.
Reprinted from Tobii Glasses 2, In Tobii Pro,
two commonly found equipment form factors used 2015, Retrieved from
by researchers are remote (mounted in a fixed pro-glasses-2/. Copyright 2017 Tobii AB.

location in front of a research participant) and wearable (modified eye wear worn by the research

participant) (Holmqvist, 2011, pp.50-63). Remote mounted equipment limits researchers to a

physical fixed location during research, but offers the flexibly to measure identical interfaces

across participants allowing direct comparison and shorter analysis timelines. In contrast,

wearable equipment allows total freedom of movement and measurement in real-world

applications, but requires individually tailored analysis based on each participant’s recorded


Measurement data and accuracy. While most equipment manufacturers provide

common measurements for fixations, saccades, and smooth pursuits, there are variable

measurements available. Some equipment models may not provide measurements options for

pupil diameter or blink rate or analysis options for heat maps or gaze plots/scanpaths. Similar to

form factor, the speed of data collection influences the reliability of data for specific research

applications. Data collection speeds in eye tracking equipment range from 60 to 300 Hz, with

equipment cost increasing exponentially as data collection speeds increase. Slower equipment

increases the gaps between recorded eye movement, making the measurement of data less

precise. As a consequence, researchers must balance their research goals with equipment form,

function, and cost.

Vision-Related Measurement Issues. Eye tracking equipment and associated data

analysis algorithms assume normal to corrected to normal vision. Blindness or other visual

disorders (e.g. amblyopia [lazy eye], cataract, nystagmus, ptosis [droopy eye lid], and

strabismus) interfere with the basic operation of equipment, resulting in either an inability to

calibrate equipment or a lack of useful data because of the disruption in foveal vision (figure 3).

Similarly, the use of bi-focal, tri-focal, progressive, and

contact lenses or long eye lashes create a potential for

physical interference disrupting measurement by blocking

sensor measurement.

Remote equipment. Measurement issues are further Figure 5: Tobii Remote Eye Tracking
Configuration. . Reprinted from Position
compounded with the use of remote-based eye tracking in front of the Tobii Eye Tracker, In Tobii
Eye Tracking, 2017, Retrieved December
equipment, where participants must remain on average 12, 2017, from
between 45-100 cm from the screen for optimal us/articles/210250305-Position-in-front-
of-the-Tobii-Eye-Tracker. Copyright
measurement (figure 5). Head movement outside of this Tobii AB.

region results in data loss as the research participant moves beyond the measurable field of

sensors. While the use of a chin stand compensates for this movement producing a consistent

stable measurement (particularly in children), it increases awareness of the observational nature

of the experiment increasing the potential for non-normative behavior. As a result, researchers

must balance the rigor of their research protocol with the benefit of naturalistic conditions

finding a middle ground between internal and external validity.

Visual Perception. Even with successful measurement, eye tracking data cannot

guarantee researchers a complete measurement of vision. The measurement of foveal vision

(figure 3) excludes perceivable peripheral vision showing a limited view of what is seen and

perceived. While perceptual load allows individuals to selectively filter attention, the presence

of internal distraction (e.g. mind wandering) may require additional measures to determine if

content was visually perceived (Moreno & Park, 2010)


Despite the significant issues associated with eye tracking in empirical research, eye

tracking methodology reveals behaviors that cannot be reliably captured via other means. By

factoring research goals into the selection of equipment and the creation of a rigorous research

protocol, researchers can mitigate limitations to gain insights on attention and distraction.

Together with other quantitative (e.g. performance, biofeedback) and/or qualitative data sources

(e.g. interviews, retrospective talk-aloud protocols), research with eye tracking can provide a

holistic view of learning and multimedia interactions that together can answer questions of

attention and the cognitive retrieval processes (Anderson, Bothell & Douglass, 2004).

Conceptual Issues in Eye Tracking in Multimedia Learning

As advocacy for flipped classroom content grows within K-12 and higher education

(Bogost, 2013) and the number of distance education courses continues to grow, the use of

multimedia instruction becomes a critically important part of educational delivery. As barriers to

the production of multimedia decreases through the broad availability of lecture-capture, low-

cost video production software, there is greater need to evaluate principles of multimedia design

for application by practioners using off the shelf products. While current research aims to

establish boundary conditions to determine the effectiveness multimedia design principles in

practical application, this research does not account for templates, instructional design

recommendations, or accessibility requirements provided to teacher-designers producing

multimedia instruction. Growth in the use of eye tracking methodology has provided a

mechanism to evaluate the intersection of task difficulty, expertise, and design on attention and

cognitive processing, more research is needed to establish actionable recommendations for


To establish practical guidelines for adoption outside the research community, the current

scope of multimedia research and the application of eye tracking methodology is needed.

Reviewing the scope of research, methods and measures, and companion measures provides

guidance for future research and methodology improvement.

Methods of Research Retrieval

The goal of this literature review was to evaluate peer-reviewed literature associated with

multimedia design with an application of eye tracking methodology. A particular focus was on

empirical research discussing cognitive load and attention or distraction. To evaluate literature,

A search of peer-reviewed articles and dissertations was conducted in ScienceDirect, EBSCO,

ProQuest, and Google Scholar databases from October-November 2017. This was followed by a

manual search of literature references lists and meta analyses to find relevant literature meeting

inclusion criteria not included in initial search results.

To identify studies relevant to the use of eye tracking in multimedia learning research.

Search terminology included eye tracking (e.g. eye track* or eye movement), multimedia

learning, cognitive load, and selection-related phrases (e.g. attention, distraction, or

concentration). No limits were placed on participant demographics to provide a complete profile

of research coverage. While no specific time limit was used to bound search results, results were

limited to 2007 or later due to the availability full-text peer-reviewed literature available in


Articles included discussed studies directly assessing the effectiveness of multimedia

learning. This resulted in the exclusion of literature focused on simulation and work

environments (e.g. virtual reality, surgical procedures, control room operators, etc) and reading

behavior (i.e. reading miscue analysis) due to the lack of specific research aims in multimedia

learning. Articles were reviewed by title, abstract, and keywords to test against inclusion and

exclusion criteria. Full-text articles meeting inclusion criteria were retrieved and assessed to

ensure research criteria were met. A total of 63 of unduplicated articles were included. However,

only 32 articles are included here due to time constraints in analysis.


An initial review of results raises questions regarding search terminology and the loose definition

of multimedia. While cognitive load was a commonly discussed theory, the use of CTML and

multimedia design principles where limited to testing of traditional presentations (e.g. animations

and PowerPoint). This made results difficult to classify using principles of multimedia design

raising questions regarding the overly broad inclusion criteria used. Despite this limitation, the

scope of the review revealed common methodological approaches that are beneficial to future

research in multimedia.

Equipment. Explicit discussion of equipment manufacture, hardware format and data

collection speed, and software is a common expectation among the research community. The

screen-based design of multimedia has resulted in the common use of remote eye tracking

software for most experiments evaluating design elements. However, wearable eye trackers do

appear in multimedia research focused on multimedia in real-world, uncontrolled environments

where environmental conditions influence attention (Bucher & Niemann, 2012) reinforcing the

importance of equipment selection in research design. Currently, Tobii remote eye tracking

equipment is the most popular among multimedia researchers (n=11) with SMI (n=7), EyeLink

(n=4), ASL (n=4), FaceLab (n=3), and Mangold International (n=1) providing alternate options

for collecting eye movement data.


Measurements. Fixation duration and count (Andrà et al, 2015; Boucheix, Lowe, &

Bugaiska, 2015; Boucheix & Lowe, 2010; Chen, Hsiao, & She, 2015; Dogusoy-Taylan &

Cagiltay, 2014; Glaser & Schwan, 2015; Huang & Chen, 2016) and dwell time (Bucher &

Niemann, 2012; Koć-Januchta et al, 2017, van Wermeskerken & van Gog, 2017) appear to be

the most common measures used regardless of research orientation. Use of saccade measures

appears to be extremely limited (Cheng, Hsiao, & She, 2015; Huang & Chen, 2016; Yang et al,

2013). While saccade measures have been used with fixation density to measure the effect of

prior knowledge (Yang et al, 2013), the use of saccade data appears to be unclearly defined


Use of gaze plots to assess repeated attention (regression) (Chen & Yang, 2014), re-

reading (Chen, Hsiao, & She, 2015), common patterns of attention (gaze synchrony) (Glaser &

Schwan, 2015), and order of attention (Andrà et al, 2015; Bucher & Niemann, 2012; Cook,

Wiebe, & Carter, 2008) are used to assess the relationship between interpretation and integration

of disparate text and graphics (split attention) and knowledge construction. While this falls

outside of the principles for multimedia design (i.e. reducing extraneous processing, managing

essential processing, and fostering generative processing), this provides context on the synthesis

of math and science concepts. The least common measure used in studies were heatmaps. Used

as a part of qualitative analysis of participant behavior, heat maps have provided researchers an

additional mechanism to inspect and assess differences in gaze patterns between participants

(Hsu, Hwang, & Chang, 2014; Huang & Chen, 2016; Koć-Januchta et al, 2017). It is important

to note that researchers using heatmaps have been careful to contextualize results and have

avoided the use of aggregate maps to avoid a mischaracterization of objective data.


Companion Measures. Additional analysis is needed to assess the type and scope of

companion measures across multimedia research. Use of interviews and retrospective think-

aloud sessions found in user research are utilized in association with research on spatial problem-

solving tasks (Cook, Wiebe, & Carter, 2008; Bucher & Niemann, 2012; Chen & Yang, 2014) to

explain the mental processing logic of learners during their tasks. However, the inclusion of the

mini-mental state examination (MMSE) to provide cognitive context to age related differences is

compelling when considering the expansion of research in elderly populations (Boucheix, Lowe,

& Bugaiska, 2015).

Limitations. The use of the search term “cognitive load” may have limited the scope of

search results. Since multimedia design principles include concepts to manage essential

processing and foster generative processing, related research may not include discussion of

cognitive load. Therefore it is premature to state that multimedia focused eye tracking research

is focused primarily on principles to reduce extraneous processing. Additionally, the lack of

detailed analysis of research questions and results chronologically makes it difficult to determine

how research protocol has evolved, identifying questions that are no longer important to


Implications for Future Research

The common use of undergraduate research participants limits the transferability of

behavioral findings to the broader population. Purposeful expansion of age demographics

provides an opportunity to evaluate attention across the full spectrum of neurological

development providing insight on multimedia design and the neurological development of

children and cognitive decline in eldery populations. An expanded research program also

requires transparent screening regarding learning disabilities and neuroatypical learners.


Early research on the impact of the impact of induced emotions on multimedia learning

(Knorzer, Brunken, & Park, 2016) provides insight on relationship between cognitive and

affective processes in learning. This suggests an opportunity to leverage eye tracking for the

measurement of emotional arousal. However, limited use of pupillary measurements in current

research, age-based differences in pupil size (Birren, Casperson & Botwinick, 1950; Winn,

Whitaker, Elliott, & Phillips, 1994), and lack of pupillary and blink-rate measures in low-speed

(Hz) eye trackers raises questions regarding the practical measurement of emotion. An analysis

of the practical measurement of emotion is needed to determine when pupillary measures are

appropriate versus other companion measures like, skin conductance (GSR), heart rate (ECG),

and facial sentiment analysis (Rosas, Micalcea, & Morency, 2013).

Researching the use of eye tracking in other disciplines to review companion measures

may provide insight on measures that could increase the rigor of research in multimedia learning.

A cursory review has shown the influence of usability related research in the incorporation of

retrospective think-aloud processes and medical research in the use of MMSE. This expanded

scope provides an opportunity to reinforce performance and attention findings to achieve results

that are generally transferable.

Overall, eye tracking methodology provides an accurate measure of attention and

provides an objective measure of workload in sighted individuals. Despite limitations in

equipment availability, format, and participant factors including age, eye disorders, and

questions regarding the overuse of convenience populations (i.e. undergraduate college students),

eye tracking provides the best measure to establish how sensory processing and attention impact

multimedia. Further research in diverse populations provides greater insight on the intersection

of design, learning interactions, and performance to further refine principles of multimedia

design into actionable recommendations for teacher-designers.



Anderson, J. R., Bothell, D., & Douglass, S. (2004). Eye movements do not reflect retrieval
processes: Limits of the eye-mind hypothesis. Psychological Science, 15(4), 225-231.’

Andrà, C., Lindström, P., Arzarello, F., Holmqvist, K., Robutti, O., & Sabena, C. (2015).
Reading Mathematics Representations: an Eye-Tracking Study. International Journal of
Science and Mathematics Education, 13(January 2012), 237–259.

Benedetto, S., Pedrotti, M., Minin, L., Baccino, T., Re, A., & Montanari, R. (2011). Driver
workload and eye blink duration. Transportation research part F: traffic psychology and
behaviour, 14(3), 199-208.

Birren, J. E., Casperson, R. C., & Botwinick, J. (1950). Age changes in pupil size. Journal of
Gerontology, 5(3), 216-221

Bogost, I. (2013). The condensed classroom. The Atlantic, 8.

Bojko, A. (2013). Eye tracking the user experience: A practical guide to research. Brooklyn,
New York: Rosenfeld Media LLC.

Boucheix, J. M., & Lowe, R. K. (2010). An eye tracking comparison of external pointing cues
and internal continuous cues in learning with complex animations. Learning and Instruction,
20(2), 123–135.

Boucheix, J. M., Lowe, R. K., Putri, D. K., & Groff, J. (2013). Cueing animations: Dynamic
signaling aids information extraction and comprehension. Learning and Instruction, 25, 71–

Boucheix, J.-M., Lowe, R., & Bugaiska, A. (2015). Age differences in learning from
instructional animations. Applied Cognitive Psychology, 29(April), 524–535.

Brasel, S. A., & Gips, J. (2014). Enhancing television advertising: Same-language subtitles can
improve brand recall, verbal memory, and behavioral intent. Journal of the Academy of
Marketing Science, 42(3), 322–336.

Bucher, H.-J., & Niemann, P. (2012). Visualizing science: the reception of powerpoint
presentations. Visual Communication, 11(3), 283–306.

Chen, S. C., Hsiao, M. S., & She, H. C. (2015). The effects of static versus dynamic 3D
representations on 10th grade students’ atomic orbital mental model construction: Evidence

from eye movement behaviors. Computers in Human Behavior, 53, 169–180.

Chen, Y. C., & Yang, F. Y. (2014). Probing the Relationship Between Process of Spatial
Problems Solving and Science Learning: an Eye Tracking Approach. International Journal
of Science and Mathematics Education, 12(3), 579–603.

Cook, M., Wiebe, E. N., & Carter, G. (2008). The influence of prior knowledge on viewing and
interpreting graphics with macroscopic and molecular representations. Science Education,
92(5), 848–867.

Dogusoy-Taylan, B., & Cagiltay, K. (2014). Cognitive analysis of experts’ and novices’ concept
mapping processes: An eye tracking study. Computers in Human Behavior, 36, 82–93.

Duchowski, A. T. (2007). Eye tracking methodology. Theory and practice, London: Springer

Eckstein, M. K., Guerra-Carrillo, B., Singley, A. T. M., & Bunge, S. A. (2017). Beyond eye
gaze: What else can eyetracking reveal about cognition and cognitive
development?. Developmental cognitive neuroscience, 25, 69-91.

Fogarty, C., & Stern, J. A. (1989). Eye movements and blinks: their relationship to higher
cognitive processes. International Journal of Psychophysiology, 8(1), 35-42.

Glaser, M., & Schwan, S. (2015). Explaining Pictures : How Verbal Cues Influence
Processing of Pictorial Learning Material. Journal Of Educational Psychology, 107(4),
Goldberg, J. H., & Kotval, X. P. (1999). Computer interface evaluation using eye movements:
methods and constructs. International Journal of Industrial Ergonomics, 24(6), 631-645.

Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J.
(2011). Eye tracking: A comprehensive guide to methods and measures. New York: Oxford
University Press.

Hsu, C. K., Hwang, G. J., & Chang, C. K. (2014). An automatic caption filtering and partial
hiding approach to improving the english listening comprehension of EFL students.
Educational Technology and Society, 17(2), 270–283.

Huang, P. S., & Chen, H. C. (2016). Gender Differences in Eye Movements in Solving Text-and-
Diagram Science Problems. International Journal of Science and Mathematics Education,
14(162), 327–346.

Jamet, E. (2014). An eye-tracking study of cueing effects in multimedia learning. Computers in

Human Behavior, 32, 47–53.

Jarodzka, H., Janssen, N., Kirschner, P. A., & Erkens, G. (2015). Avoiding split attention in
computer-based testing: Is neglecting additional information facilitative? British Journal of
Educational Technology, 46(4), 803–817.

Jian, Y. C., & Ko, H. W. (2017). Influences of text difficulty and reading ability on learning
illustrated science texts for children: An eye movement study. Computers and Education,
113, 263–279.

Jian, Y. C., & Wu, C. J. (2015). Using Eye Tracking to Investigate Semantic and Spatial
Representations of Scientific Diagrams During Text-Diagram Integration. Journal of Science
Education and Technology, 24(1), 43–55.

Just, M.A., & Carpenter, P.A. (1984). Using eye fixations to study reading comprehension. In
D.E. Kieras & M.A. Just (Eds.), New methods in reading comprehension research (pp. 151–
182). Hillsdale, New Jersey: Erlbaum.

Knörzer, L., Brünken, R., & Park, B. (2016). Facilitators or suppressors: Effects of
experimentally induced emotions on multimedia learning. Learning and Instruction, 44, 97–

Koć-Januchta, M., Höffler, T., Thoma, G. B., Prechtl, H., & Leutner, D. (2017). Visualizers
versus verbalizers: Effects of cognitive style on learning with texts and pictures – An eye-
tracking study. Computers in Human Behavior, 68, 170–179.

Korbach, A., Brünken, R., & Park, B. (2016). Learner characteristics and information processing
in multimedia learning: A moderated mediation of the seductive details effect. Learning and
Individual Differences, 51, 59–68.

Kruger, J. L., & Steyn, F. (2014). Subtitles and eye tracking: Reading and performance. Reading
Research Quarterly, 49(1), 105–120.

Lavie, N., Hirst, A., De Fockert, J. W., & Viding, E. (2004). Load theory of selective attention
and cognitive control. Journal of Experimental Psychology: General, 133(3), 339.

Liu, H. C., Lai, M. L., & Chuang, H. H. (2011). Using eye-tracking technology to investigate the
redundant effect of multimedia web pages on viewers’ cognitive processes. Computers in
Human Behavior, 27(6), 2410–2417.

Lowe, R. K., & Boucheix, J. M. (2016). Principled animation design improves comprehension of
complex dynamics. Learning and Instruction, 45, 72–84.

Lowe, R., & Boucheix, J. M. (2011). Cueing complex animations: Does direction of attention
foster learning processes? Learning and Instruction, 21(5), 650–663.

Mayer, R. E. (Ed.). (2005). The Cambridge handbook of multimedia learning. New York, New
York: Cambridge University Press.

Mayer, R. E. (2010). Unique contributions of eye-tracking research to the study of learning with
graphics. Learning and instruction, 20(2), 167-171

Moreno, R., & Park, B. (2010). Cognitive Load Theory: Historical Development and Relation to
Other Theories. In J. Plass, R. Moreno, & R. Brünken (Eds.), Cognitive Load Theory (pp. 9-
28). New York, New York: Cambridge University Press.

Nakayama, M., Takahashi, K., & Shimizu, Y. (2002, March). The act of task difficulty and eye-
movement frequency for the'Oculo-motor indices'. In Proceedings of the 2002 symposium on
Eye tracking research & applications (pp. 37-42). ACM.

Park, B., Plass, J. L., & Brünken, R. (2014). Cognitive and affective processes in multimedia
learning. Learning and Instruction, 29, 125-127.

Poole, A. & Ball, L. J. (2006). Eye Tracking in Human-Computer Interaction and Usability
Research: Current Status and Future Prospects. In C. Ghaoui (Ed.). Encyclopedia of Human
Computer Interaction. (pp. 211-219), Hershey, Pennsylvania: Idea Group Inc.

Rosas, V. P., Mihalcea, R., & Morency, L. P. (2013). Multimodal sentiment analysis of Spanish
online videos. IEEE Intelligent Systems, 28(3), 38-45.Stern, J. A., Boyer, D., & Schroeder, D.
(1994). Blink rate: a possible measure of fatigue. Human factors, 36(2), 285-297.

Sweller, J. (2010). Cognitive Load Theory: Recent Theoretical Advances. In J. Plass, R. Moreno,
& R. Brünken (Eds.), Cognitive Load Theory (pp. 29-47). Cambridge: Cambridge University
Press. doi:10.1017/CBO9780511844744.004

Tobii AB. (2015, June 25). Tobii Pro Glasses 2 wearable eye tracker. Retrieved December 09,
2017, from

van Amelsvoort, M., van der Meij, J., Anjewierden, A., & van der Meij, H. (2013). The
importance of design in learning from node-link diagrams. Instructional Science, 41(5), 833–

van Gog, T., Jarodzka, H., Scheiter, K., Gerjets, P., & Paas, F. (2009). Attention guidance during
example study via the model’s eye movements. Computers in Human Behavior, 25(3), 785–

van Gog, T., Kester, L., Nievelstein, F., Giesbers, B., & Paas, F. (2009). Uncovering cognitive
processes: Different techniques that can contribute to cognitive load research and instruction.
Computers in Human Behavior, 25(2), 325–331.

van Gog, T., & Scheiter, K. (2010). Eye tracking as a tool to study and enhance multimedia
learning. Learning and Instruction, 20(2), 95–99.

van Marlen, T., van Wermeskerken, M., Jarodzka, H., & van Gog, T. (2016). Showing a model’s
eye movements in examples does not improve learning of problem-solving tasks. Computers
in Human Behavior, 65, 448–459.

Van Meeuwen, L. W., Jarodzka, H., Brand-Gruwel, S., Kirschner, P. A., de Bock, J. J. P. R., &
van Merri??nboer, J. J. G. (2014). Identification of effective visual problem solving strategies
in a complex visual domain. Learning and Instruction, 32, 10–21.

van Wermeskerken, M., & van Gog, T. (2017). Seeing the instructor’s face and gaze in
demonstration video examples affects attention allocation but not learning. Computers and
Education, 113, 98–107.

Wang, C. Y., Tsai, M. J., & Tsai, C. C. (2016). Multimedia recipe reading: Predicting learning
outcomes and diagnosing cooking interest using eye-tracking measures. Computers in
Human Behavior, 62, 9–18.

Waniek, J., & Ewald, K. (2008). Cognitive Costs of Navigation Aids in Hypermedia Learning.
Journal of Educational Computing Research, 39(2), 185–204.

Wiebe, E. N., Slykhuis, D. A., & Annetta, L. A. (2007). Evaluating the effectiveness of scientific
visualization in two powerpoint delivery strategies on science learning for preservice science
teachers. International Journal of Science and Mathematics Education, 5(April 2006), 329–

Wiebe, E. N., & Annetta, L. A. (2008). Influences on Visual Attentional Distribution in

Multimedia Instruction. Journal of Educational Multimedia and Hypermedia, 17(2), 259–

Winn, B., Whitaker, D., Elliott, D. B., & Phillips, N. J. (1994). Factors affecting light-adapted
pupil size in normal human subjects. Investigative Ophthalmology & Visual Science, 35(3),

van Wermeskerken, M., & van Gog, T. (2017). Seeing the instructor’s face and gaze in
demonstration video examples affects attention allocation but not learning. Computers and
Education, 113, 98–107.

Yang, F. Y., Chang, C. Y., Chien, W. R., Chien, Y. T., & Tseng, Y. H. (2013). Tracking
learners’ visual attention during a multimedia presentation in a real classroom. Computers
and Education, 62.

You might also like