Professional Documents
Culture Documents
By
Seth M. Spain,
University of Illinois, Urbana-Champaign
Andrew G. Miner,
Target Corp.
Pieter M. Kroonenberg,
Leiden University
Fritz Drasgow,
University of Illinois, Urbana-Champaign
Author Notes.
Portions of this research were presented at the 2007 Annual Meeting of the Academy of
Management, Philadelphia, PA. This manuscript is based on the first author’s master’s thesis.
Correspondence and requests for reprints should be addressed to Seth Spain, Department
The authors would like to thank Chuck Hulin, Sungjin Hong, and Theresa Glomb for
Questions about the dynamic processes that drive behavior at work have been the focus of
increasing attention in recent years. Models describing behavior at work and research on
momentary behavior indicate that substantial variation exists within individuals. This paper
examines the rationale behind this body of work and explores a method of analyzing momentary
work behavior using experience sampling methods. The paper also examines a previously unused
set of methods for analyzing data produced by experience sampling. These methods are known
analysis, the Parafac and the Tucker3 models, are used to analyze data from Miner, Glomb, and
Hulin’s (in press) experience sampling study of work behavior. The efficacy of these techniques
for analyzing experience sampling data is discussed, as are the substantive multimode
Keywords: Three-way principal components analysis, Parafac model, Tucker3 model, experience
Historically, measures of job performance have been of great interest for organizational
scientists attempting to establish the validity of selection systems (Austin & Villanova, 1992).
For instance, if a city uses a cognitive ability measure to select its police officers, applied
psychologists want to validate those cognitive ability scores against a criterion such as the new
officers’ arrest records, where arrest records serve as a measure of job performance. Validity for
predicting a criterion is commonly evaluated via the correlation between the predictor measure
and criterion. For the police example, this would be the correlation between the cognitive ability
conceptually and operationally (e.g., Austin & Villanova, 1992; Borman, 1991). Most validation
research treats job performance as a monolithic and static construct. There is considerable
empirical evidence that job performance is multidimensional (e.g., Campbell, 1991; Campbell,
1994; Campbell, McHenry, & Wise, 1990; cf., Borman & Motowidlo, 1997). Furthermore, it is
possible that job performance is not stable over time (Hulin, Henry, & Noon, 1990; Keil &
Cortina, 2001). In fact, job performance data can usually be classified by three modes: the
individuals assessed, the variables measured, and the times of measurement (e.g., Cattell, 1952;
cf., Smith, 1976). Ghiselli (1956) postulated three systematic sources of variance in job
performance data, i.e. all three modes show multidimensionality. Dalal and Hulin (2008) have
called for job performance studies which include changes over time, referring to this approach as
multivariate dynamic. Even this perspective ignores qualitative differences in performance and
thus potential dimensionality in the individuals mode. This paper will deal with
multidimensionality of job performance in all its modes and takes an individual differences
In order to validly measure the frequency and the patterning of mental processes in
everyday-life situations procedures are needed which capture variations in self-reports of those
processes. To this end, experience sampling methodology has been developed in which a
participant at random or specific times has to report on his or her mental state or those activities
in which he or she is involved at that moment. To capture those reporting instances participants
are supplied with beepers, or more recently with palmtop computers. With the help of these
devices, several brief surveys each day are administered to participants (Larson &
information to be gathered from individuals about several variables over time, the procedure
provides data to study Ghiselli’s three sources of job performance variance. Several methods for
analyzing such experience sampling data have been used in the literature, in particular spectral
analysis and multilevel modeling. This paper will demonstrate techniques that have not been
frequently used in the organizational literature, and explore their usefulness for understanding
The techniques presented in this paper are methods of multiway component analysis (see
Kroonenberg, 2008, p.16 for the distinction between mode and way). We focus on the Tucker3
model (Tucker, 1966) and the Parallel factor analysis (Parafac) model (Harshman & Lundy,
1984a;b). These three-mode models can be used to explore the individual, static, and dynamic
structures of work experience data. The results show multidimensionality in all three modes. The
meaning of these dimensions and implications for understanding job performance are discussed.
There are three sources of meaningful variance in job performance scores (Ghiselli,
1956): (1) static dimensionality, the ordinary factorial representation of performance; (2)
dynamic dimensionality, temporal factors influencing the performance domain; and (3)
individual dimensionality, variability in the type of performance across persons in the same job.
We will consider each of these sources in turn, beginning with individual dimensionality. For
similar distinctions in a more general sense, see Cattell (1952; refer to Figure 1 below).
Ghiselli described individual dimensionality as the way that individuals assigned to the
same job within an organization perform that specific job in qualitatively different ways, not
employees. For example, two salespersons may provide the same economic benefit to the
organization, but one contributes by directly making sales, while the other contributes by
creating goodwill, encouraging customers to make purchases throughout the store (Ghiselli,
variety of situations. If the organization derives the same economic benefit from different
employees in different ways, these differences should be reflected in selection and reward
systems.
Static dimensionality refers to the latent structure of the variables measuring job
performance. Historically, the study of job performance was characterized by a search for the
“ultimate criterion”, a comprehensive index of performance. It has been pointed out that this is
an inappropriate way to conceptualize performance (e.g., Ghiselli, 1956; Dunnette, 1963). There
is evidence across many jobs that overall job performance can consist of as many as eight
dimensions (Campbell, 1994; Campbell, McHenry, & Wise, 1990). At minimum, job
performance researchers should consider both task performance, the technical core of the job,
and contextual performance, the social and non-technical contributions an individual makes at
work (Borman & Motowidlo, 1997). These dimensions have been found to independently
& Van Scotter, 1994). For instance, consider two salespeople who have equal sales. One is
known to be a loner, while the other gives advice and assistance to coworkers. The latter will be
viewed as the superior performer, due to the social contributions this salesperson makes.
It may be necessary to have a single measure of job performance, for example when
establishing the predictive validity of a selection battery (Hulin, 1982). One might then construct
some composite of scores on various performance dimensions. The weighting scheme chosen to
combine different dimensions of job performance has been found to greatly impact those validity
estimates in Monte Carlo simulations (Murphy & Shiarella, 1997). These authors found that 34%
of the variance in validity estimates over simulated samples was accounted for by the
Finally, the dynamic nature of performance criteria is important to consider for employee
selection. There are well-documented changes in the predictive relationships between selection
instruments and performance measures (Alvarez & Hulin, 1972). In an academic example, SAT
scores are most predictive of first semester college grade point average, and the correlations
between SAT and GPA decrease after that (e.g., Humphreys, 1976). These changes in predictive
validity have been found to be virtually ubiquitous (Hulin et al., 1990). The fact that predictive
validity changes over time is usually called validity degradation and is the primary approach to
the study of dynamic criteria, though there are other approaches (e.g., Austin, Humphreys, &
Hulin, 1989; Barrett, Caldwell, & Alexander, 1985). This work is summarized in meta-analyses
by Hulin et al. (1990) and Keil and Cortina (2001), which show that validity degradation is
If performance changes over time, it would useful to find predictors of the change itself.
Personality measures have been used in addition to cognitive ability measures to predict
individual growth curves for performance criteria (e.g., Zyphur, Bradley, Landis, & Thoresen,
2008). The results of this study indicate that both cognitive ability and conscientiousness predict
initial academic performance, but only conscientiousness predicts performance trajectories. This
may happen because early performance is a transition phase of skill acquisition and later
performance is a maintenance phase (Murphy, 1989). In a study testing that proposition, both
agreeableness and openness to experience predicted both performance differences and trends in
the transition phase (Thoresen, Bradley, Bliese, & Thoresen 2004). These attempts remain
univariate in their approach to criterion measurement, however. Theoretical models should guide
One such model is provided by Affective Events Theory (Weiss & Cropanzano, 1996).
These authors suggest that workplace behavior comes in two basic kinds: affect-driven and
judgment driven. Workplace events cause affective reactions, and these affective reactions
directly influence affect-driven behavior. But these affective reactions also influence job attitudes
affect should thus show stronger relationships with momentary behaviors such as work
withdrawal (e.g., taking long coffee breaks or surfing the web) and that job attitudes should have
stronger relationships with more considered behaviors such as job withdrawal (e.g., job search,
personality will moderate both the link between events and predict the affective reactions
themselves.
In complement to Affective Events theory, the Episodic Process Model (Beal, Weiss,
Barros, & MacDermid, 2005) suggests that there will be important momentary fluctuations in the
affective and regulatory resources available for employees to apply to performance behaviors.
This model articulates reasons why performance behaviors should meaningfully vary within
persons over short time periods. For example, if my supervisor yells at me, and I then need to
interact with a client, I may have to regulate my emotional display to appear positive to the
client. This act of emotion regulation uses up some of my regulatory resources, and may
therefore make it more difficult for me to focus my attention on a report I need to write later in
the day. Obtaining evidence to test such a model requires research designs that are capable of
Experience sampling methods are ideally suited to explore dynamic models of work
behavior because measurements may be taken throughout the work day on several variables. For
example, Weiss, Nicholas, and Daus (1999) tested propositions drawn from Affect Events Theory
about the influence of job beliefs and moods on overall job satisfaction. Dalal, Lam, Weiss,
Welch, and Hulin (2009) investigated the dimensionality of organizational citizenship and
counterproductive behaviors, and their relationships to state affect and performance. Miner,
Glomb, and Hulin (2005; in press) found general support that individuals modify their work
behaviors in response to mood changes. These studies demonstrate that a substantial portion of
longitudinal studies.
Experience sampling data are three-mode (persons × variables × occasions) and are
frequently analyzed with multilevel models, with the occasions mode nested within persons. This
paper addresses whether such data may be examined using multiway analysis, as illustrated in
the next section. Situations where these analyses would be useful include examination of whether
the structures of counterproductive and citizenship behaviors change over time, or, even more
fitting for multiway methods, research on whether different groups of employees display
different patterns of change in these structures. Multiway models are a natural way of addressing
such questions, but may prove difficult to implement with experience sampling data. For
instance, in the data presented here, the measurement occasions occurred randomly within two-
hour windows. Therefore, Participant 1’s time 1 is not strictly simultaneous with Participant 2’s,
but should be basically equivalent. One goal of this paper is to serve as a methodological
Multiway Analyses
When viewed in the way specified by Ghiselli (1956), the performance domain is defined
by a three-way data relation box (Cattell, 1952). This box is defined by its three modes:
individuals, variables, and occasions. This box is referred to as an array, a generalization of the
matrix concept (see Figure 1). In Figure 1, subjects, variables, and occasions are all both modes
and ways. The front side of the array in Figure 1, the subjects by variables “slice” is the matrix
experience sampling study would have such a matrix. Stacking these slices as in Figure 1 leads
to a three-mode array. We next consider two broad classes of multiway component analysis that
will be considered in this paper, the Parafac model and the Tucker3 model. These are both
components analysis models, however both were initially referred to as factor analysis (e.g.,
Tucker, 1966). In this paper, we refer to components, except when discussing the historical
Cattell (1944) presented a solution to the problem of rotational freedom in standard factor
analysis, which he called the principle of parallel proportional profiles. Cattell’s idea was to
measure the same individuals on the same variables across two systematically different
occasions, say before and after an experimental manipulation. If one would recover the same
factors at both occasions then this common orientation of the factors or these parallel proportion
profiles could, according to Cattell, only occur if the factors represent real psychological
constructs.
The Parafac model (Harshman, 1970) is essentially both a model formulation and
proportional profile principle could be extended to the n-way case. The Parafac model is
(Carroll & Chang, 1970). In this paper, we will refer only to the former name. The Parafac model
X k = AD k B' + E k (1)
where Xk is the kth frontal slice of the data array X (matrices are boldfaced, and multi-way arrays
boldfaced and underlined). A is the coefficient matrix for the first mode, B' is the transpose of the
coefficient matrix of the second mode, Dk is a diagonal matrix containing the kth row of the third
mode coefficient matrix along its diagonal, i.e. dssk = cks. Ek is the kth frontal slice of residual array
which shows that for the kth slice of the data array the dssk-coefficients provide weights for the
coefficients of the other two modes. They are the “proportionality” constants that capture the
systematic variation between slices of the data array (cf. Harshman & Lundy, 1984a). A dssk (csk)
coefficient defines the relative importance of component s to occasion k. It should be noted that
system as a whole and object variation which pertains to variation of specific (groups of) objects,
i.e. part of the system but not the system as a whole. For the Parafac model to be applicable
sufficient system variation should be present in the multiway data. System variation involves a
conceptualization of components that lie “in the system” under examination such that the same
instances of the components affect all of the objects; there should be parallel proportional
instance of the component “in” each object under study, such that the components are properties
of (groups of) individuals under study. The system model implies synchronous variation in
component influences across levels of the third mode (e.g., occasions), while the object model
does not imply such synchronous variation (cf. Harshman & Lundy, 1984a, pp. 130-133). The
basic Parafac model has no provision for modeling object variation. Therefore, if substantial
object variation is present, a more complicated model such as the Tucker3 should be used.
Tucker (1964, 1966) formally introduced the three-mode factor analytic model for three-
way data as an extension to the well-known two-mode model for two-way data. This model was
later named after him by Kroonenberg and de Leeuw (1980). The Tucker family of models can
account for both system and object variability. Of these models the Tucker3 is the most complex
and it can also be shown to be more complex than the Parafac model. The sum notation for the
model is:
Here, aip represents the coefficient of the ith individual on the pth A-mode factor, while. bjq
represents the coefficient of the jth variable on the qth variable factor, and the ckr represents the
coefficient of the kth time point on the rth time factor. Finally gpqr represent the element of the core
array which links the pth, qth, and rth factors in the A-, B-, and C-modes, respectively, with each
other. It can also be interpreted as a representation of the interaction between the factors of the
three modes (for further details see Kroonenberg, 2008, Section 4.8).
Thus the columns of the coefficient matrices A, B, and C are the components which may
be represented as vectors in graphs displaying the component spaces. For convenience, we often
speak of component scores for subjects, component loadings for variables and component scores
or weights for time points. Only the variable loadings can however be understood as in the two-
way case, i.e. as variable-component correlations, if the variables have been centered and
normalized in a specific way (see below). In most cases, the interpretation of the component
scores proceeds in the same way as component scores in two-way component analyses where the
components are in normalized coordinates, i.e. have unit sums of squares. Another way of
thinking about the component matrices is in terms of “idealized” instances of the real objects in
that mode; i.e. ideal individuals, ideal (or latent) variables, and ideal measurement occasions or
The core array is what allows Tucker models to account for object variability where the
Component 1 in the variables and time modes in a Parafac analysis, the same does not apply to a
Tucker3 model. The core allows three-way interactions between the components in different
modes and for different numbers of components in each mode. A number of ways of interpreting
the core are possible. One may view the core array as “idealized data”, such that each element
represents the score on the ideal variable by the ideal person at the ideal measurement occasion
corresponding to that core element (cf. Tucker, 1966). As the scalar notation makes clear
(Equation 3), one can also think of the core elements as regression weights for predicting the
original data using the three-way interaction of the components (Kroonenberg, 2008, p. 225-
228). Thus, core elements give the importance of a combination of mode components for
Research Questions
Inn, Hulin, and Tucker (1972) offer perhaps the only multivariate dynamic criterion study
using multiway analysis. They fit a Tucker3 model to data from a sample of 184 airline
reservation agents measured five times on 11 performance variables. These authors found a
three sources conjecture (1956). Given the dearth of previous research, this study is largely
exploratory and we propose few a priori hypotheses. The major expectation of this study is that
multiple components will be extracted in each mode of the data, consistent with Ghiselli (1956)
and Inn et al. (1972). Given previous work on affect cycles in experience sampling designs, it is
likely that at least one of the time components will show cyclic trends in loadings (Weiss et al.,
1999). These loadings are analogically similar to variable loadings on growth parameters in
latent growth curve analysis (e.g., Chan, 1998), though they are both exploratory in character and
expected to display cyclical rather than curvilinear patterns. Further, we expect that we should be
able to recover three B mode components for the objective, behavioral, and self-rated
performance variables discussed in the methods section below. We have no hypotheses regarding
individuals-mode components
fit well-behaved multiway models to experience sampling data. We expect to be able to do so,
and the only remaining question is whether the Parafac model will be sufficient. The Parafac
model is a constrained Tucker3 model, which itself can be seen a constrained two-way principal
components analysis (PCA) of the wide combination-mode matricization (e.g., I×JK) of the data
array X. Therefore, the sums of squares for each model (SSmod) must be:
provided all models have equivalent numbers of components (Kiers, 1991). Therefore, Tucker3
and PCA models will provide similar though better fits to the data than an equivalent Parafac
model. However, if the Parafac model is appropriate, the additional parameters of these models
will either model systematic variation redundantly or simply model noise (Smilde, Bro, &
Geladi, 2004, pp. 154 - 155). If a degenerate Parafac solution is obtained, it usually indicates the
METHOD
The experience sampling data analyzed here were originally collected by Miner et al. (in
press), who sampled one-hundred individuals from a pool of about 300 incumbents at a Fortune
500 technological sector call center. These individuals served in either customer service or
participants had sufficient criterion data for analyses. Those 55 participants were mostly white
(94 %) and non-students (85 %). Fifty-one percent were male, 78 % had at least some college
education, and 66 % worked 9 hour shifts. The mean age of participants was 34 years (sd = 11
years) and their mean tenure with the organization was 2.4 years (sd = 2.3 years).
Measures and procedures
Participants’ palmtop computers signaled them by beeping either four or five times a day,
depending on whether they work a five-day, 9 hour shift or a four-day, 11 hour shift, respectively.
Participants completed surveys each workday for three weeks, resulting in a total of sixty
possible measurement opportunities for each respondent. Participants were excluded from
analyses if they answered one-half (30) or fewer of the surveys. Participants tended to miss most
solved the customer’s problem, made three ratings of their own performance, and responded to
eight items sampling behaviors related to task performance, organizational citizenship behavior,
and work withdrawal. Participants were also asked if they were engaged in several work
behaviors, which were scored dichotomously. The behavioral items were aggregated into unit
weighted composites for focal performance (solving the problem, doing work tasks), being on
the job (at workstation, away but on the job), negative work behaviors (withdrawal, doing
personal tasks), positive citizenship behaviors (helping, doing organizational citizenship), and
neutral behaviors (at lunch, on break). In addition to the self-reported measures, performance
was also indicated by objective measures of average call handle time, average call wait time,
average call hold time in the thirty minute window in which the signal occurred.
differences measures in a post-study survey. These measures can be used to explore the meaning
of the individuals-mode components. These measures were the Trait Meta-Mood Scale (Salovey,
Mayer, Goldman, Turvey & Palfai, 1995) and the International Personality Item Pool
Extraversion and Neuroticism (IPIP; Goldberg, 2001). These measures are linked to dispositional
to affective events as well as to directly influence affective reactions to events. Trait meta-mood
regulate their feelings. IPIP extraversion is primarily an index of sociability and gregariousness,
while neuroticism tends to focus on anxiety/stress reactance. Participants also completed the Job
Descriptive Index, a measure of job satisfaction (Smith, Kendall, & Hulin, 1969). Affective
Events Theory predicts that job satisfaction should be more related to judgment-driven behaviors
than to impulsive behaviors. Additionally, job satisfaction has an empirical association with job
job satisfaction. Finally, self-reports of average work and job withdrawal and citizenship
Participants tended to miss many surveys in the last week, resulting in substantial missing
data after the fortieth measurement occasion. Therefore, the data we analyzed consisted of 55
participants by 11 variables by 40 measurement occasions (24200 data points). About 17295 data
points contained valid entries, and about 6905 did not (28.5% missing). Missing entries were
handled by imputing model-implied values. Missing values were initialized with two-way
MANOVA estimates. The information in a three-way array is quite rich, and with an array of
sufficient size, a substantial portion of missing values can be handled relatively reliably but often
requires special care in defining starting values for the missing-data locations (Smilde, Bro, &
Geladi, 2004). Though this dataset is not large enough to allow for cross-validation, the stability
RESULTS
Both Parafac and Tucker3 models were fit to the data. Analyses were performed using the
3WayPack suite of programs (Kroonenberg, 1994; 2004). Multiway models were compared
based on variance accounted for (VAF) measures. The VAF of the model was divided by number
of factors estimated, in order to compare the model’s explanatory power to its parsimony
(Timmerman & Kiers, 2000). Similar indices were discussed in Kroonenberg and Oort (2003;
Kroonenberg, 2008, pp. 179ff.). These are not formal fit indices, but aid the researcher’s
Component models assume that variables are ratio-scaled ones (Harshman & Lundy,
1984a). However, psychological measurements mostly have interval properties. Centering will
change the data into ratio-scale deviations from the mean so that they can be handled in a
component analysis. Harshman and Lundy (1984b) and Kroonenberg (2008, Chapter 6) discuss
appropriate types of centering for three-way data, and centering across subjects per variable-time
Our data were accordingly centered across the subject mode, such that the mean for each
variable at each measurement occasion was subtracted out of each measurement. This is the most
appropriate centering strategy because the starting point of the study is arbitrary. An
accompanying advantage of this centering is that the matrix of variable-occasion means can be
seen as the score of the average person so that the deviation scores can be interpreted with
respect to this average person. Additionally, the three objective measures (call handle time, call
wait time, and call hold time) were corrected for non-normality by taking their natural
logarithms. Finally, the variables were size-normalized over all subjects-time point combinations
so that the values of the variables were comparable as standard scores across variables and
remained comparable per variable across all occasions. To be precise, the scores analyzed with
where a dot replaces the index over which the sum was taken. A final advantage of this form of
standardization is that given proper scaling of the output (see below) the component loadings for
the variables can be interpreted as variable-component correlations (Harshman & Lundy, 1984a).
Descriptive statistics
Presenting even descriptive statistics for the entire data array would be cumbersome.
Thus descriptive statistics for the A-mode matricization are presented in Table 1. Matricization is
the unfolding of the three-mode data array into a two-way data matrix, extended along one of the
modes of data classification. Given an i×j×k data array, the A-mode matricization is of order ik×j
(cf. Kiers, 2000), where, the subject mode (k) is the slower-running mode such that the rows of
the matrix consist of the first participant’s 40 measurement occasions followed by the second
participant’s 40 measurement occasions, and so on. Table 1 presents the means, standard
deviations, and correlations for the variables in the A-mode matricization. This correlation matrix
Prior to three-way analysis, we explored the mean time trends for the variables, i.e. trends
for means which were removed before the three-way analyses. Figures 2 through 4 present the
time trends for seemingly similar variables. Figure 2 displays the self-rated variables: average
handle time, average quality of service, and overall service. These variables display very similar
time trends. Figure 3 shows the trends for the behavioral variables: the focal performance
composite, withdrawal behaviors, citizenship behaviors, neutral behaviors, and location. Again,
these display highly similar trends. The objective measures, average handle time, average wait
time, and average call duration, do not show such a clear pattern.
analysis of the time series for these variables averaged over subjects. Two components were
recovered: the self-rated variables loaded strongly on one, with location, focal performance,
withdrawal and neutral behaviors loading strongly on the other. Log-average handle time and
citizenship performance also showed reasonable loadings on this component. The call duration
and call wait time variables did not load strongly on either component.
Three-way analysis
Based on several initial solutions, we decided to eliminate one participant from the three-
way analyses. This participant had much too high an influence on the solutions. After this, the
preferred models are: 2×2×1, 3×2×2, 4×2×2, and 4×2×4 Tucker models and a two-component
Parafac model. The two-component Parafac model was a rotation of the 2×2×2 Tucker model
Parafac analyses. We fit a two-component Parafac model using multiple random starts.
Three of these starts produced absolute values of Tucker’s congruence coefficients greater than .
85. The components in the two-component solutions are highly negatively congruent, with an
average congruence coefficient of -0.84 and an average core consistency of -25.9. These results
indicate that the higher component Parafac solutions are degenerate. There are several options
for dealing with degenerate solutions. One is to impose orthogonality on the factors in one mode.
Another option is to attempt to fit a more general three-way model such as a Tucker3.
Tucker3 analyses. Figure 5 displays the time components from the 3×2×2 Tucker model
solution with loess curves fitted to them with 70% of the data used for the local regressions
(Cleveland & Devlin, 1988). The first component has been rotated to optimal constancy. The
second component is orthogonal to the first one. There is no clear periodicity (i.e., performance
cycles) in these time components. However, to uncover such periodicity would require that the
measurement occasions were spaced evenly and truly taken at the same time for all participants.
We constructed joint biplots to investigate the relationships between individuals and variables for
the two time components (e.g., Kroonenberg, 2008 Appendix B). Variables are displayed as
vectors in the two-dimensional space, and individuals projecting highly on a variable vector
scored highly on it; individuals plotted near each other have similar profiles.
coordinates presented in Table 4). Since time component 1 is largely constant, this biplot
represents individual differences in how the participants carried out their job duties over time.
For instance, we can see how participants 19, 20, 23, 46, and 50 had high scores on call wait
time, but low scores on call duration and handle time. On the other hand, participants 25, 40, and
48 scored highly on call duration and handle time but low on call wait time. Similarly,
participants 20, 23, and 27 had high scores on citizenship while participants 3 and 9 had very low
scores on citizenship. The objective call time measures and citizenship are the most important
Figure 7 displays the biplot for the second time component (variable coordinates are also
presented in Table 4). This component showed much more pronounced fluctuations than
component 1 and an increasing trend towards the end of the study. This suggests that the
importance of the relationships displayed in Figure 7 became more important over the course of
the study. This increase of importance is especially due to handling time and citizenship, but also
withdrawal, focal performance, focal location, call duration, and quality of service. Participants
with positive scores, such as 30, show increasing trends over time. Those with negative scores,
components with individual difference measures (e.g., Van Mechelen & Kiers, 1999). The first
component correlated positively with meta-mood clarity and negatively with overall job
withdrawal (r = .30, p = .04 and r = -.32, p = .03, respectively). The second component
correlated with meta-mood repair positively (r =.29, p = .05) and marginally positively with
overall citizenship performance (r =.27, p = .07) and marginally negatively with overall work
withdrawal (r = -.25, p = .10). The third component correlated with overall citizenship
performance (r =.38, p = .01) and marginally negatively with satisfaction with supervisor (-.23, p
= .10). The first component has perhaps the clearest interpretation, as these individuals report
great awareness of their moods and high overall job withdrawal: these employees are likely very
in tune with their emotions and unlikely to quit. They are probably having a relatively positive
work experience, relative to the average employee. The second component also has a reasonably
clear interpretation. Given that individuals are able to intentionally repair their moods and report
high levels of citizenship performance and low work withdrawal, these are likely individuals
who can deal with negative emotions effectively and remain engaged with their work. The third
component may again reflect some form of engagement. Employees scoring high on this
The variables-mode component loadings are displayed in Table 6. The two components
cleanly separate the objective and behavioral measures, and the self-reported measures do not
load substantially on either component. With the objective measures, call duration and handle
time have the opposite sign as call wait time. Citizenship performance has the highest loading for
the behavioral measures, with focal performance, location, and withdrawal all loading between .
31 and .34. Interestingly, all of these loadings, even withdrawal, are positive. It is also interesting
that the more direct self-reported measures of performance do not systematically load on these
components.
The Tucker core is shown in Table 7. The combination of the first components in all
modes has the largest core element, and accounts for the most variance. The most substantial
elements following this are the combination of the second subject and the behavioral variables
component with each of time components (g221 and g222, respectively), accounting for 3% and 2%
DISCUSSION
The Tucker3 model provides a parsimonious explanation of the observed data, even
though the overall variance explained for all of the models is rather low. Experience sampling
data prove difficult to analyze for a number of reasons. The procedures are intrusive, which can
lead to large amounts of missing data when participants are too busy or simply choose not to
answer questions. Furthermore, individuals were studied over a relatively short time and were
measured at points very close together in time. Perhaps if measurement occasions were sampled
at fully equivalent time-points across individuals, or if the study took place over organizationally
meaningful epochs, such as a new product rollout or layoff announcement, the Parafac model
This theme bears some greater consideration. Often with longitudinal and experience
sampling studies, in particular, researchers wade into the stream of events with no real care as to
when they do so. Put simply, time one often is not really time one, but an arbitrary starting point
for the study. Likewise, the studies often end at an equally arbitrary point in time. Multiway
thinking can help encourage researchers to treat time as an important facet in the design of
This study attempts to fit a dynamic structural model to data derived from an experience
sampling design. ESM studies often come in one of two types: event-triggered and signal
triggered. In event-triggered studies, participants opt to answer surveys when a particular event
or type of event occurs. In signal-triggered studies, individuals respond to surveys when signaled
by their beeper or palmtop computer. When the ESM study is event-triggered, measurement
occasions can only be viewed as nested within individuals. In signal-triggered designs, signals
will be random within time windows. Our results indicate that these measurement occasions may
This paper demonstrates that dynamic and structural investigations may be conducted
with signal-triggered experience sampling data by using methods such as multiway analysis.
Using tools such as multilevel modeling is also useful, but might sacrifice information that could
be examined using techniques like multiway analysis. Furthermore, had the data been collected
with the intent of examining structure and dynamics, the performance variables would likely
have been more systematically sampled from the repertoire of performance behaviors. This
would probably allow for stronger substantive conclusions regarding the multivariate dynamic
criterion space. As it stands, it is very interesting that the self-reported performance variables did
not load substantially on either of the variables-mode components. Given that the components
reflect behavioral activities and objective measures of performance, this suggests that the self-
reported measures may not accurately represent what a worker is actually doing.
There are three major limitations to this study. The first is that the data were not collected
for the purpose of a structural analysis. Therefore, the amount of shared variance among the
indicator variables may not adequately cover the latent job-performance variables. While this
does not preclude the analyses described in this article, data collected with such analyses in mind
would undoubtedly provide cleaner results. Second, there was a substantial amount of missing
data. However only participant 16 appeared to show undue influence on the model solutions, and
this participant was removed from the analyses reported here. Third, the sample involved only
two groups of employees from one organization, limiting the external validity of the results.
Further study of the job performance domain and its three sources of variance should be
examined across a sample from the population of jobs. This requires collecting experience
sampling and other longitudinal data in many organizations with many different types of
employees. One possibility is applying these techniques to large samples of jobs that are
appraised on multiple performance indicators at fixed intervals, such as in the military. Such
formal performance evaluations at multiple points in time would address questions about the
However, without examining other studies of performance and its determinants, the
underlying dynamics in studies with longer measurement epochs will never be fully understood
(cf. Beal et al., 2005). Data to answer these sorts of questions should be collected using multiple
methods and examined using multiple analytic procedures. Such a strategy will allow for a
scientific understanding of the dynamic interaction of individual and workplace attributes in the
Conclusion
The major performance theories in applied psychology indicate that the criterion space is
a multidimensional one. However, these theories are static; they are largely agnostic as to the
dynamic nature of performance and its determinants. It is well-established that the relationship
degrades over time. This situation suggests the possibility of dynamism on both sides of the
performance prediction model. In order to understand the dynamic relationship between two
variables, we must first understand of the processes underlying each of the variables.
The results of the current study provide further evidence that organizational scientists
must attend to dynamic processes. Further, these results echo the exhortations of Ghiselli (1956)
and Dalal and Hulin (2008) to more broadly consider the criterion space when designing
personnel interventions. It is important that these between- and within-person sources of lawful
REFERENCES
Alvarez, K.M., & Hulin, C.L. (1972). Two explanations of temporal changes in ability-skill
relationships: A literature review and theoretical analysis. Human Factors, 14, 295-308.
Austin, J.T., Humphreys, L.G., & Hulin, C.L. (1989). Another view of dynamic criteria: A
critical reanalysis of Barrett, Caldwell, and Alexander. Personnel Psychology, 42, 583-
596.
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied
Barrett, G.V., Caldwell, M.S., & Alexander, R.A. (1985). The concept of dynamic criteria: A
Beal, D.J., Weiss, H.M., Barros, E., & MacDermid, S.M. (2005). An episodic process model of
Borman, W.C. (1991). Job behavior, performance, and effectiveness. In M. Dunnette & L. Hough
Borman, W. C., & Motowidlo, S. J. (1997). Task performance and contextual performance: The
Campbell, J. P. (1994). Alternative models of job performance and their implications for
Campbell, J. P., McHenry, J.J., & Wise, L. L. (1990). Modeling job performance in a population
Carroll, J.D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional
35(3), 283-319.
Cattell, R. B. (1944). “Parallel proportional profiles” and other principles for determining the
Cattell, R. B (1952). The three basic factor analytic research designs: Their interrelations and
Chan, D. (1998). The conceptualization and analysis of change over time: A integrative approach
multiple indicator latent growth model (MLGM). Organizational Research Methods, 14,
421-483.
Cleveland, W.S., & Devlin, S.J. (1988). Locally weighted regression: an approach to regression
analysis by local fitting. Journal of the American Statistical Association, 83, 596-610.
Csikszentmihalyi, M. & Larson, R. W. (1987). Validity and reliability of the experience sampling
Dalal, R. S., & Hulin, C. L. (2008). Motivation in organizations: Criteria and dynamics. In R.
Kanfer, G. Chen, and R. Pritchard (Eds.), Work Motivation: Past, Present, and Future.
Dalal, R. S., Lam, H., Weiss, H.M., Welch, E.R., & Hulin, C.L. (2009). A within-person
counterproductivity associations, and dynamic relationships with affect and overall job
Dunnette, M. D. (1963). A note on the criterion. Journal of Applied Psychology, 47, 251-254.
Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Applied Psychology, 40, 1-4.
Goldberg, L. R. (2001, July 24). International Personality Item Pool: A scientific collaboratory
Harshman, R. A. (1970). Foundations of the Parafac procedure: Models and conditions for an
“explanatory” multimodal factor analysis. UCLA Working Papers in Linguistics, 16, 1-84.
Harshman, R. A., & Lundy, M. E. (1984a). The PARAFAC model for three-way factor analysis
(Eds.) Research methods for multimode analysis (pp.122-215). New York: Preager
Publishers.
Harshman, R. A., & Lundy, M. E. (1984b). Data preprocessing and the extended PARAFAC
methods for multimode analysis (pp. 216-284). New York: Preager Publishers.
Hulin, C. L., Henry, R. A., & Noon, S. L. (1990). Adding a dimension: Time as a factor in the
Humphreys, L.G. (1976). The phenomena are ubiquitous – but the investigator must look.
Inn, A., Hulin, C. L., & Tucker, L. R. (1972). Three sources of criterion variance: Static
Keil, C. T., & Cortina, J. M. (2001). Degradation of validity over time: A test and extension of
449-470.
Kroonenberg, P.M. (1994). The TUCKALS line: A suite of programs for the analysis of three-
Kroonenberg, P.M. (2008). Applied multiway data analysis. Wiley: Hoboken, NJ.
Kroonenberg, P.M., & De Leeuw, J. (1980). Principal component analysis of three-mode data by
Kroonenberg, P.M., & Oort, F.J. (2003). Three-mode analysis of multimode covariance matrices.
Larson, R., & Csikszentmihalyi, M. (1983). The experience sampling method. H.T. Reis (Ed.)
Miner, A. G., Glomb, T., & Hulin, C.L. (2005). Experience sampling mood and its correlates at
Miner, A. G., Glomb, T., & Hulin, C.L. (In press). Experience sampling events, moods,
Decision Processes.
Multivariate Dynamic Criteria 3
Motowidlo, S.J., & Van Scotter, J.R. (1994). Evidence that task performance should be
Murphy, K.R. (1989). Is the relationship between cognitive ability and job performance stable
Murphy, K.R., & Shiarella, A.H. (1997). Implications of the multidimensional nature of job
performance for the validity of selection tests: Multivariate frameworks for studying test
Salovey, P., Mayer, J.D., Goldman, S.L., Turvey, C., & Palfai, T.P. (1995). Emotional attention,
clarity, and repair: Exploring emotional intelligence using the Trait Meta-Mood Scale. In
J.W. Pennebaker (Ed.), Emotion, disclosure, and health (p.125 -154). Washington, DC:
APA.
Smilde, A., Bro, R., & Geladi, P. (2004). Multiway analysis: Applications in the chemical
Smith, P. C., Kendall, L., & Hulin, C. L. (1969). The measurement of satisfaction in work and
Thoresen, C.J., Bradely, J.C., Bliese, P. D., & Thoresen, J.D. (2004). The big five personality
traits and individual job performance growth trajectories in maintenance and transitional
Timmerman, M.E., & Kiers, H.A.L. (2000). Three-mode principal components analysis:
Choosing the number of components and sensitivity to local optima. British Journal of
31, 279-311.
Weiss, H. M., & Cropanzano, R. (1996). Affective events theory: A theoretical discussion of the
Weiss, H. M., Nicholas, J. P., & Daus, C. S. (1999). An examination of the joint effects of
affective experiences and job satisfaction and variations in affective experiences over
Van Mechelen, I., & Kiers, H.A.L. (1999). Individual differences in anxiety response to stressful
409-428.
Zyphur, M.J., Bradley, J.C., Landis, R.S., & Thoresen, C.J. (2008). The effects of cognitive
ability and conscientiousness on performance over time: A censored latent growth model.
Table 1. Descriptive statistics for the A-mode Matricization of ESM Data Array
Mean SD 1 2 3 4 5 7 8 9 11 10 6
Rated Average
1 Handle Time 1.91 0.74 1.00
Rated Quality of
2 Service 2.05 0.54 0.39 1.00
3 Rated Overall Service 1.99 0.57 0.50 0.62 1.00
4 Focal Performance 0.52 0.39 0.05 0.04 0.07 1.00
5 Focal Location 0.72 0.45 0.01 -0.01 0.00 0.69 1.00
7 Withdrawal 0.51 0.30 0.02 0.01 0.00 0.53 0.72 1.00
8 Neutral Behavior 0.13 0.34 -0.01 0.01 0.00 -0.35 -0.64 -0.23 1.00
9 Call Duration 6.26 0.89 -0.02 0.06 -0.05 0.09 0.02 -0.04 0.02 1.00
11 Call Handle Time 4.85 1.30 -0.10 0.06 -0.04 0.06 -0.03 -0.01 0.07 0.40 1.00
10 Call Wait Time 2.36 2.02 -0.01 -0.11 -0.07 0.07 0.09 0.08 -0.10 -0.26 -0.05 1.00
6 Citizenship 0.18 0.31 -0.05 0.01 -0.01 0.07 0.12 0.04 -0.12 0.01 0.11 0.11 1.00
Component
2 1
Rated Overall Service 0.865
Rated Average Quality 0.846
Rated Avg Handle Time 0.638
Table 7. Tucker core and variance explained for each 3-way combination of components.
Time Component 1
Variance Explained
Objective Behavioral
Subj 1 1.437 0.216 0.188 0.004
Subj 2 -0.002 0.589 0 0.031
Subj 3 -0.122 0.391 0.001 0.014
Time Component 2
Behaviora
Objective l
Subj 1 0.137 -0.173 0.002 0.003
Subj 2 -0.347 0.442 0.011 0.018
Subj 3 0.286 -0.297 0.007 0.008
Figure 2. Centered average time series for self-rated customer service variables.
Correlations among time series: 0.3- 0.6.
Self-rated handle time
Self-rated quality of
service
Self-rated overall
performance
0.20
0.10
Centered Average Rating
0.00
-0.10
-0.20
-0.30
Participants
Multivariate Dynamic Criteria 4
Figure 3. Centered average time series for behavioral variables. Correlations between .6
and .9, except with Citizenship which range from .2 - .4.
Focal performance
Focal location
Withdrawal
0.20 Citizenship
Neutral behavior
Centered behavior scores
0.10
0.00
-0.10
-0.20
Participants
Multivariate Dynamic Criteria 4
Figure 4. Uncentered average time series for logged call time variables (objective
performance). Correlations between -.0 - .3.
Call duration
7.00 Handle time
Call wait time
6.00
5.00
Log averages
4.00
3.00
2.00
1.00
Participants
Note. The trends for these variables are rather dissimilar. Therefore they are displayed
uncentered so as to show their unique trends more clearly.
Multivariate Dynamic Criteria 4
Figure 5. Time component loading of the 3×2×2 Tucker3-solution after rotation of the
first component to optimal constantness.
Code for
0 .1 5 component
1
2
1
2
2
1
0.10
Component loadings
0.05
0.00
-0.05
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 4 2
M e a s u r e m e n t t im e s
Multivariate Dynamic Criteria 4
Figure 6. Joint biplot for participants and variables for the first time
component.
Second Component
0.6
Behavioral
Self-ratings Variables
0.4 Citizenship 23
27
Call 4 15 20
47
0.2 HandleTime
3634
25 53 Call
33
22
3924 42
32
54 30 831
38 29
12 WaitTime
0 48 40 287
2126 6
18
11
14
4944 17 19
Call 5 13 1 Neutral 50 46
10 52
Duration 2
-0.2 37 43 45 35
41 51
16
3
-0.4
9
First Component
Figure 7. Joint biplot for participants and variables for the second time
component.
0.2
Second Component