Professional Documents
Culture Documents
net/publication/220590985
CITATIONS READS
123 507
5 authors, including:
T. S. Ragu-Nathan
University of Toledo
94 PUBLICATIONS 5,961 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Xiaodong Deng on 08 January 2016.
Journal of Management Information Systems / Summer 2004, Vol. 21, No. 1, pp. 227–262.
© 2004 M.E. Sharpe, Inc.
0742–1222 / 2004 $9.50 + 0.00.
228 DOLL ET AL.
Planning, and others. His current research interests include the impact of informa-
tion technology, measuring e-commerce success, computer self-efficacy, and infor-
mation systems security. He holds a Ph.D. in Operations Research from the University
of Lancaster, England, and is a member of the Institute for Operations Research and
the Management Science, Association for Information Systems, and Decision Sci-
ences Institute.
KEY WORDS AND PHRASES: confirmatory factor analysis, end-user computing satisfac-
tion, factorial invariance, instrument validation, research methods, user satisfaction.
These instruments use different items and measure different aspects of user satisfac-
tion, implying that the meaning and measurement of user satisfaction varies between
population subgroups.
Zmud et al. [90] question the robustness of user satisfaction in general and the end-
user computing satisfaction (EUCS) instrument [28] in particular. They argue that
user satisfaction is context sensitive. That is, its scaling or meaning may be influ-
enced by situational factors (e.g., conditions of measurement or population subgroups).
Other contextual uses of user satisfaction in the literature include small organizations
[74], user developed applications [77], computer simulation [65], CASE tool soft-
ware [53], and decision support systems [66]. In a review and critique of user satis-
faction instruments, Klenke [55] specifically calls for multigroup invariance studies
of EUCS to assess its measurement equivalence.
Originally developed by Doll and Torkzadeh [28], as shown in Figure 1, the EUCS
construct is defined as a second-order latent factor consisting of five first-order latent
factors (i.e., information content, format, accuracy, ease of use, and timeliness). The
five first-order latent factors and their structural weights define the meaning of this
second-order EUCS construct. The EUCS instrument [28] has been widely used [21,
27, 33, 34, 37, 39, 47, 48, 64, 65] and cross-validated [31, 38, 46, 66, 67, 85] to
measure a user’s satisfaction with a specific application. Gelderman [38] finds that
EUCS is a good predictor of an application’s impact on organizational performance
and, thus, a useful surrogate for system success.
Issues related to the meaning and measurement of user satisfaction have important
implications for both researchers and practitioners. When designing studies involv-
ing user satisfaction, researchers need to know whether they should use specific user
satisfaction instruments for each population subgroup, or they should use a standard-
ized instrument to make comparisons across the various population subgroups present
in their research. When applying the instrument, practitioners need to know whether
they can compare user satisfaction scores across diverse subpopulations. These are
questions about measurement equivalence (robustness) of an instrument across popu-
lation subgroups.
Despite the wide use of the EUCS instrument, its measurement equivalence across
different population subgroups has not been tested. This paper uses multigroup in-
variance analysis to answer two research questions: (1) Do the items used to measure
the five first-order factors of EUCS have equivalent item-factor loadings across popu-
lation subgroups? (2) Are the structural weights of the five first-order factors on the
second-order EUCS factor equivalent across population subgroups? Based on respon-
dent positions, application types, hardware platforms, and development modes, four
categories of subgroups are defined. This paper tests measurement equivalence of the
EUCS instrument across these four categories of subgroups.
essential for constructs such as EUCS that are designed to evaluate system success
across a variety of contexts and population subgroups. The EUCS instrument (see
Figure 1) was originally designed to be generally applicable to a variety of respondent
positions, application types, hardware platforms, and development modes [28]. These
dimensions define the instrument’s originally intended universe of applicability [75].
Figure 1 depicts EUCS as a single second-order latent construct with five first-
order latent factors (i.e., content, accuracy, format, timeliness, and ease of use). Con-
firmatory studies have repeatedly validated this hypothesized second-order
measurement model [29, 31, 53, 67]. Several studies of the instrument’s test–retest
reliability have reported good stability and reliability, as indicated by Cronbach’s
alpha values above 0.90 [46, 66, 85]. Since this second-order measurement model
has been supported and recommended by previous studies that developed and tested
the EUCS instrument, we have chosen the second-order measurement model to test
the invariance of the instrument across population subgroups [18, 19].
In Figure 1, the five arrows leading from end-user computing satisfaction to the five
first-order latent factors depict the structural weights. Structural weights can be viewed
as regression coefficients in the regression of the first-order factors on the higher-
order factor. These structural weights are significant for our understanding of the
nature of the user satisfaction construct itself, as well as the centrality or importance
of each component (content, accuracy, format, ease of use, or timeliness) to overall
user satisfaction. These structural weights indicate the centrality or importance as-
signed to the first-order factors in scaling the second-order factor [62]. In other words,
the weights indicate how central each first-order factor is to the second-order EUCS
factor. The structural weights can be used to derive the overall second-order EUCS
score from the weighted average of the first-order factor scores. In different contexts
or subpopulations, the first-order factors might be weighted differently, suggesting
that end-user computing satisfaction has different meanings across subgroups.
In Figure 1, the 12 arrows leading from the first-order latent factors to the measure-
ment items are the item-factor loadings. Item-factor loadings can be viewed as regres-
sion coefficients in the regression of observed variables on their corresponding latent
factor. These item-factor loadings indicate the extent to which the item captures the
trait of its latent factor. In different contexts or subpopulations, these item-factor load-
ings may be different, suggesting that the first-order factors (content, accuracy, for-
mat, timeliness, and ease of use) may have different meanings across subgroups. The
descriptions of the 12 measurement items are depicted at the bottom of Figure 1.
Assessing the robustness or measurement equivalence of second-order measure-
ment models like the EUCS instrument requires two conditions [18, 19]. First, the
items that measure the first-order factors must have equivalent item-factor loadings
across subgroups or conditions of measurement. For example, the factor loadings of
the four items measuring the first-order factor “Content” should be equivalent across
subgroups in order for the “Content” scores across the subgroups to be comparable.
Second, the structural weights of the first-order factors must also be equivalent across
population subgroups. In this paper, we test the equivalence of the item-factor load-
ings and the structural weights of the first-order factors across subgroups based on
232 DOLL ET AL.
H1-POS: The 12 items of the EUCS instrument have equivalent item-factor load-
ings on their corresponding first-order latent factors across respondent position
categories (i.e., managerial, professional, operating).
H2-POS: The five first-order latent factors have equivalent structural weights on
the second-order EUCS factor across respondent position categories (i.e., mana-
gerial, professional, and operating).
H2-PLAT: The five first-order latent factors have equivalent structural weights
on the second-order EUCS factor across hardware platforms (i.e., mainframe/
mini or personal computer applications).
uct dimensions suggest the need to test the following null hypotheses concerning the
effect of development modes on the meaning and measurement of user satisfaction:
H1-MODE: The 12 items of the EUCS instrument have equivalent item-factor
loadings on their corresponding first-order latent factors across modes of devel-
opment (i.e., analyst developed, end-user developed, and other end-user devel-
oped applications).
H2-MODE: The five first-order latent factors have equivalent structural weights
on the second-order EUCS factor across modes of development (i.e., analyst
developed, end-user developed, and other end-user developed applications).
Research Methods
CONFIRMATORY FACTOR ANALYSIS (CFA) PERMITS more rigorous tests of the equality
or invariance of measurement parameters (e.g., item-factor loadings or structural
weights) across groups than are possible with exploratory factor analysis [19, 51, 58].
In testing the measurement equivalence of the EUCS instrument across population
subgroups, two sets of parameters are of special interest. First, we are interested in the
equivalence of item-factor loadings for the 12 items. Second, we are interested in the
equivalence of the structural weights of the five first-order factors on the second-
order EUCS factor. Testing for invariance (i.e., equivalence) is a particularly demand-
ing test of robustness. In some cases, minor differences may not be critical to the
interpretation of research results [20].
Invariance analysis enables one to explicitly test the structure of a second-order
measurement model or its individual parameters for equivalence across subgroups or
conditions [16, 51]. Our invariance analysis is conducted using LISREL VIII [52].
We are following an established modeling approach that has been developed and
used by a number of studies in various disciplines (psychology and education) to test
invariance of second-order measurement models. Examples of instruments with sec-
ond-order measurement models whose factorial invariance hypotheses have been tested
using LISREL methods include: the masculinity/femininity instrument [59], a de-
pression instrument [18], and a self-concept instrument [62].
The Measures
As the purpose of this study is to confirm the EUCS instrument and assess it measure-
ment equivalence across population subgroups, we use the identical items, scales, and
sampling methods used by Doll and Torkzadeh [28] in their development of the EUCS
instrument. The 12 items are illustrated in the legend of Figure 1. The order of the 12
items is randomized in the questionnaire. A five-point scale is used: 1 = almost never;
2 = some of the time; 3 = about half of the time; 4 = most of the time; and 5 = almost
always. We also use the identical demographics questions regarding positions of the
respondents, types of application, modes of development, and hardware platforms.
238 DOLL ET AL.
Because the identical instrument and data collection methods are used, we do not
conduct separate pretests and pilot tests for this study.
The respondents are asked to identify their position within the overall organization
by checking only one of the following responses: top level management, middle level
management, first level supervisor, professional employee without supervisory re-
sponsibility, and other (e.g., operating personnel). Since theory suggests three catego-
ries of users with different needs for evaluating information [91], these five categories
are recoded into three categories—managerial, professional, and operating.
Two yes/no questions are asked to determine the type of application [1] used by the
respondent. If the respondent checks “yes” to—“Does this application provide data
analysis capabilities (spreadsheet, modeling, simulation, optimization or statistical
routines to support managerial decision making?”—the application is categorized as
decision support. If the respondent checks “no” to this question, but checks “yes”
to—“Does this application provide a database with flexible inquiry capabilities (e.g.,
managers can design and change their own monitoring and exception reports)?”—the
application is categorized as database. If the respondent checks “no” to both ques-
tions, the application is categorized as transaction processing.
A yes/no question is used to categorize the hardware platform—“Is this a personal
computer (micro) application?” “No” responses are categorized as mainframe/mini.
To determine the mode of development, a nested set of two questions are asked. The
respondent is first asked a yes/no question—“Was this application primarily developed
by an end user?” If the answer was “yes,” the respondent is then asked a second (nested)
question—“Did you personally develop this application?” Applications with a “no”
response to the first question are categorized as being developed by a professional
systems analyst. If the respondent checked “yes” to the first question and “yes” to the
second question, the application is categorized as a personally developed application.
If the respondent checked “yes” to the first question but “no” to the second question,
the application’s mode of development is categorized as “other end user.”
The Sample
The data used to testing the hypotheses are collected using surveys of end users in
over 60 firms, half of the firms we initially contacted. Typically, the directors of MISs
are asked to identify their major applications and the users who directly interact with
each application. Major applications are defined as those that are of operational or
strategic importance to the firm. In small firms, the director could easily identify the
users to be included in the sample. In large firms, the director would ask the managers
responsible for systems development and maintenance to identify the major users.
Although some individuals use several applications, they are only asked to respond
with respect to one application specified by the director.
Using a sampling plan developed by this method, questionnaires were distributed
to end users through interoffice mail with a cover letter describing the survey as a
university-based research project. There were 1,386 responses obtained. About half
of the responses came from manufacturing firms, with the remainder being about
THE MEANING AND MEASUREMENT OF USER SATISFACTION 239
equally distributed between retail, government agencies, utilities, hospitals, and edu-
cational institutions.
The sample represents over 300 different applications including accounts payable,
accounts receivable, budgeting, CAD/CAM, customer service, service dispatching,
engineering analysis, process control, work order control, general ledger, manpower
planning, financial planning, inventory, order entry, payroll, personnel, production
planning, purchasing, quality, sales analysis and forecasting, student data, and profit
planning. This was a convenience sample, but the large number of organizations and
the variety of applications support the generalizability of the findings.
To obtain a common data set, responses are eliminated if they fail to answer any
question. This yields a sample of 1,166 usable responses. Table 1 reports sample sizes
for each subgroup. Harris and Schaubroeck [45] suggest 100 as a minimum sample
size for a subgroup, but they recommend at least 200. All subgroups except database
(n = 197) and personally developed (n = 146) had a sample size above 200. No sub-
groups were below the 100-response minimum sample size.
freedom. The Tucker-Lewis index [86] was among the earliest fit indices that in-
volved comparing a model’s fit relative to other nested models. Tucker and Lewis’s
original purpose for developing their index was to quantify the degree to which a
particular exploratory factor model is an improvement over a zero factor model when
assessed by maximum likelihood. Bentler and Bonett’s [14] NNFI is a generalization
of Tucker and Lewis’s definition to all types of covariance structured models under
various estimation methods. The NNFI [14] not only measures the relative improve-
ment in fit obtained by a proposed model compared to the null model but also cor-
rects for the number of parameters in the model.
The RMSEA, CFI, and NNFI indices are used because they are generally unaf-
fected by sample size. Medsker et al. [70] recommended the CFI as being the best
approximation of the population value for a single model. Marsh et al. [63] reported
that the NNFI is also generally unaffected by sample size; it is useful for situations
where a parsimony-type index is needed to account for the number of estimated pa-
rameters in a model [70]. Good-fitting models generally yield CFI or NNFI fit indices
of at least 0.90; that is, only a relatively small amount of variance remains unex-
plained by the model [13, 14, 16]. The degree of cross-validation that is expected for
a model on additional samples (ECVI [17]) is also a measure of model fit. A model is
preferred if it minimizes the value of ECVI relative to other models. We use ECVI for
assessing sequential modifications to models rather than assessing the model-data fit
for a single subgroup or for the entire sample.
group [12]. Poor fit in a subgroup suggests that the instrument may not measure the
phenomena adequately in this subgroup—a new instrument or measurement model
may have to be developed for this particular subgroup.
Next, the equivalence of factor loadings across groups in each dimension is tested
(i.e., tau-equivalency). In evaluating measurement models, the primary concern is
usually about whether each item is a good measure of its latent construct. Factor
loadings are examined first because the equivalence of factor loadings is the minimal
condition for “factorial invariance.” Bollen [16] noted that the equality of factor load-
ings is generally of a higher priority than the equality of other parameters. Bentler
[11] suggested testing first for invariance of factor loadings because, without such
invariance, it would be difficult to argue that the factors are the same. If the factors are
not the same, it may be meaningless to test for the invariance of other parameters. To
test for equal factor loadings, an equal item-factor loading constraint is added to the
baseline model, creating a nested or more restrictive model that is a subset of the
baseline model. Thus, the significance of chi-square differences between these two
nested models provides a test of the hypotheses of equal item-factor loadings.
Next, if the hypothesis of equal item-factor loadings is not rejected, we move on to
test for the equality of the structural weights (gammas) across subgroups. This nested
or more restrictive model is a subset of the model specifying equal item-factor load-
ing. Thus, the chi-square differences between these models provide a test of whether
all five structural weights are equivalent across each of the subgroups. A p-value
greater than 0.05 indicates that the null hypothesis (i.e., no differences in structural
weights between subgroups) is not rejected. If the null hypothesis is rejected, we
examine the structural weights, generate alternative hypotheses, and test these alter-
native hypotheses to identify what structural weights are equivalent across subgroups.
Results
IN THE ENTIRE SAMPLE OF 1,166 RESPONDENTS, the hypothesized measurement model
(see Figure 1) has a chi-square of 377 for 49 degrees of freedom, RMSEA of 0.076,
NNFI of 0.96, and CFI of 0.97 (see the first line of Table 1). Based on the subjective
fit indices, the model-data fit for the overall sample was judged to be adequate. Pa-
rameter estimates for the 1,166 sample are illustrated in Table 2. The completely
standardized item-factor loadings and structural weights are above 0.60. These re-
sults indicate that the measurement model is appropriately specified, a proper solu-
tion is obtained, and the solution adequately fits the entire sample.
Chi-square statistics and subjective goodness-of-fit indices for each of the 11 sub-
groups are reported in Table 1. A proper solution is obtained for each subgroup. The
subjective fit indices for each of the 11 subgroups suggest adequate model-data fit. In
all 11 subgroups, NNFI and CFI scores are above 0.92 and 0.94, respectively. RMSEA
is below 0.08 for all subgroups except for database (0.097) and operating (0.092). All
subgroups have RMSEA scores below 0.1. The item-factor loadings in the 11 sub-
groups indicate that the items are generally good measures of their corresponding
first-order latent factors (see Table 3). Of the 132 item-factor loadings in Table 3, the
244
DOLL ET AL.
Table 2. Parameter Estimates and t-values of EUCS Instrument for Entire Sample (n = 1,166)
C1 C2 C3 C4 A1 A2 F1 F2 E1 E2 T1 T2
245
246 DOLL ET AL.
only loading below 0.60 is the one for item C3 for the personally developed subgroup
(0.52). Thus, we concluded that, in general, the measurement model illustrated in
Figure 1 adequately fits the data for each subgroup.
247
248
Table 4B. Invariance Analysis by Types of Application (APPL)
DOLL ET AL.
number Model description χ2 df NNFI CFI RMSEA ECVI Hypothesis models ∆χ2 ∆df level
Three groups: system analysts, personally developed, and other end user—1,166 cases
1 Equal pattern 448 147 0.95 0.97 0.073 0.54
2 Factor loading invariant 463 161 0.96 0.97 0.070 0.52 H1-MODE 2-1 15 14 0.3781
3 Factor loadings and
structural weights
invariant 510 171 0.96 0.96 0.071 0.55 H2-MODE 3-2 47 10 0.0000
249
250 DOLL ET AL.
modes of development (Table 4D, Model 3). The differences in chi-square between
nested models (Model 3 versus Model 2) are 32, 22, 11, and 47, respectively, for
positions, types of application, hardware platforms, and modes of development. These
chi-square differences are significant at 0.0004, 0.0151, 0.0514, and 0.0000, respec-
tively. Thus, the hypothesis of invariant structural weights of first-order factors is
rejected at the p < 0.05 level for subgroups representing respondent positions, types
of application, and modes of development. The hypothesis of equivalent structural
weights is not rejected for hardware platform. However, it should be noted that the
p-value for hardware platforms (0.0514) barely crosses the acceptable threshold of
0.05. In each case, Model 3 versus Model 2 shows little change in the subjective fit
indices.
The differences in structural weights imply that the first-order factors are not equally
important or central to the second-order EUCS factor across population subgroups in
three dimensions—positions, types of application, and modes of development. How-
ever, Table 5 does not indicate the source and nature of these differences—that is,
which first-order factors have different weights and whether they are lower or higher
than the other groups. Table 5 displays the structural weights of the first-order factors
for the “item-factor loadings invariant” models (Model 2 from Tables 3A, 3B, 3C,
and 3D). Table 5 helps us identify where the structural weights differ.
An examination of Table 5 reveals that the structural weight (1.02) for the first-order
factor “accuracy” in the operating subgroup is substantially higher than those for the
managerial (0.76) and the professional (0.72) subgroups. The structural weights of
the “accuracy” factor are similar for the managerial and the professional subgroups.
To test whether the lack of invariance in the structural weights of the first-order
factors is limited to the “accuracy” factor in the operating subgroup, we test two
additional hypotheses. First, we test whether, with the structural weight for the “accu-
racy” factor set free, the other four first-order factors have equivalent weights across
the three subgroups (see Model 4, H3-POS in Table 4A). Using a test of nested mod-
els (Model 4 versus Model 2), the hypothesis of equivalent structural weights for the
four first-order factors is not rejected (p = 0.2640). Second, we test the hypothesis
(H4-POS) that the structural weights for all five first-order factors are invariant across
the managerial and professional subgroups (see the two-group analysis, Model 7 in
Table 4A). H4-POS is not rejected (p = 0.7000), supporting the contention that all
five first-order factors have equivalent weights across the two subgroups.
An examination of Table 5 reveals that the structural weight (0.95) for the “accuracy”
factor in the database subgroup is substantially higher than that for the decision sup-
port subgroup (0.74). The structural weights of the “accuracy” factor are similar for
the decision support (0.74) and the transaction processing (0.81) subgroups.
THE MEANING AND MEASUREMENT OF USER SATISFACTION 251
Positions of respondent
Managerial 0.91 0.76 0.89 0.77 0.87
Professional 0.94 0.72 0.93 0.72 0.85
Operating 0.94 1.02 1.03 0.66 0.89
Types of application
Decision support 0.87 0.74 0.87 0.70 0.83
Database 0.88 0.95 0.90 0.61 0.96
Transaction processing 1.01 0.81 1.02 0.76 0.87
Hardware platforms
Personal computers 0.89 0.73 0.94 0.64 0.84
Mainframe/mini 0.95 0.84 0.93 0.77 0.89
Modes of development
Systems analyst 0.98 0.83 0.96 0.76 0.90
Personally developed 0.61 0.55 0.58 0.32 0.63
Other end user 0.89 0.80 0.99 0.71 0.87
(p = 0.7000), supporting the contention that the five first-order factors have equiva-
lent structural weights across two modes of development—analyst developed and
other end user developed.
Finally, we examine the sensitivity of these results to the exclusion of either the
database or the operating subgroups. These two subgroups have RMSEA scores above
0.08. For example, the 197 database responses are excluded from the common data
set, then the hypotheses are reexamined for the remaining 969 responses. This pro-
cess is repeated with the 247 operating responses excluded. The results indicate that
the findings are not sensitive to the exclusion of either subgroup.
253
254
DOLL ET AL.
Table 6. Continued.
255
256 DOLL ET AL.
analyst developed applications and applications developed by other end users. How-
ever, one should be cautious in pooling responses from operating respondents, data-
base applications, and personally developed applications with other subgroups in their
respective dimensions. User satisfaction seems to have a slightly different meaning in
these three population subgroups.
Conclusions
THIS RESEARCH HAS ILLUSTRATED that the establishment of the construct validity of
user satisfaction within a population does not necessarily assure that it has the same
meaning in each subgroup. The common experiences or frames of reference of de-
mographic categories of respondents shape the meaning of user satisfaction. For the
most part, these are differences of degree (weighting variation among subgroups),
and our results do not suggest the need for different factors. A notable exception is
personally developed applications. While content, accuracy, format, ease of use, and
timeliness are related to user satisfaction, the consistently lower structural weights
suggest that additional factors may be required to adequately capture the meaning of
user satisfaction in this personally developed context.
The results also indicate that the structure of the EUCS instrument—that is, its five
first-order factors—holds in all subgroups. Content, accuracy, format, ease of use,
and timeliness are robust or tau-equivalent across all subgroups tested. They can be
THE MEANING AND MEASUREMENT OF USER SATISFACTION 259
used in research designs without concerns for the accuracy or comparability (i.e.,
scale differences) of scores. We recommend using these five key satisfaction attributes
to make comparisons in diverse samples.
Improvements in measurement are the cause, not the consequence, of progress in
information systems research. This research has illustrated that MIS researchers need
to pay greater attention to issues of scaling. There are potential problems with the
accuracy or comparability of overall user satisfaction scores when either the 12-item
summed EUCS scale or the overall EUCS scores are used to make comparisons across
the variety of population subgroups present in organizations. Where an overall mea-
sure of user satisfaction is essential to exploring a particular research question, the
choice of scaling method should receive adequate attention in crafting a research
design.
REFERENCES
1. Alloway, R., and Quillard, J. User managers’ systems needs. MIS Quarterly, 7, 2 (June
1983), 27–41.
2. Alter, S. How effective managers use information systems. Harvard Business Review,
54, 6 (1976), 97–106.
3. Alter, S. A taxonomy of decision support systems. Sloan Management Review, 19, 1
(1977), 39–56.
4. Alter, S. Development patterns for decision support systems. MIS Quarterly, 2, 3 (Sep-
tember 1978), 33–42.
5. Alwin, D.F., and Jackson, D.J. Applications of simultaneous factor analysis to issues of
factorial invariance. In D.J. Jackson and E.P. Borgatta (eds.), Factor Analysis and Measure-
ment in Sociological Research: A Multidimensional Perspective. Beverly Hills, CA: Sage, 1981,
pp. 249–279.
6. Bagozzi, R.P., and Yi, Y. On the evaluation of structural equation models. Journal of
Academy of Marketing Science, 16, 1 (1988), 74–94.
7. Bailey, J.E., and Pearson, S.W. Development of a tool for measuring and analyzing
computer user satisfaction. Management Science, 29, 5 (May 1983), 530–545.
8. Bailey, R. Human Performance Engineering: A Guide for System Designers. Englewood
Cliffs, NJ: Prentice Hall, 1982.
9. Baroudi, J.J., and Orlikowski, W.A. A short-form measure of user information satisfac-
tion: A psychometric evaluation and notes on use. Journal of Management Information Sys-
tems, 4, 4 (Spring 1988), 44–59.
10. Bejar, I. Biased assessment of program impact due to psychometric artifacts. Psychologi-
cal Bulletin, 87, 3 (May 1980), 513–524.
11. Bentler, P.M. Theory and Implementation of EQS: A Structural Equations Program. Los
Angeles: BMDP Statistical Software, 1988.
12. Bentler, P.M. Comparative fit indexes in structural models. Psychological Bulletin, 107,
2 (1990), 238–246.
13. Bentler, P.M. On the fit of models to covariances and methodology to the bulletin. Psy-
chological Bulletin, 112, 3 (1992), 400–404.
14. Bentler, P.M., and Bonett, D.G. Significance tests and goodness-of-fit in the analysis of
covariance structure. Psychological Bulletin, 88, 3 (1980), 588–606.
15. Bhattacherjee, A. Understanding information systems continuance: An expectation-con-
firmation model. MIS Quarterly, 25, 3 (September 2001), 351–370.
16. Bollen, K.A. Structural Equations with Latent Variables. New York: Wiley, 1989.
17. Brown, M.W., and Cudeck, R. Alternative ways of assessing model fit. In K.A. Bollen
and J.S. Long (eds.), Testing Structural Equation Models. Newbury Park, CA: Sage, 1993, pp.
136–162.
260 DOLL ET AL.
18. Byrne, B.M. Strategies in testing for an invariant second-order factor structure: A com-
parison of EQS and LISREL. Structural Equation Modeling, 2, 1 (1995), 53–72.
19. Byrne, B.M. Structural Equation Modeling with LISREL, PRELIS, and SIMPLIS: Basic
Concepts, Applications, and Programming. Mahwah, NJ: Lawrence Erlbaum, 1998.
20. Byrne, B.M., and Shavelson, R.J. Adolescent self-concept: Testing the assumption of
equivalent structure across gender. American Educational Research Journal, 24, 3 (Fall 1987),
365–385.
21. Chin, W., and Newsted, P. The importance of specification in causal modeling: the case
of end-user computing satisfaction. Information Systems Research, 6, 1 (March 1995), 73–81.
22. Conrath, D., and Mignen, O. What is being done to measure user satisfaction with EDP/
MIS. Information & Management, 19, 1 (1990), 7–19.
23. Davis, F.; Bagozzi, R.; and Warsaw, P. User acceptance of computer technology: A com-
parison of two theoretical models. Management Science, 35, 8 (1989), 982–1003.
24. DeLone, W.H., and McLean, E.R. Information system success: The quest for the depen-
dent variable. Information Systems Research, 3, 1 (March 1992), 60–95.
25. DeLone, W.H., and McLean, E.R. The DeLone and McLean model of information sys-
tems success: A ten-year update. Journal of Management Information Systems, 19, 4 (Spring
2003), 9–30.
26. Dix, A.; Finlay, J.; Abowd, G.; and Beale, R. Human Computer Interaction. Englewood
Cliffs, NJ: Prentice Hall, 1993.
27. Doll, W.J., and Torkzadeh, G. A discrepancy model of end-user computing involvement.
Management Science, 35, 10 (October 1989), 1151–1171.
28. Doll, W.J., and Torkzadeh, G. The measurement of end-user computing satisfaction. MIS
Quarterly, 12, 2 (June 1988), 259–274.
29. Doll, W.J., and Xia, W. Confirmatory factor analysis of the end-user computing satisfac-
tion instrument: A replication. Journal of End User Computing, 9, 2 (Spring 1997), 24–31.
30. Doll, W.J.; Hendrickson, A.; and Deng, X. Using Davis’s perceived usefulness and ease
of use instruments in decision making: A multigroup invariance analysis. Decision Sciences,
29, 4 (December 1998), 839–870.
31. Doll, W.J.; Xia, W.; and Torkzadeh, G. A confirmatory factor analysis of the EUCS
instrument. MIS Quarterly, 18, 4 (December 1994), 453–461.
32. Drasgow, F., and Kanfer, R. Equivalence of psychological measurement in heteroge-
neous populations. Journal of Applied Psychology, 70, 4 (1985), 662–680.
33. Essex, P.A., and Magal, S.R. Determinants of information center success. Journal of
Management Information Systems, 15, 2 (Fall 1998), 95–117.
34. Etezadi-Amoli, J., and Farhoomand, A. A structural model of end user computing satis-
faction and user performance. Information & Management, 30, 2 (1996), 65–73.
35. Feltman, G. The value of information. Accounting Review, 43, 4 (October 1968), 684–696.
36. Gallagher, C. Perceptions of the value of a management information system. Academy of
Management Journal, 17, 1 (March 1974), 46–55.
37. Gatian, A.W. Is user satisfaction a valid measure of system effectiveness? Information &
Management, 26, 3 (1994), 119–131.
38. Gelderman, M. The relation between user satisfaction, usage of information systems,
and performance. Information & Management, 34, 1 (1998), 11–18.
39. Gelderman, M. Task difficulty, task variability and satisfaction with management sup-
port systems. Information & Management, 39, 7 (July 2002), 593–603.
40. Goodhue, D.L. Supporting users of corporate data: The effect of I/S policy choices.
Ph.D. dissertation, MIT, Cambridge, MA, 1988.
41. Goodhue, D.L., and Thompson, R.L. Task-technology fit and individual performance.
MIS Quarterly, 19, 2 (1995), 213–236.
42. Gorry, G., and Scott-Morton, M.S. A framework for management information systems.
Sloan Management Review, 13, 1 (Fall 1971), 55–70.
43. Gorsuch, R.L. A comparison of biquartimin, maxplane, promax, and varimax. Educa-
tional and Psychological Measurement, 30, 4 (Winter 1970), 861–872.
44. Hardgrave, B.C., and Wilson, R.L. Toward a contingency model for selecting an infor-
mation system prototyping strategy. Journal of Management Information Systems, 16, 2 (Fall
1999), 113–136.
THE MEANING AND MEASUREMENT OF USER SATISFACTION 261
69. McKinney, V.; Yoon, K.; and Zahedi, F.M. The measurement of Web-customer satisfac-
tion: An expectation and disconfirmation approach. Information Systems Research, 13, 3 (Sep-
tember 2002), 296–315.
70. Medsker, G.J.; Williams, L.J.; and Holahan, P.J. A review of current practices for evalu-
ating causal models in organizational behavior and human resources management research.
Journal of Management, 20, 2 (1994), 439–464.
71. Melone, N.P. A theoretical assessment of the user-satisfaction construct in information
systems research. Management Science, 36, 1 (January 1990), 76–91.
72. Miller, H. The multiple dimensions of information quality. Information Systems Man-
agement, 13, 2 (1996), 79–82.
73. Rai, A.; Lang, S.S.; and Welker, R.B. Assessing the validity of IS success models: An
empirical test and theoretical analysis. Information Systems Research, 13, 1 (March 2002),
50–69.
74. Raymond, L. Validating and applying user satisfaction as a measure of MIS success in
small organizations. Information & Management, 12, 4 (April 1987), 173–179.
75. Rentz, J.O. Generalizability theory: A comprehensive method for assessing and improv-
ing the dependability of marketing measures. Journal of Marketing Research, 24, 1 (1987),
19–28.
76. Rivard, S., and Huff, S.L. Factors of success for end-user computing. Communications
of the ACM, 31, 5 (1988), 552–561.
77. Rivard, S.; Poirier, G.; Raymond, L.; and Bergeron, F. Development of a measure to
assess the quality of user-developed applications. Database for Advances in Information Sys-
tems, 28, 3 (1997), 44–58.
78. Sanders, L.G. MIS/DSS success measure. Systems, Objectives, Solutions, 4, 1 (1984),
29–34.
79. Sanders, L.G. Issues and instruments for measuring system success. Working Paper,
Jacobs Management Center, SUNY–Buffalo, September 1990.
80. Sanders, L.G., and Courtney, J.F. A field study of organizational factors influencing DSS
success. MIS Quarterly, 9, 1 (March 1985), 77–92.
81. Smith, P.C.; Kendall, L.; and Hulin, C.L. The Measurement of Satisfaction in Work and
Retirement. Chicago: Rand McNally, 1969.
82. Steiger, J.H. Ez-Path: A Supplementary Module for SYSTAT and SYGRAPH. Evanston,
IL: SYSTAT, 1989.
83. Subramanian, A., and Nilakanta, S. Measurement: A blue print for theory building in
MIS. Information & Management, 26, 1 (1994), 13–20.
84. Swanson, E.B. Management information systems: Appreciation and involvement. Man-
agement Science, 21, 2 (October 1974), 178–188.
85. Torkzadeh, G., and Doll, W.J. Test-retest reliability of the end-user computing satisfac-
tion instrument. Decision Sciences, 22, 1 (1991), 26–37.
86. Tucker, L., and Lewis, C. A reliability coefficient for maximum likelihood factor analy-
sis. Psychometrika, 38, 1 (March 1973), 1–10.
87. Wheaton, B. Assessment of fit in overidentified models with latent variables. Sociologi-
cal Methods and Research, 16, 1 (August 1987), 118–154.
88. Wilken, P.H., and Blalock, H.M., Jr. The generalizability of indirect measures to com-
plex situations. In G. Bohrnstedt and E. Borgatta (eds.), Social Measurement: Current Issues.
Beverly Hills, CA: Sage, 1981, pp. 39–62.
89. Zmud, R.W., and Boynton, A.C. Survey measures and instruments in MIS: Inventory
and appraisal. In K.L. Kraemer and J.I. Cash, Jr. (eds.), The Information Systems Research
Challenge: Survey Research Methods. Boston: Harvard Business School Press, 1991, pp.
149–180.
90. Zmud, R.W.; Sampson, J.P.; Reardon, R.C.; Lenz, J.G.; and Byrd, T.A. Confounding
effects of construct overlap: An example from IS user satisfaction theory. Information Technol-
ogy and People, 7, 2 (1994), 29–45.
91. Zwass, V. Foundations of Information Systems. Boston: Irwin/McGraw-Hill, 1998.
View publication stats