You are on page 1of 38

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220590985

The Meaning and Measurement of User Satisfaction: A Multigroup Invariance


Analysis of the End-User Computing Satisfaction Instrument

Article  in  Journal of Management Information Systems · June 2004


DOI: 10.1080/07421222.2004.11045789 · Source: DBLP

CITATIONS READS

123 507

5 authors, including:

William J. Doll Xiaodong Deng


University of Toledo Oakland University
80 PUBLICATIONS   7,135 CITATIONS    29 PUBLICATIONS   868 CITATIONS   

SEE PROFILE SEE PROFILE

T. S. Ragu-Nathan
University of Toledo
94 PUBLICATIONS   5,961 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Decision Making View project

Supply Chain Management View project

All content following this page was uploaded by Xiaodong Deng on 08 January 2016.

The user has requested enhancement of the downloaded file.


THE MEANING AND MEASUREMENT OF USER SATISFACTION 227

The Meaning and Measurement of User


Satisfaction: A Multigroup Invariance
Analysis of the End-User Computing
Satisfaction Instrument
WILLIAM J. DOLL, XIAODONG DENG, T.S. RAGHUNATHAN,
GHOLAMREZA TORKZADEH, AND WEIDONG XIA

WILLIAM J. DOLL is a Professor of MIS and Strategic Management at the University


of Toledo. Dr. Doll holds a Ph.D. in Business Administration from Kent State Univer-
sity and has published extensively on information system and manufacturing issues
in academic and professional journals including Management Science, Communica-
tions of the ACM, MIS Quarterly, Academy of Management Journal, Decision Sci-
ences, Journal of Operations Management, Information Systems Research, Omega,
Information & Management, Datamation, and Datapro. Dr. Doll has published ex-
tensively in a variety of topics including computer integrated manufacturing, execu-
tive steering committees, top management involvement in MIS development, strategic
information systems, information systems downsizing, and end-user computing. He
has developed instruments to measure a variety of constructs including user involve-
ment in systems development, time-based manufacturing practices, system use, im-
pact of technology on work, and end-user computing satisfaction.

XIAODONG DENG is an Assistant Professor of Management Information Systems at


Oakland University. He received his Ph.D. in Manufacturing Management from the
University of Toledo. His research has appeared in Decision Sciences, Information
Resources Management Journal, and Journal of Intelligent Manufacturing. His re-
search interests are in postimplementation information technology learning, informa-
tion systems benchmarking, and information technology acceptance and diffusion.

T.S. RAGHUNATHAN is Professor of MIS at the University of Toledo. He is involved in


research programs in the areas of information systems, decision support systems,
planning, and implementation of technology. He is the author of many articles in
academics and professional journals including Journal of Management Information
Systems, Information Systems Research, Decision Sciences, and Omega. Dr.
Raghunathan has a Ph.D. in Information Systems from the University of Pittsburgh
and has extensive executive experience.

GHOLAMREZA TORKZADEH is Professor and Chair of MIS at the University of Ne-


vada, Las Vegas. He has published on management information systems issues in
academic and professional journals including Management Science, Information Sys-
tems Research, Journal of Management Information Systems, MIS Quarterly, Com-
munications of the ACM, Decision Sciences, Omega, Journal of Operational Research,
Information & Management, Journal of Knowledge Engineering, Educational and
Psychological Measurement, Behaviour & Information Technology, Long Range

Journal of Management Information Systems / Summer 2004, Vol. 21, No. 1, pp. 227–262.
© 2004 M.E. Sharpe, Inc.
0742–1222 / 2004 $9.50 + 0.00.
228 DOLL ET AL.

Planning, and others. His current research interests include the impact of informa-
tion technology, measuring e-commerce success, computer self-efficacy, and infor-
mation systems security. He holds a Ph.D. in Operations Research from the University
of Lancaster, England, and is a member of the Institute for Operations Research and
the Management Science, Association for Information Systems, and Decision Sci-
ences Institute.

WEIDONG XIA is an Assistant Professor of Information and Decision Sciences in the


Carlson School of Management at the University of Minnesota. He holds a Ph.D. in
Information Systems from the University of Pittsburgh. His current research relates
to the assessment of the capabilities and organizational impact of information tech-
nology infrastructure, end-user adoption, and usage of information technology. His
research articles have been published in journals including MIS Quarterly and Deci-
sion Sciences.

ABSTRACT: Although user satisfaction is widely used by researchers and practitioners


to evaluate information system success, important issues related to its meaning and
measurement across population subgroups have not been adequately resolved. To be
most useful in decision-making, instruments like end-user computing satisfaction
(EUCS), which are designed to evaluate system success, should be robust. That is,
they should enable comparisons by providing equivalent measurement across diverse
samples that represent the variety of conditions or population subgroups present in
organizations.
Using a sample of 1,166 responses, the EUCS instrument is tested for measurement
invariance across four dimensions—respondent positions, types of application, hard-
ware platforms, and modes of development. While the results suggest that the mean-
ing of user satisfaction is context sensitive and differs across population subgroups,
the 12 measurement items are invariant across all four dimensions. The 12-item
summed scale enables researchers or practitioners to compare EUCS scores across
the instrument’s originally intended universe of applicability.

KEY WORDS AND PHRASES: confirmatory factor analysis, end-user computing satisfac-
tion, factorial invariance, instrument validation, research methods, user satisfaction.

USER SATISFACTION HAS BECOME A PERVASIVE MEASURE of the success or effective-


ness of information systems for both practitioners and researchers [9, 15, 22, 24, 38,
44, 49, 54, 56, 68, 69, 71, 73, 83]. DeLone and McLean’s [25] updated model of
information system success continues to consider user satisfaction as one of the key
measures of system success. User satisfaction research is populated with what Zmud
and Boynton [89] describe as “apparently redundant scales” that have a narrow range
of applicability. Specific instruments have been developed to measure user satisfac-
tion for a number of population subgroups. User satisfaction instruments have been
developed for: a traditional data processing environment [7, 50], an end-user comput-
ing environment [28], a mainframe-based corporate database environment [40, 41],
decision support applications [78, 80], and an end-user development context [76].
THE MEANING AND MEASUREMENT OF USER SATISFACTION 229

These instruments use different items and measure different aspects of user satisfac-
tion, implying that the meaning and measurement of user satisfaction varies between
population subgroups.
Zmud et al. [90] question the robustness of user satisfaction in general and the end-
user computing satisfaction (EUCS) instrument [28] in particular. They argue that
user satisfaction is context sensitive. That is, its scaling or meaning may be influ-
enced by situational factors (e.g., conditions of measurement or population subgroups).
Other contextual uses of user satisfaction in the literature include small organizations
[74], user developed applications [77], computer simulation [65], CASE tool soft-
ware [53], and decision support systems [66]. In a review and critique of user satis-
faction instruments, Klenke [55] specifically calls for multigroup invariance studies
of EUCS to assess its measurement equivalence.
Originally developed by Doll and Torkzadeh [28], as shown in Figure 1, the EUCS
construct is defined as a second-order latent factor consisting of five first-order latent
factors (i.e., information content, format, accuracy, ease of use, and timeliness). The
five first-order latent factors and their structural weights define the meaning of this
second-order EUCS construct. The EUCS instrument [28] has been widely used [21,
27, 33, 34, 37, 39, 47, 48, 64, 65] and cross-validated [31, 38, 46, 66, 67, 85] to
measure a user’s satisfaction with a specific application. Gelderman [38] finds that
EUCS is a good predictor of an application’s impact on organizational performance
and, thus, a useful surrogate for system success.
Issues related to the meaning and measurement of user satisfaction have important
implications for both researchers and practitioners. When designing studies involv-
ing user satisfaction, researchers need to know whether they should use specific user
satisfaction instruments for each population subgroup, or they should use a standard-
ized instrument to make comparisons across the various population subgroups present
in their research. When applying the instrument, practitioners need to know whether
they can compare user satisfaction scores across diverse subpopulations. These are
questions about measurement equivalence (robustness) of an instrument across popu-
lation subgroups.
Despite the wide use of the EUCS instrument, its measurement equivalence across
different population subgroups has not been tested. This paper uses multigroup in-
variance analysis to answer two research questions: (1) Do the items used to measure
the five first-order factors of EUCS have equivalent item-factor loadings across popu-
lation subgroups? (2) Are the structural weights of the five first-order factors on the
second-order EUCS factor equivalent across population subgroups? Based on respon-
dent positions, application types, hardware platforms, and development modes, four
categories of subgroups are defined. This paper tests measurement equivalence of the
EUCS instrument across these four categories of subgroups.

The Meaning and Measurement of User Satisfaction


IF A SCALE IS ROBUST, ITS VALUE for practical decision-making and research is greatly
enhanced [10, 43]. Wilken and Blalock [88] suggest that robustness is absolutely
230
DOLL ET AL.
Figure 1. A Second-Order Measurement Model of the End-User Computing Satisfaction Instrument
THE MEANING AND MEASUREMENT OF USER SATISFACTION 231

essential for constructs such as EUCS that are designed to evaluate system success
across a variety of contexts and population subgroups. The EUCS instrument (see
Figure 1) was originally designed to be generally applicable to a variety of respondent
positions, application types, hardware platforms, and development modes [28]. These
dimensions define the instrument’s originally intended universe of applicability [75].
Figure 1 depicts EUCS as a single second-order latent construct with five first-
order latent factors (i.e., content, accuracy, format, timeliness, and ease of use). Con-
firmatory studies have repeatedly validated this hypothesized second-order
measurement model [29, 31, 53, 67]. Several studies of the instrument’s test–retest
reliability have reported good stability and reliability, as indicated by Cronbach’s
alpha values above 0.90 [46, 66, 85]. Since this second-order measurement model
has been supported and recommended by previous studies that developed and tested
the EUCS instrument, we have chosen the second-order measurement model to test
the invariance of the instrument across population subgroups [18, 19].
In Figure 1, the five arrows leading from end-user computing satisfaction to the five
first-order latent factors depict the structural weights. Structural weights can be viewed
as regression coefficients in the regression of the first-order factors on the higher-
order factor. These structural weights are significant for our understanding of the
nature of the user satisfaction construct itself, as well as the centrality or importance
of each component (content, accuracy, format, ease of use, or timeliness) to overall
user satisfaction. These structural weights indicate the centrality or importance as-
signed to the first-order factors in scaling the second-order factor [62]. In other words,
the weights indicate how central each first-order factor is to the second-order EUCS
factor. The structural weights can be used to derive the overall second-order EUCS
score from the weighted average of the first-order factor scores. In different contexts
or subpopulations, the first-order factors might be weighted differently, suggesting
that end-user computing satisfaction has different meanings across subgroups.
In Figure 1, the 12 arrows leading from the first-order latent factors to the measure-
ment items are the item-factor loadings. Item-factor loadings can be viewed as regres-
sion coefficients in the regression of observed variables on their corresponding latent
factor. These item-factor loadings indicate the extent to which the item captures the
trait of its latent factor. In different contexts or subpopulations, these item-factor load-
ings may be different, suggesting that the first-order factors (content, accuracy, for-
mat, timeliness, and ease of use) may have different meanings across subgroups. The
descriptions of the 12 measurement items are depicted at the bottom of Figure 1.
Assessing the robustness or measurement equivalence of second-order measure-
ment models like the EUCS instrument requires two conditions [18, 19]. First, the
items that measure the first-order factors must have equivalent item-factor loadings
across subgroups or conditions of measurement. For example, the factor loadings of
the four items measuring the first-order factor “Content” should be equivalent across
subgroups in order for the “Content” scores across the subgroups to be comparable.
Second, the structural weights of the first-order factors must also be equivalent across
population subgroups. In this paper, we test the equivalence of the item-factor load-
ings and the structural weights of the first-order factors across subgroups based on
232 DOLL ET AL.

positions of respondent, types of application, hardware platforms, and modes of


development.

The Importance of Measurement Invariance


Standardized instruments such as EUCS must provide equivalent (invariant) mea-
surement across subgroups if comparative statements are to have substantive import.
Without equivalent measurement, observed scores from different groups are in differ-
ent scales and, therefore, are not directly comparable [32]. If a scale is not robust,
comparing scores can result in poor managerial decisions or incorrect statistical in-
ferences. These ideas are particularly clear in physical measurement, where it is obvi-
ous that weight in pounds cannot be directly compared to weight in kilograms.
Similarly, when using instruments in user satisfaction research, observed scores can-
not be compared unless they are in the same scale.
Zmud et al. [90] argue that all instruments can be viewed as located on a continuum
reflecting the extent to which the construct is linked with an experientially based
context. The stronger a construct’s linkage to an experientially based context, the
greater the concern that the construct and the context may interact. If significant con-
struct–context interaction is present, item-factor loadings or structural weights (in
second-order measurement models) can be expected to vary between subgroups or
contexts. Thus, the context may, in part, influence the construct’s meaning or how it is
scaled.
Smith et al. [81] suggest that the frame of reference an individual brings to bear in
evaluating an application may be shaped by the unique life experiences of demo-
graphic categories of respondents. Measurement nonequivalence may result from
these differing frames of reference. For example, using multigroup invariance analy-
sis, Doll et al. [30] find that gender affects the item-factor loadings (true scores) for
Davis et al.’s [23] perceived ease-of-use instrument. Experientially based differ-
ences in organizational positions, types of application being used, hardware plat-
forms, or the user’s role in the development of the application may cause differing
frames of reference.
Constructs such as user satisfaction are the language through which theoretical ideas
and research findings are communicated among researchers and practitioners. A user
satisfaction instrument with broad applicability increases the extent to which the re-
sults of studies can be generalized to other subgroups. Thus, it enables researchers to
interpret results as informing the full body of theory.

Scaling EUCS: Implications for Accuracy and Comparability


The methods used to scale EUCS have implications for the accuracy and comparabil-
ity of the scores. The most common method for scaling EUCS is a simple aggregation
(or average) of the scores for the 12 items. This method is simple and easy, but the
weights assigned to the first-order factors (content, format, ease of use, accuracy, and
timeliness) depend upon the number of items measuring each factor and are, thus,
THE MEANING AND MEASUREMENT OF USER SATISFACTION 233

arbitrary. Substantial differences in item-factor loadings between groups would indi-


cate that EUCS may have to be scaled differently (different items or different item-
factor loadings) and comparisons across groups would not be possible. If item-factor
loadings are invariant across population subgroups, EUCS scores obtained by sum-
ming the 12 items are in the same scale and therefore directly comparable across
subgroups.
A more complicated, but potentially more accurate method for scaling EUCS is to
use the second-order factor score. The second-order factor score is computed by
multiplying each first-order factor score by its corresponding structural weight and
summing the weighted factor scores across the five factors. This EUCS factor score
does not use arbitrary weights. The weights reflect the centrality or importance of the
first-order factors to the second-order EUCS factor. If the structural weights differ
between subgroups, EUCS factor scores may provide a more accurate measure of
satisfaction for each subgroup, but the scores will not be comparable across sub-
groups. Multigroup invariance analysis enables us to test the equivalence of both
item-factor loadings and structural weights across subgroups and, thus, assess the
implications of the alternative scaling methods for accuracy and comparability.

Measurement Equivalence by Positions of Respondent


Feltman [35], in his early work on the value of information, considers the person who
receives the information and makes decisions based on that information as the key
element in studying information attributes such as relevancy, timeliness, and accu-
racy. Gallagher [36] finds that upper-level managers value information more highly
than individuals in other positions. Miller [72] argues that what is meant by informa-
tion quality depends on the intended user of the information and should be evaluated
with respect to this “customer’s needs.” Swanson [84] argues that the responsibilities
associated with varying organizational positions may influence both a priori involve-
ment and management information systems (MIS) appreciation.
Zwass [91] identifies three categories of users with different needs and, thus, differ-
ent perspectives for evaluating information—operating personnel, managers, and pro-
fessionals. These three groups utilize information systems for different
purposes—operational support, management support, and knowledge work, respec-
tively. Since user satisfaction measures whether user information needs are being
met, it seems plausible that perceptions of information attributes (content, accuracy,
format, ease of use, or timeliness) or overall end-user satisfaction may vary across
these three categories of respondents.
It has long been recognized that the nature of information required to support op-
erational-level activities is quite different from what is needed to support managerial
or professional personnel [42]. Operational tasks, such as the manufacture of a spe-
cific part, require highly accurate, current, and detailed information. In contrast, man-
agers and professional staff who plan and control such activities often deal with trends
or projections. This planning and control work is not sensitive to the accuracy of
individual transactions. This suggests the following null hypotheses:
234 DOLL ET AL.

H1-POS: The 12 items of the EUCS instrument have equivalent item-factor load-
ings on their corresponding first-order latent factors across respondent position
categories (i.e., managerial, professional, operating).

H2-POS: The five first-order latent factors have equivalent structural weights on
the second-order EUCS factor across respondent position categories (i.e., mana-
gerial, professional, and operating).

Measurement Equivalence by Types of Application


Differences in user needs [1] have led to the development of four basic types of appli-
cation designs—monitoring, exception reporting, inquiry, and analysis. Monitoring
and exception reporting systems are referred to as transaction processing systems.
Managerial support systems include inquiry and analysis applications. Database ap-
plications have the ability to respond to ad hoc requests or queries. Analysis applica-
tions generally have access to a database, but also provide powerful data analysis
capabilities such as modeling, simulation, optimization, or statistical routines to sup-
port decision-making.
An application’s design (i.e., type of application) can be viewed as independent of
the organizational position of the user. While transaction processing applications are
predominantly used by operating personnel, managers and professionals also use these
applications to monitor progress or identify exceptions that merit their attention [3].
While inquiry and decision support systems support managerial and professional de-
cision-making, these applications are also used by operating personnel to respond to
requests for information or as an aid in decision-making [2, 4].
Nevertheless, many researchers suggest that measuring user satisfaction with deci-
sion support systems requires an instrument designed specifically for decision analy-
sis. Sanders [78] proposes a measure of “satisfaction with decision-making.” He argues
that satisfaction with decision support is a different factor from satisfaction with trans-
action processing [79]. EUCS items can capture decision support satisfaction only
indirectly through information content items. A more direct measure of satisfaction
with decision-making may be necessary [78, 80].
In effect, Sanders [79] assumes that user satisfaction within the decision support
context has a different meaning, is measured by different factors (i.e., decision-mak-
ing satisfaction and overall satisfaction), and requires different measures from instru-
ments designed to measure user satisfaction with transaction processing applications.
Some recent empirical evidence challenges this assumption. For example, McHaney
and Cronan [64] find that the EUCS instrument can be used to evaluate the success of
computer simulation applications.
Other researchers have focused on the unique requirements of a corporate database
environment. The task-technology fit instrument developed by Goodhue and Thomp-
son [41] is an instrument for measuring user satisfaction with database applications.
This instrument is developed specifically for the context of a mainframe-based cor-
porate database environment [40]. The instrument reflects this corporate database
THE MEANING AND MEASUREMENT OF USER SATISFACTION 235

context by measuring satisfaction with ease-of-use factors such as locatability (i.e.,


the meaning of data is easy to find), authorization for access to data, and data compat-
ibility. It provides another example of the development of a user satisfaction instru-
ment for a particular type of application (i.e., corporate database) and, thus, it further
motivates tests of measurement equivalence across application types.
Logic suggests that accuracy may be more important or more central to user satis-
faction for database users, and that ease of use may be less important. Data integrity
issues have always been important to those who use database applications. Databases
are far more difficult to use than other means of storing and summarizing informa-
tion—for example, spreadsheets. If users have chosen a more difficult-to-use tool,
their task requirements may demand the greater capabilities provided by the data-
base. Given the users’ data management requirements, ease of use is probably a sec-
ondary consideration. This suggests the following null hypotheses concerning
measurement equivalence by type of application:
H1-APPL: The 12 items of the EUCS instrument have equivalent item-factor
loadings on their corresponding first-order latent factors across types of appli-
cation (i.e., decision support, database, and transaction processing applications).
H2-APPL: The five first-order latent factors have equivalent structural weights
on the second-order EUCS factor across types of application (i.e., decision sup-
port, database, and transaction processing applications).

Measurement Equivalence by Hardware Platforms


The user information satisfaction (UIS) instrument [7, 50] has been developed within
the context of a traditional mainframe data processing environment. Mainframe sys-
tems are not easy to use. In this environment, systems analysts are important middle-
men who create reports for managers. UIS factors include satisfaction with user
involvement in development, user–analyst relationships, and information product.
Ease of use items are omitted; it is assumed that users do not directly interact with
application software. The UIS instrument is an example of a user satisfaction instru-
ment developed specifically for a traditional mainframe environment.
An important current development in organizational computing is downsizing—
that is, moving from platforms based on mainframes and minicomputers to a micro-
computer environment. Increasingly, information system architectures are designed
on the client-server model to take advantage of the opportunities offered by networks
organized from micros. In client-server computing, the processing is split up among
a number of clients—serving individual users—and one or more servers.
The objective of the client, usually a personal computer, is to provide a graphical
user interface (GUI) to the user [91]. Research on computer interfaces suggests that
the graphical user interfaces characteristic of personal computers make them easier to
use for nonexperts than the command-line interfaces typical of mainframe and mini-
computers [8, 26].
236 DOLL ET AL.

To the extent that an application is perceived as easy to use, perceptions of an


application’s information content or format may also be positively affected. For
example, Davis et al. [23] reports that perceived ease of use can affect perceptions
of an application’s perceived usefulness. This possible influence of a hardware
platform’s ease of use on the measurement of user satisfaction suggests the follow-
ing null hypothesis:
H1-PLAT: The 12 items of the EUCS instrument have equivalent item-factor load-
ings on their corresponding first-order latent factors across hardware platforms
(i.e., mainframe/mini or personal computer applications).
User satisfaction measures whether user information needs are being met. To the
extent there are unmet needs, it seems plausible that ease of use may be more impor-
tant or more central to user satisfaction (i.e., higher structural weights of first-order
factors on the second-order EUCS factor) among mainframe users. This suggests the
need to test the following null hypothesis:

H2-PLAT: The five first-order latent factors have equivalent structural weights
on the second-order EUCS factor across hardware platforms (i.e., mainframe/
mini or personal computer applications).

Measurement Equivalence by Modes of Development


Mode of development refers to how an application is developed. Applications can be
developed in one of three modes. First, they may be developed in the traditional mode
where a professional systems analyst is primarily responsible for developing the ap-
plication for the user. This also includes software packages purchased from an exter-
nal vendor. Second, the application may be developed personally by end users for
their own use. Third, the application may be developed by an end user for another end
user. These modes of development represent different contexts that may affect the
meaning and measurement of user satisfaction.
For example, Rivard and Huff [76] present an instrument for measuring the satis-
faction of end users who develop their own applications. The population they studied
includes only individuals who are not data processing professionals and who de-
velop computer applications for themselves or others. In most cases, the end-user
developers also use the application, rather than turning it over to another person to
use. This context is quite different, because the user is the one who actually programs
the system.
The factors involved in measuring user satisfaction in this user-developed context
may be quite different from those found in other user satisfaction instruments. For
example, the information product dimensions found in the UIS instrument, the EUCS
instrument (i.e., content, format, accuracy, and timeliness), and the task-technology
fit instrument (i.e., information quality, production timeliness) are absent. A factor
unique to this end-user developer instrument is user satisfaction with independence
from data processing. This independence factor and the absence of information prod-
THE MEANING AND MEASUREMENT OF USER SATISFACTION 237

uct dimensions suggest the need to test the following null hypotheses concerning the
effect of development modes on the meaning and measurement of user satisfaction:
H1-MODE: The 12 items of the EUCS instrument have equivalent item-factor
loadings on their corresponding first-order latent factors across modes of devel-
opment (i.e., analyst developed, end-user developed, and other end-user devel-
oped applications).
H2-MODE: The five first-order latent factors have equivalent structural weights
on the second-order EUCS factor across modes of development (i.e., analyst
developed, end-user developed, and other end-user developed applications).

Research Methods
CONFIRMATORY FACTOR ANALYSIS (CFA) PERMITS more rigorous tests of the equality
or invariance of measurement parameters (e.g., item-factor loadings or structural
weights) across groups than are possible with exploratory factor analysis [19, 51, 58].
In testing the measurement equivalence of the EUCS instrument across population
subgroups, two sets of parameters are of special interest. First, we are interested in the
equivalence of item-factor loadings for the 12 items. Second, we are interested in the
equivalence of the structural weights of the five first-order factors on the second-
order EUCS factor. Testing for invariance (i.e., equivalence) is a particularly demand-
ing test of robustness. In some cases, minor differences may not be critical to the
interpretation of research results [20].
Invariance analysis enables one to explicitly test the structure of a second-order
measurement model or its individual parameters for equivalence across subgroups or
conditions [16, 51]. Our invariance analysis is conducted using LISREL VIII [52].
We are following an established modeling approach that has been developed and
used by a number of studies in various disciplines (psychology and education) to test
invariance of second-order measurement models. Examples of instruments with sec-
ond-order measurement models whose factorial invariance hypotheses have been tested
using LISREL methods include: the masculinity/femininity instrument [59], a de-
pression instrument [18], and a self-concept instrument [62].

The Measures
As the purpose of this study is to confirm the EUCS instrument and assess it measure-
ment equivalence across population subgroups, we use the identical items, scales, and
sampling methods used by Doll and Torkzadeh [28] in their development of the EUCS
instrument. The 12 items are illustrated in the legend of Figure 1. The order of the 12
items is randomized in the questionnaire. A five-point scale is used: 1 = almost never;
2 = some of the time; 3 = about half of the time; 4 = most of the time; and 5 = almost
always. We also use the identical demographics questions regarding positions of the
respondents, types of application, modes of development, and hardware platforms.
238 DOLL ET AL.

Because the identical instrument and data collection methods are used, we do not
conduct separate pretests and pilot tests for this study.
The respondents are asked to identify their position within the overall organization
by checking only one of the following responses: top level management, middle level
management, first level supervisor, professional employee without supervisory re-
sponsibility, and other (e.g., operating personnel). Since theory suggests three catego-
ries of users with different needs for evaluating information [91], these five categories
are recoded into three categories—managerial, professional, and operating.
Two yes/no questions are asked to determine the type of application [1] used by the
respondent. If the respondent checks “yes” to—“Does this application provide data
analysis capabilities (spreadsheet, modeling, simulation, optimization or statistical
routines to support managerial decision making?”—the application is categorized as
decision support. If the respondent checks “no” to this question, but checks “yes”
to—“Does this application provide a database with flexible inquiry capabilities (e.g.,
managers can design and change their own monitoring and exception reports)?”—the
application is categorized as database. If the respondent checks “no” to both ques-
tions, the application is categorized as transaction processing.
A yes/no question is used to categorize the hardware platform—“Is this a personal
computer (micro) application?” “No” responses are categorized as mainframe/mini.
To determine the mode of development, a nested set of two questions are asked. The
respondent is first asked a yes/no question—“Was this application primarily developed
by an end user?” If the answer was “yes,” the respondent is then asked a second (nested)
question—“Did you personally develop this application?” Applications with a “no”
response to the first question are categorized as being developed by a professional
systems analyst. If the respondent checked “yes” to the first question and “yes” to the
second question, the application is categorized as a personally developed application.
If the respondent checked “yes” to the first question but “no” to the second question,
the application’s mode of development is categorized as “other end user.”

The Sample
The data used to testing the hypotheses are collected using surveys of end users in
over 60 firms, half of the firms we initially contacted. Typically, the directors of MISs
are asked to identify their major applications and the users who directly interact with
each application. Major applications are defined as those that are of operational or
strategic importance to the firm. In small firms, the director could easily identify the
users to be included in the sample. In large firms, the director would ask the managers
responsible for systems development and maintenance to identify the major users.
Although some individuals use several applications, they are only asked to respond
with respect to one application specified by the director.
Using a sampling plan developed by this method, questionnaires were distributed
to end users through interoffice mail with a cover letter describing the survey as a
university-based research project. There were 1,386 responses obtained. About half
of the responses came from manufacturing firms, with the remainder being about
THE MEANING AND MEASUREMENT OF USER SATISFACTION 239

equally distributed between retail, government agencies, utilities, hospitals, and edu-
cational institutions.
The sample represents over 300 different applications including accounts payable,
accounts receivable, budgeting, CAD/CAM, customer service, service dispatching,
engineering analysis, process control, work order control, general ledger, manpower
planning, financial planning, inventory, order entry, payroll, personnel, production
planning, purchasing, quality, sales analysis and forecasting, student data, and profit
planning. This was a convenience sample, but the large number of organizations and
the variety of applications support the generalizability of the findings.
To obtain a common data set, responses are eliminated if they fail to answer any
question. This yields a sample of 1,166 usable responses. Table 1 reports sample sizes
for each subgroup. Harris and Schaubroeck [45] suggest 100 as a minimum sample
size for a subgroup, but they recommend at least 200. All subgroups except database
(n = 197) and personally developed (n = 146) had a sample size above 200. No sub-
groups were below the 100-response minimum sample size.

Criteria for Evaluating Models


Because no one statistic is universally accepted as an index of model adequacy, our
interpretation of results emphasizes substantive issues, practical considerations, and
several measures of fit. Although the chi-square statistic is a global test of a model’s
ability to reproduce the sample variance/covariance matrix, it is sensitive to sample
size and departures from multivariate normality [16]. Thus, the chi-square statistic
must be interpreted with caution [52]. Even if the discrepancy between the estimated
model and the data is very small, if the sample size is large enough, almost any model
will be rejected because the discrepancy is not statistically equal to zero. Because the
degrees of freedom for a LISREL problem do not reflect the sample size, the chi-
square/degree of freedom ratio is also as sensitive to sample size as the chi-square
statistic itself [62].
Because chi-square and the chi-square per degree of freedom statistics are sensitive
to sample size, we used subjective fit indexes that have been developed to assess the
degree of congruence between the model and data (i.e., whether the variance and
covariance in the data are accounted for by the model). Wheaton [87] has suggested
that it is prudent to report several fit measures. In this research, we use root mean
square error of approximation (RMSEA), comparative fit index (CFI), non-normed
fit index (NNFI), and expected cross-validation index (ECVI).
The RMSEA [82] is a measure of model discrepancy. It measures the amount of
model discrepancy per degree of freedom. When a model fits perfectly, RMSEA equals
zero. It has no upper bound. A value of 0.05 or less indicates a close fit of the model
in relation to the degrees of freedom. A value of 0.08 or less for the RMSEA indicates
a reasonable error of approximation. Brown and Cudeck [17] do not recommend
using a model with an RMSEA value greater than 0.1.
CFI is a normed relative noncentrality index [12]. It estimates each noncentrality
parameter by the difference between its t-statistic and the corresponding degrees of
240
DOLL ET AL.
Table 1. Model-Data Fit Assessment for Entire Sample (1,166) and Subgroups

Dimension χ2 df p-value NNFI CFI RMSEA

Data set (n = 1,166) 377 49 0.000 0.96 0.97 0.076


Positions of respondent
Managerial (n = 539) 187 49 0.000 0.96 0.97 0.072
Professional (n = 378) 160 49 0.000 0.95 0.97 0.077
Operating (n = 249) 152 49 0.000 0.94 0.95 0.092
Types of application
Decision support (n = 596) 170 49 0.000 0.96 0.97 0.064
Database (n = 197) 140 49 0.000 0.93 0.95 0.097
Transaction processing (n = 373) 155 49 0.000 0.95 0.96 0.076
Hardware platforms
Personal computer (n = 489) 191 49 0.000 0.95 0.96 0.077
Mainframe/mini (n = 677) 238 49 0.000 0.96 0.97 0.076
Modes of development
Systems analyst (n = 706) 253 49 0.000 0.95 0.97 0.077
Other end user (n = 314) 109 49 0.000 0.97 0.97 0.062
Personally developed (n = 146) 93 49 0.000 0.92 0.94 0.079
Note: Covariance matrices for the entire data set and the 11 subgroups are available from the first author upon request.
THE MEANING AND MEASUREMENT OF USER SATISFACTION 241

freedom. The Tucker-Lewis index [86] was among the earliest fit indices that in-
volved comparing a model’s fit relative to other nested models. Tucker and Lewis’s
original purpose for developing their index was to quantify the degree to which a
particular exploratory factor model is an improvement over a zero factor model when
assessed by maximum likelihood. Bentler and Bonett’s [14] NNFI is a generalization
of Tucker and Lewis’s definition to all types of covariance structured models under
various estimation methods. The NNFI [14] not only measures the relative improve-
ment in fit obtained by a proposed model compared to the null model but also cor-
rects for the number of parameters in the model.
The RMSEA, CFI, and NNFI indices are used because they are generally unaf-
fected by sample size. Medsker et al. [70] recommended the CFI as being the best
approximation of the population value for a single model. Marsh et al. [63] reported
that the NNFI is also generally unaffected by sample size; it is useful for situations
where a parsimony-type index is needed to account for the number of estimated pa-
rameters in a model [70]. Good-fitting models generally yield CFI or NNFI fit indices
of at least 0.90; that is, only a relatively small amount of variance remains unex-
plained by the model [13, 14, 16]. The degree of cross-validation that is expected for
a model on additional samples (ECVI [17]) is also a measure of model fit. A model is
preferred if it minimizes the value of ECVI relative to other models. We use ECVI for
assessing sequential modifications to models rather than assessing the model-data fit
for a single subgroup or for the entire sample.

Confirmatory Factor Analysis Methods


CFA was conducted using LISREL VIII [52]. LISREL is a statistical tool for analyz-
ing covariance matrices according to systems of structural equations. Doll and
Torkzadeh’s hypothesized second-order measurement model (see Figure 1) was tested
for model-data fit for the entire sample of 1,166 observations. Next, we assess model-
data fit and item-factor loadings for each of the 11 subgroups. The examination of
goodness-of-fit within subgroups is essential to assessing the issue of congeneric or
conceptual equivalence and identifying where, if anywhere, the model does not achieve
adequate fit. The failure of a subgroup to achieve adequate fit on subjective fit indices
may preclude tests of the invariance of specific parameters [16].
Item-factor loadings are estimates of the validity of the observed variables (items).
The larger the item-factor loadings—as compared with their standard errors and ex-
pressed by the corresponding t-values—the stronger is the evidence that the mea-
sured variables represent the underlying constructs [16]. Bagozzi and Yi [6] suggest
that item-factor loadings should exceed 0.60. Thus, in this research, items with load-
ing above 0.60 will be considered to have good construct validity. CFA also enables
us to estimate the reliability of individual items. The proportion of variance (R-square)
in the observed variables that is accounted for by the corresponding first-order latent
factor can be used to estimate the reliability of the observed variables (items). Items
with R-square value above 0.36 will be considered to have good reliability [6].
242 DOLL ET AL.

Invariance Analysis Methods


CFA models of factorial invariance [60, 61] enable one to test explicitly the structure
of a measurement model or its individual parameters for equivalence across sub-
groups or conditions. When parallel data exist for more than one group, CFA pro-
vides a particularly powerful test of the equivalence of solutions across the multiple
groups [5, 30, 62]. This chi-square test is like an F-test in that if any one of the 12
items had significantly different lambda for any of the three position subgroups, the
chi-square value would indicate that the test should be rejected. The researcher is able
to fit the data subject to the constraint that any one, any set, or all parameters are equal
in the multiple groups.
Tests of factorial invariance across multiple groups involve a hierarchical ordering
of nested models. Any two models are nested as long as the set of parameters esti-
mated in the more restrictive model is a subset of the parameters estimated in the less
restrictive model. Bentler [12, 14] noted the usefulness of testing a series of nested
models Mo, . . . Mi, . . . Mk, . . . , Ms, in which Mo is a suitably defined null model, Ms
is the saturated model with df = 0 and Mi and Mk are models with positive df of
intermediate complexity. When one model is a subset of another larger model, the
difference between two models can be tested by subtracting the two chi-square values
and testing this value against the critical value associated with the difference in de-
grees of freedom [14, 57].
This chi-square test is powerful and, where the hypothesis of equal item-factor
loadings or equal structural weights between subgroups is not rejected, it provides
strong support that observed differences between subgroups are due to chance [60].
If the chi-square difference is significant, the subjective fit indices are examined to
see how much they decline as invariance constraints are imposed. Small decreases in
the subjective fit indices would suggest that the differences in factor loadings or struc-
tural weights are not substantial and are unlikely to effect the interpretation of re-
search results. To evaluate whether model-data fit declines substantially as invariance
parameters are imposed, the researchers will examine changes in NNFI, CFI, RMSEA,
and ECVI.

The Sequence of Invariance Analysis


The hierarchical sequence begins with the least restrictive model in which only the
form of the model—the pattern of fixed and nonfixed parameters—is invariant across
groups. This initial baseline model is “totally noninvariant” in the sense that there are
no between-group invariance constraints on estimated parameters (i.e., a “no invari-
ance” model).
This baseline model provides a basis of comparison for all subsequent models in
the invariance hierarchy. If it is not able to fit the data, then none of the more restric-
tive models in the hierarchy will be able to do so [61]. The failure of the baseline
model for a particular subgroup to achieve “good fit,” or at least adequate fit, on
subjective fit indices may preclude any further tests of invariance involving this sub-
THE MEANING AND MEASUREMENT OF USER SATISFACTION 243

group [12]. Poor fit in a subgroup suggests that the instrument may not measure the
phenomena adequately in this subgroup—a new instrument or measurement model
may have to be developed for this particular subgroup.
Next, the equivalence of factor loadings across groups in each dimension is tested
(i.e., tau-equivalency). In evaluating measurement models, the primary concern is
usually about whether each item is a good measure of its latent construct. Factor
loadings are examined first because the equivalence of factor loadings is the minimal
condition for “factorial invariance.” Bollen [16] noted that the equality of factor load-
ings is generally of a higher priority than the equality of other parameters. Bentler
[11] suggested testing first for invariance of factor loadings because, without such
invariance, it would be difficult to argue that the factors are the same. If the factors are
not the same, it may be meaningless to test for the invariance of other parameters. To
test for equal factor loadings, an equal item-factor loading constraint is added to the
baseline model, creating a nested or more restrictive model that is a subset of the
baseline model. Thus, the significance of chi-square differences between these two
nested models provides a test of the hypotheses of equal item-factor loadings.
Next, if the hypothesis of equal item-factor loadings is not rejected, we move on to
test for the equality of the structural weights (gammas) across subgroups. This nested
or more restrictive model is a subset of the model specifying equal item-factor load-
ing. Thus, the chi-square differences between these models provide a test of whether
all five structural weights are equivalent across each of the subgroups. A p-value
greater than 0.05 indicates that the null hypothesis (i.e., no differences in structural
weights between subgroups) is not rejected. If the null hypothesis is rejected, we
examine the structural weights, generate alternative hypotheses, and test these alter-
native hypotheses to identify what structural weights are equivalent across subgroups.

Results
IN THE ENTIRE SAMPLE OF 1,166 RESPONDENTS, the hypothesized measurement model
(see Figure 1) has a chi-square of 377 for 49 degrees of freedom, RMSEA of 0.076,
NNFI of 0.96, and CFI of 0.97 (see the first line of Table 1). Based on the subjective
fit indices, the model-data fit for the overall sample was judged to be adequate. Pa-
rameter estimates for the 1,166 sample are illustrated in Table 2. The completely
standardized item-factor loadings and structural weights are above 0.60. These re-
sults indicate that the measurement model is appropriately specified, a proper solu-
tion is obtained, and the solution adequately fits the entire sample.
Chi-square statistics and subjective goodness-of-fit indices for each of the 11 sub-
groups are reported in Table 1. A proper solution is obtained for each subgroup. The
subjective fit indices for each of the 11 subgroups suggest adequate model-data fit. In
all 11 subgroups, NNFI and CFI scores are above 0.92 and 0.94, respectively. RMSEA
is below 0.08 for all subgroups except for database (0.097) and operating (0.092). All
subgroups have RMSEA scores below 0.1. The item-factor loadings in the 11 sub-
groups indicate that the items are generally good measures of their corresponding
first-order latent factors (see Table 3). Of the 132 item-factor loadings in Table 3, the
244
DOLL ET AL.
Table 2. Parameter Estimates and t-values of EUCS Instrument for Entire Sample (n = 1,166)

Observed variables Latent variables


Completely Completely
standardized Structure standardized
Factor factor R-square coefficient structure R-square
Item loading t-value loading (reliability) Factor (gamma) t-value coefficient (reliability)

C1 1.00 0.85 0.72 Content 0.78 31.40 0.92 0.85


C2 0.98 38.19 0.87 0.76
C3 1.00 30.29 0.75 0.56
C4 0.92 35.03 0.83 0.69
A1 1.00 0.90 0.81 Accuracy 0.59 25.01 0.75 0.56
A2 1.03 34.98 0.90 0.81
F1 1.00 0.82 0.67 Format 0.68 28.96 0.93 0.86
F2 1.06 31.31 0.85 0.72
E1 1.00 0.87 0.76 Ease of use 0.75 22.84 0.72 0.52
E2 0.93 30.04 0.88 0.77
T1 1.00 0.81 0.66 Timeliness 0.69 26.76 0.87 0.76
T2 0.93 26.54 0.79 0.62
Table 3. Completely Standardized Item-Factor Loadings for Each Subgroup

Content Accuracy Format Ease of use Timeliness

C1 C2 C3 C4 A1 A2 F1 F2 E1 E2 T1 T2

THE MEANING AND MEASUREMENT OF USER SATISFACTION


Positions of respondent
Managerial 0.84 0.87 0.76 0.80 0.88 0.89 0.78 0.84 0.87 0.87 0.84 0.79
Professional 0.86 0.84 0.75 0.80 0.89 0.90 0.84 0.85 0.83 0.93 0.75 0.82
Operating 0.82 0.86 0.78 0.84 0.89 0.89 0.80 0.84 0.82 0.85 0.81 0.81
Types of application
Decision support 0.83 0.83 0.74 0.79 0.89 0.87 0.79 0.84 0.84 0.90 0.82 0.81
Database 0.83 0.88 0.75 0.80 0.87 0.89 0.74 0.87 0.89 0.86 0.78 0.88
Transaction processing 0.84 0.86 0.74 0.83 0.89 0.92 0.82 0.83 0.83 0.88 0.79 0.77
Hardware platforms
Personal computers 0.83 0.83 0.76 0.82 0.87 0.85 0.82 0.81 0.83 0.87 0.78 0.77
Mainframe/mini 0.85 0.87 0.75 0.81 0.89 0.92 0.79 0.86 0.86 0.89 0.82 0.83
Modes of development
Systems analyst 0.84 0.85 0.76 0.81 0.90 0.91 0.79 0.85 0.86 0.88 0.79 0.79
Personally developed 0.81 0.75 0.52 0.75 0.89 0.85 0.69 0.76 0.86 0.81 0.78 0.91
Other end user 0.82 0.88 0.79 0.80 0.86 0.88 0.81 0.84 0.83 0.87 0.83 0.79

245
246 DOLL ET AL.

only loading below 0.60 is the one for item C3 for the personally developed subgroup
(0.52). Thus, we concluded that, in general, the measurement model illustrated in
Figure 1 adequately fits the data for each subgroup.

Results for Invariance of Item-Factor Loadings


Table 4 reports the chi-square and fit indices for the factor loading invariant model
(Model 2) for positions of respondent (Table 4A), types of application (Table 4B),
hardware platforms (Table 4C), and modes of development (Table 4D). The left side
of Table 4 reports the invariance models and their corresponding fit statistics. The
right side shows the hypotheses being tested (for example, H1-POS in Table 4A), the
nested models being compared, the change in chi-square, the change in degrees of
freedom between the nested models, and the significance level of the chi-square test.
Table 4A shows the results for H1-POS. With a significance level of 0.5260, the
hypothesis of equivalent item-factor loadings between managerial, professional,
and operating respondents is not rejected. Testing for invariance (i.e., equivalence)
is a particularly demanding test of an instrument’s robustness [5, 30, 62]. There-
fore, we continued with subsequent tests of the equivalence of structural weights
(H2-POS).
Table 4B shows the results for H1-APPL. With a significance level of 0.1016, the
hypothesis of equivalent item-factor loadings across decision support, database, and
transaction processing applications is not rejected. Table 4C shows the results for H1-
PLAT. The hypothesis of equivalent item-factor loadings across mainframe/mini and
microcomputer subgroups is not rejected (p = 0.1884). Table 4D depicts the results
for H1-MODE. Again, the hypothesis of equivalent item-factor loadings for the 12-
item instrument across the three modes of development is not rejected (p = 0.3781).
These results for H1-POS, H1-APPL, H1-PLAT, and H1-MODE suggest that the
12-item EUCS instrument is tau-equivalent. The instrument provides equivalent mea-
surement of content, accuracy, format, ease of use, and timeliness across respondent
positions, application types, hardware platforms, and modes of development. Thus,
the 12-item summed EUCS scale and the five first-order factors provide comparable
scores across the instrument’s originally intended universe of applicability.
The equivalence of item-factor loadings means that the first-order factors have the
same meaning across population subgroups tested. However, this does not imply that
each first-order factor is equally important or central to the second-order EUCS fac-
tor (i.e., user satisfaction) across population subgroups. For EUCS factor scores to be
in the same scale and, thus, comparable across subgroups, both item-factor loadings
and structural weights of the five first-order factors must be equivalent.

Results for Invariance of Structure Weights


Table 4 also reports the chi-square and fit indices for “item-factor loadings and struc-
tural weights invariant” model for positions of respondent (Table 4A, Model 3), types
of application (Table 4B, Model 3), hardware platforms (Table 4C, Model 3), and
Table 4A. Invariance Analysis by Positions of Respondent (POS)

Model Nested Significance


number Model description χ2 df NNFI CFI RMSEA ECVI Hypothesis models ∆χ2 ∆df level

Three groups: managerial, professional, and operating—1,166 cases

THE MEANING AND MEASUREMENT OF USER SATISFACTION


1 Equal pattern 492 147 0.95 0.96 0.078 0.57
2 Factor loading invariant 505 161 0.96 0.96 0.074 0.56 H1-POS 2-1 13 14 0.5260
3 Factor loadings and
structural weights
invariant 537 171 0.96 0.96 0.074 0.57 H2-POS 3-2 32 10 0.0004
4 Same as Model 3
except structural weight
of accuracy free 515 169 0.96 0.96 0.073 0.56 H3-POS 4-2 10 8 0.2640

Two groups: managerial and professional—918 cases


5 Equal pattern 341 98 0.96 0.97 0.074 0.51
6 Factor loading invariant 346 105 0.96 0.97 0.072 0.49 6-5 5 7 0.6600
7 Factor loadings and
structural weights
invariant 349 110 0.96 0.97 0.070 0.49 H4-POS 7-6 3 5 0.7000
(continues)

247
248
Table 4B. Invariance Analysis by Types of Application (APPL)

Model Nested Significance

DOLL ET AL.
number Model description χ2 df NNFI CFI RMSEA ECVI Hypothesis models ∆χ2 ∆df level

Three groups: decision support, database, and transaction processing—1,166 cases


1 Equal pattern 463 147 0.95 0.97 0.074 0.55
2 Factor loadings invariant 484 161 0.96 0.96 0.072 0.54 H1-APPL 2-1 21 14 0.1016
3 Factor loadings and
structural weights
invariant 506 171 0.96 0.96 0.071 0.54 H2-APPL 3-2 22 10 0.0151
4 Same as Model 3
except structural weight
of accuracy free 498 169 0.96 0.96 0.071 0.54 H3-APPL 4-2 14 7 0.0512

Two groups: decision support and transaction processing—859 cases


5 Equal pattern 303 98 0.96 0.97 0.069 0.49
6 Factor loading invariant 312 105 0.96 0.97 0.067 0.48 6-5 9 7 0.2526
7 Factor loadings and
structural weights
invariant 320 110 0.96 0.97 0.066 0.47 H4-APPL 7-6 8 5 0.1562

Table 4C. Invariance Analysis by Hardware Platforms (PLAT)


Model Nested Significance
number Model description χ2 df NNFI CFI RMSEA ECVI Hypothesis models ∆χ2 ∆df level

Two groups: personal computer and mainframe/mini—1,166 cases


1 Equal pattern 421 98 0.95 0.97 0.075 0.46
2 Factor loading invariant 431 105 0.96 0.97 0.073 0.46 H1-PLAT 2-1 10 7 0.1884
3 Factor loadings and
structural weights
invariant 442 110 0.96 0.97 0.072 0.46 H2-PLAT 3-2 11 5 0.0514
Table 4D. Invariance Analysis by Modes of Development (MODE)

Model Nested Significance


χ2 ∆χ2 ∆df

THE MEANING AND MEASUREMENT OF USER SATISFACTION


number Model description df NNFI CFI RMSEA ECVI Hypothesis models level

Three groups: system analysts, personally developed, and other end user—1,166 cases
1 Equal pattern 448 147 0.95 0.97 0.073 0.54
2 Factor loading invariant 463 161 0.96 0.97 0.070 0.52 H1-MODE 2-1 15 14 0.3781
3 Factor loadings and
structural weights
invariant 510 171 0.96 0.96 0.071 0.55 H2-MODE 3-2 47 10 0.0000

Two groups: system analysts and other end user—1,021 cases


4 Equal pattern 353 98 0.96 0.97 0.073 0.47
5 Factor loading invariant 362 105 0.96 0.97 0.070 0.46 5-4 9 7 0.2526
6 Factor loadings and
structural weights
invariant 365 110 0.96 0.97 0.069 0.46 H3-MODE 6-5 3 5 0.7000

249
250 DOLL ET AL.

modes of development (Table 4D, Model 3). The differences in chi-square between
nested models (Model 3 versus Model 2) are 32, 22, 11, and 47, respectively, for
positions, types of application, hardware platforms, and modes of development. These
chi-square differences are significant at 0.0004, 0.0151, 0.0514, and 0.0000, respec-
tively. Thus, the hypothesis of invariant structural weights of first-order factors is
rejected at the p < 0.05 level for subgroups representing respondent positions, types
of application, and modes of development. The hypothesis of equivalent structural
weights is not rejected for hardware platform. However, it should be noted that the
p-value for hardware platforms (0.0514) barely crosses the acceptable threshold of
0.05. In each case, Model 3 versus Model 2 shows little change in the subjective fit
indices.
The differences in structural weights imply that the first-order factors are not equally
important or central to the second-order EUCS factor across population subgroups in
three dimensions—positions, types of application, and modes of development. How-
ever, Table 5 does not indicate the source and nature of these differences—that is,
which first-order factors have different weights and whether they are lower or higher
than the other groups. Table 5 displays the structural weights of the first-order factors
for the “item-factor loadings invariant” models (Model 2 from Tables 3A, 3B, 3C,
and 3D). Table 5 helps us identify where the structural weights differ.

Analysis of Structural Weight Differences by Respondent Positions

An examination of Table 5 reveals that the structural weight (1.02) for the first-order
factor “accuracy” in the operating subgroup is substantially higher than those for the
managerial (0.76) and the professional (0.72) subgroups. The structural weights of
the “accuracy” factor are similar for the managerial and the professional subgroups.
To test whether the lack of invariance in the structural weights of the first-order
factors is limited to the “accuracy” factor in the operating subgroup, we test two
additional hypotheses. First, we test whether, with the structural weight for the “accu-
racy” factor set free, the other four first-order factors have equivalent weights across
the three subgroups (see Model 4, H3-POS in Table 4A). Using a test of nested mod-
els (Model 4 versus Model 2), the hypothesis of equivalent structural weights for the
four first-order factors is not rejected (p = 0.2640). Second, we test the hypothesis
(H4-POS) that the structural weights for all five first-order factors are invariant across
the managerial and professional subgroups (see the two-group analysis, Model 7 in
Table 4A). H4-POS is not rejected (p = 0.7000), supporting the contention that all
five first-order factors have equivalent weights across the two subgroups.

Analysis of Differences by Types of Application

An examination of Table 5 reveals that the structural weight (0.95) for the “accuracy”
factor in the database subgroup is substantially higher than that for the decision sup-
port subgroup (0.74). The structural weights of the “accuracy” factor are similar for
the decision support (0.74) and the transaction processing (0.81) subgroups.
THE MEANING AND MEASUREMENT OF USER SATISFACTION 251

Table 5. Structural Weights for Subgroups (Common Metric Completely Standardized


Solutions)

Structural weights (gammas)

Content Accuracy Format Ease of use Timeliness

Positions of respondent
Managerial 0.91 0.76 0.89 0.77 0.87
Professional 0.94 0.72 0.93 0.72 0.85
Operating 0.94 1.02 1.03 0.66 0.89
Types of application
Decision support 0.87 0.74 0.87 0.70 0.83
Database 0.88 0.95 0.90 0.61 0.96
Transaction processing 1.01 0.81 1.02 0.76 0.87
Hardware platforms
Personal computers 0.89 0.73 0.94 0.64 0.84
Mainframe/mini 0.95 0.84 0.93 0.77 0.89
Modes of development
Systems analyst 0.98 0.83 0.96 0.76 0.90
Personally developed 0.61 0.55 0.58 0.32 0.63
Other end user 0.89 0.80 0.99 0.71 0.87

To identify what factors or subgroups have statistically significant differences in the


structural weights across application subgroups, we test two additional hypotheses (see
H3-APPL and H4-APPL in Table 4B). H3-APPL tests whether, with the structural
weight for the “accuracy” factor set free, the other four factors have equivalent struc-
tural weights across the three application subgroups. The hypothesis of equivalent struc-
tural weights for the four first-order factors across the three subgroups is not rejected (p
= 0.0512), indicating that, if the “accuracy” factor is excluded, differences in the struc-
tural weights of the other four first-order factors may be due to chance. However, it
should be noted that the p-value is barely above the acceptable level of 0.05.
We then test the hypothesis (H4-APPL) that the structural weights for all five first-
order factors are invariant across the decision support and transaction processing sub-
groups (see the two-group analysis, Model 7 in Table 4B). H4-APPL is not rejected (p
= 0.1562), supporting the contention that all five first-order factor weights are equiva-
lent across decision support and transaction processing applications.

Analysis of Differences by Modes of Development


An examination of the structural weights of the first-order factors by mode of devel-
opment (see Table 5) suggests that applications developed by systems analyst or other
end users have similar weights for all five factors. In contrast, applications that are
personally developed by the end user have substantially lower structural weights for
all five first-order factors.
We test the hypothesis (H3-MODE) that the structural weights for all five first-
order factors are equivalent across the systems analyst and the other end-user sub-
groups (see Model 6 in the two-group section of Table 4D). H3-MODE is not rejected
252 DOLL ET AL.

(p = 0.7000), supporting the contention that the five first-order factors have equiva-
lent structural weights across two modes of development—analyst developed and
other end user developed.
Finally, we examine the sensitivity of these results to the exclusion of either the
database or the operating subgroups. These two subgroups have RMSEA scores above
0.08. For example, the 197 database responses are excluded from the common data
set, then the hypotheses are reexamined for the remaining 969 responses. This pro-
cess is repeated with the 247 operating responses excluded. The results indicate that
the findings are not sensitive to the exclusion of either subgroup.

Discussion and Implications


THE STUDY HAS SOME LIMITATIONS or underlying assumptions that provide a context
for discussing the results. First, the dimensions are selected to represent Doll and
Torkzadeh’s [28] originally specified universe of applicability. Population subgroups
based on other demographics such as gender or general level of experience with com-
puting have not been tested. Conclusions related to measurement equivalence are
group specific. An instrument that is invariant across one set of groups may not be
equivalent across other dimensions or subgroups. Second, this study assumes that
EUCS is a single second-order factor with five first-order factors. This is the mea-
surement model recommended by the instrument’s confirmatory study [31]. Third,
this study assumes that this second-order EUCS factor measures user satisfaction.
This multigroup invariance analysis has addressed some important, yet previously
unresolved, issues concerning differences in the measurement and meaning of user
satisfaction across population subgroups. Table 6 summarizes the hypotheses, the
statistical findings, and the implications of these findings.

The Measurement of User Satisfaction Across Population Subgroups


The first issue is whether the EUCS instrument provides equivalent measurement of
the five key satisfaction attributes (i.e., first-order factors) across population subgroups.
With regard to measuring content, accuracy, format, ease of use, and timeliness, the
EUCS instrument is remarkably robust. These key satisfaction attributes have the
same meaning across all subgroups. Scores for these factors (i.e., either summed
scales or factor scores) are comparable across all dimensions tested.
A second issue is whether the EUCS instrument’s 12-item summed scale provides
equivalent measurement across the dimensions and subgroups of the instrument’s
intended universe of applicability. Since all 12 items have invariant item-factor load-
ings, the 12-item summed scale provides comparable scores (i.e., in the same scale)
across all population subgroups tested. Managers or researchers who wish to evaluate
system success can use the 12-item summed scale to make comparisons across the
subgroups tested in this research. This comparability of scores has important research
design advantages; for example, it enables designs that require comparisons across a
Table 6. Summary of Research Results

Hypothesis Statistical findings Implications

H1-POS: Item-factor loadings H1-POS: Not rejected.


are equivalent across The 12-item EUCS instrument is tau-equivalent across The 12-item summed EUCS scale and the 5 first-order
managerial, professional, managerial, professional, and operating respondents. factors are comparable (i.e., in the same scale) across
and operating subgroups. managerial, professional, and operating respondents.

THE MEANING AND MEASUREMENT OF USER SATISFACTION


H2-POS: Structural weights H2-POS: Rejected.
are equivalent across The operating group has a significantly higher structural Accuracy is more important or central to user satisfaction
managerial, professional, weight for accuracy than managerial or professional for operating personnel than it is for managerial or
and operating subgroups. groups. professional respondents.
The hypothesis (H3-POS) that the other four structural Content, format, ease of use, and timeliness factors are
weights (content, format, ease of use, and timeliness) equally important or central to user satisfaction across
are equivalent across all three subgroups was not managerial, professional, and operating subgroups.
rejected.
With the operating group omitted, the hypothesis EUCS factor scores for managerial and professional
(H4-POS) that all five structural weights are equivalent groups are comparable (i.e., in the same scale); EUCS
across two groups (managerial and professional) was factor scores for operating respondents are not
not rejected. comparable with managerial or professional respondents.
H1-APPL: Item-factor loadings H1-APPL: Not Rejected.
are equivalent across The 12-item EUCS instrument is tau-equivalent across The 12-item summed EUCS scale and the five first-order
transaction processing, decision support, database, and transaction processing factors are comparable (i.e., in the same scale) across
database, and decision applications. decision support, database, and transaction processing
support subgroups. applications.
(continues)

253
254
DOLL ET AL.
Table 6. Continued.

Hypothesis Statistical findings Implications

H2-APPL: Structural weights H2-APPL: Rejected.


are equivalent across The database group has a significantly higher structural Accuracy is more important or central to user satisfaction
transaction processing, weight for accuracy than transaction processing or for database applications than for decision support or
database, and decision decision support groups. transaction processing applications.
support subgroups. The hypothesis (H3-APPL) that the structural weights Content, format, ease of use, and timeliness factors are
for content, format, ease of use, and timeliness equally important or central to user satisfaction across
(accuracy exempted) are equivalent across all three decision support, database, and transaction processing
subgroups was not rejected. applications.
With the database group omitted, the hypothesis EUCS factor scores are comparable (i.e., in the same
(H4-APPL) that all five structural weights are scale) across decision support and transaction
equivalent across two groups (decision support and processing applications; EUCS factor scores for
transaction processing) was not rejected. database respondents are not comparable with decision
support or transaction processing applications.
H1-PLAT: Item-factor H1-PLAT: Not Rejected.
loadings are equivalent The 12-item EUCS instrument is tau-equivalent across The 12-item summed EUCS scale and the five first-order
across mainframe/mini and mainframe/mini and microcomputer applications. factors are comparable (i.e., in the same scale) across
microcomputer platforms. mainframe/mini and microcomputer platforms.
H2-PLAT: Structural weights H2-PLAT: Not Rejected.
are equivalent across Ease of use has a higher structural weight for EUCS factor scores are comparable across mainframe/

THE MEANING AND MEASUREMENT OF USER SATISFACTION


mainframe/mini and mainframe/mini computers, but this difference was not mini and microcomputer platforms.
microcomputer platforms. statistically significant.
H1-MODE: Item-factor H1-MODE: Not Rejected.
loadings are equivalent The 12-item EUCS instrument is tau-equivalent across The 12-item summed EUCS scale and the five first-order
across analyst, personal, analyst, personal, and other end-user modes of factors are comparable (i.e., in the same scale) across
and other end-user modes development. analyst, personal, and other end-user modes of
of development. development.
H2-MODE: Structural weights H2-MODE: Rejected.
are equivalent across For all five first-order factors, personally developed EUCS factor scores are comparable (i.e., in the same
analyst, personal, and applications have substantially lower structural weights. scale) across analyst and other end-user modes of
other end-user modes of With the personally developed group omitted, the development; EUCS factor scores for personally
development. hypothesis (H3-MODE) that all five structural weights developed applications are not comparable with the
are equivalent across two groups (analyst and other other modes.
end-user developed) was not rejected.

255
256 DOLL ET AL.

wide variety of population subgroups. Also, it enhances the additivity of research by


improving confidence that findings from different studies are comparable.
This advantage may be specious if the structural weights of the first-order factors
vary across subgroups. Where the structural weights of the first-order factors vary
across subgroups, use of the 12-item summed scale will result in comparable, but
somewhat inaccurate measurement. EUCS factor scores may be more accurate be-
cause they capture differences in the meaning of user satisfaction across subgroups.
However, if two subgroups have different structural weights for the five first-order
factors, the EUCS factor scores for the two subgroups are in different scales and are
not comparable. If incorrect structural weights are used, the overall EUCS scores are
not necessarily more accurate than the 12-item summed scale.
There are also operational problems in using EUCS factor scores in ways that com-
promise accurate measurement. A typical research project seldom has enough re-
spondents in each of the subgroups suggested by the demographics of the sample to
adequately test for the invariance of structural weights. Except for invariance studies
like this one, researchers can expect to have little prior knowledge of the differences
in structural weights for population subgroups in their studies.
In deciding which scaling method to use, managers and researchers may have to
consider the tradeoff between comparability and accuracy for their situation. Most
managers may find the 12-item summed scale preferable because of its comparability
and ease of use. Few practitioners would want to bother with the complexities of
calculating EUCS factor scores. Normally, managers wish to compare user satisfac-
tion ratings for their portfolio of applications, identifying the worst and the best ap-
plications. The 12-item summed scale’s accuracy is more than adequate for this
purpose.
The selection of a scaling technique for researchers will depend upon their research
question. If the question focuses on the determinants of user satisfaction within a
specific population subgroup, factor scores would provide greater assurance of accu-
racy. If the research design includes respondents from heterogeneous subgroups and
the comparability and accuracy of satisfaction scores are absolutely essential, we
recommend using the five key satisfaction attributes (i.e., content, accuracy, format,
ease of use, and timeliness). If the 12-item summed scale is used for an overall mea-
sure of user satisfaction, the researcher should be aware that this scale is comparable,
but it may not reflect changes in the meaning of user satisfaction across subgroups. If
the researcher is using the overall EUCS scores with heterogeneous subgroups, each
subgroup should be analyzed separately and the similarity of structural weights of the
first-order factors across subgroups should be examined before combining samples.
When data from different groups of respondents are combined to study the rela-
tionship between various variables, an underlying assumption is that all variables
have the same meaning across subgroups. The results of this study will enhance re-
searchers’ confidence in the pooling of responses from subgroups that have been
shown to have equivalent structural weights. For example, responses can be pooled
across managerial and professional respondents, across decision support and transac-
tion processing applications, across personal computer and mainframe/mini, and across
THE MEANING AND MEASUREMENT OF USER SATISFACTION 257

analyst developed applications and applications developed by other end users. How-
ever, one should be cautious in pooling responses from operating respondents, data-
base applications, and personally developed applications with other subgroups in their
respective dimensions. User satisfaction seems to have a slightly different meaning in
these three population subgroups.

The Meaning of User Satisfaction Across Population Subgroups


As suggested by theory and implied by research developing user satisfaction instru-
ments for specific population subgroups, we find that the structural weights for some
first-order factors (e.g., accuracy) are not equivalent across subgroups. Since these
structural weights define the meaning of user satisfaction for each subgroup, we
conclude that user satisfaction is, as suggested by Zmud et al. [90], a context-sensi-
tive measure of system success. The meaning of user satisfaction may vary across
population subgroups; it is affected by the context or task requirements of the re-
spondents.
Except for applications personally developed by the end user, the differences in
structural weights appear to be confined to the accuracy factor. Accuracy is more
important to operating personnel and database users than to the others. System de-
signers need to be aware that accuracy is a key satisfaction factor in the design of
applications for these two population subgroups. Questionnaire responses only indi-
cate how satisfied users are with the accuracy of their applications; they do not hint at
the heightened importance of this attribute for user satisfaction among operating re-
spondents and database users.
Existing research on specialized user satisfaction instruments for decision support,
corporate database, and end-user developers has only partially mirrored what this
research finds as significant differences in the meaning of user satisfaction between
population subgroups. Researcher intuition that a specific subgroup is a unique con-
text, and requires the development of a specialized user satisfaction instrument, has
not always been justified.
The finding that the structural weights for database applications are not equivalent
with those for decision support or transaction processing applications supports
Goodhue’s [40] development of a specific instrument for database applications.
Goodhue’s instrument includes factors that may have diagnostic value, providing fur-
ther justification for the use of an instrument designed specifically for a corporate
database environment.
In contrast, among those who study decision support systems, intuition concerning
the need for a measure of user satisfaction in a decision support context versus a
transaction processing context is not supported by our research results. User satisfac-
tion appears to have the same meaning and measurement for both decision support
and transaction processing applications.
Also, previous instrument development efforts have not focused on the operating
respondent subgroup, where there is a clear need for differential measurement. Dif-
ferences in structural weights between managerial/professional and operating respon-
258 DOLL ET AL.

dents appear to reflect differences in their job responsibilities. As suggested by theory


[42], the nature of information required to support operational-level activities (i.e.,
greater detail) causes end users in this population subgroup to place greater weight on
accuracy as a component of user satisfaction.
Finally, personally developed applications have substantially lower structural weights
than those developed by analysts or other end users for all five first-order factors.
These consistently lower structural weights on all first-order factors imply that user
satisfaction has a substantially different meaning in this personally developed con-
text. Thus, this research supports Rivard and Huff’s [76] work on developing a user
satisfaction instrument specifically designed for the end-user development context.
This research also raises interesting and important theoretical questions about the
meaning of user satisfaction in the context of personally developed applications. Rivard
and Huff [76] do not include information product or ease-of-use dimensions in their
instrument for measuring the satisfaction of end-user developers. Instead, they focus on
factors such as satisfaction with independence from data processing and satisfaction
with support from data processing. Perhaps when the end users develop the informa-
tion product themselves, they see information product attributes as representing their
own work and, thus, as being less central to their satisfaction with the application itself.
Should other factors such as satisfaction with independence from data processing,
satisfaction with support, pride in ownership, employee empowerment, or self-effi-
cacy be included as essential components of user satisfaction in personally developed
applications? Pride in authorship or ownership may be inherent components of the
satisfaction users associate with personally developed applications. Personally devel-
oped applications are tools for doing work that is under the control of the individual
end user. They enhance employee empowerment. Employee empowerment may be
part of the satisfaction end users realize when using a personally developed applica-
tion. Further research is needed to improve our understanding of user satisfaction in
the context of personally developed applications.

Conclusions
THIS RESEARCH HAS ILLUSTRATED that the establishment of the construct validity of
user satisfaction within a population does not necessarily assure that it has the same
meaning in each subgroup. The common experiences or frames of reference of de-
mographic categories of respondents shape the meaning of user satisfaction. For the
most part, these are differences of degree (weighting variation among subgroups),
and our results do not suggest the need for different factors. A notable exception is
personally developed applications. While content, accuracy, format, ease of use, and
timeliness are related to user satisfaction, the consistently lower structural weights
suggest that additional factors may be required to adequately capture the meaning of
user satisfaction in this personally developed context.
The results also indicate that the structure of the EUCS instrument—that is, its five
first-order factors—holds in all subgroups. Content, accuracy, format, ease of use,
and timeliness are robust or tau-equivalent across all subgroups tested. They can be
THE MEANING AND MEASUREMENT OF USER SATISFACTION 259

used in research designs without concerns for the accuracy or comparability (i.e.,
scale differences) of scores. We recommend using these five key satisfaction attributes
to make comparisons in diverse samples.
Improvements in measurement are the cause, not the consequence, of progress in
information systems research. This research has illustrated that MIS researchers need
to pay greater attention to issues of scaling. There are potential problems with the
accuracy or comparability of overall user satisfaction scores when either the 12-item
summed EUCS scale or the overall EUCS scores are used to make comparisons across
the variety of population subgroups present in organizations. Where an overall mea-
sure of user satisfaction is essential to exploring a particular research question, the
choice of scaling method should receive adequate attention in crafting a research
design.

REFERENCES
1. Alloway, R., and Quillard, J. User managers’ systems needs. MIS Quarterly, 7, 2 (June
1983), 27–41.
2. Alter, S. How effective managers use information systems. Harvard Business Review,
54, 6 (1976), 97–106.
3. Alter, S. A taxonomy of decision support systems. Sloan Management Review, 19, 1
(1977), 39–56.
4. Alter, S. Development patterns for decision support systems. MIS Quarterly, 2, 3 (Sep-
tember 1978), 33–42.
5. Alwin, D.F., and Jackson, D.J. Applications of simultaneous factor analysis to issues of
factorial invariance. In D.J. Jackson and E.P. Borgatta (eds.), Factor Analysis and Measure-
ment in Sociological Research: A Multidimensional Perspective. Beverly Hills, CA: Sage, 1981,
pp. 249–279.
6. Bagozzi, R.P., and Yi, Y. On the evaluation of structural equation models. Journal of
Academy of Marketing Science, 16, 1 (1988), 74–94.
7. Bailey, J.E., and Pearson, S.W. Development of a tool for measuring and analyzing
computer user satisfaction. Management Science, 29, 5 (May 1983), 530–545.
8. Bailey, R. Human Performance Engineering: A Guide for System Designers. Englewood
Cliffs, NJ: Prentice Hall, 1982.
9. Baroudi, J.J., and Orlikowski, W.A. A short-form measure of user information satisfac-
tion: A psychometric evaluation and notes on use. Journal of Management Information Sys-
tems, 4, 4 (Spring 1988), 44–59.
10. Bejar, I. Biased assessment of program impact due to psychometric artifacts. Psychologi-
cal Bulletin, 87, 3 (May 1980), 513–524.
11. Bentler, P.M. Theory and Implementation of EQS: A Structural Equations Program. Los
Angeles: BMDP Statistical Software, 1988.
12. Bentler, P.M. Comparative fit indexes in structural models. Psychological Bulletin, 107,
2 (1990), 238–246.
13. Bentler, P.M. On the fit of models to covariances and methodology to the bulletin. Psy-
chological Bulletin, 112, 3 (1992), 400–404.
14. Bentler, P.M., and Bonett, D.G. Significance tests and goodness-of-fit in the analysis of
covariance structure. Psychological Bulletin, 88, 3 (1980), 588–606.
15. Bhattacherjee, A. Understanding information systems continuance: An expectation-con-
firmation model. MIS Quarterly, 25, 3 (September 2001), 351–370.
16. Bollen, K.A. Structural Equations with Latent Variables. New York: Wiley, 1989.
17. Brown, M.W., and Cudeck, R. Alternative ways of assessing model fit. In K.A. Bollen
and J.S. Long (eds.), Testing Structural Equation Models. Newbury Park, CA: Sage, 1993, pp.
136–162.
260 DOLL ET AL.

18. Byrne, B.M. Strategies in testing for an invariant second-order factor structure: A com-
parison of EQS and LISREL. Structural Equation Modeling, 2, 1 (1995), 53–72.
19. Byrne, B.M. Structural Equation Modeling with LISREL, PRELIS, and SIMPLIS: Basic
Concepts, Applications, and Programming. Mahwah, NJ: Lawrence Erlbaum, 1998.
20. Byrne, B.M., and Shavelson, R.J. Adolescent self-concept: Testing the assumption of
equivalent structure across gender. American Educational Research Journal, 24, 3 (Fall 1987),
365–385.
21. Chin, W., and Newsted, P. The importance of specification in causal modeling: the case
of end-user computing satisfaction. Information Systems Research, 6, 1 (March 1995), 73–81.
22. Conrath, D., and Mignen, O. What is being done to measure user satisfaction with EDP/
MIS. Information & Management, 19, 1 (1990), 7–19.
23. Davis, F.; Bagozzi, R.; and Warsaw, P. User acceptance of computer technology: A com-
parison of two theoretical models. Management Science, 35, 8 (1989), 982–1003.
24. DeLone, W.H., and McLean, E.R. Information system success: The quest for the depen-
dent variable. Information Systems Research, 3, 1 (March 1992), 60–95.
25. DeLone, W.H., and McLean, E.R. The DeLone and McLean model of information sys-
tems success: A ten-year update. Journal of Management Information Systems, 19, 4 (Spring
2003), 9–30.
26. Dix, A.; Finlay, J.; Abowd, G.; and Beale, R. Human Computer Interaction. Englewood
Cliffs, NJ: Prentice Hall, 1993.
27. Doll, W.J., and Torkzadeh, G. A discrepancy model of end-user computing involvement.
Management Science, 35, 10 (October 1989), 1151–1171.
28. Doll, W.J., and Torkzadeh, G. The measurement of end-user computing satisfaction. MIS
Quarterly, 12, 2 (June 1988), 259–274.
29. Doll, W.J., and Xia, W. Confirmatory factor analysis of the end-user computing satisfac-
tion instrument: A replication. Journal of End User Computing, 9, 2 (Spring 1997), 24–31.
30. Doll, W.J.; Hendrickson, A.; and Deng, X. Using Davis’s perceived usefulness and ease
of use instruments in decision making: A multigroup invariance analysis. Decision Sciences,
29, 4 (December 1998), 839–870.
31. Doll, W.J.; Xia, W.; and Torkzadeh, G. A confirmatory factor analysis of the EUCS
instrument. MIS Quarterly, 18, 4 (December 1994), 453–461.
32. Drasgow, F., and Kanfer, R. Equivalence of psychological measurement in heteroge-
neous populations. Journal of Applied Psychology, 70, 4 (1985), 662–680.
33. Essex, P.A., and Magal, S.R. Determinants of information center success. Journal of
Management Information Systems, 15, 2 (Fall 1998), 95–117.
34. Etezadi-Amoli, J., and Farhoomand, A. A structural model of end user computing satis-
faction and user performance. Information & Management, 30, 2 (1996), 65–73.
35. Feltman, G. The value of information. Accounting Review, 43, 4 (October 1968), 684–696.
36. Gallagher, C. Perceptions of the value of a management information system. Academy of
Management Journal, 17, 1 (March 1974), 46–55.
37. Gatian, A.W. Is user satisfaction a valid measure of system effectiveness? Information &
Management, 26, 3 (1994), 119–131.
38. Gelderman, M. The relation between user satisfaction, usage of information systems,
and performance. Information & Management, 34, 1 (1998), 11–18.
39. Gelderman, M. Task difficulty, task variability and satisfaction with management sup-
port systems. Information & Management, 39, 7 (July 2002), 593–603.
40. Goodhue, D.L. Supporting users of corporate data: The effect of I/S policy choices.
Ph.D. dissertation, MIT, Cambridge, MA, 1988.
41. Goodhue, D.L., and Thompson, R.L. Task-technology fit and individual performance.
MIS Quarterly, 19, 2 (1995), 213–236.
42. Gorry, G., and Scott-Morton, M.S. A framework for management information systems.
Sloan Management Review, 13, 1 (Fall 1971), 55–70.
43. Gorsuch, R.L. A comparison of biquartimin, maxplane, promax, and varimax. Educa-
tional and Psychological Measurement, 30, 4 (Winter 1970), 861–872.
44. Hardgrave, B.C., and Wilson, R.L. Toward a contingency model for selecting an infor-
mation system prototyping strategy. Journal of Management Information Systems, 16, 2 (Fall
1999), 113–136.
THE MEANING AND MEASUREMENT OF USER SATISFACTION 261

45. Harris, M., and Schaubroeck, J. Confirmatory modeling in organizational behavior/hu-


man resource management: Issues and applications. Journal of Management, 16, 2 (1990),
337–360.
46. Hendrickson, A.R., and Glorfeld, K. On the repeated test-retest reliability of the end-user
computing satisfaction instrument. Decision Sciences, 25, 4 (July–August 1994), 655–667.
47. Igbaria, M. End-user computing effectiveness: A structural equation model. Omega, 18,
6 (1990), 637–652.
48. Igbaria, M., and Zviran, M. End-user effectiveness: A cross-cultural examination. Omega,
19, 5 (1991), 369–379.
49. Ives, B., and Olson, M. User involvement and MIS success: A review of research. Man-
agement Science, 30, 5 (May 1984), 586–603.
50. Ives, B.; Olson, M.; and Baroudi, J.J. The measurement of user information satisfaction.
Communications of the ACM, 26, 10 (October 1983), 785–793.
51. Joreskog, K. Simultaneous factor analysis in several populations. In K. Joreskog and D.
Sorbom (eds.), Advances in Factor Analysis and Structural Equation Modeling. Cambridge,
MA: Abt Books, 1979, pp. 189–206.
52. Joreskog, K., and Sorbom, D. LISREL VIII User’s Guide. Mooresville, IN: Scientific
Software, 1993.
53. Kim, S., and McHaney, R. Validation of the end-user computing satisfaction instrument
in case tool environments. Journal of Computer Information Systems, 41, 1 (Fall 2000), 49–55.
54. Klein, G.; Jiang, J.J.; and Sobol, M.G. A new view of IS personnel performance evalua-
tion. Communications of the ACM, 44, 6 (June 2001), 95–101.
55. Klenke, K. Construct measurement in management information systems: A review and
critique of user satisfaction and user involvement instruments. INFOR, 30, 4 (November 1992),
325–348.
56. Lin, W.T., and Shao, B.B.M. The relationship between user participation and system
success: A simultaneous contingency approach. Information & Management, 37, 6 (Septem-
ber 2000), 283–295.
57. Long, J.S. Confirmatory Factor Analysis: A Preface to LISREL. Beverly Hills, CA: Sage,
1983.
58. Marcoulides, G.A., and Hershberger, S.L. Multivariate Statistical Methods: A First Course.
Mahwah, NJ: Lawrence Erlbaum, 1997.
59. Marsh, H.W. The structure of masculinity/femininity: An application of confirmatory
factor analysis to higher-order factor structures and factorial invariance. Multivariate Behav-
ioral Research, 20, 4 (October 1985), 427–449.
60. Marsh, H.W. The factorial invariance of responses by males and females to a multidi-
mensional self-concept instrument: Substantive and methodological issues. Multivariate Be-
havioral Research, 22, 4 (1987), 457–480.
61. Marsh, H.W. Confirmatory factor analysis models of factorial invariance: A multifaceted
approach. Structural Equation Modeling, 1, 1 (1994), 5–34.
62. Marsh, H.W., and Hocevar, D. Application of confirmatory factor analysis to the study of
self-concept: First and higher order factor models and their invariance across groups. Psycho-
logical Bulletin, 97, 3 (1985), 562–582.
63. Marsh, H.W.; Balla, J.R.; and McDonald, R.P. Goodness-of-fit indexes in confirmatory
factor analysis: The effects of sample size. Psychological Bulletin, 103, 3 (May 1988), 391–410.
64. McHaney, R., and Cronan, T.P. Computer simulation success: On the use of the end-user
computing satisfaction instrument: A comment. Decision Sciences, 29, 2 (Spring 1998),
525–536.
65. McHaney, R., and Cronan, T.P. Toward an empirical understanding of computer simula-
tion implementation success. Information & Management, 37, 3 (April 2000), 135–151.
66. McHaney, R., and Hightower, R. EUCS test-retest reliability in representational model
decision support systems. Information & Management, 36, 2 (August 1999), 109–119.
67. McHaney, R.; Hightower, R.; and Pearson, J. A validation of the end-user computing
satisfaction instrument in Taiwan. Information & Management, 39, 6 (May 2002), 503–601.
68. McKeen, J.D., and Guimaraes, T. Successful strategies for user participation in systems
development. Journal of Management Information Systems, 14, 2 (Fall 1997), 133–150.
262 DOLL ET AL.

69. McKinney, V.; Yoon, K.; and Zahedi, F.M. The measurement of Web-customer satisfac-
tion: An expectation and disconfirmation approach. Information Systems Research, 13, 3 (Sep-
tember 2002), 296–315.
70. Medsker, G.J.; Williams, L.J.; and Holahan, P.J. A review of current practices for evalu-
ating causal models in organizational behavior and human resources management research.
Journal of Management, 20, 2 (1994), 439–464.
71. Melone, N.P. A theoretical assessment of the user-satisfaction construct in information
systems research. Management Science, 36, 1 (January 1990), 76–91.
72. Miller, H. The multiple dimensions of information quality. Information Systems Man-
agement, 13, 2 (1996), 79–82.
73. Rai, A.; Lang, S.S.; and Welker, R.B. Assessing the validity of IS success models: An
empirical test and theoretical analysis. Information Systems Research, 13, 1 (March 2002),
50–69.
74. Raymond, L. Validating and applying user satisfaction as a measure of MIS success in
small organizations. Information & Management, 12, 4 (April 1987), 173–179.
75. Rentz, J.O. Generalizability theory: A comprehensive method for assessing and improv-
ing the dependability of marketing measures. Journal of Marketing Research, 24, 1 (1987),
19–28.
76. Rivard, S., and Huff, S.L. Factors of success for end-user computing. Communications
of the ACM, 31, 5 (1988), 552–561.
77. Rivard, S.; Poirier, G.; Raymond, L.; and Bergeron, F. Development of a measure to
assess the quality of user-developed applications. Database for Advances in Information Sys-
tems, 28, 3 (1997), 44–58.
78. Sanders, L.G. MIS/DSS success measure. Systems, Objectives, Solutions, 4, 1 (1984),
29–34.
79. Sanders, L.G. Issues and instruments for measuring system success. Working Paper,
Jacobs Management Center, SUNY–Buffalo, September 1990.
80. Sanders, L.G., and Courtney, J.F. A field study of organizational factors influencing DSS
success. MIS Quarterly, 9, 1 (March 1985), 77–92.
81. Smith, P.C.; Kendall, L.; and Hulin, C.L. The Measurement of Satisfaction in Work and
Retirement. Chicago: Rand McNally, 1969.
82. Steiger, J.H. Ez-Path: A Supplementary Module for SYSTAT and SYGRAPH. Evanston,
IL: SYSTAT, 1989.
83. Subramanian, A., and Nilakanta, S. Measurement: A blue print for theory building in
MIS. Information & Management, 26, 1 (1994), 13–20.
84. Swanson, E.B. Management information systems: Appreciation and involvement. Man-
agement Science, 21, 2 (October 1974), 178–188.
85. Torkzadeh, G., and Doll, W.J. Test-retest reliability of the end-user computing satisfac-
tion instrument. Decision Sciences, 22, 1 (1991), 26–37.
86. Tucker, L., and Lewis, C. A reliability coefficient for maximum likelihood factor analy-
sis. Psychometrika, 38, 1 (March 1973), 1–10.
87. Wheaton, B. Assessment of fit in overidentified models with latent variables. Sociologi-
cal Methods and Research, 16, 1 (August 1987), 118–154.
88. Wilken, P.H., and Blalock, H.M., Jr. The generalizability of indirect measures to com-
plex situations. In G. Bohrnstedt and E. Borgatta (eds.), Social Measurement: Current Issues.
Beverly Hills, CA: Sage, 1981, pp. 39–62.
89. Zmud, R.W., and Boynton, A.C. Survey measures and instruments in MIS: Inventory
and appraisal. In K.L. Kraemer and J.I. Cash, Jr. (eds.), The Information Systems Research
Challenge: Survey Research Methods. Boston: Harvard Business School Press, 1991, pp.
149–180.
90. Zmud, R.W.; Sampson, J.P.; Reardon, R.C.; Lenz, J.G.; and Byrd, T.A. Confounding
effects of construct overlap: An example from IS user satisfaction theory. Information Technol-
ogy and People, 7, 2 (1994), 29–45.
91. Zwass, V. Foundations of Information Systems. Boston: Irwin/McGraw-Hill, 1998.
View publication stats

You might also like