You are on page 1of 9

Measurement and Scaling Concepts

At-a-glance

I. WHAT IS TO BE MEASURED?
A. Concepts
B. Operational Definitions

II. RULES OF MEASUREMENT

III. TYPES OF SCALES


A. Nominal scale
B. Ordinal scale
C. Interval scale
D. Ratio scale
E. Mathematical and statistical analysis of scales

IV. INDEX MEASURES

V. THREE CRITERIA FOR GOOD MEASUREMENT


A. Reliability
B. Validity
1. Face or content validity
2. Criterion validity
3. Construct validity
4. Convergent validity
5. Discriminant validity
C. Reliability versus validity
D. Sensitivity

Lecture Outline

I. WHAT IS TO BE MEASURED?

Any researcher has the opportunity to select a measuring system. Unfortunately, many measurement scales
used in business research are not directly comparable. The first question the researcher must ask is “What is
to be measured?” This question is not as simple as it first seems. A precise definition of the concept may
require a description of how it will be measured, and there is frequently more than one way of measuring a
concept. Further, true measurement of concepts requires a process of precisely assigning scores or numbers to
the attributes of people or objects. To have precise measurement in business research requires a careful
conceptual definition, an operational definition, and a system of consistent rules for assigning numbers or
scales.

A. Concepts: Before the measurement process can occur, the researcher has to identify and define the
concepts relevant to the problem. A concept (or construct) is a generalized idea about a class of
objects, attributes, occurrences, or processes. Concepts such as brand loyalty, personality, and so on,
present great problems in terms of definition and measurement.

B. Operational definitions: Concepts must be made operational in order to be measured. An operational


definition gives meaning to a concept by specifying the activities or operations necessary to measure
it. It specifies what the investigator must do to measure the concept under investigation. An
116 Part I Chapter Materials

operational definition tells the investigator “do such-and-such in so-and-so manner.” Exhibit 13.2
presents some operational definitions and measures of job challenge from a study on the quality of
life.

II. RULES OF MEASUREMENT

A rule is a guide instructing us what to do. An example of a measurement rule might be “assign the numerals
1 through 7 to individuals according to how brand loyal they are. If the individual is an extremely brand loyal
individual, assign a 1. If the individual is a total brand switcher with no brand loyalty, assign a 7.”
Operational definitions help the researcher specify the rules for assigning numbers.

III.TYPES OF SCALES

A scale may be defined as any series of items that are progressively arranged according to value or
magnitude into which an item can be placed according to its quantification. In other words, a scale is a
continuous spectrum or series of categories. The purpose of scaling is to represent, usually quantitatively, an
item’s, a person’s, or an event’s place in the spectrum.

The four types of scale in business research are as follows:

A. Nominal scale: The simplest type of scale. The numbers or letters assigned to objects serve as labels
for identification or classification. The first drawing in Exhibit 13.3 depicts the number 7 on a horse’s
color. This is merely a label for betters and horse racing enthusiasts.

B. Ordinal scale: This scale arranges objects or alternatives according to their magnitude. In our race
horse example, we assign a 1 to a win position, a 2 to the place position and a 3 to a show position.
A typical ordinal scale in business asks respondents to rate brands, companies, and so on as
“excellent,” “good,” “fair,” or “poor.” We know that “excellent” is better than “good,” but we don’t
know by how much.

C. Interval scale: Exhibit 13.3 depicts a horse race in which the win horse was two lengths ahead of the
place horse. Not only is the order of the finish known, but the distance between the horses is known.
Interval scales not only indicate order, they measure order (or distance) in units of equal intervals.
The location of the zero point is arbitrary. The classic example of an interval scale is the Fahrenheit
temperature scale. If the temperature is 80 , it cannot be said that it is twice as hot as a 40
temperature.

D. Ratio scale: Ratio scales have absolute rather than relative scales. For example, both money and
weight are ratio scales because they possess an absolute zero and interval properties. The absolute
zero represents a point on the scale where there is an absence of the given attribute. However, for
most behavioral business research, interval scales are typically the best measurements.

E.Mathematical and statistical analysis of scales: The type of scale utilized in business research will
determine the form of the statistical analysis. Exhibit 13.4 shows the appropriate descriptive statistics
for each type of scale.

IV.INDEX MEASURES

This chapter thus far focused on measuring a concept with a single question or a single observation.
However, measuring more complex concepts may require more than one question because the concept has
several attributes. An attribute is a single characteristic or fundamental feature pertaining to an object,
person, situation, or issue.
Chapter 13 Measurement and Scaling Concepts 117

Multi-itemed instruments for measuring a single concept with several attributes are called index measures,
or composite measures. For example, index of social class may be based on three weighted averages:
residence, occupation, and residence. Asking different questions in order to measure the same concept
provides a more accurate cumulative measure than does a single-item measure.

V. THREE CRITERIA FOR GOOD MEASUREMENT

There are three major criteria for evaluating measurements:

A. Reliability: Reliability applies to a measure when similar results are obtained over time and across
situations. It is the degree to which measures are free from random error and, therefore, yield
consistent results.

There are two dimensions of reliability; repeatability and internal consistency. Assessing the
repeatability of a measure is the first aspect of reliability. The test-retest method involves
administering the same scale or measurement to the same respondents at two separate points in time
to test for stability. If the measure is stable over time, the repeated test administered under similar
conditions should obtain similar results. High stability correlation, or consistency between the two
measures at time one and time two, indicates a high degree of reliability. There are two problems
with measures of test-retest reliability; first, the first measure may sensitize the respondents to their
participation in a research project and, subsequently, influence the results of the second measure.
Further, if the duration of the time period between measures is long, there may be attitude change, or
some other form of maturation, of the subjects which will affect the responses.

The second underlying dimension of reliability concerns the homogeneity of the measure. An attempt
to measure an attitude may require asking several questions or a battery of scale items. To measure
the internal consistency of a multiple-item measure, scores on subsets of items within the scale must
be correlated. The split-half method, when a researcher checks the results of one half of the scale
items to the other half, is the most basic method for checking internal consistency.

The equivalent-form method is utilized when two alternative instruments are designed to be as
equivalent as possible. If there is a high correlation between the two scales, the researcher can
conclude that the scales are reliable. However, if there is a low correspondence, the researcher will be
unsure as to whether the measure has intrinsically low reliability, or whether the particular
equivalent-form has failed to be similar to the other form.

Both of the above methods assume that the concept is uni-dimensional; they measure homogeneity
rather than over-time stability.

B. Validity: The purpose of measurement is to measure what we intend to measure. For example, in
measuring intention to buy, there could be a systematic bias to identify brands “I wish I could afford”
rather than the brand usually purchased. Validity addresses the problem of whether or not a measure
does indeed measure what it purports to measure; if it does not, there will be problems.

Researchers attempt to provide some evidence of a measure’s degree of validity. There are three basic
approaches to dealing with the issue of validity:

1. Face or content validity: This refers to the subjective agreement of professionals that a scale
logically appears to be accurately reflecting what it purports to measure.

2. Criterion validity: Criterion validity is an attempt by researchers to answer the question “Does
my measure correlate with other measures of the same construct?” Consider the physical concept
of length. If a new measure of length were developed, finding that the new measure correlated
118 Part I Chapter Materials

with other measures of length would provide some assurance that the measure was valid.
Criterion validity may be classified as either concurrent validity (when the measure is taken at
the same time as the criterion measure) or predictive validity (when the measure predicts a
future event).

3. Construct validity: Construct validity is established by the degree to which the measure
confirms a network of related hypotheses generated from a theory based on the concept. In its
simplest form, if the measure behaves the way it is supposed to in a pattern of intercorrelation
with a variety of other variables, then there is evidence for construct validity. This is a complex
method of establishing validity and of less concern to the applied researcher than to the basic
researcher.

4. Convergent and discriminant validity: Convergent validity is the same as criterion validity
because a new measure is expected to predict or converge with similar measures. A measure has
discriminant validity when it has a low correlation with measures of dissimilar concepts.

C. Reliability versus validity: The concepts of reliability and validity should be compared. Reliability,
although necessary for validity, is not in itself sufficient. The differences between reliability and
validity can be illustrated using the rifle target in Exhibit 13.5.

D. Sensitivity: The sensitivity of a scale is important; particularly when changes in attitude, or other
hypothetical constructs, are under investigation. Sensitivity refers to the ability of a instrument to
accurately measure variability in stimuli or responses. The sensitivity of a scale which is based on a
single question or a single item can be increased by adding additional questions or items. In other
words, because index measures allow for a greater range of possible scores, they are more sensitive
than single item scales.

Questions/Answers

1. What are the appropriate descriptive statistics allowable with nominal, ordinal, and interval scales?

To calculate a mean and standard deviation interval or ratio, data are required. Only nonparametric statistics
are permissible with nominal and ordinal data. The detailed answer to this question is outlined in Exhibit
13.4 entitled Descriptive Statistics for Types of Scales.

2. Discuss the difference between validity and reliability.

The difference between these two key measurement concepts is often confused by students. Reliability refers
to the ability of a measure to obtain similar results over time and across situations. Reliability is the degree
to which measures are free from error and, therefore, yield consistent results. Validity refers to the measure’s
ability to measure what we intend to measure.

3. What is the difference between a conceptual definition and an operational definition?

A conceptual definition is defined as a verbal explanation of the meaning of the concept. It defines the
domain of the concept; it may explain what the concept is not. An operational definition is a definition that
gives meaning to a concept by specifying the activities and operations necessary to measure it. The
operational definition is at a more concrete level than the abstract conceptual definition.

Here are two examples of conceptual definitions:

Job satisfaction may be defined as one’s affective reaction to one’s total job.
Chapter 13 Measurement and Scaling Concepts 119

Job involvement may be defined as the extent to which one psychologically identifies with one’s work.

4. Why might a researcher wish to utilize more than one question to measure satisfaction with a job?

The answer to this question requires a recognition that there are several dimensions to job satisfaction. Like
many aspects of our lives, job satisfaction may be both a source of disappointment and satisfaction at the
same time. It may be appropriate to have a battery of measures to make sure that the scale measures what it
is supposed to measure. Because job satisfaction is a complex multidimensional concept, several questions
may be required to design scales that are valid and reliable.

5. Comment on the validity and reliability of the following:

A) A respondent’s reporting of an intention to subscribe to Consumer Reports is highly reliable.


A researcher believes that this self-report constitutes a valid measurement of dissatisfaction
with the economic system and alienation from big business.

There is a problem with validity in this instance. There may be other reasons for reading Consumer Reports
other than alienation from big business. It has been said that a bent ruler may consistently provide the same
results, but this does not necessarily indicate accuracy of measurement. This probably is the case when a
researcher uses this magazine as an indicator of alienation.

B) A general-interest magazine advertised that the magazine was a better advertising medium
than television programs with similar content. Research for a soft drink and other test
products indicated recall scores were higher for magazine ads than for 30-second
commercials.

This question deals with advertising effectiveness. The question indirectly asks the question, “What is
advertising effectiveness?” Recall—consumers’ ability to remember commercials—is a standard form of
measuring advertising effectiveness. However, it has been argued that the persuasive power of television is
substantially greater than magazines. Television has the ability to involve the prospect. Recall may not be a
valid measure of advertising effectiveness if the goal is to measure persuasiveness rather than ability to
remember the ads.

C) A respondent’s report of frequency of magazine reading consistently indicates that she


regularly read Good Housekeeping and Gourmet and never reads Cosmopolitan.

This question implies a longitudinal study reporting magazine readership at several points in time. Because
answers are consistent on all occasions, the results are reliable. However, the question may not be valid.
Suppose, for example, a respondent’s longitudinal report of magazine readership gave the following
responses over a one-year period: period one, never read Cosmopolitan, period two, subscribe to
Cosmopolitan, period three, occasionally read Cosmopolitan. These results show a lack of reliability. While
it is possible that the subject has changed her behavior radically over the course of the year, it is more likely
that response bias is inherent in the question concerning the reading of Cosmopolitan magazine. Even with
consistent answers a response bias could occur, hence a lack of validity.

6. Indicate whether the following measures are nominal, ordinal, interval, or ratio scales.

A) Prices on the stock market are ratio scales. They have an absolute zero point.

B) Marital status, when it is classified as married or never married, is a nominal scale because it
indicates a category of marital status.

C) Whether or not a respondent has ever been unemployed is a nominal scale. The two
120 Part I Chapter Materials

categories are “employed at least once” and “never employed.”

D) Professional rank: assistant professor, associate professor, or professor is an ordinal scale


because it indicates an ordered rank, according to a hierarchical status. However, depending
on the context of the research, it may be considered to be a nominal scale.

E) Grades: A, B, C, D, or F.

As students know, these show order, but not necessarily interval order.

8. Define the following concepts, then operationally define each concept:

A) A good bowler

Conceptually a good bowler is someone who regularly bowls and scores above average.
Operationally a good bowler might be defined as someone who bowls in a league and has a 185 average.

B) A workaholic

Conceptual definition: Most students will have a feel for this concept, and conceptually define it as an
alcoholic, as someone who works all the time, who is addicted to work, or who works to excess.

Operational definition: The answer that is first given is often someone who works more than x hours per
week, perhaps 70 hours per week. This will not, however, tap the cognitive domain of someone who loves to
work, who cannot wait for Monday mornings. It may not tap someone’s self-perception that they are a
workaholic.

Some operational definitions might be answers to “How many hours per week do you work?” or “How many
hours did you work this week?” Or a simple self report statement, such as “Are you a workaholic?”

A series of attitude statements might also be used as an operational definition. For example:

“I love to go to work on Monday mornings.”

strongly agree, agree, disagree, strongly disagree

“Work is the most important thing in my life.”

C) Purchasing intention for a new palm-sized computer.

Purchasing intention is an individual’s plan to buy a palm-sized computer. An individual who indicates
“definitely will” or “probably will” on a scale that reads: Do you plan to purchase a new palm-sized
computer in the next six months?

Definitely Probably Uncertain Probably Definitely


Will Will Will Not Will Not

D) A mentor

The word mentor comes from the name of Odysseus’s loyal advisor Mentor who was entrusted with the care
and education of Telemachus during Odysseus’s adventures. In business today a conceptual definition might
be a person who adopts a newcomer and provides tips on how to navigate the corporate hierarchy.
Chapter 13 Measurement and Scaling Concepts 121

An operational definition might be a behavioral report similar to the conceptual definition: Did any one
individual “take you under his/her wing” and show how to be successful in this corporation?

E) Media skepticism

Media skepticism has been defined as the degree to which individuals are skeptical toward the reality
presented in the mass media. It varies among individuals from those who are mildly skeptical and accept
most of what they see and hear in newspapers, magazines, radio, and television, to those who completely
discount and disbelieve the facts, values, and portrayal of reality in the media.

An operational definition of this concept is as follows: Please tell me how true each statement is about the
media. Is it very true, not very true, or not at all true?

The program was not very accurate in its portrayal of the problem.

Most of the story was “staged” for entertainment purposes.

The presentation was slanted and unfair.

I think the story was fair and unbiased.

I think important facts were purposely left out of the story.

Based on M.D. Cozzens and N.S. Contractor, “The Effect of Conflicting Information on Media Skepticism,”
Communications Research (August 1987): 437-451.

F) American Dream

This is a term often used, but rarely conceptually defined. Ask your students what they think is the meaning
of the American Dream. The answers will range from having a steady job to becoming wealthy. Expect
answers such as owing a home to sending my children to college. The professor can also point out that
questions such as “ Have you achieved the American Dream?” are frequently asked in political polls.

G) Alternative music

This is not in the list in the textbook, but you should have some fun with it in class.

9. Education is often used as an indicator of a person’s socioeconomic status. Historically, the number of
years of schooling completed has been recorded in the Census of Population as a measure of education.
Critics say that this measure is no longer accurate as a measure of education. Comment.

In the past the Census categorized people with 12 years of schooling as “high school graduates” and people
with 5 years of college as “college graduates.” However with the granting of “certificates of completion”
after 12 years and “certificates of general educational development” (GED), as popular alternatives to
graduation, years of schooling is less valid a measure of education. A Census Bureau study showed that 91
percent of those who indicated four years of college were college graduates. Thus, the problem also exists in
post-secondary education. The relationship between years of schooling and degrees is weakest in graduate
education, where in 1980, among people aged 25 and over who said they had attended graduate school, only
58 percent reported that they had received a graduate degree. (Based on M. F. Riche, “Making the Grade,”
American Demographics, May 1987, p. 8.)

10. The number of mixed-race marriages in the United States has increased more than 100 percent since
122 Part I Chapter Materials

1980. In the 2000 Census, Americans for the first time were able to identify themselves as belonging to more
than one race. Respondents could choose from these categories: white, black (African American or Negro),
American Indian or Alaska Native, Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, Native
Hawaiian, Guamanian or Chamorro, Samoan, or Other. Multiracial respondents could check all categories
that apply. What are the measurement issues of the “multiracial” category ?

The conceptual definitions and operational definitions of racial categories have changed because the number
of mixed race marriage has increased dramatically. Since 2000, multiracial respondents who check all
categories that apply can be identified in a separate group called multiracial. Thus, the “values” or
“categories” on the measure of race have changed. There is a new conceptual definition for multiracial and
new operational definitions for all race categories. With multiracial as a category, results for categories such
as black will no longer be comparable with past censuses. Someone who checked black and another race was
classified as black and this is no longer the case.

11. Many Internet surveys want to know demographic characteristics of their respondents and how
technologically sophisticated they are. Create a conceptual definition of “technographics” and operationalize
it.

Demographics are characteristics of people in terms of variables such as age, income, education, etc. A
simple definition of technographics might be the characteristics of people in terms of their use technology
and attitudes toward technology. Here are some sample questions that might be used for the usage measure:

We’d like to learn a little about your computer habits at home and at work. Do you use a computer at...?
Please check all that apply.
Work
Home
Other location

Do you connect to the Internet from...? Please check all that apply
Work
Home
Other location

Which of these types of devices do you use to connect to the Internet from home? Please check all that apply.

Palm Pilot, WinCE device, or other electronic organizer


Cell phone
Desk or laptop computer
WebTV
Internet email appliance like iOpener, iPhone, or MailStation
Other device

Here are some questions that might be used with an agreement scale to measure attitudes:

I enjoy impressing other people with the new technology products I have.
New technology products have improved my life.
I try to learn as much as possible about new technology products.
I avoid buying new technology products with the latest product innovations.

The responses are


Chapter 13 Measurement and Scaling Concepts 123

Strongly disagree
Slightly disagree
Slightly agree
Strongly agree

12. Two academic researchers create a psychographic scale to measure attitudes toward
downsizing/rightsizing. They do not evaluate the reliability or validity of the measuring instrument. They
submit the article about their research to a scholarly publication for review. Is this ethical?

Situations similar to this arise often in the academic world. Because of a lack of money for research, time
pressure to publish or perish, and other reasons many marketing researchers don’t follow the “textbook”
process for determining the truth. Some journals will not accept this type of article. Others, however, may
publish article of this type if other factors seem favorable. Often, academics will consider articles like this to
be exploratory research. However, under no circumstances should the researchers attempt to hide the fact that
there were no measures of reliability or validity.

You might also like