You are on page 1of 20

International Journal of Health Care Quality Assurance

Using data mining to segment healthcare markets from patients' preference


perspectives
Sandra S. Liu Jie Chen

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

Article information:
To cite this document:
Sandra S. Liu Jie Chen, (2009),"Using data mining to segment healthcare markets from patients'
preference perspectives", International Journal of Health Care Quality Assurance, Vol. 22 Iss 2 pp. 117 134
Permanent link to this document:
http://dx.doi.org/10.1108/09526860910944610
Downloaded on: 21 January 2017, At: 11:21 (PT)
References: this document contains references to 34 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 1773 times since 2009*

Users who downloaded this article also downloaded:


(2009),"Data mining in pharma sector: benefits", International Journal of Health Care Quality Assurance,
Vol. 22 Iss 1 pp. 82-92 http://dx.doi.org/10.1108/09526860910927970
(2002),"Using data mining/data repository methods to identify marketing opportunities in health care",
Journal of Consumer Marketing, Vol. 19 Iss 7 pp. 607-613 http://dx.doi.org/10.1108/07363760210451429

Access to this document was granted through an Emerald subscription provided by emerald-srm:463963 []

For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.

About Emerald www.emeraldinsight.com


Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
*Related content and download information correct at time of download.

The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0952-6862.htm

Using data
Using data mining to segment
mining to
segment
markets
healthcare markets from patients
preference perspectives

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

Sandra S. Liu and Jie Chen


Department of Consumer Sciences and Retailing, Purdue University,
West Lafayette, Indiana, USA

117
Received 11 July 2007
Revised 19 October 2007
Accepted 6 November 2007

Abstract
Purpose This paper aims to provide an example of how to use data mining techniques to identify
patient segments regarding preferences for healthcare attributes and their demographic characteristics.
Design/methodology/approach Data were derived from a number of individuals who received
in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with
average linkage and Pearson correlation procedures are employed and compared to show how each
procedure best determines segmentation variables.
Findings Data mining tools identified three differentiable segments by means of cluster analysis.
These three clusters have significantly different demographic profiles.
Practical implications The study reveals, when compared with traditional statistical methods,
that data mining provides an efficient and effective tool for market segmentation. When there are
numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis
for reducing variables to clearly and meaningfully understand clusters.
Originality/value Interests and applications in data mining are increasing in many businesses.
However, this technology is seldom applied to healthcare customer experience management. The
paper shows that efficient and effective application of data mining methods can aid the understanding
of patient healthcare preferences.
Keywords Data analysis, Market segmentation, Patients, United States of America
Paper type Research paper

Introduction
Data mining interest and application are increasing because it enables businesses to
extract hidden information from large amounts of data so that they can better
understand their customers (Chopoorian et al., 2001). In healthcare, data mining has
also been used to improve diagnosis and treatment or to better understand patients
waiting behaviors (Milley, 2000). However, this technology is seldom applied to
customer management or marketing within healthcare organizations (Milley, 2000).
Therefore, this study is designed to provide an example about how to use data mining
techniques for:
(1) conducting market segmentation with respect to patient preferences for
healthcare attributes; and
(2) exploring the patient segment demographic characteristics.
The authors acknowledge the Editors guidance and the research grant from Ascension Health,
St Louis, Missouri, USA.

International Journal of Health Care


Quality Assurance
Vol. 22 No. 2, 2009
pp. 117-134
q Emerald Group Publishing Limited
0952-6862
DOI 10.1108/09526860910944610

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

118

Segments are subgroups with similar patient preferences in the whole healthcare
market. Successfully identifying demographically well-defined consumer segments
can assist hospital managers develop long-term business strategies and offer an
optimal mix of products and services that meet customer needs and preferences
(Woodside et al., 1988; Ross et al., 1993).
This studys contribution is its data mining method efficiency and effectiveness,
specifically cluster analysis, for understanding patient healthcare preferences.
Clustering/segmentation are fundamental data mining analysis tasks (Peacock,
1998). Although many statistical tools can perform cluster analysis, data mining
provides numerous and irreplaceable benefits. First, the software is easy to use and
unlike traditional statistical methods and tools, data mining does not require
sophisticated statistical knowledge and data preprocessing (Chen and Sakaguchi,
2000). Thus, advanced statistics training is not necessary for managers or
administrators. Second, data mining tools automatically provide more informative
tables and visual charts, which present data analysis results from various
perspectives to facilitate understanding. Hence, data mining techniques can better
support decision-making processes. Third, data minings most attractive function is
that it can be used as a knowledge discovery in database (KDD), which means that
data mining is a process of discovering hidden patterns in large consolidated
databases. Nowadays, businesses often face the challenge that too much data are
collected, but too little information is extracted. In general, traditional statistical
models only can handle a limited data and test the existing hypotheses (Chopoorian
et al., 2001). Thus, the KDD function in data mining can be used to face the
challenge and increase business competitive advantages. On the one hand, data
mining can easily handle large data. On the other, it facilitates both uncovering
relationships hidden in complex data and identifies unknown problems and
opportunities.
Because data mining has such powerful data analysis functions, it has been used
in a variety of businesses among which healthcare and pharmaceuticals account for
only 4 percent of all users (Calderon et al., 2003). Moreover, among all data mining
applications, marketing occupies only five percent (Calderon et al., 2003). In
healthcare organizations, data mining is mainly used in diagnostic fields (SAS
Institute Inc., 2002). Only two published studies used data mining in marketing
(Rafalski, 2002; Cheng et al., 2005). That is, Rafalski (2002) vertically integrated
multiple databases through data mining tools to identify healthcare organizations
key trends and marketing opportunities. His study focused primarily on building
and employing a data warehouse not on model building. Similarly, Cheng et al.
(2005) applied data mining to conduct a cluster analysis. However, their cluster
analyses manipulated simple variables like length of stay. Therefore, our studys
contribution is to employ data mining techniques to conduct healthcare market
segmentation using complicated psychographic variables and to reveal the benefits
of data mining to understand customers psychological needs for improving
healthcare services. Another contribution from our study is that cluster analysis is
conducted using two different processes. First, cluster analysis is performed on all
healthcare attributes using general statistical software. Second, survey instrument
factor analysis is conducted before basing cluster analysis on factor scores. Cluster
analysis is performed by the popular data mining software, Enterprise Miner.

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

Results gathered from these two processes are compared, which provide insights for
practitioners about how to choose appropriate procedures for conducting an
accurate cluster analysis. Previous authors studied patient preference from one of
three aspects:
(1) individual patients involvement in medical decision making (Thompson et al.,
1993);
(2) patients benefit-seeking preferences (Woodside et al., 1988); and
(3) preference for healthcare attributes (Fletcher et al., 1983; Ross et al., 1993;
Morrison et al., 2003; Finn and Lamb, 1986; Gabbott and Hogg, 1994).
Our study focuses on healthcare attribute preferences. There are several differences
between our and previous studies. First, our instrument was developed from an
extensive literature search; in-depth exploratory interviews and focus groups, while
instruments in previous studies were adopted from already existing instruments or
were based on limited literature. Ross et al.s (1993) six dimensional healthcare service
attributes were adapted from Ware et al.s (1983) Patient Satisfaction Questionnaire.
Finn and Lamb (1986) identified their items from the Altschul (1983) and Marquis
(1983) studies. Gabbott and Hogg (1994) used the SERVQUAL model to develop their
scales. Although Morrison et al.s (2003) instrument was developed both from a
literature review and an exploratory study, their goal was to understand a general
practitioner service while ours concentrates on inpatient services. Additionally,
Morrison et al.s sample was chosen from all residents in one region whereas ours is
from several hospitals. Most studies mentioned so far conducted market segmentation.
However, we are the first to use a commercially available data mining tool to illustrate
how these approaches benefit healthcare market segmentation.
Literature review
Previous literature tends to define data mining narrowly. Interesting but non-obvious
patterns hidden in databases can be automatically discovered or viewed from a much
broader scope in which the relationships are confirmed or tested through discovery
processes (Peacock, 1998). The KDD process is viewed from the broadest scope that
incorporates wide-ranging activities:
.
data collection;
.
data cleaning;
.
data analysis;
.
scoring the database;
.
decision-making support; and
.
model refinement (Peacock, 1998).
We discuss data mining applications from a broader scope, which has some important
applications in customer relationship management or marketing:
.
comprehending customers needs and predicting their responses to new
products/services and communication programs
.
identifying customers loyalty levels and making marketing strategies to retain
vulnerable customers

Using data
mining to
segment markets
119

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

120

.
.

recognizing unprofitable customers; and


market segmentation and products/services differentiation (Peacock, 1998;
Cheng et al., 2005).

These applications can be realized by specific data mining functions and techniques.
Table I summarizes functions, tools and software commonly used in data mining
(Peacock, 1998; Cheng et al., 2005; SAS Institute Inc., 2002; Thelen et al., 2004). In this
table there are five fundamental functions identified:
(1) summarizing;
(2) classification;
(3) prediction;
(4) segmentation; and
(5) link analysis.
Under each function several analytical tools/techniques can be used. Moreover, a great
number of data mining products have been developed by independent vendors. Among
these products, Intelligent Miner, Darwin and Enterprise Miner are the most frequently
used software (Calderon et al., 2003). The latter software was the one we chose.
Since data mining is synonymous with knowledge discovery processes, procedures
should not be restricted to model building. Peacock (1998) noted that KDD processes
include ten steps:
Functions

Techniques

Software

Summarising
Summarised descriptions of variables and
their relationships

Query tools
Simple cross-tabs
Visualization techniques etc.

Structured query
language (SQL)

Classification
Discriminant analysis
Classifying a new subject into existing groups Logistic regression
according to a set of predictors
Association rules
Neural network etc.

Enterprise Miner

Prediction
Predicting unknown dependent variables
using one or more independent variables

Darwin

Segmentation
Grouping subjects in terms of a set of
variables relevant to research objectives

Table I.
Functions and techniques
in data mining

Link analysis
Identifying the correlated purchase patterns
using the implicit information

OLS regression
Logistic regression
Discriminant analysis
Association rules
Decision trees
Neural network
Genetic algorithms
Cluster analysis
Decision trees
Neural networks
Genetic algorithms
Association rules

Clementine

Intelligent Miner
Business Miner,
etc.

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)

data funneling;
preprocessing;
exploratory data analysis;
recording and transformation;
data mining discovery;
data mining confirmation;
model validation;
model scoring;
reporting results; and
recalibrating models.

Among these the first two were used for cleaning and preparing data for further
analysis. Exploratory data analysis offers a glance at data patterns, trends, dispersion,
etc. The remaining steps are mainly used to build, assess and refine the model, and to
report the final results. Our study, on the other hand, adopts the following five steps to
perform data mining process:
(1) objective setting;
(2) problem translation;
(3) data cleaning;
(4) model building; and
(5) model assessment (Figure 1).
Our objective is to understand whether and how the healthcare market is
heterogeneous in terms of patient preferences. Cluster analysis is then chosen to
segment the market. The last three steps are discussed later.
Patient preferences and demographic characteristics
Patient preferences or priorities can be defined as the patients perceived importance of
healthcare service aspects (Wensing et al., 1998). Previous studies segment the
healthcare market as healthcare attribute patient preferences and some examined the
demographic characteristics in different segments. However, none applied data mining
techniques. Morrison et al.(2003) conducted market segmentation based on
participants preferences for their general practitioners (GP) attributes. They used
cluster analysis to segment the market and their cluster analysis employed factor
scores generated from 18 Likert-type questions. Finally, four segments were developed
and each segments demographic characteristics were described. Specifically, people in
the first segment had a lower social status and focused on atmospheric elements and
convenience. The second valued high-quality services. Those in the third segment had
few unique preferences while those in the fourth only preferred communications with
the general practitioners.
Ross et al.(1993) used three patient satisfaction dimensions to measure patient
preference and then conducted cluster analysis. Four clusters were derived:
(1) interpersonal care;
(2) access/quality;

Using data
mining to
segment markets
121

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

122

Figure 1.
Data mining process
flowchart

(3) access; and


(4) quality.
The authors concluded that patients valuing an interpersonal care aspect were more
likely to be black, have a lower education and income, and were less likely to be
unemployed than those who prioritized access/quality (i.e. the group assigning little
importance to interpersonal care) or only quality. Finn and Lamb (1986) identified 15
hospital attributes found to be important in previous patient research and conducted a
factor analysis using all 15. Five factors were derived:
(1) physical comfort;
(2) me/them;
(3) quiet;
(4) mobility/accessibility; and
(5) personal attention.
They found four segments from the factor scores through cluster analysis:
(1) physical comfort;
(2) quiet environment;
(3) personal attention; and
(4) cognitive attributes.

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

They found no significant demographic differences among these segments. Gabbott


and Hogg (1994) derived 24 primary healthcare practice attributes using SERVQUAL
before conducting factor analysis and cluster analysis. Three clusters were found:
(1) healthier and emphasizing situational elements;
(2) which constituted more middle empathy and responsiveness; and
(3) upper-income men having no special preferences.
In our study, 24 healthcare services attributes are used to measure patient preferences,
summarized from 23 publications that cover four patient satisfaction aspects:
(1) physiological care;
(2) psychological care;
(3) physical environment; and
(4) spiritual care.
Physiological care (competence/convenience) is the clinical care functional attributes
provided by physicians, nurses and other staff (Otani and Kurz, 2004; Powers and
Bendall, 2004; Ware and Snyder, 1975). Psychological care covers the clinical care
non-functional attributes including empathy, respect, communication and attention
(Andaleeb, 1998; Brown et al., 1999; Tomes and Ng, 1995). The physical environment
includes specific attributes, such as hospital conditions (Lam, 1997; Tomes and Ng,
1995; Andaleeb, 1998; Swan et al., 2003). Finally, spiritual care addresses hospital
attendance in response to patients spiritual needs (Reed, 1992). We aimed to identify
patient preference segments according to 24 attributes and to examine the
demographic characteristics of these segments. Based on our previous findings
(discussed above), we expect that the healthcare market is heterogeneous in terms of
patient preferences for healthcare attributes and that demographic characteristics vary
among different market segments.
Method and data
Our data were derived from US non-profit healthcare group inpatients in 2006. Our
questionnaire was administered through telephone interviews with 2,000 subjects.
Almost 17,000 usable questionnaires were obtained, which were then evaluated using
data cleaning techniques:
.
screening using two criteria hospitalizing in the past 12 months and older than
18 years; and
.
removing interview subjects giving systematically-biased answers, such as all 7s
and missing values, leaving 1,561 questionnaires for analysis.
Measures
Our survey questions can be categorized in two ways:
(1) demographic; and
(2) statements that measure healthcare service preferences.
Demographic questions include education, race or ethnicity, insurance type and
occupation. The 24 statements about healthcare service attributes were developed from

Using data
mining to
segment markets
123

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

124

previous literature. These 24 statements cover all of four aspects discussed previously,
e.g. physiological care, etc. Interview respondents were asked to assess the attributes
importance. A seven-point Likert-scale from not critical to absolutely critical was
used in the attributes rating.
Data analysis and results
Our purpose was to demonstrate the application of data mining to healthcare market
segmentation. We used two data analysis procedures:
(1) segmentation variables the patients evaluations of 24 healthcare attributes;
and
(2) the statistical tool R for analysis.
In the second procedure, 24 attributes were initially extracted into five dimensions
using exploratory factor analysis, followed by factor score cluster analysis. We then
tested the segments demographic profiles and tests demographic characteristic
differences between these clusters through a contingency table.
Data analysis procedure
Cluster analysis is conducted on 24 attributes. Data normalization was first performed
within each attribute so that each follows standard normal distribution with mean 0
and variance 1. Our clustering method used hierarchical cluster analysis. Pearson
correlation and average linkage are employed to measure the similarity between two
cases and the distance between two groups respectively (Gordon, 1999). The optimal
number of clusters is six, decided by Gap statistic application (Tibshirani et al., 2001).
The hierarchical clustering is shown in Table II, which lists the most and least
important attributes rated by each cluster member. For example, patients in cluster 3
consider discharge time information an important attribute, while their involvement in
decision making less so. Comparing these clusters with those in procedure II is
discussed later.
Data analysis procedure II
Exploratory factor analysis is conducted first (Table III). The criteria we used to
determine the number of factors was eigenvalue . 1 so that construct definition is
theory data grounded (Thompson and Daniel, 1996). In practice, factor interpretability
also needs to determine the number of factors. In our study, five were identified:
(1) communication and empowerment;
(2) compassionate and respectful care;
(3) clinical reputation;
(4) care responsiveness; and
(5) efficiency.
Factor scores were saved for cluster analysis.
Cluster analysis was conducted on five factors. Our study combines both
hierarchical and nonhierarchical clustering method, which are commonly used
methods for attaining the best cluster analysis solution (Sharma, 1996). Hierarchical
clustering determines the most appropriate number of clusters while non-hierarchical

Clusters

Someone is always
available to you and your
family to talk about your
fears and concerns
Staff always make you feel
that they care about you

Staff are good at making Doctors and nurses are


you feel that you can trust very good at
them and depend on them communicating with each
other about your needs and
treatment
Staff or doctors always tell
you when you can expect
to go home

Hospital has the latest and Hospital is the cleanest,


best maintained and most
greatest treatments and
comfortable
equipment

Most important attributes


Doctors and nurses are
very good at
communicating with each
other about your needs and
treatment
You and your family are
always involved in
decisions about your care
You are given all the
information you need to
make decisions about your
care
Your spiritual needs and
preferences are always
addressed
You are able to leave the
hospital as quickly as
possible on the day of your
discharge
Your spiritual needs and
preferences are always
addressed

Hospital is easy to get


around and has helpful
signs

Registration, scheduling,
and billing services are
handled most efficiently

Staff or doctors always tell


you when you can expect
to go home

You and your family are


always involved in
decisions about your care

Staff respect your privacy


Staff are most polite,
introduces themselves and
know your name

Least important attributes

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

Using data
mining to
segment markets
125

Table II.
Cluster analysis results

IJHCQA
22,2

Factors

Eigenvalues

Attributes

Communication and
empowerment

10.705

Compassionate and
respectful care

1.741

Clinical reputation

1.040

Care responsiveness

0.943

Efficiency

0.889

Staff are best at letting you know what is wrong with


you and telling you about your medical care
Staff do the best job of listening to you
Staff are the best at letting you and your family
know what is going on, how long things will take,
and why there are waiting times
You are given all the information you need to make
decisions about your care
Staff are the best at making you feel that you can
trust and depend on them
You and your family are always involved in
decisions about your care
Doctors and nurses are very good at communicating
with each other about your needs and treatment
Staff are most polite, introduces themselves and
know your name
Your spiritual needs and preferences are always
addressed
Staff always makes you feel that they care about you
Nurses and other staff appear happy and have the
most positive attitude
Hospital is easy to get around and has helpful signs
Staff respect your privacy
Doctors and the hospital have the best reputation for
your condition
Hospital has the latest and greatest treatments and
equipment
You never have to wait unnecessarily
Nurses are responsive and prompt when you need
something
You are able to leave the hospital as quickly as
possible on the day of your discharge
Staff or doctor always tells you when you can expect
to go home

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

126

Table III.
Factor analysis results

clustering allocates each case into the best fitting cluster (Sharma, 1996). Since our
sample size is too large to conduct a hierarchical cluster analysis (Garson, 2002), 156
questionnaires were randomly drawn from 1,561 respondents. We used Enterprise
Miner to generate the random sample (Figure 2). In hierarchical clustering it is
necessary to decide how similarity is measured, how clusters are aggregated (or
divided) and how many clusters are needed. In this procedure, Pearson correlation and
average linkage are also used to measure the similarity between two cases and the
distance between two groups respectively. The criteria used to determine the best
number of clusters is whether there is a big change in the average distance between
clusters, which is plotted in the distance plot (Figure 3) (Sharma, 1996). Three clusters
emerged.
Based on our hierarchical clustering results, nonhierarchical clustering refined the
allocation of subjects between clusters using the K-means method found in Enterprise
Miner. An Enterprise Miner diagram (Figure 2) shows five nodes emerging from our

Using data
mining to
segment markets

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

127

Figure 2.
Diagram of cluster
analysis in Enterprise
Miner

Figure 3.
Average distance plot
cluster numbers

data analysis. Each node has a specific function. The Work cluster node is used to
identify input data and to set up variable information. Sampling is used for drawing
random or nonrandom samples from the database. Clustering builds the cluster model.
The Insight node has a model assessment function and the Reporter provides the
results. After running these nodes, Enterprise Miner provides a variety of tables and

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

128

charts to demonstrate the results. Table IV provides the cluster median on each factor
and frequency of each cluster. The pie chart in Figure 4 summarizes three statistics:
(1) cluster frequency;
(2) standard deviation; and
(3) greatest distance from cluster seed (SAS Institute Inc., 2002).
The height of the slice indicates the frequency of each cluster. Cluster 2 has the greatest
number of cases and the slice width is the standard deviation between cluster cases.
Cluster 3 has the largest standard deviation. Shading represents the distance between
the furthest cluster member and the cluster seeds. Cluster 1 contains a member that has
the greatest distance from its cluster seed. Table V results illustrate the relative
importance of the five factors in differentiating the clusters (ranging from 0 to 1), (SAS
Institute Inc., 2002). The attribute with the higher number contributes more to segment
differentiation. In our study, of the three clusters efficiency is the characteristic that is
most different followed by compassionate and respectful care.
Finally, we used a contingency table to test whether patients demographic
characteristics varied among the three segments derived from our cluster analysis. First,
we summarized each clusters demographic characteristics (Table VI). A visual chart can
be obtained from the insights node to demonstrate each clusters demographic profile
(Figure 5). Then chi-squares are used to test whether demographic variables differ
among the three clusters. The chi-squares (Table VI) reveal that these three clusters have
significantly different demographic profiles (education, ethnicity, insurance type, and
occupation) at the 0.01 level. The implications of these results are discussed later.
Discussion
Conducting cluster analysis on the five factors (procedure II) instead of the original
twenty-four variables (procedure I) produced results that provide more clear and
reliable explanations about preferences (Tables III and IV). Previous studies
demonstrated similar arguments. It is not practicable to keep and compare all
cluster analysis attributes, but to adopt factor analysis to reduce a large number of
variables for modeling purposes (Garson, 2002; Morrison et al., 2003). Our factor
analysis extractions explain most of the total variation in the original variables, while
the arbitrary selection of the cut-off values (in this case 0.35 and 2 0.5) in procedure I,
may leave out important information (Garson, 2002). For example, factor analysis
results clearly show that all three attributes identified in cluster 2 (determined by
procedure I) measure the same dimension: communication and empowerment, which
indicates that cluster 2 is focused upon communication and empowerment. On the
other hand, procedure I does not generate correlations between attributes and hence
cannot conclude a preference for cluster 2. As such the following discusses procedure II
generated results.
Data mining in healthcare market segmentation
Data mining is an effective tool for market segmentation (procedure II). Compared with
traditional statistical methods, data mining can manipulate large datasets quickly and
easily. It also efficiently accomplishes the entire data analysis process. Through a
series of nodes shown in Figure 3, all tasks from data preparation and cleaning to
model assessment can be accomplished automatically at a sequence simply by

431 (27.6)
793 (50.8)
337 (21.6)

1. Reputation-driven

2. Performance-driven
3. Empowerment-driven

Note: Figures in parentheses are percentages

Frequency

Cluster
2 0.410
0.59
0.203
0.444

Communication and
empowerment
0.057
2 0.041
0.551
2 0.987

Compassionate and
respectful care
20.516
0.127
20.356

Factors
Clinical
reputation

0.377
20.321

Care
responsiveness

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

0.589
2 0.543

Efficiency

Using data
mining to
segment markets
129

Table IV.
Cluster medians

IJHCQA
22,2

130
Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

Figure 4.
Cluster pie chart

Table V.
Cluster analysis factors
relative importance

Importance

Communication and
empowerment
0.63

Demographic characteristics

Table VI.
Each clusters
demographic
characteristics

Education
Below high school
High school
College and above
Race or ethnicity
White or Caucasian
African American or Black
Hispanic
Asian
Insurance type
Medicaid
Medicare
Private/HMO/PPO
None
Occupation
Professional/management/officials/
business owners/financial
Sales/office occupations
Laborer/service worker/tradesman
Homemaker
Retired
Student

Factors
Compassionate and
Clinical
respectful care
reputation
0.84

x 2 (p)

0.75

Care
responsiveness

Efficiency

0.42

Frequency
Reputation- Performance- Empowermentdriven (%) driven (%)
driven (%)

111.7 (0.000)
8.1
50.6
38

11.6
60.2
26.5

3.9
39.7
45.1

84
7.9
5.6
0.7

76.7
14.5
4.9
1.5

86.4
6.2
4.7
0.9

2.8
40.8
45.5
2.6

4.9
41.1
43.3
2.8

3.9
30.9
54.9
1.8

17.9
5.1
7.7
16.7
43.4
0.9

14.4
4.9
10.1
13.7
49.9
0.8

22.8
6.5
8.6
16.3
35.3
2.4

28.76 (0.001)

28.2 (0.005)

37.04 (0.001)

Using data
mining to
segment markets

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

131

Figure 5.
Each segments
demographic profile

dragging and clicking. Managers can apply these techniques within six to eight
training weeks (Thelen et al., 2004). With declining data mining software costs,
establishing a data mining system is an efficient and effective way for healthcare
organizations to manage their large databases and to support their managerial
decisions. In our study, Enterprise Miner provided tables and visual charts to help us
understand different market segments accurately and intuitively (Tables IV-VI and
Figures 4 and 5). Consistent with literature findings, data mining results reveal that the
healthcare market is heterogeneous and the demographic characteristics of these
segments are significantly different (Morrison et al., 2003; Ross et al., 1993; Gabbott and
Hogg, 1994). Specifically, there are three segments identified from our cluster analysis
(Table IV). These segments are designated as: reputation-driven (cluster 1),
performance-driven (cluster 2) and empowerment-driven (cluster 3).
Half our respondents fall into the performance-driven segment respondents are
interested in efficiency, compassionate and responsive care (Table IV). Compared
with the other two segments (Figure 5 and Table VI), this segment has a greater
percentage of black individuals, Medicaid or Medicare users. The

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

132

performance-driven group typically is at a lower education level and has more retired
people. Since performance-driven patients form the major inpatient care group, a
better response to this segment is important. Healthcare organizations may devise
ways to provide efficient services for this group; for example, employing polite,
respectful and caring staff. Accounting for 27 percent of the total respondents, the
reputation-driven segment considers clinical reputation as the most important
attribute (Table IV). This groups demographic profile falls between the other two
segments (Figure 5 and Table VI). When compared to the performance-driven segment,
the reputation-driven group contains a higher proportion of patients with a college
degrees, private/HMO (Health Maintenance Organization)/PPO (Preferred Provider
Organization) insurance and professional or management positions. But the proportion
is lower than the empowerment-driven segment. The clinical reputation group also
includes more Hispanics. Advertisements indicating high-quality hospital staff and
advanced equipment may attract this patient group.
The empowerment-driven group accounts for 21.6 percent of the total
respondents. In this group, patients consider communication and empowerment as
important attributes. They desire more information from hospital staff and to become
more involved in the medical care decision-making process (Table IV). Most
respondents in this group have college degrees and purchase private/HMO/PPO
insurance. They are likely to have a professional or management position (Figure 5 and
Table VI). Despite its small size, the empowerment-driven segment is a profitable
market for hospital managers. It is formed by patients who have a better education and
occupation status, implying that they may have a higher purchasing power. To target
this market, the medical staff needs to effectively communicate with these people by
providing more information directly related with their treatments and by more actively
involving them and their family in decision-making processes. In summary, data
mining tools allow efficient respondent classification into measurable clusters.
Adopting these tools in the healthcare sector helps administrators to understand
segments in the market and their service preferences. We found efficiency and
compassionate and respectful care to be the most important factors in determining
the clusters. Healthcare organization staff, therefore, may wish to differentiate their
services according to different segments preferences.
Limitations and future research
One data mining advantage is that the total sample can be divided into a training set
and a validation set. The latter can be used to monitor and adjust the model to improve
generalization to the whole population. A further evaluation using discriminant
analysis can also be valuable for assessing the cluster model. Sufficiently large sample
sizes, however, are critical for conducting such analyses. Most healthcare networks
possess large patient datasets from various sources. Therefore, data mining tools can
be adopted for understanding patient profiles, their preferences and possibly financial
characteristics related to the segments. We approached patient preferences from an
experience perspective rather than economics. Attributes related to financial
accessibility, therefore, might concern some patients groups. Contracting and
pay-for-performance debates mean that understanding: patient preferences from
financial perspectives; and target segment characteristics, are imperative for higher
patient satisfaction ratings.

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

References
Altschul, A.T. (1983), The consumers voice: nursing implications, Journal of Advanced
Nursing, Vol. 8 No. 3, pp. 175-83.
Andaleeb, S.S. (1998), Determinants of customer satisfaction with hospitals: a managerial
model, International Journal of Health Care Quality Assurance, Vol. 11 No. 6, pp. 181-7.
Brown, J.B.M., Mullooly, J. and Levinson, W. (1999), Effects of clinician communication skills
training on patient satisfaction: a randomized, Controlled Trial, Annals of Internal
Medicine, Vol. 131 No. 11, pp. 822-9.
Calderon, T.G., Cheh, J.J. and Kim, I. (2003), How large corporations use data mining to create
value, Management Accounting Quarterly, Vol. 4 No. 2, pp. 1-11.
Chen, L. and Sakaguchi, T. (2000), Data mining methods, applications, and tools, Information
Systems Management, Vol. 17 No. 1, p. 65.
Cheng, B., Chang, C. and Liu, I. (2005), Enhancing care services quality of nursing homes using
data mining, Total Quality Management & Business Excellence, Vol. 16 No. 5, pp. 575-96.
Chopoorian, J.A., Witherell, R., Khalil, O.E.M. and Ahmed, M. (2001), Mind your business by
mining your data, Advanced Management Journal, Vol. 66 No. 2, p. 45.
Finn, D.W. and Lamb, C.W. Jr (1986), Hospital benefit segmentation, Journal of Health Care
Marketing, Vol. 6 No. 4, pp. 26-33.
Fletcher, R.H., OMalley, M., Earp, J.A., Littleton, T.A., Fletcher, S.W., Greganti, M.A., Davidson,
R.A. and Taylor, J. (1983), Patient priorities for medical care, Medical Care, Vol. 21 No. 2,
pp. 234-42.
Gabbott, M. and Hogg, G. (1994), Uninformed choice, Journal of Health Care Marketing, Vol. 14
No. 3, pp. 28-33.
Garson, G.D. (2002), Guide to Writing Empirical Papers, Theses and Dissertations, Marcel
Dekker, New York, NY.
Gordon, A.D. (1999), Classification, Chapman and Hall/CRC, London.
Lam, S.S.K. (1997), SERVQUAL: a tool for measuring patients opinions of hospital service
quality in Hong Kong, Total Quality Management, Vol. 8 No. 4, pp. 145-52.
Marquis, M.S. (1983), Patient satisfaction and change in medical care provider: a longitudinal
study, Medical Care, Vol. 21 No. 8, pp. 821-9.
Milley, A. (2000), Healthcare and data mining, Health Management Technology, Vol. 21 No. 8,
p. 44.
Morrison, M., Murphy, T. and Nalder, C. (2003), Consumer preferences for general practitioner
services, Health Marketing Quarterly, Vol. 20 No. 3, pp. 3-19.
Otani, K. and Kurz, R.S. (2004), The impact of nursing care and other healthcare attributes on
hospitalized patient satisfaction and behavior intentions, Journal of Healthcare
Management, Vol. 49 No. 3, pp. 181-96.
Peacock, P.R. (1998), Data mining in marketing: part 1, Marketing Management, Vol. 6 No. 4,
pp. 8-18.
Powers, T.L. and Bendall, D. (2004), The influence of time on changes in health status and
patient satisfaction, Health Care Management Review, Vol. 29 No. 3, pp. 240-8.
Rafalski, E. (2002), Using data mining/data repository methods to identify marketing
opportunities in health care, Journal of Consumer Marketing, Vol. 19 No. 7, pp. 607-13.
Reed, P.G. (1992), An emerging paradigm for the investigation of spirituality in nursing,
Research in Nursing & Health, Vol. 15 No. 5, pp. 349-57.

Using data
mining to
segment markets
133

IJHCQA
22,2

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

134

Ross, C.K., Steward, C.A. and Sinacore, J.M. (1993), The importance of patient preferences in the
measurement of health care satisfaction, Medical Care, Vol. 31 No. 12, pp. 1138-49.
SAS Institute Inc. (2002), Applying Data Mining Techniques Using Enterprise Miner,
SAS Institute Inc., Cary, NC.
Sharma, S. (1996), Applied Multivariate Techniques, John Wiley & Sons, New York, NY.
Swan, J.E., Richardson, L.D. and Hutton, J.D. (2003), Do appealing hospital rooms increase
patient evaluations of physicians, nurses, and hospital service?, Health Care Management
Review, Vol. 28 No. 3, pp. 254-64.
Thelen, S., Mottner, S. and Berman, B. (2004), Data mining: on the trail to marketing gold,
Business Horizons, Vol. 47 No. 6, pp. 25-32.
Thompson, B. and Daniel, L.G. (1996), Factor analytic evidence for the construct validity of
scores: a historical overview and some guidelines, Educational and Psychological
Measurement, Vol. 56 No. 2, pp. 197-208.
Thompson, S.C., Pitts, J.S. and Schwankovsky, L. (1993), Preferences for involvement in medical
decision-making: situational and demographic influence, Patient Education and
Counseling, Vol. 22 No. 3, pp. 133-40.
Tibshirani, R., Guenther, W. and Trevor, H. (2001), Estimating the number of clusters in a data
set via the gap statistic, Journal of the Royal Statistical Society, Vol. 63 No. 2, pp. 411-23.
Tomes, A.E. and Ng, S.C.P. (1995), Service quality in hospital care: the development of an
in-patient questionnaire, International Journal of Health Care Quality Assurance, Vol. 8
No. 3, pp. 25-33.
Ware, J. and Snyder, M. (1975), Dimensions of patient attitudes regarding doctors and medical
care services, Medical Care, Vol. 13 No. 8, pp. 669-82.
Ware, J.E. Jr, Snyder, M.K., Wright, W.R. and Davies, A.R. (1983), Refining and measuring
patient satisfaction with medical care, Evaluation and Program Planning, Vol. 6 Nos 3-4,
pp. 247-63.
Wensing, M., Jung, H.P., Mainz, J., Olesen, F. and Grol, R. (1998), A systematic review of the
literature on patient priorities for general practice care. Part 1: description of the research
domain, Social Science & Medicine, Vol. 47 No. 10, pp. 1573-88.
Woodside, A.G., Nielsen, R.L., Walters, F. and Muller, G.D. (1988), Preference segmentation of
health care services: the old-fashioneds, value conscious, affluents, and professional
want-it-alls, Journal of Health Care Marketing, Vol. 8 No. 2, pp. 14-24.
Corresponding author
Sandra S. Liu can be contacted at: liuss@purdue.edu

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com


Or visit our web site for further details: www.emeraldinsight.com/reprints

This article has been cited by:

Downloaded by Nirma University of Science and Technology At 11:21 21 January 2017 (PT)

1. Eric R. Swenson, Nathaniel D. Bastian, Harriet B. Nembhard. 2016. Data analytics in health promotion:
Health market segmentation and classification of total joint replacement surgery patients. Expert Systems
with Applications 60, 118-129. [CrossRef]
2. Brian W. Powers, Amol S. Navathe, Khin-Kyemon Aung, Sachin H. Jain. 2013. Patients as customers:
Applying service industry lessons to health care. Healthcare 1:3-4, 59-60. [CrossRef]
3. J Ryan, C Lewis, B Doster, S DailyAnalyzing Block Scheduling Heuristics for Perioperative Scheduling
Flexibility: A Case Study Perspective 1-10. [CrossRef]
4. Anastasius MoumtzoglouE-Health as the Realm of Healthcare Quality 291-310. [CrossRef]

You might also like