You are on page 1of 9

DATA MINING:

WHAT IS IT AND HOW IS IT USED?


By Barry Keating
Data mining is a way to gain market
intelligence from a huge amount of
data ... the problem today is not the
lack of data, but how to learn from
it ... in data mining, the data tell
the story, but it is up to you how to
use that information.
D
ata mining is used to search for
valuable information from the
mounds of data collected over
time, which could be used in decision
making. The information may be certain
patterns and/or relationships that exist.
With data mining, a retail store may
find that certain products are sold more
in one channel of distribution than in the
others; certain products are sold together;
certain products are sold more in one
geographical location than in others; and
certain products are sold when a certain
event occurs. Wal-Mart, for example, has
found that the sales of beer increase when
a hurricane is imminent. This means that
they have to hold more than the usual
supply ofbeer when a hurricane is expected.
With data mining, a financial analyst
would like to know the characteristics of
a company becoming insolvent; human
resource managers would like to know the
characteristics of a successful prospective
employee; credit card departments would
like to know which potential customers are
more likely to pay back the debt and when
a credit card is swiped, which transaction
is fraudulent and which one is legitimate;
direct marketers would like to know
which customers purchase which types of
products; booksellers like Amazon would
like to know which customers purchase
which types of books (fiction, detective
stories, or any other kind); and so on.
With this type of information available,
decision makers will make better choices.
Human resource people will hire the right
individuals. Credit departments will target
those prospective customers that are less
prone to become delinquent and/or less
likely to involve in fraudulent activities.
BARRY KEATING
Dr. Keating is the Jesse H. Jones
Professor of Business Economics at
the University of Notre Dame. He
specializes in understanding how notfor-profit organizations function;
more specifically, how they respond to
incentives, changes in revenue and cost
conditions, and changes in regulatory
mechanisms. He is widely published and
is the co-author of the book. Business
Forecasting, published by McGrawHill. He Is a Heritage Foundation Fellow
(1992-1996), a Heartland Institute
Research Fellow, and serves on the
Board of Advisors of both the Indiana
Policy Review Group and the Institute
of Business Forecasting.
Direct marketers will target those
customers that are more likely to purchase
their products. With the insight gained
from data mining, businesses may wish to
re-configure their product offering and/or
emphasize specific features of a product.
These are not the only uses of data
mining. Police use this tool to dctennine
when and where a crime is likely to occur,
and what would be the nature of that
crime. Organized stock exchanges detect
fraudulent activities with data mining.
Pharmaceutical companies mine data to
predict the efficacy of compounds as well
as to uneover new chemical entities that
may be useful for a particular disease. The
airline industry uses it to predict which
flights are likely to be delayed (well before
the flight is scheduled to depart). Weather
analysts determine weather patterns with
data mining to predict when there will
be rain, sunshine, a hurricane, or snow.
Nonprofit companies use data mining
to predict the likelihood of individuals
making a donation for a certain cause. The
uses of data mining arc far reaching and
its benefits may be quite significant.
DATA MINING IN
HISTORICAL PERSPECTIVE
The job of a data miner is to extract
valuable information from the data
available. The approach to finding
information is to find patterns and
relationships present in the data, which,
of course., is not new. Indeed, man has
looked for patterns in almost every
endeavor undertaken by mankind. Early
man looked for patterns in the sky at night,
in the movement of stars and planets, and
in the weather. Modem man still hunts for
THE JOURNAL OF BUSINESS FORECASTING, FALL 2008 33patterns in early
election returns, in global
temperature ehanges, and in the saies data
of new and matured products.
Over the last 25 years or so, there
has been a gradual evolution from data
processing to data mining. In the 1960s,
businesses routinely collected data and
processed it using database management
techniques that allowed an orderly listing
and tabulation of the data as well as
some query activity. On-line Transaction
Processing (OLTP) became routine,
data retrieval from stored data became
faster and more efficient because of the
availability of new and better storage
devices, and data processing became
quicker and more efficient because of
advancements in computer technology.
Database management advanced rapidly
to include highly sophisticated query
systems, and became popular not only in
business applications but also in scientific
inquiries. Databases began to grow at
previously unheard of rates. The amount
of data in all of the world's databases is
now estimated to double in less than every
two years. Businesses currently deploy
what we call data warehouses and data
marts. A "data warehouse" is a firm's
repository of historic data, containing
information of every relevant activity that
occurred in the past. A "data mart," on the
other hand, is a subset of a data warehouse.
It holds some special information or
infonnation that has been grouped to help
businesses in making better decisions.
Data used here are usually derived from
a data warehouse. The first organized use
of such large databases started with Online Analytical Processing (OLAP). Data
mining tools use and analyze the data that
exist in databases, data marts and data
warehouses.
Researchers have been doing data
inining for a long time, though they called
it by different names. Some called it
Exploratory Data Analysis; others called
it Business Intelligence, Data Driven
Discovery, Deductive Learning., Discovery
Science, and Knowledge Discovery in
Databases (KDD).
DATA MINING VS,
FORECASTING MODELS
A decade ago. one of the most pressing
problems of a forecaster was lack of data.
But today we are overwhelmed with data.
Data are collected whenever you swipe
a eredit card—at the grocery store, retail
store, or whenever you tiiake a purchase
or click for some information. The amount
of data being collected is exploding with
no end in sight. The presence of large
cheap storage devices makes it possible
to store every piece of infonnation or data
produced. The pressing problem now is
not the lack of data, but how to leam from
it. Data mining is the set of tools that helps
to accomplish that.
Data mining is quite different from the
statistical modeling used in forecasting.
In the traditional statistical forecasting,
forecasters first determine the patterns that
exist in a dataset; that is, whether it has
a trend, seasonality, cyclicality. etc., and
then they search for a model that captures
those patterns. Remember, each model
eapttires a certain pattern. If we know the
pattern, then we know which model to use.
The model may be regression, exponential
smoothing, or any other model. In the case
of new product forecasting, for example,
forecasters may assume that new products
will "roll out" with a life cycle that looks
like an "s-curve." In tbat sense, they
impose their belief about the appropriate
model on the data.
By doing that, forecasters impose the
pattern on the data because they believe
that pattern describes the data. But with
data mining, tables are turned. Forecasters
do not presume what pattern or family of
patterns may fit a particular set of data.
Many times they don't even know what
kind of pattern they are going to find. This
may sound strange, beeause data mining
is not a method of attacking the data; on
the contrary, it is a way of teaming from
the data and then using that information.
For that reason, we need a new mindset in
data mining. We must be open to finding
relationships and patterns that we never
imagined existed. We let data tell us the
story rather than impose a model on the
data that we feel will replicate the actual
patterns.
Perhaps the most common misconception about data mining is that it will
automatically extract all the valuable
infonnation embedded in a database
without any intervention on the part of the
researcher. In fact, every large database
contains numerous sets of patterns, which
may very well be as many in number as the
number of items in the database itself. But.
most of the patterns could be irrelevant to
the researcher's task. So, the researcher,
before he or she starts the data mining
process, sets goals and research parameters.
This way he or she will eliminate many
patterns that are irrelevant to the task and
concentrate on the ones that are important
and pertinent. As in traditional statistical
forecasting, the researcher remains an
important part of the analysis.
Data mining usually uses very large
datasets, oftentimes far larger than the
datasets used in business forecasting. But
the tools used in data mining are somewhat
different than the ones used in traditional
business forecasting. You may well be
familiar with many of the statistical tools
available to us, but tools used in data
mining and the way they are used are
different from the ones used in traditional
business forecasting. Tools used in data
mining are discussed in the next section.
The premise of data mining is that there
is a great deal of information locked up
in a database; it's up to the researcher to
unlock it. Data mining tools help to unlock
that information.
TOOLS OF DATA MINING
There are four categories of data mining
tools:
1. Prediction
2. Classification
3. Clustering Analysis
4. Association Rules Discovery
THE JOURNAL OF BUSINESS FORECASTING, FALL 2008Prediction Tools: They
are the methods
derived from traditional statistical forecasting for predicting a variable's value.
ClassificationTools: Most commonly used
in data mining., classification tools attempt
to distinguish different classes of objects
or actions. For instance, a particular credit
card transaction may be either normal or
fraudulent. These tools could classify it as
one or the other, thereby saving the credit
card company a considerable amount of
money. In another instance, an advertiser
may want to know which aspect of its
promotion is most appealing to consumers.
Is it price, quality, and/or reliability of
a product? Maybe it is a special feature
that is missing on competitive products.
The classification tools help give such
information on all the products, making
possible to use the advertising budget in a
most effective manner.
Clustering Analysis Tools: These are very
powerful tools for clustering products into
groups that naturally fall together. These
groups are identified by the program and
not by the researchers. Most of the clusters
discovered may not be useilil in business
decision. However, they may find one
or two that are extremely important, the
ones the company can take advantage
of. The most common use for clustering
tools is probably in what economists refer
to as "market segmentation." In market
segmentation, a company divides the
customer base into segments dependent
upon characteristics such as income,
wealth, geographic location, lifestyle, and
so on. Each segment is then treated with a
different marketing approach, one suited
precisely to that particular segment.
Association Rules Discovery: Here the
data mining tools discover associations;
e.g., what kinds of books certain groups of
people read, what products certain groups
of people purchase, what movies certain
groups of people watch, etc. Businesses use
this information in targeting their markets.
Netflix, for example, recommends movies
based on movies people have watched and
rated in the past. Amazon does the mueh
the same thing in recommending books.
SOFTWARE USED IN
DATA MINING
The two major pieces of software
used at the moment for data mining are
SPSS Clementine and SAS Enterprise
Miner. Both packages inelude an array of
data mining techniques that encompass
all four of the data mining categories
mentioned above. Current users of SAS
products are probably inclined to use
SAS Enterprise Miner when selecting a
data mining program and current users
of SPSS products are likely to choose
SPSS Clementine; however, both data
mining programs are actuaUy stand-alone
packages that do not require the user to
be a customer of the other offerings from
a particular company. Both software
packages can import and process data in
virtually any format. In addition, there are
several smaller software packages that are
only used for data mining. These packages
do not have a full set of statistical tools
like SAS and SPSS.
Neweomers to data mining can use an
Excel add-in called XLMiner'". which is
available from Resampling Stats, Inc. This
Excel add-in lets potential data miners not
only examine the usefulness of such a
program but also get familiar with some
of the data mining techniques. Although
Excel is quite limited in the number of
observations it ean handle, it eertainly can
give a flavor of how valuable data mining
can be to a company.
Software for data mining also gives
a ftill range of diagnostic statistics that
can be used in assessing the information
obtained from these techniques. Once the
user recognizes the value of data mining,
he or she can switch to more advanced
software that ean handle large datasets.
Data mining is here to stay, and
forecasters would find it a useftil extension
to their toolkit. Although these tools
are new, their use is similar to statistical
techniques currently used in forecasting.
As such, they will serve as an important
addition to the foreeasting toolkit. •
(keating(g}nd. edu)
BOOKS
Practical Guide to Business
Forecasting edited by Chaman L.
Jain & Jack Malehom. Flushing.
New York: Graceway Publishing
Cotiipany. 2005. pp. 510. $59.95
Regression Analysis, Modeling
and Forecasting by George C.
Wang & Chaman L. Jain, Flushing,
New York: Graceway Publishing
Company. 2003. pp. 299. $58.95.
Benchmarking Forecasting Practices by Chaman L. Jain & Jack
Malehom. Flushing, New York:
Graceway Publishing Company.
2006. pp. 116. $68.95.
Sales & Operations Planning: The
How to Handbook by Thomas F.
Wallace. 2004. pp. 176. $44.95.
Sales & Operations Planning: The
Executive's Guide by Thomas F.
Wallace and Robert A. Stahl. 2006.
pp. 112. $44.95.
For Information Call/Contact IBF
350 Northern Blvd., Suite 203
Great Neck, N.Y. 11021
P: 516.504.7576
Email: info@ibf.org
SHARE YOUR
EXPERIENCE
if you have experience in the areas of
forecasting and planning and would like
to share with our readers, send an outline
of your proposal to the editor at;
Jainc(â)Stj oh ns.edu
THE JOURNAL OF BUSINESS FORECASTING, FALL 2008 35

You might also like