You are on page 1of 13

Predictive Analytics World

Washington, DC
October 2010

2010 Data Miner Survey Highlights


… The Views of 735 Data Miners

Karl Rexer, PhD


President
Rexer Analytics

www.RexerAnalytics.com
2010 Data Miner Survey: Overview
Vendors Corporate

•  Fourth annual survey 19% 33%

NGO / Gov’t 5%
•  47 questions 12%
Academics
31%
•  10,000+ invitations emailed Consultants
plus newsgroups, vendors,
and snowball referrals Middle East & Africa (3%)
Central & South
America (4%) •  Israel 1%
•  Turkey 1%
•  Respondents: •  Columbia 2%
•  Brazil 1%
735 data miners Asia Pacific North America
from 60 countries •  India 4%
•  Australia 3%
12% 45% •  USA 40%
•  Canada 4%
•  China 2%

Europe
36%
•  Germany 7%
Note: Data from tool vendors was •  UK 5%
•  France 4%
excluded from many analyses •  Poland 4%
© 2010 Rexer Analytics 2
Fields Applying Data Mining
•  CRM / Marketing, Financial and Academic are the most commonly
reported fields. This has been consistent since the 2007 survey.
–  Many data miners work in several fields.

CRM / Marketing 41%

Financial 29%

Academic 25%

Insurance 15%

Telecommunications 15%

Retail 14%

Pharmaceutical 13%

Technology 13%

Medical 11%

Manufacturing 10%
Question: In what fields do you
Internet-based 10% TYPICALLY apply data mining?
(Select all that apply)
Government 10%

0% 10% 20% 30% 40% 50%


© 2010 Rexer Analytics 3
Data Mining Algorithms
•  Decision trees, regression, and cluster analysis continue to form a triad of core
algorithms for most data miners. This is very consistent, year to year.
•  However, a wide variety of algorithms are being used.
Decision Trees 69%
Regression 68%
Cluster Analysis 60%
Time Series 32%
Neural Nets 31%
Factor Analysis 27% Ensemble Models
Text Mining 26%
Association Rules 25% Corporate Consultants Academic NGO / Gov’t
Ensemble Models 22%
Support Vector 21% 21% 27% 20% 18%
Machines
Bayesian 21%
Anomoly Detection 16%
Survival Analysis 14%
Rule Induction 13%
Social Network Analysis 12% Uplift Modeling
Genetic Algorithms 11%
Link Analysis 9% Corporate Consultants Academic NGO / Gov’t
Uplift Modeling 9%
10% 12% 4% 5%
MARS 8%

0% 10% 20% 30% 40% 50% 60% 70%

Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)
© 2010 Rexer Analytics 4
Text Mining
Software Used
•  About a third of data miners STATISTICA Text Miner 19%
currently incorporate text IBM SPSS Modeler 17%
SAS Text Miner 9%
mining into their analyses, IBM SPSS Text Analytics 7%
and another third plan to. Rapid Miner 6%
Provalis Wordstat 2%
GATE 2%
No Plans to KXEN 2%
Conduct Text Oracle Text or ODM 1%
Megaputer Text Analyst 1%
Mining
Autonomy 1%
Text Miners Other 35%

30%
The focus of our text mining
36% is to extract key themes 59%
(sentiment analysis)

34% We use text fields as inputs /


55%
predictors in a larger model

We use text mining as part of


Plan to Start social network analyses 21%
Text Mining
0% 20% 40% 60%

© 2010 Rexer Analytics 5


Computing Environments
•  A lot of data mining happens on desktop and laptop computers.
•  Frequently the data and processing is local

NGO / Gov’t
(not on servers, mainframe or cloud).

Consultant
Corporate

Academic
•  Only a small minority of data mining is on the cloud.

Vendor
7% Cloud Computing 5% 10% 7% 3% 14%

18% Centralized Mainframe/Server 20% 16% 14% 32% 26%

26% Local Server 28% 30% 19% 29% 45%

Desktop PC/Workstation (with data &


39% processing on server, mainframe or cloud)
48% 36% 25% 47% 39%

Desktop PC/Workstation (with


49% data & processing locally)
43% 49% 58% 58% 35%

Laptop PC (with data & processing


24% on server, mainframe or cloud) 29% 24% 15% 32% 37%

Laptop PC (with data &


35% processing locally) 28% 36% 46% 42% 44%
60% 0%

Overall Question: What are the computing environments/platforms on which data


mining/analytics occurs at your company/organization? (Check all that apply)
© 2010 Rexer Analytics 6
Analytic Capability & Data Quality
•  Analytic capability:
–  There’s room to improve if we’re going to “Compete on Analytics”.

20% 30% 35% 13%

•  Data quality:
–  48% rate it “strong” or “very strong” (same as last year)
–  16% rate it “poor” or “very poor” (13% last year)

13% 35% 40% 8%

Analytic Capability Question: How do you rate the Data Quality Question: How do you rate the quality of data
analytic capabilities of your company/organization? available for analysis at your company/organization?
© 2010 Rexer Analytics 7
Overcoming Challenges: Best Practices

•  Top challenges facing data miners:


–  Dirty data: #1 challenge every year, 2007-2010
–  Explaining data mining to others: always in the top 4 challenges,
2007-2010
–  Difficult access to data: always in the top 3 challenges, 2007-2010

•  This year survey respondents provided “Best


Practices” for overcoming these challenges.
–  E.g., Dirty Data: Use anomaly detection to flag records to put before
subject matter experts.
–  E.g., Dirty Data: All projects begin with low-level data reports showing
counts of records, verification of keys (uniqueness, widows/orphans), and
distributions of field contents. These reports are echoed back to the data
content experts.
–  See the list of Best Practices at www.RexerAnalytics.com in early
November.
© 2010 Rexer Analytics 8
Survey Questions:

Data Mining Software •  What Data mining/analytic tools did you use in
2009? (rate each as “never”, “occasionally”, or
“frequently”)
•  What one Data Mining software package do you
use most frequently?

•  The average data miner reports using 4.6 software tools.


•  R is used by the most data miners (43%).
•  STATISTICA is the primary data mining tool chosen most often (18%).
Overall Corporate Consultants Academics NGO / Gov’t

© 2010 Rexer Analytics 9


Satisfaction with Data Mining Tools
•  STATISTICA received the highest satisfaction ratings. Consistent with
the 2009 findings, R and SPSS Modeler users are also quite satisfied.
–  About 80% of STATISTICA and R users also report that they are extremely likely to
stay with these primary tools over the next 3 years. This is reported by only 42-45%
of SAS, SPSS Statistics, and SAS-EM users; and only 18% of Weka users.

2010 2009

Sample size < 20

Question: Please rate your overall satisfaction Continued Use question (not graphed): What is the likelihood that you will continue
with your primary Data Mining software package. to use this tool as your primary Data Mining software package over the next 3 years?
© 2010 Rexer Analytics 10
Data Mining and the Economy
There is a strong market for data mining:
•  73% of data miners foresee increases in the number of data mining projects.
•  Offshoring of data mining is also increasing: It is reported by 14% of data
miners this year (8% last year).

Number of Data Mining Projects in 2010

Question: How will the number of data mining projects your Offshoring Question (not graphed): Has your company moved
organization conducts in 2010 compare to what has been any data mining or other analytics to another country to take
typical in the past few years? advantage of lower wages in the destination country?
© 2010 Rexer Analytics 11
Future Trends in Data Mining

“What do you envision as the primary future trends in data


mining?” (open-ended survey question)

Growth in Data Mining Adoption 50


Text Mining 32
Social Network Analysis 32
Automation 26
Cloud Computing 15
Data Visualization 15
Tools Get Easier to Use 12
Scaling to Bigger Data 11
0 10 20 30 40 50 60
Number of respondents
© 2010 Rexer Analytics 12
How to Get More Information

•  Questions? – Talk with me at PAW


–  Call or email me if you don’t see me in the hallways

•  Copy of these slides – Available now

•  2010 Data Miner Survey Summary Report (Free)


–  Available in early November
–  Available at PAW website or email me

•  Best Practices for overcoming data mining


challenges
–  Available in early November at Karl Rexer, PhD
krexer@RexerAnalytics.com
www.RexerAnalytics.com
www.RexerAnalytics.com
617-233-8185
© 2010 Rexer Analytics 13

You might also like