You are on page 1of 21

April 25, 2019 Data Mining: Concepts and Techniques 1

Why Data Mining?

 The Explosive Growth of Data: from terabytes to petabytes


 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

April 25, 2019 Data Mining: Concepts and Techniques 2


Why Mine Data? Commercial
Viewpoint

 Lots of data is being collected


and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card
transactions

 Computers have become cheaper and more powerful


 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in
Customer Relationship Management)
Why Mine Data? Scientific Viewpoint

 Data collected and stored at


enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene
expression data
 scientific simulations
generating terabytes of data
 Traditional techniques infeasible for raw
data
 Data mining may help scientists
 in classifying and segmenting data
 in Hypothesis Formation
Examples: What is (not) Data
Mining?

 What is not Data  What is Data Mining?


Mining?
– Certain names are more
prevalent in certain US locations
– Look up phone
(O’Brien, O’Rurke, O’Reilly… in
number in phone
Boston area)
directory
Database Processing vs. Data Mining
Processing

 Query  Query
 Well defined  Poorly defined
 SQL  No precise query language

 Data  Data
– Operational – Not operational
data data
 Output  Output
– Precise – Fuzzy
– Subset of – Not a subset of
database database
Query Examples
 Database

– Find all credit applicants with last name of Smith.

– Identify customers who have purchased more than $10,000 in


the last month.
– Find all customers who have purchased milk

 Data Mining

– Find all credit applicants who are poor credit risks.


(classification)

– Identify customers with similar buying habits. (Clustering)

– Find all items which are frequently purchased with milk.


(association rules)
Evolution of Database Technology
 1960s:
 Data collection, file processing system, database creation, IMS and
network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 Advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems
April 25, 2019 Data Mining: Concepts and Techniques 8
What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data

 Data mining: a misnomer?

 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.

April 25, 2019 Data Mining: Concepts and Techniques 9


What Is Data Mining?

 Process of semi-automatically analyzing large


databases to find patterns that are:
 valid: generalize to the future
 novel: what we don't know
 useful: be able to take some action
 understandable: humans should be able to
interpret the pattern

Data mining is the computing process of


discovering patterns in large data sets involving
methods at the intersection of machine
learning, statistics, and database systems.
(Wikipedia)

April 25, 2019 Data Mining: Concepts and Techniques 10


Knowledge Discovery in Data base (KDD)
Process
 Data mining—core of Pattern Evaluation
knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases Flat files


April 25, 2019 Data Mining: Concepts and Techniques 11
Knowledge Discovery in Data base
(KDD) Process

April 25, 2019 Data Mining: Concepts and Techniques 12


KDD Process

1) Understand application domain


-Prior knowledge, user goals
2) Create target dataset
-Select data, focus on subsets
3) Data cleaning and transformation
-Remove noise, outliers, missing values
-Select features, reduce dimensions

April 25, 2019 Data Mining: Concepts and Techniques 13


KDD Process

1) Apply data mining algorithm


-Associations, sequences, classification,
clustering, etc.
2) Interpret, evaluate and visualize patterns
-What's new and interesting?
-Iterate if needed
-Manage discovered knowledge
1) Close the loop

April 25, 2019 Data Mining: Concepts and Techniques 14


Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses


DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
April 25, 2019 Data Mining: Concepts and Techniques 15
Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Learning Data Mining

Pattern
Recognition Other
Algorithm Disciplines

April 25, 2019 Data Mining: Concepts and Techniques 16


Applications

 Banking:
 predict good customers based on past transactions

 Customer relationship management:


 identify those who are likely to purchase same kind of items.

 Medicine:
 analyze patient disease history: find relationship between
diseases

 Web site/store design and promotion:


 find interestingness of visitor to pages and modify layout
Data Mining Challenges
 Though data mining is very powerful, it faces many challenges
during its implementation.
 The challenges could be related to data, performance and methods
user etc.
 Tremendous amount of data
 Data must be highly accurate to make the correct decision
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Multimedia data
 Data streams and sensor data
 Time-series data, temporal data, sequence data

April 25, 2019 Data Mining: Concepts and Techniques 18


Data Mining Challenges (contd…)
 Distributed Data
 data mining demands the development of tools and algorithms that
enable mining of distributed data.
 Performance
If the algorithms and techniques designed are not up to the mark, then
it will affect the performance of the data mining process adversely.
 Incorporation of Background Knowledge
 If background knowledge can be incorporated, more reliable and
accurate data mining solutions can be found.
 Data Visualization
 The input data and output information being really complex, very
effective and successful data visualization techniques need to be
applied to make it successful.
 Data Privacy and Security
 Data mining normally leads to serious issues in terms of data security
and privacy.
 For example, when a retailer analyzes the purchase details, it reveals
information about buying habits and preferences of customers without
their permission.
April 25, 2019 Data Mining: Concepts and Techniques 19
 I am an 8 letter word in which first 4 is the

question 234 protects our head and 567 is a liquid

in tree... 7 & 8 are same letters, who am I?

April 25, 2019 Data Mining: Concepts and Techniques 20


Whatsapp – cross platform mobile messenger

April 25, 2019 Data Mining: Concepts and Techniques 21

You might also like