Professional Documents
Culture Documents
By
M. Pranay Teja
Id no. 180030368.
Sec 25.
Data 3 1
What is Data Mining?
• Data mining is a capability to support the
recognition of previously unknown but
potentially useful relationships within large
databases/ data warehouses.
Data 3 2
Data Mining Tools
• Data mining tools use statistical or rules-based
methods to identify patterns and create predictive
models.
• Tools look for patterns using a variety of models
– Statistical methods e.g. correlation
– Decision trees
– Case based reasoning
– Neural computing
– Intelligent agents
– Genetic algorithms
Data 3 3
Text Mining
• Text Mining – Analyze text documents.
Data 3 4
Process of Data Mining/ Knowledge Discovery
Pattern Evaluation
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases Data 3 5
What does it let you to do?
• Data mining automates the process of
sifting through historical data in order to
discover new information.
• Data Mining techniques enable users to
identify patterns and correlations within a
set of data
• These can then be used as predictive
models that anticipate behaviour or events
based on trends in the data.
Data 3 6
Correlation versus Causation
• Correlation
– A statistical relation between two or more
variables such that changes in the value of one
variable are accompanied by changes in the value
of the other
• Causation
– Changes in one variable cause changes in another.
Data 3 7
What do you need for Data Mining?
Data 3 8
Five Basic Operations
• Clustering
– Identifies groups of items that share a particular characteristic
• Classification
– infers the defining characteristics of a certain group
• Association
– identifies relationships between events that occur at the one
time
• Sequencing:
– relationships over time
• Forecasting
– estimates future values based on patterns within large sets of
data
Data 3 9
Clustering
Data 3 10
Classification
Data 3 11
Association
Data 3 12
Sequential Analysis
• A form of association used to track
relationships over time.
– E.g. health insurance claims.
– E.g. 10% of customers who bought a tent bought a
backpack within one month.
– Weather patterns e.g. tidal wave in Hawaii follows
hurricane in N. Atlantic x% of the time.
Data 3 13
Forecasting
• Concerns the prediction of continuous variables
e.g. sales, share values, stock market levels, oil
prices etc.
• Often done with regression functions statistical
methods for examining the relationship between
variables in order to predict a future value.
• 2 types
– Forecasting single continuous value based on
unordered examples. e.g. predict income based on
personal details.
– Predict one or more values based on a sequential
pattern – time series forecasting.
Data 3 14
Data Mining Tools in more detail
• Case-based Reasoning
– Use historical cases to identify patterns.
• Neural Computing :
– Examine historical data for pattern recognition e.g.
identify potential customers for a new product.
• Intelligent agents
– Retrieve information from large databases.
Data 3 15
Some Key Application Areas
• Data mining is used in many different areas
• Two big areas are:
– Market analysis and management
• Initial Data Gathered From
Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, lifestyle studies, focus groups
Data 3 16
Examples
• Target marketing
– Find clusters of “model” customers who share the same characteristics: e.g.
interests, income
• Determine customer purchasing patterns over time
• Cross-market analysis uses associations/co-relations between product
sales and predicts based on the association information
• Customer profiling:
– What types of customers buy what products
• Identifying customer requirements-
– Identifying the best products for different customers, use prediction to find
what factors will attract new customers
Data 3 17
Fraud detection and management
Data 3 18
Text Mining
Data 3 19
Some more example applications by area
Data 3 21
• Airlines: Capturing data on what customers are
flying and destination of those who change
carriers midflight.
Data 3 23
Conclusion
• Data Mining is an attractive sounding
technology which is still evolving.
• The key is that the algorithms discover useful
relationships.
– Unlike standard research where researchers
hypothesise correlations and then search for
them.
• There are ethical issues:
– E.g. Criminal profiling.
Data 3 24