Professional Documents
Culture Documents
By
Saurabh Jain
General Concept of Data Mining
• Most organization have accumulated a great
deal of data, but, what they really want is
information
• Data mining is the process of finding
correlations or patterns among dozens of
fields in large relational databases.
General Concept of Data Mining
(cont’d)
Algorithms Technologies
• Associations • Neural networks
• Classifications • Decision trees
• Sequential discovery • Rule induction
• Clustering • Data visualization
Algorithms
1. Associations
- This is used to identify items that occur
together in a given event or record.
- This technique is often used for market
analysis, rules hidden between the attributes
Ex. “When people buy a hammer they also buy
nails 50% of the time.”
Algorithms (cont’d)
2. Classifications
This is used to classify database records
into a number of predefined classes based
on certain criteria.
Ex. “Customers with excellent credit history
have a debt/equity ratio of less than 10%”
Algorithms (cont’d)
3. Sequential Discovery
This helps identify patterns in time series.
4. Clustering
This is used to segment the database into
different clusters, based on a set of
attributes.
2. Decision Trees(cont’d)
- Cannot use continuous data
- Used for model understanding rather than
prediction
Technologies (cont’d)
3. Rule induction
All possible patterns in the database are
systematically pulled out and then the
accuracy and the coverage are calculated.
Ex.
IF breakfast cereals, then milk : accuracy 90%, coverage 15%
IF Friday and male and diapers, then beer: