You are on page 1of 15

DATA MINING AND

WAREHOUSE

By,
T.KARTHI,
S.KARTHIKEYAN.
CONTENT
• Process of KDD,
• Architecture of a data mining system,
• Different levels of knowledge,
• Functionalities of data mining,
• Multiple discipline of data mining,
• Major issues in data mining,
• Application.
INTRODUCTION
• Data mining is for discovering interesting
patterns from large amount of data.
• Process of finding correlations or patterns
among dozens of fields in large relational
databases.
• A KDD process includes data cleansing, data
integration, data selection, transformation,
pattern evaluation and knowledge presentation.
TO KNOW
Data Warehousing:
• Process of centralized data management and
retrievel.
• It is a relational database management system
(RDMS) to meet the needs of transaction
processing system.
Data mining:
• Provides a way to get at the information buried in
the data.
KDD Process
KDD – Knowledge Discovery in Databases.
• Extracting of interesting information or
patterns from data in large databases.
• Finding the right method to do data mining.
• It’s of interest to researchers in machine
learning, pattern recognition, databases,
statistics, artificial intelligence and data
visualization.
STEPS
• Developing and understanding of the goals of the end
user.
• Selecting a data set, or focusing on a subset of
variables,or data samples.
• Data cleaning and preprocessing.
removal of noise
strategies for handling missing data fields.
• Data reduction and projection.
dimensionality reduction or transformation methods.
•Choosing the data mining task.
•Choosing the data mining algorithm(s).
1)Searching for patterns.
2)Deciding which models and
parameters may be appropriate.
3)matching a particular data
mining method.
•Data mining.
•Interpreting mined process.
•Consolidating discovered knowledge.
Schematic representation of KDD
process
Data mining functionalities (1)
Association rule mining:
1)classification and prediction
2)finding models that describe and
distinguish classes.
-e.g., classify countries based on climate.
-Presentation: decision-tree, clasification rule,
neural network.
-Prediction: Predict some unknown or missing
numerical values.
Data mining functionalities (2)
Cluster analysis:
Class label is unknown.
Clustering based on the principle.
Methods:
Partitioning: k-means, k-medoids, CLARANS
Hierarchical: BIRCH, CURE
Density-based: DBSCAN, CLIQUE, OPTICS
Grid-based: STING, WaveCluster
Model-based: Autoclass,denclue,cobweb
Data mining functionalities (3)
Outlier analysis
A data object that does not comply with the
general behavior of the data.
It can be considered as noise but is quite useful
in fraud detection,rare event analysis.
Trend and evolution analysis
Trend and deviation: regression analysis.
Sequential pattern mining, periodicity analysis.
Similarity-based analysis.
Major issues in data mining

• Mining methodology and user interaction


• Performance and scalability
• Issues relating to the diversity of data types
• Issues relete to applications and social impacts
Applications
Database analysis and decision support
1)Market analysis and management,
2)Risk analysis and management,
3)Fraud detection and management.
Other applications
1)Text mining and web analysis,
2)Intelligent query answering,
3)Sports,
4)Astronomy.
Quries???...

You might also like