You are on page 1of 20

DATA MINING AND ITS TECHNIQUES

Data mining is the analysis step of the knowledge discovery in databases process, a relatively young and interdisciplinary field of computer science. The process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems.

The actual data mining task is the automatic or semiautomatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records unusual records and dependencies . This usually involves using database techniques such as spatial indexes. These patterns can then be seen as a kind of summary of the input data, and used in further analysis or for example in machine learning and predictive analytics. Neither the data collection, data preparation nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps

Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Data Mining Techniques

Descriptive

Predictive

Clustering

Classification

Association

Decision Tree

Sequential Analysis

Rule Induction

Neural Networks

Nearest Neighbor Classification

Regression

Automatic Cluster Detection is useful to find better behaved clusters of data within a larger dataset; seeing the forest without getting lost in the trees. When used for directed data mining

Marketing clusters referred to as segments Customer segmentation is a popular application

of clustering

HOW DOES A DECISION TREE WORKS?


Suppose an automobile company uses data mining software to do the knowledge discovery of purchasing pattern of its customers.

Age <30 Car Type Minivan YES Sports, Truck NO >=30 NO

Minivan YES Sports, Truck NO NO

30

60

Age

In memory based reasoning the data mining or the problem solving is done by reusing past experiences.
Experts often find it easier to relate stories about past cases than to formulate rules.

Some biologists suggests that elephant succeed in harsh environment due to their memory A herd of elephants retain a collective memory of problems and their solutions Example: they remember where water can usually be found during a draught due to their past experience.

This analysis is very useful for findings patterns from relationships. For example:-telephone calls connect people and establish relationships, airlines link cities together, etc. Also, this technique mines relationships and discovers knowledge. For example:-if you look at the supermarket sale transaction for one day, why are skim milk and brown bread found in the same transaction about 80% of the time? Is there a strong relationship between the two products in the supermarket basket? If so, can these products be promoted together? Are there more such combinations? How can we find such links?

a. b. c.

Depending upon the types of knowledge discovery, link analysis techniques have three types of applications: Associations discovery Sequential pattern discovery Similar time sequence discovery

Associations are affinities between items. These algorithms find combinations where the presence of one or more items suggests the presence of another. For example:-when you apply these algorithms to the shopping transactions at a supermarket, they will uncover links among products that are likely to be purchased together. The two parts- support factor and the confidence factorindicate the strength of the association. Rules with high support and confidence factor values are more valid, relevant and useful.

These algorithms discover patterns where one set of items follows another specific set. Time plays a role in these patterns. When you select records for analysis, you must have date and time as data items to enable discovery of sequential patterns. Typical discoveries include associations of the following types: Purchase of digital camera is followed by purchase of a color printer 60% of the time. Purchase of window curtains is followed by purchase of living room furniture 50% of the time.

This technique depends on the availability of time sequences. In the previous technique, the results indicate sequential events over time. This technique however, finds a sequence and then comes up with other similar sequences of events. For example:- in retail department stores, this data mining technique comes up with a second department that has a sales stream similar to the first. Finding similar sequential price movements of the another application of this technique.

Mimic the human brain by learning from training dataset. Applying learning to generalize patterns for classification and prediction. The basic unit of neural network is modeled after the neurons in the brain. This is known as node. Other Structure is the LINK that corresponds to the connection between neurons in the brain.

It has basis in biology. It is said that evolution and natural selection promote the Survival of the fittest. It uses a highly iterative process of selection, cross-over and mutation operators to evolve successive generation of models.

Population A

Evaluation

Population B

Mutation

Selection

Population D Crossover

Population C

Population Possible solutions of the problem Traditionally represented as bit-strings (e.g. each bit associated to a feature, indicating if it is selected or not) Each bit of an individual is called gene Evaluation Giving a goodness value to each individual in the population Selection Process that rewards good individuals Good individuals will survive, and get more than one copy in the next population. Bad individuals will disappear

THANK YOU

You might also like