You are on page 1of 27

Data Mining Tools

Done by Muthurathna G (10AD19)

Association Analysis

Association Analysis
Association analysis uncovers hidden patterns, correlations or casual structures among a set of items or objects.

For example, association analysis enables you to understand what products and services customers tend to purchase at the same time.

Data Mining->Expand Association Analysis>Right Click Association Analysis->Create Model(SAP easy access)

Choose the Model Name and Description

The screen shows the list of Fields and we can select and exclude fields in it

Few analysis done


Sequence analysis Link analysis Unique data analysis

Test for Association using SPSS


Example Gender (male/female) is associated with the preferred type of learning medium (online vs. books). We therefore have two nominal variables: Gender(male/female) and Preferred Learning Medium (online/books). Procedure in SPSS Click Analyze > Descriptives Statistics > Crosstabs... on the to menu as

K MEAN

An simple learning algorithm for clustering analysis The goal of K-Means algorithm is to reduce the total distance between the group's members and its corresponding centroid

steps involved in the K-Means clustering algorithm


1. The given sample set is first randomly distributed between these k different clusters. 2. The distance measurement between each of the sample, to their respective cluster centroid is calculated. 3. Records the shortest distance from a sample to the cluster (k ) centroid.

Distance Types
Euclidean distance It takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels Manhattan distance Also known as city-block distance, this distance measurement is especially relevant for discrete data sets. While the Euclidean distance corresponds to the length of the shortest path between two samples (i.e. as the crow flies), the Manhattan distance refers to the sum of distances along each dimension (i.e. walking round the block). Pearson Correlation distance This distance is based on the Pearson correlation coefficient that is calculated from the sample values and their standard deviations.

Absolute Pearson Correlation distance The corresponding distance lies between 0 and 1, just like the correlation coefficient. The equation for the Absolute Pearson distance -da- is: da = 1 - r Un-centered Correlation distance The un-centered correlation coefficient lies between 1 and +1; hence the distance lies between 0 and 2. Kendall s (tau) distance This non-parametric distance measurement is more useful in identifying samples with a huge deviation in a given data set.

Neural Networks

Introduction
An artificial neural network is a mathematical model or computational model based on biological neural networks It consists of an interconnected group of artificial neurons Low error rate, the continuously advancing and optimization of various network training algorithms, continuously advancing and improvement of various network pruning algorithms and rules extracting algorithm

NEURAL NETWORK METHOD IN DATA MINING Neural network method is used for classification, clustering, feature mining, prediction and pattern recognition The neural network model can be broadly divided as: (1) Feed-forward network (2) Feedback network (3) Self-organization networks

DATA MINING PROCESS BASED ON NEURAL NETWORK


Composed of 3 phases. 1.Data preparation
2.Rules extracting 3.Rules assessment

Data Preparation includes the four processes. 1) Data cleaning 2) Data option 3) Data preprocessing 4) Data expression

Testing
Black-box testing Comparing test results to historical results The primary approach for verifying that inputs produce the appropriate outputs. Error terms can be used to compare results against known benchmark methods. The network is generally not expected to perform perfectly

Rules Assessment
The rules can be assessed in accordance with the following objectives, Find the optimal sequence of extracting rules, making it obtains the best results in the given data set Test the accuracy of the rules extracted Detect how much knowledge in the neural network has not been extracted Detect the inconsistency between the extracted rules and the trained neural network

You might also like