Data Mining Tools: Association Analysis, K-Means Clustering, Neural Networks

Data Mining Tools
Done by Muthurathna G (10AD19)
Association Analysis
Association Analysis
Association analysis uncovers hidden patterns, correlations or casual structures among a set of items or objects.
For example, association analysis enables you to understand what products and services customers tend to purchase at the same time.
Data Mining->Expand Association Analysis>Right Click Association Analysis->Create Model(SAP easy access)
Choose the Model Name and Description
The screen shows the list of Fields and we can select and exclude fields in it
Few analysis done

Sequence analysis Link analysis Unique data analysis
Test for Association using SPSS

Example Gender (male/female) is associated with the preferred type of learning medium (online vs. books). We therefore have two nominal variables: Gender(male/female) and Preferred Learning Medium (online/books). Procedure in SPSS Click Analyze > Descriptives Statistics > Crosstabs... on the to menu as
K MEAN
An simple learning algorithm for clustering analysis The goal of K-Means algorithm is to reduce the total distance between the group's members and its corresponding centroid
steps involved in the K-Means clustering algorithm

1. The given sample set is first randomly distributed between these k different clusters. 2. The distance measurement between each of the sample, to their respective cluster centroid is calculated. 3. Records the shortest distance from a sample to the cluster (k ) centroid.
Distance Types
Euclidean distance It takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels Manhattan distance Also known as city-block distance, this distance measurement is especially relevant for discrete data sets. While the Euclidean distance corresponds to the length of the shortest path between two samples (i.e. as the crow flies), the Manhattan distance refers to the sum of distances along each dimension (i.e. walking round the block). Pearson Correlation distance This distance is based on the Pearson correlation coefficient that is calculated from the sample values and their standard deviations.
Absolute Pearson Correlation distance The corresponding distance lies between 0 and 1, just like the correlation coefficient. The equation for the Absolute Pearson distance -da- is: da = 1 - r Un-centered Correlation distance The un-centered correlation coefficient lies between 1 and +1; hence the distance lies between 0 and 2. Kendall s (tau) distance This non-parametric distance measurement is more useful in identifying samples with a huge deviation in a given data set.
Neural Networks
Introduction
An artificial neural network is a mathematical model or computational model based on biological neural networks It consists of an interconnected group of artificial neurons Low error rate, the continuously advancing and optimization of various network training algorithms, continuously advancing and improvement of various network pruning algorithms and rules extracting algorithm
NEURAL NETWORK METHOD IN DATA MINING Neural network method is used for classification, clustering, feature mining, prediction and pattern recognition The neural network model can be broadly divided as: (1) Feed-forward network (2) Feedback network (3) Self-organization networks
DATA MINING PROCESS BASED ON NEURAL NETWORK

Composed of 3 phases. 1.Data preparation
2.Rules extracting 3.Rules assessment
Data Preparation includes the four processes. 1) Data cleaning 2) Data option 3) Data preprocessing 4) Data expression
Testing
Black-box testing Comparing test results to historical results The primary approach for verifying that inputs produce the appropriate outputs. Error terms can be used to compare results against known benchmark methods. The network is generally not expected to perform perfectly
Rules Assessment
The rules can be assessed in accordance with the following objectives, Find the optimal sequence of extracting rules, making it obtains the best results in the given data set Test the accuracy of the rules extracted Detect how much knowledge in the neural network has not been extracted Detect the inconsistency between the extracted rules and the trained neural network

Data Mining Tools: Association Analysis, K-Means Clustering, Neural Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Tools: Association Analysis, K-Means Clustering, Neural Networks

Uploaded by

Copyright:

Available Formats

Data Mining Tools

Done by Muthurathna G (10AD19)

Choose the Model Name and Description

Few analysis done

Test for Association using SPSS

steps involved in the K-Means clustering algorithm

DATA MINING PROCESS BASED ON NEURAL NETWORK

You might also like