You are on page 1of 49

Data Analytics Using

SAS
Group 9 Varun Jain (00380/49) Varun Jaiswal (0381/49) Sovan (0384/49) Vinod Chaudhary (0394/49) Vinoth (0395/49)

SAS

problems, but also creates opportunities.

Reporting

Analytics

Analytics not only solves

Reporting is part of a seamless process for creating & sharing intelligence.


The Importance of getting the right information to the right person at the right time which will enhance decisions with the help of fact-based decision making.

Know how to capture any abnormal behavior as close to time of the transaction as possible to combat fraud.
Reduce risk by uncovering operational gaps, vulnerabilities & threats that may otherwise go unnoticed until it is too late to intervene.

Predictive Analysis
Predictive analytics encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events
Predictive Analysis

Association Rule Mining

Regression

Classification

Clustering

Scoring

Cross Selling

Risk Management

Customer Retention

Market Segmentation

Credit Worthiness

SAS Enterprise Miner

Introduction Getting Data Explore Modify Model Assess Scoring

Scenario
Database
Orion Star Maintaining a database of all donor information including the personal and demographic details of the donors, the donation amounts by period, history of any promotional previous offers, etc.

Using SAS Enterprise Miner client 12.1 to perform an in depth analysis of the data available in order to identify relationships which have until now been ignored while targeting promotions

Getting Started

Start the SAS Enterprise Miner and start a new project.

Creating New Library

Create a data library that will contain the data to be worked upon

Creating new Data Source

Create a new data source for the dataset and modify the metadata as per requirements.

Creating Data Source : Summary

DMDB Run

Result

Exploring Data

Plotting Charts

Data options on Pie Chart

Adding Where Clause

Plotting Histogram

Histogram Based on Age

Partition
Training Test

40%

40%

Validation

20%

Determines relationship between multiple variables by calculating an equation which satisfies the observed data
Linear Regression Non Linear Regression
Logistic Regression Decision Tree ANN Ensemble

Model

Model 1: Error

Decision Tree

In classifying process, decision trees visualize which steps were taken to arrive at a classification. Decision tree begins with a root node, which is the "parent" of every other node present in the tree. Each node evaluates an attribute given in the data and determines the path that it should follow. The decision is based on comparing the value of attributes against some constant prespecified. While doing classification using a decision tree we start routing from the root node and traverse till we arrive at a leaf node.

Decision Tree
The decision tree is obtained such that purity of nodes can be gauged from the color of the nodes and the thickness of the edges shows the percentage of tuples moving to that node in the next level.

Decision Tree
The iteration plot for the tree shows that average square error is minimum at 6 number of leaves.

Artificial Neural Networks


Artificial Neural Networks (ANN) are non-linear mapping functions based on the function of Human Brain. It can identify correlated patterns between input variables and the corresponding target variables. It is very helpful in predictive modeling.

Artificial Neural Networks

Ensemble
Ensemble methods use multiple models to provide better predictive performance as compared to constituent models. Ensemble is a supervised learning algorithm because it can be trained and then used to make predictions

Ensemble

Assess
Assessment

is identification of best model based on certain criteria

Selected

model for the partition was Ensemble in our analysis

ROC curve - Area under the ensemble plot is maximum

Cumulative lift for ensemble model is maximum

Score
Created

model based on historical data can be applied to new data to predict unseen behavior process of using a model to predict about behavior which is yet to happen is called scoring.

This

Decisions are set in initial data set

Ensemble model is best where the average profit is maximized

Best model is selected. Expected profit is maximized for given decision weights. Selected model is used to predict whether to send or dont send for the entire data set.

Clustering

To understand pattern of sales in stores - Dungaree -Stores 6 and 4 provide maximum revenue -In Segment 4, Leisure performs well, Original performs badly

Association Mining

Bank & Different accounts Understand correlation and probable causation between accounts -Rules on top right best, provided lift conditions are satisfied -Horizontal line for support cutoff, Vertical line for confidence cutoff

Association Mining

Link graph: Size of node Support; Thickness of line - Confidence

SAS OLAP Cube Studio

Scenario
Sports

goods manufacturer - Various products - Different geographies - Types of customers - Sold at different times

Identify

patterns which can help the company increase revenue/profits

Metadata creation

Variables & Hierarchies

Dimensions & Star Schema

Drill Down

Drill Down

-Maximum costs for Outdoor category; Revenue not highest Could reduce cost and increase revenues -Sports doing well -Scope for improvement in Children category

Slice & Dice

- Maximum revenue from United States - Sports accounting for maximum revenue in US

Slice & Dice

-Overall rise in sales in 2005 -Members create a high percentage of sales

You might also like