Professional Documents
Culture Documents
Learning Outcomes: Learned about machine learning and predictive analysis using various models.
Declaration:
I declare that this Assignment is my individual work. I have not copied it from any other student’s work or
from any other source except where due acknowledgement is made explicitly in the text, nor has any part
been written for me by any other person.
StudentsSignature:
Marks Obtained:
Max. Marks
Introduction
In this study tree Classifier machine learning algorithm is applied to predict. The data is acquired
from Banking marketing and includes 45211 number instances(Row) data on 17
attributes(column) based on Data Mining for Bank Direct Marketing .
Dataset(s)
The data for the project was accessed from the
(https://archive.ics.uci.edu/ml/datasets/bankmarketing). The data is extracted by Paulo Cortez,
Sérgio Moro using Data Mining for Bank Direct Marketing October 2011. The data set includes
figures on 45211 observations and 17 attributes of dataset. The data is related with direct
marketing campaigns of a Portuguese banking institution. The marketing campaigns were based
on phone calls. Often, more than one contact to the same client was required, in order to access if
the product (bank term deposit) would be (or not) subscribed. The target variable in the data set
likes Ages, employee service position ,such as admin ,services, blue-collar etc. either they are
single ,married ,divorce and not based on a set of different variables. There are 17 features
containing information on class target such as nominal, numeric, binary number of variables
properties.
Methods
A supervised machine learning approach of tree Classifier is used for the study tree classifier is
chosen due to two reasons. First since the outcome (target) variable is binary variable , using
classification algorithms is better than regression algorithms. This is because the target having
only values of 0 and 1, regression algorithms will perform less due to less variation in the
target variable.
#classification
Library define as collection of stored packages of R function in which data would be complied with format
#classification
> library(tree)
> library(ISLR)
> attach(bank)
> View(bank)
> range(V1)
[1] 18 95
ind=sample(2,nrow(Bank),replace=TRUE,prob=c(0.7,0.3))
train=Bank[ind==1,]
test=Bank[ind==2,]
library(party) (A class for representing decision trees and corresponding accessor functions)
plot(model)
As we can see for analyzing the dataset bank marketing R have picked varibles (V4,V3,V7,) as most
important factor.
From above tree model we can interpreted that the areas which was mostly affected in banking
market decision as in primary,territory,secondary and unknown which is further has to be
implemented .
In V4 variables (area) like primary,secondary,unknown have different class of people like single,
married and divorce which was represented V3.
Single- p<0.001
If person was divorced and married it created four types of variables under V3 variables.
Out of four variables most of the person are married which have Node13(n=591) as comparisation to
other attributes
If “yes” then people can easily meet up to other attribute which was mentioned V4 variables of
secondary and Unknown attributes.
At last the most important attributes which was from tertiary under V4 variables that define high
affected by making unexpected decision that is comes under (V7) variables generally people are from
divorced and married and Nodes 21(n=3067) is high covered
#error in train data
tab<-table(predict(model),train$V2)
print(tab)
1-sum(diag(tab))/sum(tab)
#error in test
data
testpred<-predict(model,newdata=test)
tab2<-table(testpred,test$V2)
print(tab2)
1-sum(diag(tab2))/sum(tab2
# Regression
bank<-read.csv("C:/Users/ABC/Desktop/banking marketing.csv")
ind=sample(2,nrow(bank),replace=TRUE,prob=c(0.8,0.2))
tdata=bank[ind==1,]
vdata=bank[ind==2,]
head(tdata)
head(vdata)
results=lm(V1~V6,tdata)
results
prediction=predict(results,vdata)
head(prediction)
head(vdata)
summary(results)
results
> results
Call:
lm(formula = V1 ~ V6, data = tdata)
Coefficients:
(Intercept) V6
4.047e+01 3.343e-04
#Clustering
Clustering is an unsupervised learning technique. It is the task of grouping together a set of
objects in a way that objects in the same cluster are more similar to each other than to objects in
other clusters.
According to Bank marketing dataset has disnabled to made a cluster Plot Inspite of this dataset I
had taken Inbuilt dataset that is “US Arrests” . It is already mentioned in a inbuilt data of
Rstudio .
As below it was “USArresrts” data The commands are for plotting cluster which was shown highly affected area.
In terms if Murder, Assault ,Rape etc.
#Full script
attach(USArrests)
> View(USArrests)
> df=USArrests
> df=na.omit(df)
> df=scale(df)
> k2=kmeans(df,centers = 2,nstart = 20)
> fviz_cluster(k2,data=df)