You are on page 1of 9

Basic

ML with python, R and MATLAB

Table of Contents
Introduction

Linear Regression

Logistic Regression

Likelihood ratio tests using logistic regression

Basic ML with python, R and MATLAB

Introduction
We will be covering couple of basic machine learning methods and an example on
how to use them with Python, R and MATLAB.

Introduction

Basic ML with python, R and MATLAB

Linear Regression
Linear regression is one of the most basic and widely used machine learning
method. A linear model consists of an explanatory variable X and response
variable Y. It can be represented as:

Here, is the mean effect and is the effect of the explanatory variable X on the
response variable Y.
Let's see how this works

Python
from sklearn import datasets
from sklearn import linear_model
import numpy as np
## Load boston housing dataset
boston = datasets.load_boston()
X = boston.data
X = np.insert(X,0,1,axis=1) # constant for mean effect
Y = boston.target
''' Fit linear regression model'''
''' if we had not included constant term of ones for mean in X,
then fit_intercept = True'''
model = linear_model.LinearRegression(fit_intercept=False)
modelfit = model.fit(X,Y)
beta = modelfit.coef_

Here the length of is equal to # explanatory variables (columns in ) +1 (for the


mean term, )

Linear Regression

Basic ML with python, R and MATLAB

require(MASS)
data(Boston)
target = as.numeric(Boston$medv) # median house prices in Boston

X = Boston[,c(1:13)] # Features that can help predict house prices in Boston


Y = target
model = glm(Y~1+., data = X, family = gaussian(link = identity))

beta = model$coefficients #here the first entry in beta is the intercept/mea

MATLAB
% load car dataset
load carsmall

X = [Weight Horsepower Cylinders Model_Year]; %Features that can help to pre


Y = MPG; %target variable - mileage
modelfit = fitlm(X,Y); % automatically fits intercept
beta = modelfit.Coefficients;

As we see above, the scripts above do the same thing. It learns the effect of each
feature in X on the response variable Y.

Linear Regression

Basic ML with python, R and MATLAB

Logistic Regression
Logistic regression is a classification method used when the response/observed
variable is categorical. Here we will only consider the case where is a binary
response variable, i.e. it takes value or . The explanatory variable that help
us predict can be discrete or continuous. Our model is same linear model:

How do we model the response such that our predictions are binary?
It turns out that we can do this using a logistic function. We model the
. This is given by:

Here

is the logistic function:

Using the logistic function, we can get the probability of being or .

Let's see how to do this using Python, R and MATLAB


Please download example data from:
https://raw.githubusercontent.com/princyparsana/cs438/master/data_files/binary.c
sv

Python

Logistic Regression

Basic ML with python, R and MATLAB

from sklearn import linear_model


import numpy as np
'''Read first line of the file - column names'''
col_names = []
with open('data_files/binary.csv','r') as fh:
col_names.extend(fh.readline().strip().split(','))

fh.close()

'''Read the data file using genfromtxt function in NumPy. Skip reading the f
dat_ex = np.genfromtxt('data_files/binary.csv',delimiter=',',skip_header=
'''Dimensions od dat_ex'''

dat_ex.shape ## first entry it returns is number of rows and second is the n

'''The first column of this file is 'admit' which is a binary variable


where 0 represents that the student was not admitted to the
program and 1 represents students who were offered admission
So here first column is the response variable Y
We Use 300 samples for training and 100 for test'''

Y = dat_ex[0:300,0] # selecting all rows of column 0. In python indexing beg

X = dat_ex[0:300,1::] # In this example we do not add constant ones, and hen

model = linear_model.LogisticRegression(C=1e86) ''' Since scikit-learn imple


of logistic regression is penalized version,
we set C - inverse of penalty parameter to a large number
This leads to very low to no regularization'''
modelfit = model.fit(X,Y)
beta = modelfit.coef_
intercept = modelfit.intercept_
testX = dat_ex[300::,1::]
testY = dat_ex[300::,0]
'''
We use the fitted model to predict Y using our test data testX
'''
predicted_Y = modelfit.predict(testX)

Logistic Regression

Basic ML with python, R and MATLAB

R
#Read example dataset
dat.ex <- read.csv('data_files/binary.csv',header=T)
#Dimensions of dataset
dim(dat.ex)
# We use 300 samples for training
Y <- dat.ex[1:300,1]
X <- dat.ex[1:300,c(2,3,4)]

modelfit = glm(Y~1+.,family = binomial(),data = X) # 1 is for the intercept

#Test using 100 samples. In R, we can get response in the form of probabilit
testX <- dat.ex[301:400,c(2,3,4)]
testY <- dat.ex[301:400,1]
predict.prob<-predict(modelfit, newdata = testX, type = "response")
predicted.Y<-ifelse(predict.prob>0.5,1,0)

MATLAB
% Load data
dat_ex = dataset('File','data_files/binary.csv','Delimiter',',');
X = dat_ex(1:300,[2,3,4]);
Y = dat_exp(1:300,1);
testX = dat_ex(301:400,[2,3,4]);
testY = dat_exp(301:400,1);
modelfit = fitglm(double(X),double(Y),'Distribution','binomial')
predY_prob = predict(modelfit,double(testX));
predY=NaN(100,1);
predY(find(predY_prob =< 0.5)) = 0;
predY(find(predYprob > 0.5)) = 1;

Logistic Regression

Basic ML with python, R and MATLAB

Likelihood ratio tests using logistic


regression

Likelihood ratio tests using logistic regression

You might also like