Professional Documents
Culture Documents
Table of Contents
Introduction
Linear Regression
Logistic Regression
Introduction
We will be covering couple of basic machine learning methods and an example on
how to use them with Python, R and MATLAB.
Introduction
Linear Regression
Linear regression is one of the most basic and widely used machine learning
method. A linear model consists of an explanatory variable X and response
variable Y. It can be represented as:
Here, is the mean effect and is the effect of the explanatory variable X on the
response variable Y.
Let's see how this works
Python
from sklearn import datasets
from sklearn import linear_model
import numpy as np
## Load boston housing dataset
boston = datasets.load_boston()
X = boston.data
X = np.insert(X,0,1,axis=1) # constant for mean effect
Y = boston.target
''' Fit linear regression model'''
''' if we had not included constant term of ones for mean in X,
then fit_intercept = True'''
model = linear_model.LinearRegression(fit_intercept=False)
modelfit = model.fit(X,Y)
beta = modelfit.coef_
Linear Regression
require(MASS)
data(Boston)
target = as.numeric(Boston$medv) # median house prices in Boston
MATLAB
% load car dataset
load carsmall
As we see above, the scripts above do the same thing. It learns the effect of each
feature in X on the response variable Y.
Linear Regression
Logistic Regression
Logistic regression is a classification method used when the response/observed
variable is categorical. Here we will only consider the case where is a binary
response variable, i.e. it takes value or . The explanatory variable that help
us predict can be discrete or continuous. Our model is same linear model:
How do we model the response such that our predictions are binary?
It turns out that we can do this using a logistic function. We model the
. This is given by:
Here
Python
Logistic Regression
fh.close()
'''Read the data file using genfromtxt function in NumPy. Skip reading the f
dat_ex = np.genfromtxt('data_files/binary.csv',delimiter=',',skip_header=
'''Dimensions od dat_ex'''
Logistic Regression
R
#Read example dataset
dat.ex <- read.csv('data_files/binary.csv',header=T)
#Dimensions of dataset
dim(dat.ex)
# We use 300 samples for training
Y <- dat.ex[1:300,1]
X <- dat.ex[1:300,c(2,3,4)]
#Test using 100 samples. In R, we can get response in the form of probabilit
testX <- dat.ex[301:400,c(2,3,4)]
testY <- dat.ex[301:400,1]
predict.prob<-predict(modelfit, newdata = testX, type = "response")
predicted.Y<-ifelse(predict.prob>0.5,1,0)
MATLAB
% Load data
dat_ex = dataset('File','data_files/binary.csv','Delimiter',',');
X = dat_ex(1:300,[2,3,4]);
Y = dat_exp(1:300,1);
testX = dat_ex(301:400,[2,3,4]);
testY = dat_exp(301:400,1);
modelfit = fitglm(double(X),double(Y),'Distribution','binomial')
predY_prob = predict(modelfit,double(testX));
predY=NaN(100,1);
predY(find(predY_prob =< 0.5)) = 0;
predY(find(predYprob > 0.5)) = 1;
Logistic Regression