Partial Least Square Discriminant Analysis

Partial Least Square Discriminat Analysis
Classifier to treat unbalanced data:

A Case study on Malaysian Bankruptcy
SMEs
Sallehuddin Hussin
2013597357
Problem Statement
Problems with bankruptcy study on SMEs:
Unbalanced dataset- Bankruptcy is a very rare
event. The number of bankrupt firms is very
small compared to non-bankrupt firms so the
dataset will be highly imbalanced.
Minority class is less than 5% known as a rare event
(Au et al. 2010)
Model is not meaningful because lack of information
to learn from the rare event (Bee Wah Yap et al.
2014)
Problem Statement
Multicollinearity- Financial ratios are ratios of
two financial items (such as total assets, total

liabilities, current assets and etc.). The same
item might be used to make up different ratios.
This can cause multicollinearity.
Multicollinearity is a major problemw hen building
models based on financial data (Serrano et al. 2013)
The advantage of PLS is on its capability to deal with
multicollinearity, its robustness to missing data and
skew distributions (Cassel et al. 1999)
Research Question
How efficient are the re-sampling techniques
between SMOTE and One-Sided Selection

(OSS) in classifying bankrupt SMEs from the
non-bankrupt ones?
How efficient is the PLSDA classifier in
classifying bankrupt SMEs from the nonbankrupt ones?
Objectives
To compare the performance of re-sampling
techniques between SMOTE and One-Sided

Selection (OSS) for bankruptcy dataset.
To determine the efficiency of PLSDA classifier
in classifying bankrupt and non-bankrupt
Malaysian SMEs.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Resampling-SMOTE
Synthetic Minority Over-sampling Technique
Oversampling technique proposed by Chawla
et al (2002)
The idea is to form new minority examples by
interpolating between example of the same
class.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Resampling-OSS
One Sided Selection
Under-sampling technique purposed by Kubat
et al (1997).
The idea is reduce the majority class by
considers important observations at the
border classes, and the minority group.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Classifier: Partial Least

Square
Not originally designed for discriminant but
empirical studies shown it perform well for

classification.
PLS transforms original variables into
orthogonal components (latent variable) by

taking into account both the independent and
dependent variables
Dimension reduction which extract the score
vectors as new independent variables
Classifier: Partial Least

Square
PLS was introduced by Wold (1966) who
created Non-linear iterative partial least

squares (NIPALS) algorithm
Then PLS was improved by Jong (1993) with
SIMPLS by simplify the algorithm
Barker (2003) properly discussed PLS for
discrimination by synchronizing PLS algorithm
and Linear Discriminat Analysis (LDA). Partial
least squares Discriminant Analysis (PLS-DA)
is a variant used when the dependent variable
is categorical.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Model Performance measures

Accuracy =
Sensitivity=
Specificity=
Actual
Prediction
Positive
Negative
Positive
TP
FN
Negative
FP
TN
Area under ROC curve (AUC) : ROC graphs are
two-dimensional graph in which Sensitivity is

plotted on the Y axis and 1-Specificity is
plotted on X axis.
Methodology-Data
This research use s secondary data (financial
statement)which was obtained from

Suruhanjaya Syarikat Malaysia (SSM)
This study only focuses on firm categorical
under TRANSPORTATION & STORAGE service
sector Malaysian SMEs.
Methodology-Data
One year
before firm go
bankrupt
Two years
before firm go
bankrupt
Three years
2006
9
5
5
2007
9
6
4
2008
7
1
7
2009
2010
9
1
10
Bankrupt
19
19
15
6
7
1
14
NonBankrupt
Total
1548
1706
1883
1925
2051
1567
1725
1898
1939
2061
before firm go
bankrupt
Total
77
(0.8%)
9113
(99.2%)
9190
The table shows the number of bankrupt and
non bankrupt firms based on year-end.

This study uses 3 year financial statement
before firm bankrupt as the bankruptcy data.
Select all data for existing firm in the period of
the study. as Non bankruptcy
Total number of data from 2006 to 2010 is
9190 which 0.8% of bankruptcy and the rest
for nonbankruptcy. This shows that the
dataset is highly unbalanced.
Methodology-Data
2006
9
5
5
2007
9
6
4
2008
7
1
7
2009
2010
9
1
10
Bankrupt
19
19
15
6
7
1
14
NonBankrupt
Total
1548
1706
1883
1925
2051
1567
1725
1898
1939
2061
training
testing
60%
40%
Total
77
(0.8%)
9113
(99.2%)
9190
For the classification model, the data is split
into model construction (or training set) and

model evaluation (or testing set).
For this study the training data is set on
periods 2006 to 2008 inclusively while the
testing data is set from 2009 to 2010 The ratio
is approximate to 60:40.
Methodology-Variables
The output variables is the status of company which
is bankrupt or non-bankrupt.
The input variables are the financial ratios of
company
Adopt from Hossari&Rahman (2006) who ranked48
Financial Ratios based on its popularity

Used 22 out of 48 financial Ratio from the list.
According to Ivo (2011), any ratio that can take a zero
or negative denominator never makes any sense. She
suggest to winsorise* the denominator to smaller
positive value.
Methodology-Variables
No Variable
Detail
No
Variable
Detail
Net Income/Total Assets

Current Assets/Current
Liabilities
13
S/FA*
Sales/Fix Assets
14
TE/TL
Total Equity/Total Liabilities
NI/TA
CA/CL*
TL/TA
Total Liabilities/Total Assets
15
FA/TA
Fix Assets/Total Assets
WC/TA
Working Capital/Total Assets
16
FA/TE*
Fix Assets/Total Equity
TL/TE*
Total Liabilities/Total Equity
17
LTL/TA
S/TA
Sales/Total Assets
18
CL/TA
CA/S
Current Assets/Sales
19
CL/TE*
CA/TA
Current Assets/Total Assets
20
EBT/TA
NI/S
Net Income/Sales
21
LTL/TE*
10
NI/TE*
Net Income/Total Equity
22
S/TE*
11
TE/TA
Total Equity/Total Assets
23
TE/LTL*
12
WC/S
Working Capital/Sales
Long-Term Liabilities/Total
Assets
Current Liabilities/Total
Assets
Current Liabilities/Total
Equity
Earnings Before Taxes/Total
Assets
Long-Term Liabilities/Total
Equity
Sales/Total Equity
Total Equity/Long-Term
Liabilities
Preliminary StudyDescriptive
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
F13
F14
F15
F16
F17
F18
F19
F20
F21
F22
F23
Bankrupt
-1.21
8.44
1.74
-1.31
5.79
1.88
2.75
-0.6
-2.93
-7.08
-1.54
-6.68
3.55
6.99
0.76
5.34
1.47
1.62
5.96
-0.73
3.23
1.67
-0.07
Skew
Nonbankru
pt
67.37
57.2
56.62
-56.97
73.49
95.43
18.55
-0.82
0.02
-21.96
-56.61
-4.81
4.7
35.87
0.16
50.59
51.46
57
20.87
81.95
80.25
3.62
59.87
All
Bankrupt
67.66
4.68
57.42
70.07
56.86
3.7
-57.21
2.88
73.62
39.36
95.83
5.37
18.55
7.42
-0.81
-0.86
0.02
13.62
-21.79
54.69
-56.85
2.54
-4.82
50.52
4.7
11.75
36.02
53.81
0.17
-0.55
50.78
31.38
51.62
2.39
57.24
3.44
20.39
41
82.29
4.53
80.58
9.78
3.59
1.81
60.12
7.54
Kurtosis
Nonbankru
pt
5009.88
3781
3334.42
3364.21
6221.63
9105.96
588.06
1.35
868.96
870.26
3334.13
374.18
27.99
1534.99
8.85
3078.55
3213.93
3366.66
664.74
7185.99
7049.44
14.77
4664.15
All
Bankrupt
5052.06
0.01
3811.32
32.53
3362.54
0.05
3392.57
0.05
6252.71
1024.38
9182.96
0.18
589.54
0.06
1.33
0.03
873.54
0.01
859.66
136.17
3362.24
0.05
373.74
0.09
27.92
18.56
1547.93
0.14
8.78
0.03
3102.74
190.71
3236.38
0.02
3395.04
0.05
636.28
943.81
7246.62
0.02
7108.66
116.52
14.48
563.38
4703.16
39.46
se
Nonbankru
pt
0.03
18.39
0.12
0.12
212.61
54.83
0.01
0
0
11.15
0.12
0.01
1.87
0.36
0
92.64
0.01
0.12
47.19
0.05
184.58
28.14
57.01
All
0.03
18.24
0.12
0.12
211.01
54.37
0.01
0
0
11.11
0.12
0.01
1.87
0.36
0
91.88
0.01
0.12
47.48
0.05
183.04
28.36
56.54
Preliminary Study-PLSDA
Mixomic package in R was used to come out
with Original PLS DA result.

Prediction
Actual
Total
Bankruptcy
Nonbankruptcy
Bankruptcy
17
24
Nonbankruptcy
520
3456
3976
Total
527
3473
4000
Accuracy= 0.86575
Specificity= 0.869215
Sensitivity= 0.291667
AUC= 0.58
R package coding
##set trainning and testing sample -ignore data year
train<-subset(mydata, Year < 2009, select= -Year)
test<-subset(mydata, Year >= 2009, select=-Year)
X<- train[,1:23]
Y<- train$Status
X.test<- test [,1:23]
Y.test<- test $Status
plsda.train <- plsda(X, Y, ncomp = 2)
test.predict<- predict(plsda.train, X.test, method = "mahalanobis.dist")
Prediction<- levels(Y)[test.predict$class$mahalanobis.dist[, 2]]
cbind(Y = as.character(Y.test,) Prediction)
Thank you

Partial Least Square Discriminant Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partial Least Square Discriminant Analysis

Uploaded by

Copyright:

Available Formats

Partial Least Square Discriminat Analysis

Classifier to treat unbalanced data:

two financial items (such as total assets, total

between SMOTE and One-Sided Selection

techniques between SMOTE and One-Sided

Classifier: Partial Least

empirical studies shown it perform well for

orthogonal components (latent variable) by

Classifier: Partial Least

created Non-linear iterative partial least

Model Performance measures

Area under ROC curve (AUC) : ROC graphs are

two-dimensional graph in which Sensitivity is

statement)which was obtained from

The table shows the number of bankrupt and

non bankrupt firms based on year-end.

For the classification model, the data is split

into model construction (or training set) and

Financial Ratios based on its popularity

Net Income/Total Assets

Total Equity/Total Liabilities

Total Liabilities/Total Assets

Fix Assets/Total Assets

Working Capital/Total Assets

Fix Assets/Total Equity

Total Liabilities/Total Equity

Current Assets/Total Assets

Net Income/Total Equity

Total Equity/Total Assets

with Original PLS DA result.

You might also like