Professional Documents
Culture Documents
Sallehuddin Hussin
2013597357
Problem Statement
Problems with bankruptcy study on SMEs:
Unbalanced dataset- Bankruptcy is a very rare
event. The number of bankrupt firms is very
small compared to non-bankrupt firms so the
dataset will be highly imbalanced.
Minority class is less than 5% known as a rare event
(Au et al. 2010)
Model is not meaningful because lack of information
to learn from the rare event (Bee Wah Yap et al.
2014)
Problem Statement
Multicollinearity- Financial ratios are ratios of
Research Question
How efficient are the re-sampling techniques
Objectives
To compare the performance of re-sampling
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Resampling-SMOTE
Synthetic Minority Over-sampling Technique
Oversampling technique proposed by Chawla
et al (2002)
The idea is to form new minority examples by
interpolating between example of the same
class.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Resampling-OSS
One Sided Selection
Under-sampling technique purposed by Kubat
et al (1997).
The idea is reduce the majority class by
considers important observations at the
border classes, and the minority group.
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Framework
Classifier: PLSDA
Original
PLSDA
PLSDA+
exponential
PLS DA+
Nearest
Neighbors
Evaluation
AUC
Accurac
y
Sensitivity
Specificity
MODEL
EVALUATION
One-Sided
Selection (OSS)
SMOTE
MODEL
CONSTRUCION
Resampling
Actual
Prediction
Positive
Negative
Positive
TP
FN
Negative
FP
TN
Methodology-Data
This research use s secondary data (financial
Methodology-Data
One year
before firm go
bankrupt
Two years
before firm go
bankrupt
Three years
2006
9
5
5
2007
9
6
4
2008
7
1
7
2009
2010
9
1
10
Bankrupt
19
19
15
6
7
1
14
NonBankrupt
Total
1548
1706
1883
1925
2051
1567
1725
1898
1939
2061
before firm go
bankrupt
Total
77
(0.8%)
9113
(99.2%)
9190
Methodology-Data
2006
9
5
5
2007
9
6
4
2008
7
1
7
2009
2010
9
1
10
Bankrupt
19
19
15
6
7
1
14
NonBankrupt
Total
1548
1706
1883
1925
2051
1567
1725
1898
1939
2061
training
testing
60%
40%
Total
77
(0.8%)
9113
(99.2%)
9190
Methodology-Variables
The output variables is the status of company which
is bankrupt or non-bankrupt.
The input variables are the financial ratios of
company
Adopt from Hossari&Rahman (2006) who ranked48
Methodology-Variables
No Variable
Detail
No
Variable
Detail
13
S/FA*
Sales/Fix Assets
14
TE/TL
NI/TA
CA/CL*
TL/TA
15
FA/TA
WC/TA
16
FA/TE*
TL/TE*
17
LTL/TA
S/TA
Sales/Total Assets
18
CL/TA
CA/S
Current Assets/Sales
19
CL/TE*
CA/TA
20
EBT/TA
NI/S
Net Income/Sales
21
LTL/TE*
10
NI/TE*
22
S/TE*
11
TE/TA
23
TE/LTL*
12
WC/S
Working Capital/Sales
Long-Term Liabilities/Total
Assets
Current Liabilities/Total
Assets
Current Liabilities/Total
Equity
Earnings Before Taxes/Total
Assets
Long-Term Liabilities/Total
Equity
Sales/Total Equity
Total Equity/Long-Term
Liabilities
Preliminary StudyDescriptive
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
F13
F14
F15
F16
F17
F18
F19
F20
F21
F22
F23
Bankrupt
-1.21
8.44
1.74
-1.31
5.79
1.88
2.75
-0.6
-2.93
-7.08
-1.54
-6.68
3.55
6.99
0.76
5.34
1.47
1.62
5.96
-0.73
3.23
1.67
-0.07
Skew
Nonbankru
pt
67.37
57.2
56.62
-56.97
73.49
95.43
18.55
-0.82
0.02
-21.96
-56.61
-4.81
4.7
35.87
0.16
50.59
51.46
57
20.87
81.95
80.25
3.62
59.87
All
Bankrupt
67.66
4.68
57.42
70.07
56.86
3.7
-57.21
2.88
73.62
39.36
95.83
5.37
18.55
7.42
-0.81
-0.86
0.02
13.62
-21.79
54.69
-56.85
2.54
-4.82
50.52
4.7
11.75
36.02
53.81
0.17
-0.55
50.78
31.38
51.62
2.39
57.24
3.44
20.39
41
82.29
4.53
80.58
9.78
3.59
1.81
60.12
7.54
Kurtosis
Nonbankru
pt
5009.88
3781
3334.42
3364.21
6221.63
9105.96
588.06
1.35
868.96
870.26
3334.13
374.18
27.99
1534.99
8.85
3078.55
3213.93
3366.66
664.74
7185.99
7049.44
14.77
4664.15
All
Bankrupt
5052.06
0.01
3811.32
32.53
3362.54
0.05
3392.57
0.05
6252.71
1024.38
9182.96
0.18
589.54
0.06
1.33
0.03
873.54
0.01
859.66
136.17
3362.24
0.05
373.74
0.09
27.92
18.56
1547.93
0.14
8.78
0.03
3102.74
190.71
3236.38
0.02
3395.04
0.05
636.28
943.81
7246.62
0.02
7108.66
116.52
14.48
563.38
4703.16
39.46
se
Nonbankru
pt
0.03
18.39
0.12
0.12
212.61
54.83
0.01
0
0
11.15
0.12
0.01
1.87
0.36
0
92.64
0.01
0.12
47.19
0.05
184.58
28.14
57.01
All
0.03
18.24
0.12
0.12
211.01
54.37
0.01
0
0
11.11
0.12
0.01
1.87
0.36
0
91.88
0.01
0.12
47.48
0.05
183.04
28.36
56.54
Preliminary Study-PLSDA
Mixomic package in R was used to come out
Total
Bankruptcy
Nonbankruptcy
Bankruptcy
17
24
Nonbankruptcy
520
3456
3976
Total
527
3473
4000
Accuracy= 0.86575
Specificity= 0.869215
Sensitivity= 0.291667
AUC= 0.58
R package coding
##set trainning and testing sample -ignore data year
train<-subset(mydata, Year < 2009, select= -Year)
test<-subset(mydata, Year >= 2009, select=-Year)
X<- train[,1:23]
Y<- train$Status
X.test<- test [,1:23]
Y.test<- test $Status
plsda.train <- plsda(X, Y, ncomp = 2)
test.predict<- predict(plsda.train, X.test, method = "mahalanobis.dist")
Prediction<- levels(Y)[test.predict$class$mahalanobis.dist[, 2]]
cbind(Y = as.character(Y.test,) Prediction)
Thank you