You are on page 1of 9

NUS-RMI

FE5218 Credit Risk


Lecture 5
Dr. Keshab Shrestha
rmikms@nus.edu.sg
2014
Contents
1 Poisson Intensity Model 2
2 Articial Neural Network 3
3 Maximum Likelihood Estimation Using R 8
1
FE5218: Credit Risk
1. Poisson Intensity Model
The Poisson intensity model is based on a doubly stochastic process commonly known
as the Cox process. It is a Poisson process whose intensity is a function of other
stochastic variables known as covariates. We use the rst jump by the Poisson process
to represent the default time. Here we will discuss the intensity model based on the
model suggested by Due et al. (2007).
1
The intensity is assumed to be function of covariates as follows:
2
(x
t
; ) = e

0
+
1
x
1t
++
k
x
kt
= e

t
x
t
(1)
where
0
,
1
, . . . ,
k
are the parameters and x
1t
, . . . , x
kt
are the covariates at time
t. The covariates may include rm-specic and macroeconomic variables. Due to
the property of Poisson process, the probability of surviving a small time interval t
(from time t to t + t )is given by
1 P (x
t
; ) = e
(x
t
;)t
(2)
Thus, the probability of default in the same interval is given by
P (x
t
; ) = 1 e
(x
t
;)t

= (x
t
; ) t (3)
The survival probability over a longer time period can be viewed as surviving
many little time intervals. For example, suppose that we have data for T periods
from t = 1, . . . , T where the length of each period is t . Then, the probability of
survival for the whole sample period (assuming conditional independence) is given
by
3
e

Tt
0
(x
s
;)ds

= e

i=1

(
x
(i1)t
;
)
t
=
T

i=1

1 P

x
(i1)t
;

(4)
1
Due, D., L. Saita and K. Wang, (2007), Multi-period corporate default prediction with
stochastic covariates, Journal of Financial Economics 83, pp. 635-665.
2
Positive
i
means higher the value of x
i
higher the probability of default.
3
See equation (3), Lecture 4 note which has the similar formula in dierent context - where the
probability of default is logistic function. Here the probability of survival and default depends on
intensity. Also note that t = 1 (the rst period) refers to time 0 to t and t
th
period is from time
(t 1)t to tt.
Dr. Keshab Shrestha 2
FE5218: Credit Risk
Similarly, the probability default in period (t+1) (from tt to (t +1)t) is given
by
P (x
tt
; )
t

i=1

1 P

x
(i1)t
;

(5)
-
0t = 0 t 2t . . . (t 1)t tt (t + 1)t
Period... 1
st
2
nd
. . . t
th
(t + 1)
th
P(x
(t1)t
; ) P(x
tt
; )
Figure 1: Time Line
So far we only referred to one rm. If we have sample of rms, then we would
identify the rms with the subscript j. For example, the probability of default for
the j
th
rm for period t
P

x
(t1)t,j
;

Then the likelihood for the whole sample and all n rms is given by
L =
n

j=1
T

i=1

x
(i1)t,j
;

y
tj

1 P

x
(i1)t,j
;

1y
tj
(6)
Strictly speaking above expression is not correct, because the expression includes
the periods after a rm has defaulted in which case the information required is not
available. Therefore, once the rm defaults in any period, the rm will no longer be
included in the above likelihood function. We can now maximize the loglikelihood
function where the product term will be converted to summation.
Due et al. also considers second exit which we did not discuss here.
2. Articial Neural Network
An articial neural network (ANN) is a mathematical modeling tool used to mimic the
way human brain is considered to process information. An ANN structure can range
Dr. Keshab Shrestha 3
FE5218: Credit Risk
from being a simple to highly complex and computationally intensive. Recently, the
use of ANN has been popular due to the increase in computing power and decreasing
computational cost of current generation of computers. Since this trend is expected
to continue, we expect to see increasing use of articial neural network.
ANN was originally designed for pattern recognition and classication. However,
ANN can also be used for prediction applications. Therefore, it is natural that at-
tempts have been made to use ANN for forecasting bankruptcy (see for example,
Odom and Sharda, (1990), Wilson and Sharda, (1994), and Lacher et al. (1995))
4
An ANN is typically composed of several layers of many computing elements
called nodes. Each node receives input signals from external inputs or other nodes
and processes the input signals through a transfer function resulting in a transformed
signal as output from the node. The output signal from the node is then used as the
input to the other nodes or nal result. ANNs are characterized by their network
architecture which consists of a number of layers with each layer consisting of a
number of nodes. Finally, the network architecture would display how the nodes are
connected to other nodes or to input node or to output node.
ANN architecture takes a large number of forms. Here we will discuss the simple
one that is used for the bankruptcy prediction purpose.
A popular form of ANN is called the multi-layer perceptron (MLP) where all
nodes and layers are arranged in feed forward manner resulting in a feed-forward
architecture. The input layer constitutes the rst or the lowest layer of MLP. The
input layer is the layer which is connected to the external information. In other
words, the MLP received the external information or input through this input layer.
The last or the highest layer is called the output layer where the ANN produces the
output visible to the outside the network. In between these two layers, input and
output layers, there may exist one or more layers known as hidden layers.
There are almost unlimited variation of network architectures representing MLP
depending on the number of hidden layers and interconnections of the nodes. Here we
will discuss one specic architecture that is used for the bankruptcy prediction pur-
pose. This is a three-layer MLP network with one hidden layer. Since the bankruptcy
classication is a two-group classication problem, three-layer architecture is consid-
ered to be sucient. The three-layer MLP architecture with one hidden layer with
single node is shown in the Figure 1 below.
4
Please see GCR for references. The discussion here is based on Zhang et al. (1999).
Dr. Keshab Shrestha 4
FE5218: Credit Risk
The lowest layer, the input layer, consists of the k dierent inputs which represents
explanatory variables or the rm characteristics in case of bankruptcy models. At the
hidden layer, the input values, or the activation values of the input nodes, is linearly
combined as follows:

0
+
1
X
1
+ +
k
X
k
(7)
In linear regression, the coecient
0
is known as intercept. In neural network
terminology, these constants are called bias parameter. The linear combination is
then transferred using a transfer function into the hidden layers activation value.
In Figure 1, the transfer function for the hidden layer is taken as logistic function.
Therefore, the activation value of the hidden layer, H
1
, is given by
H
1
=
1
1 +e
(
0
+
1
X
1
++
k
X
k
)
=

1 +e
(
0
+
1
X
1
++
k
X
k
)

1
(8)
The output of the hidden layer is then used as the input to the single node at
the output layer or another hidden layer if exists. Again, since we are dealing with
two-group classication, single node is all we need. At this node, a linear combination
of the input, B
0
+B
1
H
1
, is transferred using another activation function which is also
taken (in Figure 1) as the logistic function. Therefore, the activation value of the
output node, Y, is given by
Y =
1
1 +e
(B
0
+B
1
H
1
)
=

1 +e
(B
0
+B
1
H
1
)

1
(9)
The activation value or the output of the output layer becomes the output of
the network representing the ANN. It is important to note that due to the logistic
activation function used at the output node, the activation value would be between
0 and 1. However, we are using the network for classication purpose. Therefore, we
need convert the value of Y that lies between 0 and 1 into either 0 (non-bankrupt
group) or 1 (bankrupt group). One common way to do this by using the following
classication method:
Dr. Keshab Shrestha 5
FE5218: Credit Risk
y =

0 if Y 0.5
1 Otherwise
Or, y =

1 if Y 0.5
0 Otherwise
(10)
We just completed the description of simple three-layer MLP architecture with
the following two sets of unknown parameters:

0
,
1
, . . . ,
k
, B
0
, B
1
Now the question is how do we decide about the values these parameters will take.
This is done by a process called training the network which involves choosing the
values of these parameters so that some measure of error is minimized. One such
popular error measure is the so-called mean-squared errors (MSE) dened as
MSE =
1
N
N

i=1
(a
i
y
i
)
2
(11)
where a
i
represents the i
th
target value and y
i
represents the network output for
the i
th
training values. Finally, N represents the number of training sets of input
values, i.e., the size of the training sample.
For example, if we use the same set of ve ratios for 266 rms out of which 134
rms are in bankrupt group, we have one set of ve ratios for each rm. When this
set of ratios for a rm i is used as input to the ANN, the output of the network,
y
i
, for this input would be either 0 (representing non-bankrupt) or 1 (representing
bankrupt). Then the actual bankruptcy status of the rm would be represented by
a
i
. Therefore, the sample of 266 rms would constitute a training sample that would
be used by the network to nd the value of the parameters that would minimize the
MSE.
From the discussion above, it is clear that the training of the network is an un-
constrained nonlinear minimization problem. One of the most popular algorithm use
for training the network is the well-known backpropogation. This method a variation
of the gradient based steepest descent method. There are other methods of training
the network (see Zhang et al. (1999)).
R has a package called neuralnet that can be used to estimate the neural network
parameters.
Dr. Keshab Shrestha 6
FE5218: Credit Risk
58 GLOBAL CREDIT REVIEW
determines how excited a particular neuron is. The
transfer function is typically chosen so that a fully
excited neuron will register 1 and a partially excited
neuron will have some value between 0 and 1. The
output signal from the node is then used as the input to
other nodes or nal result. ANNs are characterized by
their network architecture which consists of a number
of layers with each layer consisting of some nodes.
Finally, the network architecture displays how nodes
are connected to one another.
ANN architecture takes a wide variety of forms.
Here we will discuss the simple one that has been used
for the purposes of bankruptcy prediction.
A popular form of ANN is the multi-layer percep-
tron (MLP) where all nodes and layers are arranged
in a feed-forward manner resulting in a feed-forward
architecture. The input layer constitutes the rst or
the lowest layer of a MLP. This is the layer for the
external information. In other words, the MLP receives
the external information or input through this input
layer. In the context of default prediction, the external
information is characterized by the attributes of rms
and/or the common risk factors. The last or the highest
layer is called the output layer where the ANN produces
its result. For default prediction, the output can be
thought of as the default probability because a fully
excited neuron has the value of 1. In between these two
layers, input and output layers, there may exist one or
more layers known as hidden layers.
There are almost unlimited variations of the network
architecture that represents a MLP. Variations come
from the number of hidden layers, from the number
of nodes in each layer, and from the ways in which
notes are connected. Here we will restrict ourselves
to a specic architecture with one hidden layer that
will be used later for our default prediction. Since the
bankruptcy classication is a two-group classication
problem, a three-layer architecture is likely to be
sufcient. The three-layer perceptron considered here
has only one node in the hidden layer and uses the
logistic function, whose value is bounded between 0
and 1, as the transfer function. The exact structure is
shown in the gure below.
The input layer consists of the k different inputs,
which represents explanatory variables or the rm
characteristics. At the hidden layer, the input values,
or the activation values of the input nodes, is linearly
combined as follows:

0
+
1
X
1
+ +
k
X
k

(13).
In linear regression, the coefcient
0
is known as
intercept. In the neural network terminology, it is known
as the bias parameter. The linear combination is then
translated by the transfer function into an activation
Output Layer
Hidden Layer
(actlvatlon functlon: Loglstlc)
Input Layer
II GUR 10
A t|.oo|.yo. po.copt.cr ..c||toctJ.o
X
l

H
l
X
l
X
l
Y
e
B B H


1
1
0 1 1


Figure 2: Three-Layer MLP Architecture
Dr. Keshab Shrestha 7
FE5218: Credit Risk
3. Maximum Likelihood Estimation Using R
# Note that you have to change the directory using "File\change directory"
# then select the
# directory folder where "altman.R" is located
ddat <- read.table("altman_new.txt",header=TRUE)
default.logit<- ddat$default # to be used by logistic model
# logistic regression lgm is part of "stats" package
glm.out = glm(default.logit ~ X1 + X2 + X3 + X4 + X5, family=binomial(logit), data=ddat)
summary(glm.out)
#____________________________________________________________________________
# use maximum likelihood to estimate
library(maxLik)
N<-nrow(ddat)
d<-as.matrix(ddat[,1:5]) # convert the data.frame to matrix
d1<-cbind(1,d) # column bind - add columns of 1s as the first
# coulmn representing intercept
loglik1 <- function(param) { # alternate but equivalent way of defining
# loglikelihooe
beta <- param
# lambda<-matrix(0,N,1)
loglik1<-0.0
lambda<-d1%*%param
Dr. Keshab Shrestha 8
FE5218: Credit Risk
for(i in 1:N){
loglik1<-loglik1-log(1+exp(-lambda[i]))-lambda[i]+
default.logit[i]*lambda[i]
}
loglik1
}
beta<-matrix(0.1,6,1)
loglik1(beta)
O.withAltmanVar<- maxLik(loglik1, start=beta)
summary(O.withAltmanVar)
# first derivative of log-likelihood for one firm "i" (y_i - lambda) times x_i
# see equation (18)
gradlik <- function(param) {
beta <- param
# lambda<-matrix(0,N,1)
BetaXt<-d1%*%param
# grad<-matrix(0,6,1)
dd<-default.logit-(1/(1+exp(-BetaXt))) # (y_i - lambda)
gradlik<-colSums(d1*matrix(rep(dd,6),nrow=266,byrow=FALSE))
gradlik
}
# Ohlson.withAltmanVar<- maxLik(loglik, start=beta)
O<- maxLik(loglik, gradlik, hess=NULL, start=beta)
summary(O)
Dr. Keshab Shrestha 9

You might also like