You are on page 1of 9

Predictive Modeling on Probability of Default – Logit Model Using

Microsoft Excel and VBA


Kevin Lan from QuantMinds.NET

2009.5
Table of Contents

Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA...1
Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA...3
Abstract................................................................................................................................................3
1. Introduction......................................................................................................................................3
2. General Discussion on Key Factors in the Analysis of Obligor’s Risk Assessment.......................3
3. Data..................................................................................................................................................4
4. The Model........................................................................................................................................5
5. Visual Basic Application (VBA) for Estimation on Logistic Regression.......................................6
6. Results..............................................................................................................................................6
Appendix:.............................................................................................................................................6
Reference:............................................................................................................................................9
Predictive Modeling on Probability of Default – Logit Model Using Microsoft Excel and VBA

Kevin Lan

Abstract

Logistic regression is often used to estimate the probability of default in commercial loans against
different types of borrowers (obligor’s PD model). Microsoft Excel is probably the most popular used
software in daily banking business work. However, the logistic regression model function is not
captured by Excel. In this case study, I developed a Visual Basic Application (VBA) add-in in Excel
macro environment to regress the linear logistic function with algorithms that capture the maximum
likelihood procedure to predict probabilities of default in commercial lending. Using the dataset with
five financial ratios that capture the widely known Z-score model developed by Altman (1968), the
general drivers in terms of credit risk as well as the model estimation procedure was discussed. The
developed VBA application can be expanded into retail banking (i.e., credit card, auto loan & home
mortgages) and SME lending practices in small or medium commercial banking environment.

Key words: Excel, VBA, Logistic Regression, Risk Management

1. Introduction
Credit risk refers to a financial or credit institution’s risk of a borrower’s payment default on payment of
interest and principal due to the borrower’s unwillingness or inability to service the debt. The higher the
credit risk an institution is exposed to, the greater the losses may be. For banks and most other credit
institutions, credit risk is considered to be the form of risk that can most significantly diminish earnings
and financial strength.

Logistic regression is often recommended to predict probability of default. In statistics, logistic


regression is used for prediction of the probability of occurrence of an event by fitting data to a logistic
curve. Logistic regression is appropriate in cases where the dependent variable is binary, which fits the
loan default data very well (obligor’s PD model). Microsoft Excel is probably the most popular software
used in daily business statistical work. However, the logistic regression model function is not captured
by Excel. In this case study, I developed a Visual Basic Application (VBA) in Excel Macro environment
to estimate the logistic model to predict probabilities of default in commercial banking. Using the
dataset with five financial ratios that capture the widely known Z-score model developed by Altman
(1968), the general drivers in terms of credit risk as well as the model estimation procedure was
discussed. The developed VBA application can be expanded into retail banking (i.e., credit card, auto
loan & home mortgages) and SME lending practices in small or medium commercial banking
environment.

2. General Discussion on Key Factors in the Analysis of Obligor’s Risk Assessment


The credit risk models are divided into two primary categories – obligor’s PD and facility rating model.
The latter is more likely in presence when loss given default (LGD) is estimated. In another words, the
LGD model (= 1 - recovery rate) is often used in terms of specific loan product while the PD model is
focused on the quality of the borrowers. As to the facility rating models, logistic model is not a natural
choice because the dependent variable is recovery rate rather than a 0-1 dummy variable. The
complicated data source (e.g., different collaterals, uncertainty) also limits the comparability to use
logistic regression in LGD modeling.
There are types of models that used in history to predict the probability of default. Although the function
forms might be different from each other, the underlying data estimated by these models are quite
similar. According to Moody’s KMV model, the key building blocks of a predictive model in PD
estimation consist of five parts:

1. Historical Ratio Assessment


2. Balance Sheet Factors
3. Industry / Market
4. Company
5. Management

Essentially, a pre-warning mechanism of a risk rating system is supposed to be focused on the financial
strength of a firm as well as the business cycle of the specific industry trends. First corporate earnings
must be reasonable relative to payment obligations. If this is not the case, liquidity will be weakened.
Without satisfactory earnings, it will also be difficult for an enterprise to raise other types of capital,
such as loan capital and new equity. A shortage of liquidity is often the factor that triggers bankruptcy.
One or more variables that can explain the level of and changes in the enterprise’s liquidity should
therefore be included in a credit risk model. An enterprise’s ability to withstand losses is often assessed
on the basis of its financial strength measured by its equity ratio. With a high equity ratio, the enterprise
is better equipped to cope with difficult periods, partly because it will be easier to raise capital through
the sale of assets without encumbrances and also obtain new loans because better collateral can be
offered. Generally, a high equity ratio also implies lower current expenses for interest and principal.

3. Data

Following the previous guideline, one can incorporate a lot of financial ratios into the linear logistic
function. Among the dozens of financial ratios available, we've chosen 30 measurements that are the
most relevant to the investing process and organized them into six main categories as per the following
list1.

Not every financial ratio is appropriate for the modeling estimation due to the data access limitation. The
accurate prediction is impossible in reality. This paper aims to develop an Excel VB application rather
than a good model. Therefore, we only select five variables into the model for default prediction:
Working Capital (WC), Retained Earnings (RE), Earnings before interest and taxes (EBIT) and Sales(S),
each divided by Total Assets (TA); and Market Value of Equity (ME) divided by Total Liabilities (TL).
Except for the market value, all of these items are found in the balance sheet and income statement of
the company. The market value is given by the number of shares outstanding multiplied by the stock
price. The five ratios are those from the widely known Z-score developed by Altman (1968). WC/TA
captures the short-term liquidity of a firm, RE/TA and EBIT/TA measure historic and current
profitability, respectively. S/TA further proxies for the competitive situation of the company and ME/TL
are a market-based measure of leverage. Of course, one could consider other variables as well; to
mention only a few, these could be: cash flows over debt service, sales or total assets (as a proxy for
size), earnings volatility, and stock price volatility. Also, there are often several ways of capturing one
underlying factor. Current profits, for instance, can be measured using EBIT, EBITDA (=EBIT plus
depreciation and amortization) or net income.

1
The list is quoted from www.investopedia.com.
Table 1 List of Financial Ratios
1) Liquidity Measurement Ratios 4) Operating Performance Ratios

 - Current Ratio  - Fixed-Asset Turnover


 - Quick Ratio  - Sales/Revenue Per Employee
 - Cash Ratio  - Operating Cycle
 - Cash Conversion Cycle
5) Cash Flow Indicator Ratios
2) Profitability Indicator Ratios
 - Operating Cash Flow/Sales Ratio
 - Profit Margin Analysis  - Free Cash Flow/Operating Cash Ratio
 - Effective Tax Rate  - Cash Flow Coverage Ratio
 - Return On Assets  - Dividend Payout Ratio
 - Return On Equity
6) Investment Valuation Ratios
 - Return On Capital Employed

 - Per Share Data


3) Debt Ratios
 - Price/Book Value Ratio
 - Overview Of Debt  - Price/Cash Flow Ratio
 - Debt Ratio  - Price/Earnings Ratio
 - Debt-Equity Ratio  - Price/Earnings To Growth Ratio
 - Capitalization Ratio  - Price/Sales Ratio
 - Interest Coverage Ratio  - Dividend Yield
 - Cash Flow To Debt Ratio

4. The Model
A score summarizes the information contained in factors that affect default probability. Standard scoring
models take the most straightforward approach by linearly combining those factors. Let x denote the
factors (their number is N) and b the weights (or coefficients) attached to them; we can represent the
score that we get in scoring instance i as:

(1)

The scoring model should predict a high default probability for those observations that defaulted and a
low default probability for those that did not. In order to choose the appropriate weights b, we first need
to link scores to default probabilities. This can be done by representing default probabilities as a
function F of scores:

(2)

Like default probabilities, the function F should be constrained to the interval from 0 to 1; it should also
yield a default probability for each possible score. The requirements can be fulfilled by a cumulative
probability distribution function. A distribution often considered for this purpose is the logistic
distribution. The logistic distribution function z is defined as:
(3)

Applied to (3) we get:

(4)

5. Visual Basic Application (VBA) for Estimation on Logistic Regression


The VBA code (Appendix) applies maximum likelihood procedure to estimate the logit model. The
maximum likelihood method is a way of inferring parameter values from sample data. Parameters are
chosen such that they maximize the probability (=likelihood) of drawing the sample that was actually
observed.

6. Results
Table 1 Logit Model Results using VBA
Model 1
CONST WC/TA RE/TA EBIT/TA ME/TL S/TA
b -2.543 0.414 -1.454 -7.999 -1.594 0.620
SE(b) 0.266 0.572 0.229 2.702 0.323 0.349
t -9.56 0.72 -6.34 -2.96 -4.93 1.77
p-value 0.000 0.469 0.000 0.003 0.000 0.076
Pseudo R² / # iter 0.222 12
LR-test / p-value 160.1 0.000
lnL / lnL0 -280.5 -360.6

Appendix:
Option Explicit

Function logit(y As Range, xraw As Range, Optional constant, Optional stats)

If IsMissing(constant) Then constant = 1


If IsMissing(stats) Then stats = 0
'Count variables
Dim i As Integer, j As Integer, jj As Integer

'Read data dimensions


Dim K As Integer, N As Integer
N = y.Rows.Count
K = xraw.Columns.Count + constant

'Some error checking


If xraw.Rows.Count <> N Then MsgBox "error"

'Adding a vector of ones to the x matrix if constant=1, name xraw=x from now on
Dim x() As Double
ReDim x(1 To N, 1 To K)
For i = 1 To N
x(i, 1) = 1
For j = 1 + constant To K
x(i, j) = xraw(i, j - constant)
Next j
Next i

'Initializing the coefficient vector (b) and the score (bx)


Dim b() As Double, bx() As Double, ybar As Double
ReDim b(1 To K)
ReDim bx(1 To N)

ybar = Application.WorksheetFunction.Average(y)
If constant = 1 Then b(1) = Log(ybar / (1 - ybar))
For i = 1 To N
bx(i) = b(1)
Next i

'Defining the variables used in the Newton procedure


Dim sens As Double, maxiter As Integer, iter As Integer, change As Double
Dim lambda() As Double, lnL() As Double, dlnL() As Double, hesse() As Double, hinv(), hinvg()
ReDim lambda(1 To N)

sens = 1 * 10 ^ (-11): maxiter = 50


ReDim lnL(1 To maxiter)
change = sens + 1: iter = 1: lnL(1) = 0

'Loop for Newton iteration


Do While Abs(change) > sens And iter < maxiter
iter = iter + 1

'reset derivative of log likelihood and Hessian


Erase dlnL, hesse
ReDim dlnL(1 To K): ReDim hesse(1 To K, 1 To K)

'Compute prediction Lambda, gradient dlnl, Hessian hesse, and log likelihood lnl
For i = 1 To N
lambda(i) = 1 / (1 + Exp(-bx(i)))
For j = 1 To K
dlnL(j) = dlnL(j) + (y(i) - lambda(i)) * x(i, j)
For jj = 1 To K
hesse(jj, j) = hesse(jj, j) - lambda(i) * (1 - lambda(i)) * x(i, jj) * x(i, j)
Next jj
Next j
lnL(iter) = lnL(iter) + y(i) * Log(1 / (1 + Exp(-bx(i)))) + (1 - y(i)) * Log(1 - 1 / (1 + Exp(-bx(i))))
Next i

'Compute inverse Hessian (=hinv) and multiply hinv with gradient dlnl
hinv = Application.WorksheetFunction.MInverse(hesse)
hinvg = Application.WorksheetFunction.MMult(dlnL, hinv)

change = lnL(iter) - lnL(iter - 1)

'If convergence achieved, exit now and keep the b corresponding with the estimated hessian
If Abs(change) <= sens Then Exit Do

' Apply Newton's scheme for updating coefficients b


For j = 1 To K
b(j) = b(j) - hinvg(j)
Next j

'Compute new score (bx)


For i = 1 To N
bx(i) = 0
For j = 1 To K
bx(i) = bx(i) + b(j) * x(i, j)
Next j
Next i

Loop

'some error handling


If iter > maxiter Then
MsgBox "Maximum Number of Iteration exceeded. No convergence achieved. Exiting. Sorry."
GoTo myend
End If

'output
Dim relogit()
ReDim relogit(1 To 1, 1 To K)
If stats = 1 Then ReDim relogit(1 To 7, 1 To K)

'Coefficients
For j = 1 To K
relogit(1, j) = b(j)
Next j

'Additional statistics if requested


If stats = 1 Then
For j = 1 To K
relogit(2, j) = Sqr(-hinv(j, j))
relogit(3, j) = relogit(1, j) / relogit(2, j)
relogit(4, j) = (1 - Application.WorksheetFunction.NormSDist(Abs(relogit(3, j)))) * 2

relogit(5, j) = "#N/A"
relogit(6, j) = "#N/A"
relogit(7, j) = "#N/A"

Next j

'ln Likelihood of model with just a constant(lnL0)


Dim lnL0 As Double
lnL0 = N * (ybar * Log(ybar) + (1 - ybar) * Log(1 - ybar))

relogit(5, 1) = 1 - lnL(iter) / lnL0 'McFadden R2


relogit(5, 2) = iter - 1 'Number of iterations
relogit(6, 1) = 2 * (lnL(iter) - lnL0) 'LR test
relogit(6, 2) = Application.WorksheetFunction.ChiDist(relogit(6, 1), K - 1) 'p-value for LR
relogit(7, 1) = lnL(iter)
relogit(7, 2) = lnL0

End If
logit = relogit

GoTo myend

'Error Handler
error:
MsgBox ("Fatal Error. Reasons might be: y not {0,1}, not the same number of N for y and x's...or
anything else")
myend:
End Function

Reference:
1. Loeffler, G. and Posch, P. N. 2007 “Credit Risk Modeling Using Excel and VBA with DVD (The
Wiley Finance Series)”. John Wiley & Sons.

You might also like