You are on page 1of 12

Journal

of Econometrics

40 (1989) 3-14.

North-Holland

AN ECONOMETRIC ANALYSIS OF THE BANK CREDIT SCORING PROBLEM* William J. BOYES, Dennis L. HOFFMAN and Stuart A. LOW

Arizona State University, Tempt-, AZ 85287, USA Most credit assessment models used in practice are based on simple credit scoring functions estimated by discriminant analysis. These functions are designed to distinguish whether or not applicants belong to the population of would be defaulters. We suggest that the traditional view that emphasizes default probability is too narrow. Our model of credit assessment focuses on expected earnings. We demonstrate how maximum likelihood estimates of default probabilities can be obtained from a bivariate censored probit framework using a choice-based sample originally intended for discriminant analysis. The paper concludes with recommendations for combining these default probability estimates with other parameters of the loan earnings process to obtain a more meaningful model of credit assessment.

1. Introduction Most credit assessment models used in practice are based on simple credit scoring functions designed to distinguish applicants who would repay from those who would default [see Altman et al. (1981) for a summary of recent examples]. These functions are typically based on discriminant analysis. Models are then judged on their ability to generate indices of applicant attributes that take on values above or below a critical cut-off level depending on whether or not the applicant belongs to the population of would be defaulters. We suggest that this view of the credit assessment problem is too narrow. Ultimately the bank is interested in profit maximization - not simply a ranking essentially based on a measure of default probability. We develop a simple model of credit card lending that demonstrates how expected earnings on revolving credit loans depend both on maintained balances and probability of default. The choice-based estimator of Manski and Lerman (1977) may then be combined with the notion of partial observability to estimate default probabilities from non-random samples of consumer lending behavior originally intended for use in discriminant analysis. The paper concludes with
*This paper was presented at the Issues in Econometric Forecasting Conference at Arizona State Universitv in March 1987. We benefited from discussions with Marie Connollv. Mike Ormiston and Peter Schmidt. Support from the Center for Financial System Research at &iaona State University is gratefully acknowledged.

0304~4076/89/$3.500

1989, Elsevier Science Publishers

B.V. (North-Holland)

W.J. Boyes et al., Econometric analysis of bank scoring problems

suggestions parameters assessment

for combining these default probability of the loan earnings process to obtain model.

estimates with the other a more meaningful credit

2. Credit card lending Assume that a credit card loan is granted (or denied) and is repaid or defaulted within a single period. Each loan then yields two possible outcomes. The probability distribution of these outcomes can be described by a Bernoulli trial: x = r with repayment probability p, 1 - p,

w with default probability

where r is the earnings on a repaid loan and w denotes losses that must be written-off when a loan is defaulted. Specifically r is the product of the nominal yield on credit card loans, i, and balances maintained on repaid accounts, ballr, while w is the product of write-off rate, q and balances that accrue on defaulted accounts, baljw. If the bank knows all the parameters of the trial, it establishes a credit approval requirement for each applicant. Approved loans must have expected returns that exceed the opportunity cost of bank funds - for example, E(x) ZZE(t), where E(1) could be the earnings that would be obtained by investing the funds in government securities at interest rate i,. In this case the credit granting requirement is (1 - p)(bal (I )( i, - i,) - p(ba1) w)( q + i,) > 0. Hence, credit is granted only to those applicants with default probabilities less than (bal(r)(i, - i,)/[(ballr)(i, - i,) + (balIw)(q + i,)]. Two aspects of the above scenario are apparent. First, precise estimates of default probabilities are necessary to ensure accurate credit assessment. Historically banks have devoted significant resources to developing credit scores that are essentially proxies for default probabilities. Second, if lenders apply a uniform critical credit score to all applicants - as is customary with some banks - they are implicitly assuming that all applicants maintain the same revolving credit balances. Recent efforts in the area of behavioral scoring have attempted to identify those applicants that might maintain higher balances and yield greater returns, but these are in the earliest stages of development. Our paper illustrates how estimates of default probabilities might be obtained from data typically complied by banks. We then briefly discuss how one might build a scoring model that accounts for variable balance behavior.
Processing costs are presumed to be paid from the merchants credit transaction or annual fees assessed to cardholders. contribution on each revolving

W.J. Boyes et al., Econometric analysis of bank scoringproblems

3. Econometric

considerations

3.1. Sample stratification and sample selection

Most samples used to estimate credit assessment functions are not randomly drawn from the applicant populations. In preparation for discriminant analysis, banks often segment samples into groups that repaid in a timely fashion, defaulted (or were chronic late payers) or were denied credit. The relative sizes of these groups may bear little relation to the proportions that are observed in a random sample from the applicant population. Also, the parameters of the assessment process must be estimated from a truncated or censored sample since not all applicants receive credit and, thus, there is no way to observe the subsequent behavior of the excluded group. Whether this warrants serious consideration depends on the nature of the sample censoring. We deal with the non-random choice-based stratification issue by applying the weighted exogenous sample maximum likelihood estimator (WESML) designed by Manski and Lerman (1977). The WESML is obtained by maximizing a weighted log likelihood function with weights determined by comparing sample proportions with corresponding population frequencies. For our sample these weights were determined after discussions with bank officials. Samples used to estimate repayment probabilities are censored since only applicants that receive credit are observed to default or repay. Heckman (1979) has shown that censored samples can lead to biased estimates, if, in our example, the sample selection rule is correlated with the errors in the repayment probability equation. Thus the impact of sample censoring depends on the nature of the sample selection rule. If lenders rely strictly on quantitative credit scores, sample selection is deterministically governed by applicant attributes and the sample selection rule does not lead to biased estimates. However, most lenders maintain that credit scoring is only one aspect of the credit assessment process and that loan officers also allow subjective assessments to enter the loan granting decision. Presuming that these assessments are not simply a different deterministic function of observed attributes, they add an element of randomness to the loan granting process and ultimately the sample selection rule. If these subjective assessments are correlated with default equation disturbances, censoring may lead to biased estimates of credit assessment probabilities. In our case the structure exposed to potential sample selection bias has a qualitative dependent variable so that the standard Heckman procedure is not applicable. Technically, the loan granting model and the default model together constitute a bivariate qualitative dependent variable model that exhibits a form of partial observability first discussed by Poirier (1980) and applied by Farber (1983). Meng and Schmidt (1985) summarize this model along with several related structures.

W.J. Boyes et al., Econometric analysis of bank scoring problems

To illustrate, assume that we have empirical equations with binary dependent variables

credit

granting

and

default

if loan defaulted,
Y, = za, + E2,

Recognizing that y, is observed in this censored probit the log likelihood function for a sample of T applicants, and Schmidt (1985, eq. 6), is

model on/y if yi = 1, as specified in Meng

In L( c9, a2, P) = i
t=l

hy,21nW;+

Z@2;

+y,,(l

y,,)ln[+(Zjq)

F(Z/cf,>

Z/a,;

PII

+(I -y,,)ln[l -

+(Z:41

where F(e) and +(e) denote the bivariate standard normal c.d.f. and univariate standard normal c.d.f., respectively. Estimates of the parameters are obtained by maximizing In L. These estimates offer efficiency gains over those obtained in the separate estimation of the two equations. More importantly, the joint approach accounts for potential correlation between the two equations, p, and thereby corrects for potential sample selection bias that could be incurred in the separate estimation of the default equation. Our sample requires that we estimate this censored probit model from a choice-based sample. We found this to be a straightforward application of Manski-Lermans WESML estimator. The weighted likelihood function and asymptotic variance-covariance matrix associated with these censored probit WESML estimates are described in the appendix.

3.2. Prediction

in bank credit scoring models

In practical applications, lenders gauge expected profits using estimates of the parameters of the earnings distribution. To illustrate, suppose that all yields, write-off rates, and balances are known a priori (as in an installment loan) and default probabilities are estimated as outlined above to obtain pt for each applicant. In this case, the expected return on the t th account is

E(q) = Ep{E(xtlA~r,, w,)) = Q{(l -B,)r,-Aw,).

W.J. Boyes et al., Economeiric analysis of bank scoring problems

Noting that 3 is a non-linear function from McFadden and Reid (1975):

of random

variables,

we apply a result

covariance matrix of where ui;&, = Z;V(C?,)Z, and V(&,) is the asymptotic the censored probit default probability estimates. Since Z,G2 < 0 for all applicants in our sample, Ed(fi,) >a,, and naive measures of expected returns - based on 3, rather than E)(a,) - are biased upward. We estimate the potential significance of this bias in the empirical section. 4. Empirical results 4.1. Data The data employed in this paper were obtained from a single large financial institution that monitored a non-random sample of its credit card applicants between 1977 and 1980 as well as the performance through 1984 of those granted credit. The sample contains 4,632 credit card applicants with complete information. Of these, 3,711 (80.1%) were granted credit and 921 (19.9%) were denied credit. Of those granted credit, 1,938 (41.8% of the sample and 52.2% of those granted credit) were classified as good by the institution and 1,773 (38.3% of the sample and 47.8% of those granted credit) were classified as bad. The WESML technique was applied to a bivariate probit model to adjust the sample to the true (as evaluated by bank officials) proportion of 51% granted credit and a 5% probability of default for those granted credit. This implies population proportions of 48.4% good, 2.6% bad and 49.0% deny. All estimates reported below incorporate adjustments for sample non-randomness induced by choice-based samples and partial observability with covariante matrix as described in the appendix. Each record contains information on personal characteristics, economic and financial variables, credit recipient status and repayment performance. Full variable descriptions are contained in table 1. 4.2. Estimates The bivariate censored probit estimates and asymptotic t-statistics for the loan granting decision (column 1) and default decision (column 2) are preIn practice balances would also be replaced with estimated counterparts. However, these estimates would presumably be linear and conceivably uncorrelated with the default probability estimates.

W.J. Boyes et al., Econometric analysis of bank scoring problems Table 1 Variable definitions.

Personal AGE AGE-NA MS-MAR DEP MOA DD

Characteristics Age of the applicant Dummy Dummy Number Months Dummy Months Dummy variable variable (in years). one if missing age. one if married, spouse present.

equalling equalling

of dependents. at the current variables address. one if missing months at the current address.

MOA DD-NA MOJOB MOJOB-NA ED<12 ED12 ED13-15 ED16 + Economic OWN RENTOTH EXP/INC INC-NA EXPI-NA EXPZ- NA i PRO MGR SAL CLER SERV LABOR CRAFT OP RETTRE MILIT Financial MAJORCC OILCC FINREF DEPTCC CK REF SVREF LN REF I #INQ #INTR #UNRA TE #NSAT #SLOW #MINOR #MAJOR I \

equalling

on the current job. variable equalling one if missing months on the current job.

Dummy variables equalling one if education level is less than high school, a high school degree, some college, or a college degree (or more), respectively. Missing education is the excluded group.

Variables Dummy variables equalling one if home owner or renter/other, Missing living arrangement is the excluded group. Ratio of regular monthly expenditures to regular monthly income. expenditures or respectively.

Dummy variables other expenditures

equalling one if income, are missing, respectively.

living arrangement

Dummy variables equalling one if occupation is as a professional, manager, sales person, clerical worker, service worker, laborer, crafts person, operative, retired or military, respectively. Missing occupation is the excluded group.

) Variables Dummy variables equalling one if the applicant lists a major credit card, financial institution reference or department/speciality card, respectively. Dummy account variables equalling one if the applicant or loan reference, respectively. lists a checking credit card, oil store credit

account,

savings

From credit bureau reports. The number of inquiries, too new to rate, unrateables, satisfactory, slow, minor derogatory and major derogatory accounts, respectively.

W.J. Boyes et al., Econometric analysis of bank scoring problems Table 2 Probability of grant and probability of default ( j) equations. P(Grant) Personal Characteristics AGE AGE-NA MS-MAR DEP MOADD MOADD-NA MOJOB MOJOB- NA ED<12 ED12 EDl3-15 ED16 t Economic Variables OWN RENTOTH EXP/INC INC- NA EXPI-NA EXPZ-NA PRO MGR SAL CLER SER V LABOR CRAFT OP RETIRE MILIT Financial Variables MAJORCC OILCC FIN REF DEPTCC CK REF SVREF LN REF #INO #TNTR #UNRA TE #SA T #SLO w #MINOR #MAJOR rho t-value 0.0528 0.2611 -0.1138 0.1705 - 0.0279 0.0092 0.1759 - 0.0027 0.0205 - 0.0887 0.0078 - 0.5278 - 0.7152 - 0.8604 0.353 5.94 1.45 4.95 2.21 4.02 0.90 1.14 2.82 0.49 0.54 2.58 5.09 6.07 7.64 12.03 -0.1198 - 0.2007 0.2017 0.1952 - 0.4025 - 0.1940 0.0152 0.1812 0.0271 0.0872 - 0.0626 0.2897 0.1925 0.2075 0.3370 0.2183 - 0.5074 - 0.2043 - 0.2227 - 0.0169 0.0585 0.1871 0.0432 - 0.1229 -0.3011 -0.1106 0.0613 0.1934 - 0.1761 - 0.4997 3.89 2.97 14.40 1.67 4.38 1.13 0.87 2.46 0.48 1.70 2.47 0.92 0.48 1.95 1.23 2.52 - 0.0889 0.1060 0.0876 0.1990 0.0919 0.0372 - 0.3999 0.0368 0.1015 -0.1974 - 0.2001 - 0.0698 - 0.0497 -0.1981 - 0.5998 - 0.4008 0.0263 1.1011 0.0579 -0.1131 0.0017 0.0172 0.0030 - 0.0868 0.2883 0.2047 0.1384 0.1546 11.10 5.56 1.31 2.34 4.50 0.15 6.27 0.86 2.23 2.51 1.75 1.77 - 0.0140 - 0.3994 - 0.0634 0.0968 - 0.0012 0.1001 0.0021 - 0.0204 0.0060 - 0.1131 0.0294 - 0.2063 ItI P(Default) ItI

4.23 0.66 2.21 2.81 0.80 0.25 1.75 1.47 0.30 1.42 14.10 1.72

3.16 2.45 6.19 0.50 2.10 0.44 1.49 1.08 0.34 0.88 0.80 1.97 0.57 1.23 1.07 0.56

2.54 0.94 2.28 3.45 2.90 2.82 1.36 7.84 1.57 3.62 11.21 1.87 5.87 7.42

10

W.J. Boyes et al., Econometric analysis of bank

scoring problems

sented in table 2. As we argue in section 2, the loan granting decision is made with an eye toward expected profits and hence reflects an assessment of both default probability and balance behavior. When the parameter estimates associates with a particular variable listed in table 2 are opposite in sign the variable affects granting positively and default negatively, or conversely. If each of the two signs are statistically significant the lender has presumably used the variable in a manner that is consistent with a strategy designed to minimize default. Alternatively, when parameter estimates associated with a variable listed in table 2 carry the same sign the granting decision runs counter to a default minimization lending policy. One explanation for this behavior would be a lending policy designed to seek out accounts that might carry substantially higher balances - despite higher default risk. Variables that are significant (5% level) in both equations and carry opposite sign include age, number of dependents, post-baccalaureate education, home ownership, expenditures to income ratio, finance company reference and several credit bureau variables. The signs maintained by these variables are all consistent with what one might expect. Though some of these variables (i.e., number of major derogatories, expenditures to income ratio, etc.) might predict accounts that would maintain high balances, the lending institution that generated our sample evidently had overriding concern over the correlation between these variables and default probability. Another set of variables are significant in both equations and share the same sign. These include number of months on the current job, between 13 and 15 years of education, home rental and department store credit card reference. While it is difficult to explain some of these findings, it is possible that the bank looked favorably on department store cardholders because it expected those individuals to hold higher balances. If this interpretation is correct, we find that, with respect to this variable, concern over attracting higher balance accounts takes precedence over the higher default risk. Several other variables in the model were significant in at least one of the equations. Financial institutions typically place considerable weight on stability in their granting decisions. Yet, months at the current address was not strongly associated with repayment performance. Similarly, occupational category appears to be important in the granting decision but have little to do with default frequency. It is of course possible that occupation is associated with accounts that maintain higher balances though it is impossible to verify this from our limited data set. The estimate of rho that maximizes the bivariate probit likelihood is 0.353 - suggesting that unexplained tendencies to extend credit are actually associated with higher frequencies of default in our sample. This is again consistent with a lending policy that attempts to seek out accounts that, although they have high default probabilities, offer substantially higher ex-

W.J. Boyes et al., Econometric analysis of bank scoring problems

11

petted earnings due to high expected balances. An alternative explanation for the positive rho estimate is summarized in Boyes, Hoffman and Low (1986). This bank may have aggressively pursued minority accounts during this post-ECOA sample period in an effort to reduce the probability of class action discrimination suits. This might have led to a riskier loan portfolio and is consistent with the observed positive correlation between unexplained tendencies to grant credit and observed defaults. 4.3. A simple simulation In an effort to illustrate the potential of the model described above, we can quantify some of the parameters in the earnings process to establish a hypothetical yet realistic, credit granting criterion. Over our sample period, credit card loan rates averaged 21% per annum, our average good credit recipient maintained an account for about three years, and the opportunity cost of funds (one-year t-bill) averaged 12%. Although we do not have precise data on balances maintained by individual applicants, bank officials suggest that the average good account maintained about $500 in outstanding debt, while the average defaulted resulted in a one time loss of $1,500. We were also told that a write-off rate of 55% is in line with industry norms. Substituting these figures into the credit granting criterion established in section 2, we find that only those applicants with default probabilities below 9.0% have positive expected earnings.3 Using this level as a classification rule we find that 94% of the good accounts, 61.4% of the bad accounts and 69.2% of the deny accounts would be granted credit based on our estimated probability of default equation and simulated granting criterion. The assumptions about balance behavior and write-off amounts used in our study generate classifications that are quite different than those achieved by the bank in our sample. To explore the sensitivity of our results to alternative assumptions, we altered maintained balances so that a default probability of 5% or less was required to generate positive expected returns. In this case the ratio of correctly classified good accounts falls from 94% to 83%, the percentage of bad accounts awarded credit falls from 61.4% to 42.2% and the percentage of creditworthy denys falls from 69.2% to 51.1%. Interestingly, we continue to find that denied applicants appear to be as creditworthy as those who subsequently defaulted - suggesting that there may have been a number of profitable accounts that the bank failed to acquire over this period. This might occur if lenders refused to rely exclusively on quantitative credit scores from a discriminant function. Upon further investigation we learned
3This calculation is based on a three year loan with annual interest rates and balances given in the immediately preceding discussion. The write-off rate represents a one-time loss of 55% of the average default balance of $1500.

12

W.J. Boyes et al., Econometric analysis of bank

scoring problems

that, for an unspecified length of time, the bank in our sample simply awarded credit to all individuals with income levels that satisfied a particular critical level. Also, after calculating credit scores, branch managers based ultimate decisions on subjective assessments. As a result we find that credit was awarded to numerous applicants with characteristics very similar to many of the denys. Though our classification results differ substantially from those of the bank in our sample, it is impossible to verify that the bank could actually have profited from more liberal credit granting policies due to the speculative nature of the balance assumptions used in this paper. The effect of accounting for estimation error in the estimates of default probabilities can be measured by replacing 8 with Ea( b) and again applying the classification rule. Using the initial 9.0% cutoff, we now find that 92%, 57.6% and 65.4% of goods, bads and denys, respectively, would be granted credit using Ea( fi) and our simulated granting criterion. In each case, as expected, less credit is granted when accounting for estimation error. Specifically, our simulation predicts that 134 more individuals (2.9% of the sample) would be denied credit after accounting for estimation error in estimates of the default probability. While these estimates are only suggestive, they do indicate that estimation error may play an important role in credit granting decisions. 5. Summary and conclusions

Most applied credit scoring models are designed to minimize misclassifications between good and bad loan accounts. Since the overall motive of a lender is profit maximization, this narrow view of the problem may be misleading. The goal of credit assessment should be to provide accurate estimates of each applicants probability of default and the pay-offs that will be realized in the event of default or repayment. From estimates of these parameters, loan officers can define a loan granting criterion that maximizes expected earnings. A crucial parameter in the probability distribution of earnings is an individual applicants probability of default. We apply the Manski-Lerman WESML technique to estimate default probabilities from a bivariate censored probit model. Also we recognize, following McFadden and Reid (1975), that failure to account for the non-linearity of default probability estimates leads to biased estimates of expected earnings. Results are based on a sample compiled by a large commercial bank. Classification rules are illustrated by combining the default probability estimates with hypothetical balance behavior. Data limitations placed binding constraints on the analysis conducted in this paper. With improved data on individual applicant balances we could build a complete behavioral credit scoring model. This model would contain estimates of maintained balances along with estimates of default probability. Second, with precise balance data we could begin to concentrate on the second

W.J. Boyes et al., Econometric analysis of bank scoring problems

13

moment of the earnings distribution. Certainly, variance in the earnings of a prospective loan is an important consideration in deciding how much credit a bank will allocate to a specific portion of its portfolio. An accurate measure of the variance of a prospective credit card loan will require calculation of the variance of the earnings process with known parameters plus the additional uncertainty attributable to error in estimation. Finally, the complete model credit assessment would account for time-to-default by applying split population survival time models of borrower behavior. Improved estimates of time-to-default will undoubtedly increase our ability to measure expected earnings.
Appendix A

Asymptotic variance-covariance matrix for WESML estimates in a censored probit modeI obtained from a choice-based sample, The weighted log likelihood

function for the censored probit model is

+w2yf1(l -yf2)ln[~G%) - W31~ -Ta2;41

where WI=

QJf4,

with Q, = population proportion of y1 = 1 and y, = 1, HI = sample proportion of yl=l andy,=l,

with Q, = population proportion tion of yi = 1 and y, = 0,


~3 =

of y, = 1 and y2 = 0, H2 = sample propor-

Q3/4,

with Q3 = population proportion of y, = 0, H3 = sample proportion of yi = 0. The numerical values for these weights are described in the empirical section. Estimates are obtained by maximizing this likelihood function with respect to 8 = ((Y;,(II;; p). Following Mans&Lerman, the asympotic vari-

14

W.J. Boyes et al., Econometric analysis of bank

scoring problems 6 is

ante-covariance

matrix

for the WESML

estimates

V{
where

6) = D-'AL'-',

and

A=E{(G)($)}.

References
Altman, E.I., R.B. Avery, R.A. Eisenbeis and J.F. Sinkey, 1981, Application of classification techniques in business, banking and finance (JAI Press, Greenwich, CD. Boyes, W.J., D.L. Hoffman and S.A. Low, 1986, Lender reactions to information restrictions: The case of banks and the ECOA, Journal of Money, Credit and Banking 18, 211-219. Farber, H.S., 1983, Worker preference for union representation, Research in Labor Economics 2, 171-205. Heckman, J.J., 1979, Sample selection bias as a specification error, Econometrica 47, 153-162. Manski, C.F. and S.R. Lerman, 1977, The estimation of choice probabilities from choice-based samples, Econometrica 45, 1977-1988. McFadden, D. and F. Reid, 1975, Aggregate travel demand forecasting from disaggregated behavior models, Record no. 534 (Transportation Research Board, Washington, DC). Meng, CL. and P. Schmidt, 1985, On the cost of partial observability in the bivariate probit model, International Economic Review 26, 71-85. Poirier, D.J., 1980, Partial observability in bivariate probit models, Journal of Econometrics 12, 210-217.

You might also like