Professional Documents
Culture Documents
1. Introduction
Recent issues of trade publications in the credit and banking area have published a
number of articles heralding the role of artificial intelligence (AI) techniques in helping
bankers make loans, develop markets, assess creditworthiness, and detect fraud. For
example, HNC Inc., considered a leader in neural-network technology, offers (among
other things) products for detection of credit-card fraud (Falcon), automated
mortgage underwriting (Colleague), and automated property valuation. Clients for
HNC's Falcon software include AT&T Universal Card, Household Credit Services,
Colonial National Bank, First USA Bank, First Data Resources, First Chicago Corp.,
Wells Fargo & C o , and Visa International (American Banker 1993c,d, 1994a,b).
According to Allen Jost (1993: p. 32), the director of Decision Systems for HNC Inc.,
'Traditional techniques cannot match the fine resolution across the entire range of
account profiles that a neural network produces. Fine resolution is essential when
only one in ten thousand transactions are frauds'. Other software companies
marketing AI products in this area include Cybertek-Cogensys and Nestor Inc.
323
© Oxford Univenity Pros 1997
324 V. S. DESAI ETAL.
data-driven approach, i.e. a prespecification of the model is not required. For example,
an MLP 'learns' the relationships inherent in the data presented to it, and a GA
provides a nonlinear classification function using a search procedure borrowed from
natural phenomena These approaches seem particularly attractive in solving the
problem at hand, because, as Allen Jost (1993: p. 30) says, Traditional statistical
model development includes time-consuming manual data review activities such as
searching for non-linear relationships and detecting interactions among predictor
variables'.
Desai et al. (1995) use the same data sets with a binary classification scheme to
compare neural-network models with linear discriminant analysis and logistic
regression, and report that, in terms of correctly classifying good and bad loans,
the neural-network models outperform linear discriminant analysis, but are only
marginally better than logistic regression. However, in terms of correctly classifying
bad loans, the neural-network models outperform both conventional techniques. The
J, /k(x)dx. (1)
326 V. S. DESAI ET AL.
Let p t be the prior probability, in the population, that a case is a member of group
k, and c u be the cost of misclassifying a member of group k into group h. Then the
expected loss is
f
/*(x)dx. (2)
*-oj;o j*
The aim is to choose an allocation rule which minimizes L. Suppose that the cost
of misclassification c u is always equal to 1. Then it can be shown (Choi 1986) that
the solution is for each Jh to satisfy
Suppose that we have only two groups. Let the set of x values in group 0 and 1 each
be multivariate normally distributed with means fi0 and /i t respectively, and
Using these definitions, it can be shown that (e.g. Lachenbruch 1975; Boyle et al.
1992) it would be optimal to classify x into J7O if
bTx > c. (4)
This is a version of the linear discriminant function derived by Fisher (1936), but he
did so using a different argument. This paper, in which there are more than two
groups, uses a second method which is based on Fisher's original approach. Fisher
argued that the greatest difference between the groups occurs when the ratio
of the between-groups to within-groups sums of squares is largest. This ratio can be
shown to be
X = ^D^ (5)
weights for the linear combination which gives the largest value of k, say X2- Similar
interpretations apply to other eigenvector-eigenvalue pairs.
Each such linear combination of the p characteristics is called a canonical
discriminant function. We may present these as a single matrix-vector equation
z = W*x, (7)
where W is an p x m matrix of weights, each column being a separate eigenvector, wt
(there being m such eigenvectors), z is an m-vector of variables, and x is a p-vector
of characteristics.
In this paper, each case was classified into the group where the posterior probability
that it was a member of that group, given its value of x, was largest This posterior
probability was calculated using Baye's rule as
continuous variables, the linear discriminant function may not be optimal. In this
case, special procedures for binary variables are available (Dillon & Goldstein
1984). However, in the case of binary variables, most evidence suggests that the linear
discriminant function performs reasonably well (Gilbert 1968; Moore 1973; Krzanowki
1977). Furthermore the literature suggests that, while quadratic discriminant analysis
is appropriate when the assumption of normality holds, but that of equal covariances
does not, the results of a classificatory quadratic rule are more sensitive to
violations of normality than the results of a linear rule. A linear rule seems to work
satisfactorily unless the violation of equal covariance matrices is drastic (Stevens
1992; Boyle 1992).
where fik is a column vector of coefficients for group k and xt is a column vector of
values for each variable for case i. To make the parameter estimates identifiable, it
is usual to normalize the coefficients for one group. Thus it is assumed that (i0 = 0.
The conditional probabilities then become
The main requirements to be satisfied by the activation functions F(-) and G(-) are
that they be nonlinear and differentiable. Typical functions used in the hidden layer
are the sigmoid, hyperbolic tangent, and the sine functions, i.e.
. (14)
The weights in the neural network can be adjusted to minimize the relative entropy
criterion, given as
E^-to.lny,. (15)
330 V. S. DESAI ETAL.
eliminating the excess neurons have been explored, one that seems to be readily
applicable in our case is given below. The hypothesis that '... the simplest most
robust network which accounts for a data set will, on average, lead to the best
generalization to the population from which the training set has been drawn' was
made by Rumelhart (reported in Hanson & Pratt 1988). One of the simplest
implementations of this hypothesis is to check the network at periodic intervals and
eliminate nodes in the hidden layers, up to a certain maximum number, if the
elimination does not lead to a significant deterioration in performance. We have
implemented this hypothesis in our current paper, the details of which are available
from the authors.
problems which might be associated with using modified objectives. The mathematical
justification is given below.
X IY V I V
11 IA12 x
11 I r
12
X IY Y I V
21 I A 2 2 X
21 I r 22>
where X xl = 1111100, X12 = 10001, X21 = 0101011, X22 = 10111, Y n = 100,Y12 =
111101101, Y21 = 010, Y22 = 010010111. The new strings are:
of three credit unions in the Southeastern United States for the period 1988-91. Credit
union L is predominantly made up of teachers, and credit union N is predominantly
made up of telephone-company employees, whereas credit union M represents a more
diverse state-wide sample. The narrowness of membership is somewhat mitigated by
the inclusion of family members in all three credit unions. Only credit union M had
added select employee groups to diversify its membership.
Predictor variables commonly used in credit-scoring studies include various debt
ratios and other cash-flow-oriented surrogates, employment time, home ownership,
major credit-card ownership, and representations of past payment history (e.g.
Overstreet et a\. 1992). Additional variables that can be added to the model include
detailed credit-bureau reports (e.g. Overstreet & Bradley 1994). In selecting predictor
variables, care must be taken to comply with regulation B of the terms of the Equal
Credit Opportunity Act, so as to avoid non-intuitive variables. Based on all the
considerations mentioned above, eighteen variables were selected for the present
TABLE 1
List of predictor variables
TABLE 2
Descriptive statistics for credit union L
The number of good, poor, and bad loans equal 287,125, and 93 respectively. Thefirst,second,
and third numbers in each cell refer to the good, poor, and bad loans respectively
26.12% bad; and 695 observations for credit union N with 56.77% good, 22.05%
poor, and 21.18% bad.
5.1.1 Descriptive data analysis. Tables 2, 3, and 4 give the descriptive statistics for
the three credit unions. Furthermore, the descriptive statistics are broken down by
loan type. It is interesting to note that the median values for a number of variables
are identical for two out of three loan types, or in some instances for all loan types.
For example, the median number of dependents (Depends) are equal for all three
loan types for all three credit unions, and the number of delinquent accounts in the
past 12 months (Delnq) are equal for all three loan types for credit unions M and
N, and are equal for poor and bad loans for credit union L. It is also interesting to
note that the median values are identical for good and poor loans for some variables
in some credit unions, and identical for poor and bad loans for other credit unions,
e.g. the number of inquiries in past 7 months (Numinq), number of open accounts
on credit-bureau reports (Acctopen), and the number of active accounts on credit-
bureau reports (Actaccts).
5.1.2 Training and holdout data. There exist several approaches for validing statistical
models (e.g. Dillon & Goldstein 1984; Hair et al. 1992). The simplest approach,
338 V. S. DESAI ETAL.
TABLE 3
Descriptive statistics for credit union M
The number of good, poor, and bad loans equal 390,173, and 199 respectively. Thefirst,second,
and third numbers in each cell refer to the good, poor, and bad loans respectively
referred to as the cross-validation method, involves dividing the data into two subsets,
one for training (analysis sample) and a second one for testing (holdout sample).
More sophisticated approaches include the U method and the jackknife method.
Both these methods are based on the 'leave-one-out' principle, where the statistical
model is fitted to repeatedly drawn samples of the original sample. Dillon & Goldstein
(1984: p. 393) suggest that, in the case of discriminant analysis, a large standard
deviation in the estimator for misclassification probabilities can overwhelm the bias
reduction achieved by the U method, and, if multivariate normality is violated, it is
questionable whether jackknifed coefficients actually represent an improvement in
general. Also, these methods can be computationally expensive. An intermediate
approach, and perhaps the most frequently used approach, is to divide the original
sample randomly into analysis and holdout samples several times. Given the
substantial number of data and the fact that we investigated six models, we decided
to use an intermediate approach. The data sample was divided into two parts, with
two thirds of the observations being used for training and the remaining one third
for testing. Observations were randomly assigned to the training or testing data set,
and ten such pairs of data sets were created. A popular approach is to use stratified
sampling in order to keep the proportion of good loans and bad loans identical
across all data sets. Since the percentage of bad loans is different for the three credit
NEURAL NETWORKS AND GENETIC ALGORITHMS FOR CREDIT SCORING 339
TABLE 4
Descriptive statistics for credit union N
The number of good, poor, and bad loans equal 394,153, and 148 respectively. Thefirst,second,
and third numbers in each cell refer to the good, poor, and bad loans respectively
Variable Mean Median Tmnc. mean Min Max
Majcard 0.800, 0.575, 1, 1,0 0.833, 0.584, 0,0,0 1, 1, 1
0.449 0.444
Ownbuy 0.812, 0.693, 1, 1, 1 0.846, 0.715, 0,0,0 1, 1, 1
0.544 0.549
Income 2850, 2441, 2603,2203,1950 2771, 2346, 2157 584, 728, 600 7499, 8500,
2410 22000
Goodcr 0.848, 0.706, 1, 1, 1 0.887, 0.730, 0.707 0,0,0 1, 1, 1
0.687
Jobtime 13.48, 8.44, 7.42 13, 6, 5 13.14, 7.52, 6.85 0,0,0 40, 40, 33
Depends 12, U 1.2 1, 1, 1 1.1, 1.2, 1.1 0,0,0 6,4,4
unions, and since claims by practitioners imply that the performance of neural
networks in comparison to the conventional methods would depend upon the
proportion of bad loans in the data set, we decided not to use stratified sampling,
and we let the percentage of bad loans vary across the ten data sets so that our
results would not depend upon the particular composition of the data sample at
hand. As Section 6 indicates, when the results were compared, we accounted for this
variation by performing paired t tests.
ranged from 0 to 10 out of the 18 neurons in the hidden layer for the MLP
models.
For the MLP models, the initial values of the learning parameter r\ were set at
0.3 for the hidden layer and 0.15 for the output layer, and the momentum parameter
6 was set at 0.4 for all layers. These parameters were allowed to decay by
reducing their values by half after 10,000 iterations, and again by half after 30,000
iterations.
In all, the six methods tested were as follows:
Ida linear discriminant analysis
lr logistic regression
ga genetic algorithm
mlp multilayer perceptron
mlp-m combination of multilayer perceptrons and a majority rule
6. Comparison of results
Table 5 gives the results for the traditional techniques, Table 6 does the same for
neural networks and genetic algorithms, and Table 7 for combinations of neural
networks. For each method, the first column gives the total (i.e. the good plus the
poor plus the bad) percentage correctly classified, the second column gives the
percentage of poor loans correctly classified, and the third column gives the
percentage of bad loans correctly classified. Since the cost of giving a loan to a
defaulter is far greater than rejecting a good or poor loan, the percentage of bad
loans correctly identified is important. Also, poor loans and good loans differ in their
profitability. While reading the results of Tables 5-7, one must keep in mind that,
in the experiments reported in the present study, we did not explicitly include
misclassification costs; this is because one of the methods, namely logistic regression,
does not allow that feature. Also note that, given the information in §5.1, the
percentage of good loans correctly classified can be easily obtained from the data
given in these tables.
TABLE 5
Results for linear discriminant analysis and logistic regression
Credit union L
1 24.40 18.45 66.07 56.10 41.94 68.45 41.46 54.84
2 29.76 17.28 66.07 41.46 48.28 66.67 40.00 51.72
3 23.81 20.24 67.26 47.50 50.00 67.26 40.00 55.88
4 21.43 22.62 69.05 44.44 44.74 69.05 36.11 50.00
5 29.76 18.45 66.07 55.00 48.39 65.48 52.50 45.16
6 25.60 19.64 67.86 57.58 54.55 67.26 39.39 60.61
7 22.02 17.86 67.86 54.05 40.00 70.84 45.95 50.00
p value* 0.003
• The p values are for a one-tailed paired t test comparing the lr results with the other five methods.
variables used, seven have median values that are identical for all three loan types,
four have median values that are identical for poor and bad loans, and two have
median values that are identical for good and poor loans. Similar behaviour persists
under more detailed comparisons.
In comparing the three loan types, one sees that the poor loans were most difficult
to classify, followed by bad loans. For example, for credit union L, logistic regression
342 V. S. DESAI ET AL.
TABLE 6
Results for multilayer perceptrons and genetic algorithms
Credit union L
1 24.40 18.45 66.07 48.78 51.61 63.10 26.83 74.19
2 29.76 17.28 70.83 60.00 62.07 64.29 35.00 65.52
3 23.81 20.24 70.24 77.50 38.24 61.90 30.00 64.71
4 21.43 22.62 69.43 80.56 15.79 70.24 41.67 71.05
5 29.76 18.45 65.48 28.00 58.06 67.86 32.00 70.97
6 25.60 19.64 60.71 27.91 24.24 64.29 25.58 63.64
7 22.02 17.86 66.07 45.95 43.33 66.67 37.84 60.00
* The p values are for a one-tailed paired t test comparing the lr results with the other five methods.
misclassified, on average, 57.16% of poor loans as good loans or bad loans, and
30.76% of bad loans as poor loans or good loans, whereas only 14.49% of the good
loans were misclassified as poor or bad loans. These results are consistent with the
descriptive statistics reported in Tables 2-4 which show the median values of the
explanatory variables for the poor loans to be often coinciding with those for good
NEURAL NETWORKS AND GENETIC ALGORITHMS FOR CREDIT SCORING 343
TABLE 7
Results for combinations of multilayer perceptrons
Credit union L
1 24.40 18.45 60.71 29.27 24.39 65.48 41.46 41.94
2 29.76 17.28 67.86 60.00 55.17 67.86 50.00 55.17
3 23.81 20.24 66.67 4150 44.12 67.86 40.00 50.00
4 21.43 22.62 66.07 55.56 13.16 69.64 61.11 23.68
5 29.76 18.45 64.88 40.00 48.39 64.29 38.00 51.61
6 25.60 19 64 61.90 48.83 21.21 62.50 46.51 27.27
7 22.02 17.86 66.07 51.35 43.33 64.88 43.24 43.33
• The p values are for a one-tailed paired t test comparing the lr results with the other five methods.
or bad loans. These results are also consistent with previous studies (e.g. Overstreet
& Bradley 1995).
4.2 Comparing modelling techniques
As Tables 5-7 indicate, logistic regression identifies 67.30% of the loans correctly,
which is higher than all the other models. This superiority is confirmed by a more
344 V. S. DESAI ETAL.
formal comparison using a paired t test. Since the data sets differ in that the
proportion of poor and bad loans were different for the ten data sets, we accounted
for this difference by using the paired t test. As the p values indicate, logistic regression
is clearly better than linear discriminant analysis and genetic algorithms, and the
difference between logistic regression and multilayer perceptrons is significant at the
0.05 significance level, but not at the 0.01 significance level. The fact that logistic
regression is better than discriminant analysis is consistent with results reported
elsewhere (e.g. Harrel & Lee 1985), and is perhaps due to the presence of categorical
variables, which violates the assumption of multivariate normality required for linear
discriminant analysis.
The situation is quite different when it comes to identifying poor and bad loans
correctly. The combination of neural networks with the best-neuron rule (model
mlp-b) has the best performance when it comes to correctly identifying poor loans,
and the genetic-algorithm models outperformed the rest when it came to correctly
Acknowledgements
Financial support from the Mclntire Associates Program is gratefully acknowledged
by the first author. This paper was presented at the fourth Credit Scoring and Credit
NEURAL NETWORKS AND GENETIC ALGORITHMS FOR CREDIT SCORING 345
Control Conference held at the University of Edinburgh in 1995, and the authors
gratefully appreciate the comments of participants at this conference.
REFERENCES
American Banker, 1993a (29 March), p. 15A; 1993b (25 June), p. 3; 1993c (14 July), p. 3; 1993d
(27 August), p. 14; 1993e (5 October), p. 14; 1994a (2 March), p. 15; 1994b (22 April), p. 17.
BOYLE, M., CROOK, J. N., HAMILTON, R., & THOMAS, L. C , 1992. Methods for credit scoring
applied to slow payers. Credit scoring and credit control (L. C. Thomas, J. N. Crook, &
D. E. Edelman, Eds). Oxford University Press, Oxford. Pp. 75-90.
BRENNAN, P. J., 1993a. Promise of artificial intelligence remains elusive in banking today. Bank
Management (July), pp. 49-53.
BRENNAN, P. J., 1993b. Profitability scoring comes of age. Bank Management (September),
pp. 58-62.
BRYSON, A. E., & Ho, Y. C , 1969. Applied optimal control. Hemisphere Publishing, New York.
CONWAY, D. G., VENKATARAMANAN, M. A., & CABOT, A. V., (forthcoming) A genetic
JOST, A., 1993. Neural networks: a logical progression in credit and marketing decision
systems. Credit World (March/April), pp. 26-33.
KNOKE, J. D., 1982. Discriminant analysis with discrete and continuous variables. Biometrics
38, 191-200.
KOEHLER, G. J., 1991. Linear discriminant function determined by genetic search. ORSA
Journal of Computing 3, 345-57.
KRZANOWKI, W. J., 1977. The performance of Fisher's linear discriminant function under
non-optimal conditions. Technometrics 19, 191-200.
LACHENBRUCH, P. A., 1975. Discriminant analysis. Hafner, New York.
LAPEDES, A., & FARBER, R., 1987. Non-linear signal processing using neural networks:
prediction and system modeling. Los Alamos National Laboratory report LA-UR-87-
2662.
LEUNBERGER, D. G., 1969. Optimization by vector space methods. Wiley, New York.
MANGASARIAN, O., 1965. Linear and nonlinear separation of patterns by linear programming.
Operations Research 13, 444-52.
MOORE, D. M., 1973. Evaluation offivediscrimination procedures for binary variables. Journal