You are on page 1of 6

Fourth International Conference on Natural Computation

Hybrid Neural Network Based On GA-BP for Personal Credit Scoring


Shulin Wang, Shuang Yin, Minghui Jiang School of Management, Harbin Institute of Technology, Harbin 150001, China wsl@hit.edu.cn Abstract
Aiming at the insufficiencies of BP neural network, this paper established a hybrid neural network based on the combination of GA and BP algorithms. The hybrid algorithm made fully use of GAs global searching to improve the learning ability of neural network with the combination of BP. The model was used in personal credit scoring in commercial banks. Compared with single BP neural network, the training results of hybrid neural network indicate that the hybrid algorithm can improve the learning ability of neural network to achieve the training goal. The classification accuracy of hybrid neural network on testing samples is higher than that of single BP neural network.

2. Two Algorithms and Hybrid Neural Network


2.1. Principles of BP Algorithm
Since Rumelhart introduced BP algorithm in 1986, BP neural network has been widely used in practice and is regarded as the core part of forward neural networks[2]. The performance function used in BP network is MSE (Mean Square Error). Standard BP algorithm is a gradient descent algorithm, in which the network weights are moved along the negative of the gradient of the performance function. The formula used to change the weights of the network in standard BP algorithm can be written as follows:

wlji (k + 1) = wlji (k )

MSE ( w) |w( k ) (1) wlji

1. Introduction
With the rapid development of consumer credit market in China, the role of personal credit scoring becomes more and more important. It is important for commercial banks to keep away from credit risks. In the west, personal credit scoring methods have developed for long and are relatively advanced. A lot of useful methods have been applied into this area[1], including neural networks. In practice of neural networks, BP network based on back-propagation algorithm has been applied widely into many fields, such as function approximation, pattern recognition. However, there exist some insufficiencies of BP algorithm, such as plunging into local minimum, convergence slow around the goal and so on. Aiming at the insufficiency of BP algorithm, we establish a hybrid neural network through the combination of genetic algorithm (GA) and BP algorithms and use this model in personal credit scoring, and in the end, we make an analysis on the applied results of the hybrid network with single BP network.

where wlji is the weight connecting with neuron

i in

layer of l 1 and neuron j in layer l , and is a positive number named learning rate which is used to control the learning steps and it is often given as a very small positive number. BP algorithm is an effective method in practice; however, it has some insufficiencies including[3]: (1) BP algorithm uses gradient descent to minimize MSE which usually makes it plunge into local minimum;(2) the performance of the algorithm is very sensitive to the proper setting of the learning rate. If the learning rate is set too high, the algorithm may oscillate and becomes unstable. If the learning rate is too small, the algorithm will take a long time to converge; (3) the initial weights and biases of network, which are important to the convergence process of BP network, are generated randomly, that sometimes even causes the network not to achieve the training goal. And the structure of BP network is often constructed by the experience of the decision-maker for there are no uniform rules to lean on. In view of the insufficiencies, some methods are introduced to improve the performance of BP

978-0-7695-3304-9/08 $25.00 2008 IEEE DOI 10.1109/ICNC.2008.681

209

algorithm including gradient descent with momentum and gradient descent with an adaptive learning rate. The first method is introduced to solve the problem of plunging into local minimum of BP, that is:

regions of the search space. Mutation is used to search for further problem space and to avoid local convergence of GA.

where is named momentum factor, and momentum allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Acting like a low-pass filter, momentum allows the network to ignore small features in the error surface. Without momentum a network may get stuck in a shallow local minimum. With momentum a network can slide through such a minimum. We can use a larger learning rate to hasten the convergence. The second method is based on the following consideration. If the value of MSE falls down by the weights, it means the chosen learning rate is small, and we should choose a larger . Otherwise, we should choose a smaller . These two improved BP algorithms are effective in solving the problems of plunging into local minimum and determining the learning rate. However, the problem of initial weights and biases generated randomly and the slow convergence process around the training goal are not solved effectively.

MSE ( w) w( k + 1) = w( k ) + w( k ) (2) w

2.3. Hybrid Neural Network Based on GA-BP


Aiming at the insufficiencies of BP algorithm, we establish a hybrid neural network based on the combination of GA and BP algorithms by using GA to search the initial weights and biases of the network and hasten the convergence when BP algorithm makes the convergence slow around the training goal. The basic consideration of the establishment and the procedural algorithm of the model are as follows: Step 1 use GA to search a string of best chromosome decoded as the initial weights and biases of the network. Step 2 use BP algorithm to train the network with the initial weights and biases. Step 3 judge whether the performance gets a distinct decline or not. If yes, go to Step 5, if not, go to Step 4. Step 4 encode the weights and biases of network as the initial population of GA and use it to search for the best chromosome against the weights and biases trained by BP algorithm then turn to Step 2. Step 5 judge whether the termination condition is satisfied or not. If yes, terminate the training process, or turn to Step 2.

2.2. Genetic Algorithm


GA was brought into practice in 1970s by professor Holland. It is a kind of stochastic global search and optimization method that mimics the metaphor of natural biological evolution[4]. GA operates on a population of potential solutions applying the principle of survival of the fittest to produce successively better approximations to a solution. At each generation of a GA, a new set of approximations is created by the process of selecting individuals according to their level of fitness in the problem domain and reproducing them using operators borrowed from natural genetics. This process leads to the evolution of populations of individuals that are better suited to their environment than the individuals from which they were created, just as in natural adaptation. There are three genetic operators in GA, which are selection, crossover, and mutation. The selection operator allows strings with higher fitness to appear with higher probability in the next generation. Crossover is performed between two selected individuals, called parents, by exchanging parts of their strings, starting from a randomly chosen crossover point. This operator tends to enable to the evolutionary process to move toward promising

3. Sample Data and Pretreatment


3.1. Variables and Samples
In this paper, we use the consumer credit data from one commercial bank in Shenzhen. There are ten input variables figured with x i and one output variable y , listed in Table1. We reselect the data to adjust the proportion of the data with the output y = 0 and y = 1 . As the collectivity has many samples, which are complex and different, we use layer spot check here. First, we divide the collectivity with the value y and extract about 500 samples respectively, thus to make the proportion of y = 0 and y = 1 to be nearly 1:1. Finally, we get 978 pieces of samples for the model, with 492 training samples and 486 testing samples. In training samples set, there are 224 default ones and 268 non-default ones. In testing samples set, there are 232 default ones and 254 non-default ones.

3.2. Data Pretreatment


In order to hasten the convergence process and reduce the influence to the models classification

210

accuracy of disequilibrium data, we make a pretreatment of the sample data to reduce the variables into the range of [0, 1]. Here we divide the variables into discrete and sequential groups. Table 1. Inputs and Outputs
variable index Education Level Monthly Income value elementary1,middling2, advanced3 Actual value National department1, scientific education cultural and healthy department2, trade and business3, post and communication4, financial and insurance5, social service6, supply of water, electricity and gas7, industry 8, real estate9, other10 manager1, techinque2, officer3, jobless4,other5 yes1, no2 Actual value Actual value Actual value 1,corpus2 pledge1, impawn2, other3 Actual value yes0, no1

4. Model Establishment and Application


4.1. Flowchart of Personal Credit Scoring
According to the procedural algorithm of the model as well as the practice of personal credit scoring, the operation flow of the hybrid neural network designed in this paper is shown in Figure 1.

x1 x2

x3

Organization Character

x4 x5 x6 x7 x8 x9 x10 y

Career Spouse Loan Amount Time limit Return mode Surety Age Default or not

On variables in discrete group which contains x1 , x3 , x4 , x5 , x7 , x8 , x9 , we use Formula (3), that is: X X min (3) Y = X max X min where Y [0, 1] represents the outcome of data processing; X min and X max represent the minimum and maximum value of the variable X respectively. On variables in sequential group which contains x 2 , x 6 , x10 , we observe that they obey normal distributions, that is xi ~ N ( , 2 ) . So we adopt Formula (4), which can be written as follows: X (4) Y = ( ) where Figure 1. Flowchart of hybrid neural network for personal credit scoring

( x) =

cumulative probability of normal distribution.

1 2 e dt 2

t2

4.2. Structure of Neural Network


There are two aspects need to be considered in neural networks structure, the number of hidden layers and the number of neurons in hidden layers. Scholars have proved that networks with one hidden layer can be trained to approximate any function (with a finite number of discontinuities) arbitrarily well[5]. On the

represents

the

211

number of neurons selected in each layer, there are no rules to lean on just according to the problems and some empirical rules. Here we choose a forward threelayer network with one hidden layer, and on the numbers of neurons in hidden layer we refer to the empirical formula (5)[6]: (5) Lk P(O + 3) + 1 where P and O represent the numbers of nodes in input layer and output layer; L k is the maximum number of hidden neurons. In this paper, we get 10 input variables and 1 output variable, so we choose 10 nodes in input layer and 1 node in output layer as well as 7 nodes in hidden layer.

network respectively; M is a big positive number used to make the fitness more distinctive and here we set it as 100. Genetic operators include selection, crossover and mutation. In this paper, we employ a roulette wheel mechanism to probabilistically select individuals based on their fitness levels. This method is called proportional model, that is:

pi = fi

fi
i =1

(7)

where f i is the fitness level of individual i and

p i is

4.3. Parameters of GA
According to the networks structure, the weights connecting with input layer and hidden layer form a matrix W1 with the size of 10 by 7 containing 70 elements and the biases in hidden layer is a matrix B1 with the size of 7 by 1 containing 7 elements; the weights connecting with hidden layer and output layer form a matrix W2 with the size of 7 by 1 containing 7 elements and the bias in output layer is a matrix B2 containing 1 element. So the number of the parameters needed to be optimized by GA is 85. Considering the number of parameters and the precision, we choose real-valued encodings. The locations of each parameter in chromosome string are listed in Table 2. GA operates on a number of potential solutions, called a population, consisting of some encoding of the parameter set simultaneously. Typically, a population is composed of between 20 and 100 individuals. Here we set the number of population as 100. Table 2. Locations of Each Parameter in Chromosome
W1 w11 w10,7 W2 w1 w7 B1 b1 b7 B2 b

the expected selection probability of individual i . On crossover operator, we choose arithmetic crossover. That is: t t t X A+1 = X B + (1 ) X A (8) t +1 t t X B = X A + (1 ) X B where is a parameter which can be selected as a constant or a variable determined by the evolution generations. Here, we choose 0.9 as the value of . On mutation, we choose non-uniform mutation method, which can be described as follows: when making a mutation operation from chromosome ' X = x1 x2 xk xl to X ' = x1 x2 xk xl , if the range of gene
k k x k named mutation point is [U min ,U max ],

then the new gene x k is determined by the following formula, that is, k x + (t,Umax vk ) if random0,1) = 0 (9) ( ' xk = k k ( xk (t, vk Umin) if random0,1) = 1 k k where (t , y ) ( y represents U max vk or vk U min ) represents a random number which obeys the nonuniform distribution and the probability of (t , y ) approaching to 0 increases with the evolution generation grows. Mutation is usually applied with low probability, typically in the range 0.0001 and 0.1. Here we choose 0.08 as the mutation probability. Besides, we set the evolution generation as 1000 to terminate GA.

'

GA selects individuals according to their level of fitness. Highly fit individuals, relative to the whole population, have a high probability of being selected for mating whereas less fit individuals have a correspondingly low probability of being selected. In this paper, we select the fitness function as: 1 1 (6) = f = 1 N MSE (ti ai ) 2 N i =1 where N is the number of training samples; a i and

4.4. Parameters of BP
The parameters of BP algorithm are set as follows. On training function, we choose the training function traingdx which combines the improved BP algorithms of adaptive learning rate and momentum training with momentum set as 0.9 and learning rate set as the formula (10):

t i represent the target output and the output of the

212

Mean Square Error

MSEk < MSEk 1 1.05 k k +1 = 0.7 k MSEk > 1.04MSEk 1 (10) else k The training epoch in each cyclic iteration is 100 and performance function is MSE; the training goal is set as 0.

MSE=6.2796e-006 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0

4.5. Model Application


We put the pretreatment data into the program based on MATLAB and set the number of cyclic iteration as 1000. The criterion of the distinct decline of performance function is set as this: compared with the result of last circulation, if the decline extent is less than 0.000001 which indicates that BP algorithm cant change MSE distinctly. At this point, we decode the weights and biases trained by BP algorithm in the last circulation as the initial population and use GA to optimize. Then we decode the best chromosome of GA as the weights and biases of neural network and train the network using BP algorithm. MSE<=0.00001 is set as the termination condition. Using GA to optimize the initial weights and biases, we get the fitness function curve shown in Figure 2. In the whole training process, the change curve of MSE is shown in Figure3. In order to make a contrast of the training and testing results of hybrid neural network based on GABP, here we use single BP algorithm to train the neural network with the same training and testing sample data. The parameters of single BP algorithm are set as follows: the structure and the training function with momentum factor and learning rate of network are set as the parameters in hybrid network; the training epochs are set as 10000; the training goal is set as 0.00001. The MSE curve of the training sample data using single BP algorithm is shown in Figure 4.
Fitness=61.455825 70

50

100

150 Iteration

200

250

300

Figure 3. The MSE curve of hybrid neural network


10
0

Performance is 0.0019152, Goal is 1e-005

10

-1

Training-Blue Goal-Black

10

-2

10

-3

10

-4

10

-5

10

-6

1000

2000

3000

4000 5000 6000 10000 Epochs

7000

8000

9000 10000

Figure 4. The MSE curve of single BP algorithm

5. Result Analysis
5.1. Comparison on Training Result
Compared with Figure 3 and Figure 4, we can see hybrid neural network gets the convergence of 6.279610-6 and achieves the training goal. However, single BP neural network doesnt achieve the training goal within the maximum training epochs. In Figure 3, we can see that when BP algorithm gets the convergence of 0.002, the program uses GA to optimize the networks weights and biases for the decline extent of MSE is less than 0.000001. With the optimization of GA, MSE gets a sharp decline and achieves the training goal in the end. However, in Figure 4, MSE gets almost no extinct change around 0.002, and the single BP network doesnt achieve the training goal in the process. This indicates that with the combination of GA and BP algorithms, the learning

60

50 Fitness

40

30

20

10

100

200

300

400 500 600 Generation

700

800

900

1000

Figure 2. The fitness curve of the initial weights and biases

213

ability of neural network has been improved. Using GA to optimize the weights and biases, problem of BP algorithms slow convergence around the training goal can be solved.

of the samples fully, but single BP neural network doesnt achieve the training goal. So the single BP neural networks classification accuracy is lower than that of hybrid network.

5.2. Comparison on Classification Results


Our final objective is using the trained model to evaluate new credit applicants to make the decision of accepting their applications or not. So we use another group of sample data to test the trained hybrid neural network and single BP network. For further understanding, we put 0.5 as the critical value, so those with an output larger than 0.5 are regarded as good applicants, while smaller ones are regarded as bad. The classification results are listed in Table 3. Table 3. Comparison of Precision of Different Models
Model BP network Actual value 0.00 1.00 Subtotal 0.00 1.00 Subtotal Forecasting 0.00 1.00 219 13 16 238 235 225 16 241 251 7 238 245 Precision % 94.40 93.70 94.03 96.98 93.70 95.27 error % 5.60 6.30 5.97 3.02 6.30 4.73

6. Conclusions
In this paper, we establish a hybrid neural network based on the combination of GA and BP algorithms and apply the model in personal credit scoring of commercial banks. The training results of the model indicate that hybrid algorithms can improve the learning ability of neural network effectively and can overcome the defects of BP algorithm when it is slow around the training goal. The classification result on testing sample data indicates that hybrid neural network gets a higher classification accuracy than single BP network does.

References
[1] Lyn C. Thomas, A Survey of Credit and Behavioral Scoring: Forecasting Financial Risk of Lending to Consumers. International Journal of Forecasting, 2000, No.16, pp. 149-172. Dong Xu, Zheng Wu, The System Analysis and Design Based on MATLAB 6.x-Neural Network (2nd edition), XIDIAN UNIVERSITY PRESS, Sian, 2002. Hao Pan, Xiaoyong Wang, Qiong Chen, et al, Application of BP neural network based on genetic algorithm , 2005, Vol 25, No. 12, pp. 2777-2779. J. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, 1975. Lippman. R. P, An introduction to computing with neural nets, IEEE ASSP Magazine, 1987, April, pp. 422,. Xiaofeng Hui, Yunquan Hu, Fei Hu, Study on The Application of BP Neural Network in Forecasting the Exchange Rate Based on GA, Quantitative & technical economics, 2002, No. 2, pp. 80-83.

Hybrid network

[2]

From Table 3, we can see on total classification accuracy, hybrid neural network has a higher result which gets 95.27% than single BP neural network. There are two type errors, type I error mistakes a client with a good credit condition as a bad one and refuses to offer the loan, however, type II is on the opposite way. From Table 3, we can see GA-BP hybrid neural network and single BP neural network have equal error rate on type I, but on type II, hybrid neural network which is only 3.02% has a lower error rate than single BP neural network which is 5.60%. So from the aspects of total classification accuracy and the error rate of type II, we can say GA-BP hybrid neural network is better than single BP neural network. The reason lies mainly on that hybrid neural network achieves the training goal and learn the characteristics

[3]

[4]

[5]

[6]

214

You might also like