188 views

Uploaded by jbsimha3629

Predictive analytics for credit scoring

- Credit Scoring
- Credit Scoring
- Logistic Regression in Credit Scoring
- The Fourth Industrial Revolution... Embrace the Internet of Things
- 1183203231096 Credit Scoring Systems Handbook
- A Credit Scoring Model for Vn's Retail Banking Market
- Credit Scoring and Loan Default
- Credit Scoring and Data Mining
- Credit Scoring
- Credit Scoring Using Machine Learning
- Credit Scoring Tool
- Improving Credit Risk Scorecards
- SAS PRG Self Study Essentials 1
- Nude & Photo. Magazine for Sensual Photography - 1.2003
- Curso Web y Data Mining 3 Predictive Modeling Using Logistic Regresion
- Sex at 14 Multiple Orgasms per Hour (entire book)
- Real Sex for Real Women
- Citibank Interest Rates Workbook
- Management by Objectives (MBO)
- Advanced Credit Risk Modeling for Basel II Using SAS - Course Notes (2008)

You are on page 1of 5

Jay B.Simha

Abiba Systems, Bangalore, India

jay.b.simha@abibasystems.com

ABSTRACT

Credit risk modeling is a well researched area from both

statistical and AI communities. Several models cited in research

use model built using whole data set. In this study, a hybrid

redictive model framework based on fu!!y clustering and

statistical"machine learning classifiers is roosed for credit risk

analysis. #his hybrid aroach enables building rules"functions

for different grous of borrowers searately. In the first stage,

customers are segmented into clusters, that are characteri!ed by

similar features and then, in the second ste, for each grou,

classifiers are built to obtain scoring rules"function that may

rovide risk level for each customer. $ultile classifiers are

evaluated on each segment and the best classifier for each

segment will selected for final scoring. #he main advantage of

alying the integration of two techni%ues consists of building

models that, may better redict risk connected with granting

credits for each client, than while using each method searately.

#he results are comared with the results of classifier on the

whole data set, according to classification erformance and the

business objective. #he results indicate that the hyothesis that a

hybrid model based framework indeed rovides better results

than a global model.

Key wor!" &ybrid models, 'u!!y C(means, Classifiers, Credit

)isk

#. INTRODUCTION

*ne of the key decisions financial institutions have to make is to

decide whether or not to grant a loan to a customer. #his

decision basically boils down to a binary classification roblem

which aims at distinguishing good ayers from bad ayers.

+umerous methods have been roosed in the literature to

develo credit scoring models. #hese models include traditional

statistical methods ,e.g. logistic regression -./0, nonarametric

statistical models ,e.g. k(nearest neighbor -1/, and classification

trees -2/0, clustering -3/, fu!!y logic -4/ and neural network

models -5,6/. $ost of these studies rimarily focus at

develoing classification models with high redictive accuracy.

&owever all these aroaches build a global model. It can be

argued that otential savings from redicting risks from certain

segments can overweigh overall classification accuracy on all

the segments.

7akr!ewska -58/ develoed a model based on clustering and

decision trees. Since the concet used one classifier ,decision

tree0 for scoring, it may not be alicable across the different

data sets. In addition a soft clustering method like 'u!!y

clustering is suerior to hard k(means clustering as it rovides

better cluster %uality. &ence in this research these two concets,

i.e use of soft clustering to identify the segments and use of best

classifier for each of the segment has been investigated, with a

hyothesis that, the resulting classifier system will rovide a

better control over scoring.

In this aer we resent a framework using fu!!y clustering and

different classifiers for building credit scoring models using

local atterns.

$. SYSTE% ARC&ITECTURE

#he roosed system, which is e9ected to suort evaluation of

credit risks, by building classifiers, is comosed of three main

modules.

'igure 5. System Architecture

#he first module is a segmentation module where the data set is

slit into clusters with homogeneous behavior. :e are using

fu!!y C(means algorithm for clustering as discussed in the

revious section. #he second module is the classifier learning

module, which will build a model for each of the classifier on

the each of the cluster obtained by the revious module. In the

third module, the best classifier for each of the segment will be

selected based on the configured criteria. In this research we

have selected two criteria for evaluation, namely ; classification

accuracy and true ositive rate.

'. FU((Y C)%EANS CLUSTERING

'u!!y C(means Clustering,'C$0, is a clustering techni%ue

which is different from hard k(means that emloys hard

artitioning. #he 'C$ emloys fu!!y artitioning such

that a data oint can belong to all grous with different

membershi grades between 8 and 5.

'C$ is an iterative algorithm. #he aim of 'C$ is to find cluster

centers ,centroids0 that minimi!e a dissimilarity function. A

brief summary of the considerations and major stes is given

below.

#he algorithm first osits a given number <c= of clusters and an

initial membershi value ,from !ero to one0 for each oint ,a

customer=s attribute vector0 in each of the <c= clusters. #he

seudoartition cluster membershi values for each oint are

chosen as adding to one, with the membershi values not all

e%ual at first. #he algorithm then successively adjusts the

membershi values of each oint in each of the various clusters,

based on the oint=s distance from the cluster=s center, comared

to the distances from the other cluster centers. #he algorithm

then uses the new membershi degrees to iteratively move the

cluster center oints toward mutually better locations. #he

>uclidean distance based ?center@ of each cluster will be

calculated from all the customers= attribute vectors weighted by

their membershi degrees in the cluster. #he weighting will also

be recomuted based on the membershi values. #he algorithm

stos when the seudo artition membershis collectively sto

changing by a determined amount on successive iterations. #he

mathematical treatment of the algorithm can be found in -A/. #he

algorithm used in the research is given in fig ..

.

'ig .. 'u!!y clustering algorithm

*. CLASSIFIERS

A classifier is a statistical"machine learning function which mas

the indeendent attributes to deendent attribute with some

confidence. #here are different tyes of classifiers -B/. In this

work, five classifiers namely ; naCve Bayes, logistic regression,

decision trees, logistic regression, artificial neural networks and

suort vector machines are used. A brief overview of these

techni%ues is given belowD

*.# Na+,e Baye! -.a!!i/ier

#he robability model for a classifier is a conditional model

over a deendent class variable C with a small number of

outcomes or classes, conditional on several feature variables F5

through Fn. #his conditional model can be e9tended using

Bayes= theorem as

&owever the above e%uation assumes interdeendence. :hen

this model is rela9ed with the assumtion of indeendence, the

conditional distribution over the class variable C can be

e9ressed like asD

where Z is a scaling factor deendent only on '5,'.,..,'n i.e., a

constant if the values of the feature variables are known.

'ig 3. +aCve Bayesian classifier

$odels of this form are much more manageable, since they

factor into a so(called class rior ,C0 and indeendent

robability distributions p(Fi|C). #his is the naCve Bayes=

classifier, which has shown surrising erformance over real life

data sets.

*.$ Lo0i!1i- Re0e!!io2

Eogistic regression is the widely used classifier in the credit risk

modeling. Eogistic regression can redict the robability ,F0

than an e9amle G belongs to one of two redefined classes.

Suose e9amle G H ,95, 9., 93,III. 9n,0, as in linear regression,

logistic regression gives each 9i a coefficient wj which measures

the contribution of each 9i to variations in F. 'irst, a logistic

transformation of F is defined as

where F can only range from 8 to 5, while logit,F0 ranges from

(J to J. Eogit,F0 is then matched by a linear function of the

feature variables

*.' De-i!io2 1ree!

Kecision tree learning is a common method used in data mining.

#he goal is to create a model that redicts the value of a target

variable based on several inut variables. >ach interior node

corresonds to one of the inut variables. #here are edges to

children for each of the ossible values of that inut variable.

>ach leaf reresents a value of the target variable given the

values of the inut variables reresented by the ath from the

root to the leaf.

A tree can be LlearnedL by slitting the source set into subsets

based on an attribute value test. Slitting can be based on

different criteria. #wo of the most widely used measures are

information gain and Mini inde9.

Information gainD

Mini inde9D

'ig A. Kecision tree classifier

#his rocess is reeated on each derived subset in a recursive

manner called recursive artitioning. #he recursion is comleted

when the subset at a node all has the same value of the target

variable, or when slitting no longer adds value to the

redictions.

*.* Ar1i/i-ia. 2e3ra. Ne1wor4!

An Artificial +eural +etwork ,A++0 is an information

rocessing aradigm that is insired by the way biological

nervous systems, such as the brain, rocess information. #he key

element of this aradigm is the novel structure of the

information rocessing system. It is comosed of a large number

of highly interconnected rocessing elements ,neurons0 working

in unison to solve secific roblems. #he learning in neural

networks is accomlished by adjusting the connection weights

iteratively, till convergence.

'ig 1. Artificial +eural +etworks

>ach of the feed forward connections are comuted using the

activation functionD

#yically feedback of the delta comutations

are used to minimi!e the errors during learning. +eural networks

are used in credit risk ne9t only to logistic regression.

*.5 S366or1 Ve-1or %a-hi2e! 7SV%8

A Suort Nector $achine is a suervised learner for

classification. An SN$ will view inut data as two sets of

vectors in an n(dimensional sace and construct a searating

hyerlane in that sace, one which ma9imi!es the margin

between the two data sets.

'ig B. Suort vector machines

In order to calculate the margin, two arallel hyerlanes are

constructed, one on each side of the searating hyerlane,

which are Lushed u againstL the two data sets. Intuitively, a

good searation is achieved by the hyerlane that has the

largest distance to the neighboring data oints of both classes,

since in general the larger the margin the lower the

generali!ation error of the classifier. In formal terms an SN$

can be written as ,in its dual form0D

$a9imi!e ,in Oi 0

subject to ,for any 0

and

It has been found that suort vector machines work well with

credit risk modeling.

9. E:PERI%ENTAL RESULTS AND DISCUSSIONS

>9eriments were done on a real life credit risk data set

collected for an Indian bank. #he e9eriments consist of

valuating and comaring the %uality of results obtained by

best classifier for each segment against similar classifier

develoed using whole data. In the whole data set mode

of learning the classifier, a ten(fold cross validation is

adoted to test the model. Since the segment si!es are

small, leave(one(out aroach for validation of the

classification models is adoted.

#able 5. shows the classification accuracy of different

classification algorithms. It can be seen that all the

algorithms erform well the validation set. *ne of them

,decision tree0 have in built feature selection, another

,logistic regression0 is used with forward selection. *ther

two classifiers were built using full data set and all the

attributes. Since a similar aroach is used in learning the

classifier on segmented data, further runing was not

carried out on the algorithm.

#able .. shows the true ositive rates with different

classification algorithms. It can be observed that all the

classifiers erform similarly when all the data is used for

modeling. #his indicates that the classification boundaries

learned by each of the classifier are otimal for the given

data. Any further data transformation and classifier

learning arameters may imrove the classification

accuracy. &owever our intention was to comare the

erformance of classifiers on segments with same

arameter settings. It can be seen that none of the

classifier is suerior in all the segments on all of the

erformance measures. #his has motivated us to develo

our aroach to select the best classifier for each segment.

It is clear from the tables that the best classifier for each

segment rovides a suerior erformance.

#able 5. Classification accuracy

#able .. #rue ositive rates

;. CONCLUSION

In the aer a framework for connecting unsuervised

,fu!!y clustering0 and suervised ,classification

algorithms0 techni%ues for credit risk evaluation is

investigated. #he resented techni%ue allows for building

different classifiers for different grous of customers,

which rovide the best results for that segment. In the

roosed aroach, each credit alicant is assigned to

the most similar grou of clients from the training data set

and credit risk is evaluated by alying the classifier

roer for this grou.

)esults obtained on the real credit risk data sets showed

higher recisions and simlicity of models obtained for

each cluster than for model develoed with the whole data

set.

'uture research will focus on further investigations on

using Self *rgani!ing $as and >9ectation

$a9imi!ation clustering for segmentation with multile

classification techni%ues for suervised learning and

additional erformance measures like area under )*C

curve.

REFERENCES

-5/ B. Baesens, ). Setieno, Ch. $ues, P. Nanthienen. Qsing

+eural +etwork )ule >9traction and Kecision #ables for Credit(

)isk >valuation. $anagement Science, A6,30, .883, 35.(3.6.

-./ $. Bensic, +. Sarlija, $. 7ekic(Susac. $odelling Small(

Business Credit Scoring by Qsing Eogistic )egression. +eural

+etworks and Kecision #rees. Intelligent Systems in

Accounting, 'inance and $anage(ment, 53, .881, 533(518.

-3/ M. Chi, P. &ao, Ch. Giu, 7. 7hu. Cluster Analysis for :eight

of Credit )isk >valuation Inde9. Systems >ngineering(#heory

$ethodology, Alications, 58,50, .885, BA(B4.

-A/ Kunn P.C., 5643, LA 'u!!y )elative of the IS*KA#A

Frocess and Its Qse in Ketecting Comact :ell(Searated

ClustersL, Pournal of Cybernetics 3D 3.(14

-1/ :.>. &enley, K.>. &and. Construction of a k(nearest

neighbor credit(scoring system. I$A Pournal of $ana(gement

$athematics, 2, 5664, 381(3.5.

-B/ Ian H. Witten and Eibe Frank (2005)

"Data Mining: Practical machine

learning tl! and techni"#e!"$ 2nd

Editin$ Mrgan %a#&mann$ 'an

Franci!c$ 2005.

-4/ R.(7. Euo, S.(E. Fang, S.(S. Siu. 'u!!y Cluster in Credit

Scoring. Froceedings of the Second Interna(tional Conference on

$achine Eearning and Cyber(netics, Gi=an, .(1 +ovember .883,

.435(.43B.

-2/ Satchidananda S.S., Pay B.Simha, Comaring decision trees

with logistic regression for credit risk analysis, SAS AFAQMC

.88B, $umbai

-6/ K. :est. +eural network credit scoring models. Comuters

T *erations )esearch, .4, .888, 5535(551.

-58/ 7akr!ewska K, *n integrating unsuervised and suervised

classification for credit risk evaluation, Information technology

and Control, .884, Nol.3B, +o.5A

- Credit ScoringUploaded bycellmania
- Credit ScoringUploaded byRaunak Motwani
- Logistic Regression in Credit ScoringUploaded bysondure
- The Fourth Industrial Revolution... Embrace the Internet of ThingsUploaded byteamrework
- 1183203231096 Credit Scoring Systems HandbookUploaded byNegovan Codin
- A Credit Scoring Model for Vn's Retail Banking MarketUploaded byHtt Pham
- Credit Scoring and Loan DefaultUploaded byAdminAli
- Credit Scoring and Data MiningUploaded bySiau Shuang
- Credit ScoringUploaded bymaria9876
- Credit Scoring Using Machine LearningUploaded byTrungVo369
- Credit Scoring ToolUploaded byAna Bilinska Kamdzijash
- Improving Credit Risk ScorecardsUploaded byamanullahjamil
- SAS PRG Self Study Essentials 1Uploaded byJordieee
- Nude & Photo. Magazine for Sensual Photography - 1.2003Uploaded bypepitoria5
- Curso Web y Data Mining 3 Predictive Modeling Using Logistic RegresionUploaded bySunil
- Sex at 14 Multiple Orgasms per Hour (entire book)Uploaded byJoeAverage69
- Real Sex for Real WomenUploaded byApsara Dawah
- Citibank Interest Rates WorkbookUploaded by00aa
- Management by Objectives (MBO)Uploaded byFazil Bin Mohammed
- Advanced Credit Risk Modeling for Basel II Using SAS - Course Notes (2008)Uploaded bysandeep
- Financial Results for Sept 30, 2014 (Standalone) [Result]Uploaded byShyam Sunder
- Management by Objective (Mbo)Uploaded byMj Payal
- The Importance of Management by Objective Mbo on Performance of EmployeeUploaded bymayurkapse
- Prepayment ModelUploaded byudo3
- 237 Reasons Why Women Have SexUploaded byJed Diamond
- Citi - Auto ABS PrimerUploaded byjmlauner
- MANAGEMENT BY OBJECTIVE (MBO)Uploaded byFaireh Ahmed

- Smart Grid CommUploaded byjbsimha3629
- 1809.10756.pdfUploaded byjbsimha3629
- Tour to FranceUploaded byjbsimha3629
- RM Good AdviceUploaded byjbsimha3629
- Module 1_ Enterprise Analytics Lesson Plan1.1.docxUploaded byjbsimha3629
- Churn Prediction Using Logistic RegressionUploaded byjbsimha3629
- StockMarketPredictionUsingTwitterSentimentAnalysis.pdfUploaded byjbsimha3629
- Six sigmaUploaded byjbsimha3629
- Six sigmaUploaded byjbsimha3629
- HR Analytics SkillsUploaded byjbsimha3629
- Linear ModelsUploaded byjbsimha3629
- IJCAI3.pdfUploaded byjbsimha3629
- Advances in MLUploaded byjbsimha3629
- CAR Data MartUploaded byjbsimha3629
- Original Housing PaperUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- 4.Hypothesis TestingUploaded byjbsimha3629
- Chakraborty Aaai12Uploaded byjbsimha3629
- 3.DistributionsUploaded byjbsimha3629
- Big Data SyllabusUploaded byjbsimha3629
- Combining Load Forecasts From Independent ExpertsUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- Enrollments Forecasting Based On Aggregated K-Means Clustering and Fuzzy Time SeriesUploaded byIRJET Journal
- 2.Variable AnalysisUploaded byjbsimha3629
- stopwords.txtUploaded byjbsimha3629
- Data Types and OperationsUploaded byjbsimha3629
- 01_paperUploaded byjbsimha3629
- AI BrochureUploaded byjbsimha3629
- TimeseriesUploaded byjbsimha3629

- Comparison of RBF and MLP Neural Networks in Short-Term Traffic Flow ForecastingUploaded byبورنان محمد
- Ai Alife HowtoUploaded by4U6ogj8b9snylkslkn3n
- cUploaded byPriyaprasad Panda
- 01_introduction.pdfUploaded byNitesh Shinde
- 4.Morpheo (1704.05017)Uploaded byJuracy Bertoldo
- Neural Networks and Machine LearningUploaded byAditya Agarwal
- ch1Uploaded byamt12202
- Thesis Title List and Other Thesis ResourcesUploaded bykimberl0o
- Dmc 1628 Data Warehousing and Data MiningUploaded byNagappan Govindarajan
- ANFIS_eg2Uploaded byc_mc2
- DISEASE PREDICTION USING MACHINE LEARNING OVER BIG DATAUploaded bycaijjournal2
- DeepLearning with labeled and Unlabeled data application to Imaging Mass Spectrometry DataUploaded byporesh
- Smart Planning in Solid Waste Management for a Sustainable Smart CityUploaded byIRJET Journal
- Rapid Learning in RoboticsUploaded byMatthew Padilla
- ResumeUploaded byJay Baxter
- Introduction to Adaptive Boosting - Cheng LiuUploaded byrashed44
- Literature SurveyUploaded byArunPandiyan
- MathWorks News & NotesUploaded byfrancisco_gil_51
- Idea.group.publishing.intelligent.techniques.for.PlanningUploaded byjdaemon27
- Hierarchical Cluster AnalysisUploaded byAndi Alimuddin Rauf
- Week6_FaceDetection.pptUploaded bypradanjain
- k Means ExampleUploaded byAmanpreet Kaur
- 2012 - Tecuci - Artificial Intelligence.pdfUploaded byVladimir Guimarães Farias Sodré
- k meanUploaded byShivram Dwivedi
- 16 MethodsUploaded byOktavian Tegar Pambudi
- Mining of Massive DatasetsUploaded byXentinel Romero Perez
- Data MiningUploaded bypeyman_qz_co
- AIUploaded byShingz96
- zhen gou resume newUploaded byapi-237873327
- Saurav Gupta Resume 2016Uploaded bySaurav Gupta