Professional Documents
Culture Documents
1,2
Vivien Brunel
1
Léonard de Vinci Pôle Universitaire, Finance Lab, France
2
Société Générale
February 8, 2018
Abstract
In machine learning applications, and in credit risk modeling in particular, model performance
is usually measured by using CAP and ROC curves. The purpose of this paper is to use the
statistics of the CAP curve to provide a new method for credit PD curves calibration that is not
based on arbitrary choices as the ones used in the industry. We map CAP curves to a ball-box
problem and we use statistical physics techniques to compute the statistics of the CAP curve
from which we derive the shape of PD curves. This approach leads to a new type of shape for
PD curves that have not been considered in the literature yet, namely the Fermi-Dirac function
which is a two parameter function depending on the target default rate of the portfolio and the
target accuracy ratio of the scoring model. We show that this type of PD curve shape is likely
to outperform the logistic PD curve that practitioners often use. We suggest that practitioners
should stop using logistic PD curves and should adopt the Fermi-Dirac function to improve the
accuracy of their credit risk measurement.
Keywords: Scoring, machine learning, credit risk, PD calibration, logistic function, Fermi-
Dirac.
The random variable Xi represents a ball in box number i: if Xi equals 1, there is a ball in box i; if
Xi = 0, there is no ball in box i. We refer to Hand (1997) and Tasche (2010) for a description of the
theory of discriminatory power. The Cumulative Accuracy Profile (CAP) curve plots the proportion
of defaults that we capture by observing the first i lowest ranked obligors. The accuracy ratio is a
global performance measure of the scoring model and is a simple function of the area under the CAP
curve:
2 · area − 1
AR = (2)
1−p
We illustrate the formula of the area under the CAP curve in Fig. 1 in the case N = 4 and B = 10.
In Fig. 1, the horizontal axis represents the rank of obligor number i expressed as a percentage of
the total portfolio, namely x = i/B. The vertical axisP is the CAP function expressed as a percentage
of the total number of defaults, namely CAP (x) = j≤Bx Xj /N . From Fig. 1, we see that the area
under the CAP curve is the sum of areas of the horitonzal rectangles, each rectangle having an area
equal to Xi (1 − i/B)/N . We get the total area under the CAP curve:
B B
1 X 1 1 X 1
area = Xi B − i + =1− i− Xi (3)
N B i=1 2 N B i=1 2
We have added a 1/2 term in Eq.(3) in order to have an area exactly equal to 1/2 when the
(Xi )i=1···B are distributed uniformly on the unit segment. When both N and B go to infinity, the
value of this correction is equal to 0, but it is necessary as long as the number of obligors is finite.
We suppose now that the accuracy ratio is given and is equal to AR; this is another constraint
on the (Xi )i=1···B sequence. We see from Eq.(3) that the number of configurations of the CAP curve
corresponding
P to a given accuracy ratio is equal to the number of ways we can decompose the integer
E = i i Xi as a sum of N distinct integers each being smaller than B. The relationship between E
4
having the sequence (Xi )i=1···B and the sequence (Yi )i=1···B are equal, and consequently, the average
value of the accuracy ratio in the random model is equal to 0. It is often considered that a random
model corresponds to AR = 0 but this is true on average only. Any value of AR is atteignable. The
probability of having a given value of the accuracy ratio AR in the random model (or a given value
of E) is equal to (Zeinstraa et al. 2017):
q(E, N, B)
B
(6)
CN
In the random model, all the configurations are equi-probable, and a fortiori, all the configurations
corresponding to the same value of AR are equi-probable. If the model is not random, Eqs.(1, 4) remain
valid, but the configurations are no longer equi-probable. Indeed, the model will allocate different
probabilities of having a ball in different boxes. In section 3, we will make the assumption that
all the configurations corresponding to the same value of AR are equi-probable. This assumption is
supported by the fact that the huge majority of the configurations associated with a given value of AR
are very close to the average one, as soon as the parameters N and B are large enough. The realized
configuration is then likely to be close to the average one. This result is well known in statistical
mechanics since Boltzmann and still holds here: in physics, boxes are equidistant energy levels, balls
are particles (they are fermions because we have only one ball per box at maximum), and E is the
total energy of the system.
So, by inverting this relationship (see Tran et al. (2004) and Rovenchak (2016)), we have:
I I
1 Z(x, z)
q(E, N, B) = 2
dx dz N +1 E+1 (9)
(2iπ) z x
We set z = e−α and x = e−β . The partition function for balls distributed according to the Fermi-Dirac
statistics is well known (Tran et al. 2004):
B
Y
1 + e−α e−βk
Z(α, β) = (10)
k=1
5
The integrals of Eq.(9) are taken over closed contours round the origin of the complex x and z
plane. Statistical physics techniques are based on the saddle-point approximation method to compute
this type of integral as in Tran et al. (2004) or Rovenchak (2016) for instance. The saddle-point
approximation leads to:
1
q(E, N, B) ∼ √ exp [α∗ N + β ∗ E + ln Z(α∗ , β ∗ )] (11)
2π D
where α∗ and β ∗ are the coordinates of the saddle-point that maximizes the function S(α, β) =
αN + βE + ln Z(α, β) and:
2 2 2 2
∂ S ∂ S ∂ S
D= 2
. 2
− (12)
∂β ∂α ∂β∂α
The second order derivatives are taken at the saddle-point (α∗ , β ∗ ). When the partition function is
the one of Eq.(10), the function S(α, β) writes :
B
X
S(α, β) = αN + βE + ln(1 + e−α e−βk ) − ln(1 + e−α ) (13)
k=0
6
3 PD curve calibration
In credit scoring models, the score function is calibrated from observed defaults of obligors or loans.
The PD curve is the probability of default assigned to each loan of a portfolio depending on its score
value or rank in the portfolio. PD curve calibration has generated less literature than scoring models
calibration but some papers have been devoted to this problem. Van Der Burgt (2008) fits the CAP
curve with a parametric function and then computes the PD curve from this fit. This method is
designed to cope with Low Default Portfolios (LDP). The PD curve obtained by Van Der Burgt is an
exponential function of the rank. Changing the shape of the CAP curve fit generates different shapes
for the PD curve.
Another approach is the Quasi Moment Matching approach (Tasche 2010) that fits a 2 parameters
PD function from the target average default rate of the portfolio and the target accuracy ratio of the
scoring model. Because the accuracy ratio is not equal to the variance of default events among risk
classes, the calibration method is not exactly a moment approach but a quasi-moment approach.
In the same paper, Tasche proposes to fit the ROC curve instead of the CAP curve. Using the
binormal fit, Tasche shows that the resulting PD curve is the robust logit function (Tasche 2010).
All these methods require a choice of a parametric function to operate the fit. Under the assumption
that all the configurations corresponding to the same value of AR are equi-probable, we compute a
PD curve corresponding to a target probability of default p and a target accuracy ratio AR without
any additional parametric assumption. This PD curve has a Fermi-Dirac shape.
Let’s come back to the ball-box representation of obligor defaults. The probability of default of
the company ranked q by the scoring model is equal to the probability of having a ball in box number
q; this probability depends on the accuracy ratio AR (or on E). For example, it is uniformly equal
to p in the random model case, and depends on q when the model is not random. In the saddle-
point approximation of section 2, we can compute the probability of any configuration defined by the
sequence (Xi )i=1···B . Indeed, from Eq.(8), the partition function can be rewritten as a sum over all
the configurations: X
Z(α∗ , β ∗ ) = eαN eβE (20)
(Xi )i=1···B
Then, the probability of having a given configuration can be re-expressed in terms of the sequences
(Xi )i=1···B :
1
P (X1 , · · · , XB ) = exp (−α∗ (X1 + · · · + XB ) − β ∗ (X1 + 2X2 + · · · + BXB )) (21)
Z(α∗ , β ∗ )
Then, the expected value of Xq for q ∈ {1, · · · B}, under the set of conditions of Eq.(7) is equal to:
" B B
#
X X X 1 ∂Z
P D(q) = E Xq Xi = N, i Xi = E = Xq P (X1 , · · · , XB ) = (α∗ , β ∗ )
i=1 i=1
Z(α∗ , β ∗ ) ∂(−β ∗ q)
X1 ,··· ,XB
(22)
Indeed, the partial derivative of Z(α∗ , β ∗ ) with regards to q lowers a factor −β ∗ . Then the PD curve
is the Fermi-Dirac function:
1
P D(q) ∼ (23)
1 + eα∗ +β ∗ q
The values of α∗ and β ∗ are the solutions of Eq.(15) and Eq.(16) respectively. We have solved
them numerically and compared the resulting PD curves in Fig. 2 for p = 0.5% and p = 4%, and
7
Table 1: Saddle-point coordinates for p = 0.5% and p = 4%, and AR = 70% and AR = 80%.
B = 10 000 in all these cases.
p AR α∗ β∗
0.5% 70% 3.37748 0.0006712752
0.5% 80% 2.978242 0.0009925998
4% 70% 1.199637 0.0006584164
4% 80% 0.7746198 0.0009475957
AR = 70% and AR = 80% respectively. In all these cases, B = 10 000. The saddle-point parameters
are summarized in table 1.
We plot the resulting PD curves in Fig. 2. We see, as expected, that a higher discriminatory power
in the scoring model results in less dispersed PDs over the portfolio.
In our approach, the saddle-point PD curve is a logistic function of the rank and not of the score.
This function differs from all the PD curve functions usually used in the literature. In particular,
it doesn’t depend on the score distribution of the portfolio. In the next section, we compare the
performance of the saddle-point PD curves obtained here with the logistic PD curves which are often
used in credit risk modeling.
8
This shape is used in credit risk because it often results of a logistic credit score. However, other
machine learning models are now becoming more and more popular, and the logistic score may be
seen as a suboptimal choice. We show that the logistic PD curve is suboptimal compared to the
saddle-point PD curve obtained in section 3 for conditionally normal continuous scores.
Let’s consider that the score distributions conditional to default and survival are normal:
2 2
sD ∼ N (µD , σD ) and sN ∼ N (µN , σN ) (25)
and the exact PD curve is an explicit function of the conditional score densities:
fD (s)
pE (s) = p (27)
pfD (s) + (1 − p)fN (s)
The shape of the function pE (x) is definitely different of the shape of the function pL (s) as well as the
saddle-point PD curve of Eq.(23). When conditional score standard deviations σN and σD are equal
to each other, the shape √ of the function pE (s) is logistic, but is equal to the one of Eq.(24) in the
special case AR = 2N (1/ 2) − 1 ∼ 52.05% only. In the general case, a numerical study is necessary
to check the performance of Eq.(23) compared to the logistic function.
To illustrate this, we consider two cases:
• Case 1: example provided in section 4.2 of (Tasche 2010), i.e. µD = 6.8, σD = 1.96, µN = 8.5
and σN = 2. We choose P D = 4% and the number of obligors B = 10 000. The values of the
parameters for the conditional distributions correspond to AR = 45.6%.
• Case 2: µD = 0, σD = 1, µN = 2.10 and σN = 1.3. We choose P D = 3% and the number of
obligors B = 10 000. The values of the parameters for the conditional distributions correspond
to AR = 80%.
The choice of the logistic PD curve pL (s) is often done because of simplicity and because it
guarantees that the PD curve is a smooth decreasing function of the score. The PD curve we propose,
obtained in the saddle-point approach, has a decreasing shape as well, and it outperforms the logistic
PD curve as we shall see below.
We use the L1 distance between PD curves to assess the relative performance of the saddle-point
method compared
R ∞ to the logistic. We define the L1 distance between two functions f and g by
kf − gk1 = −∞ |f (x) − g(x)| dx. The saddle-point PD curve is closer than the logistic curve to
the exact PD curve for the L1 distance in both cases. All the parameters and numerical results are
summarized in Table 2.
9
Figure 3: PD curves for case 1 (left) and case 2 (right).
Case 1 Case 2
α∗ 1.73 1.10
β∗ 0.000408 0.0009577
γ 3.67 −2.24
As we can see in Table 3, the L1 distance between the logistic and the exact curves is 3 times
higher than the distance between the saddle-point and the exact curves in case 1. This ratio drops to
1.56 in case 2 but remains significantly higher than 1. We see on the graphs of Fig. 3 that the saddle-
point approach is very accurate for high scores in all cases. For very low scores (high risk loans),
the saddle-point curve starts for small values of the rank at relatively low probabilities of default
compared to the other curves. In reality, very high default probabilities are quite unlikely because
the loans have already crossed a selection process (the granting process) and they correspond to loans
that were acceptable for the bank at the date of origination. From this perspective, the saddle-point
curve is more realistic than the logistic one for low scores.
We make a Monte-Carlo simulation to explore a wider range of cases. We choose the probability of
default p uniformly in the range 0 to 5% and the parameters of the conditional score distributions. As
all these parameters are tied together, we choose µD = 0 and σD = 1. We select µD − µN uniformly
Table 3: L1 distance between the exact PD curve on one hand and the saddle-point and logistic curves
on the other hand.
Case 1 Case 2
Distance saddle / exact 0.00888 0.00606
Distance logistic / exact 0.0268 0.0098
10
in the range 1.2 to 2.4 and σN in the range 0.8 to 1.2. From Eq.(26), the accuracy ratio ranges in
the interval 60% to 90%. We plot the L1 distances in Fig. (4); the blue line separates two regions
where the saddle-point outperforms (resp. underperforms) the logistic curve. As the number of spots
is much higher under the blue line, we conclude that the saddle-point PD curve often outperforms
the logistic one in this range of parameters. We leave a more detailled study of these simulations to
another paper.
Figure 4: Plot of the L1 distances between the exact PD curve on one side and the logistic and
saddle-point respectively on the other side. The line where both distances are equal is the straight
line.
Conclusion
We assumed in this paper that all the configurations associated to a given accuracy ratio are equi-
probable. This assumption is supported in the large portfolio limit by the fact that most of the
configurations are very close to the average configuration; this fact is well known in statistical physics.
By using statistical physics tools, we have been able to derive the resulting PD curve and we have
shown that it often outperforms the logistic PD curve when conditional score distributions are normal.
This result is new because it provides an argument of using PD curves with a Fermi-Dirac shape
(logistic function of the rank). This PD curve can be easily calibrated from the portfolio’s target
default rate and the model’s target accuracy ratio as provides a better fit of the PD curve because it
has two parameters. The logistic curve should no longer be used by practitioners because it is based
11
on arbitrary assumptions and is a one parameter function only. The dependence on the score is often
considered as more accurate than a rank dependence. However, we claim that scores contain more
model risk than ranks, and this is the reason why PD curves based on ranks instead of scores may be
more robust.
Appendix
In this appendix, we derive the asymptotic form of the function q(E, N, B) when N → ∞, B → ∞,
and E, N and B are tied together with Eq. (7). The second order derivatives of the S(α, β) function
are in the asymptotic limit:
∂2S
R∞ ∗ 2
2
∂β ∼ β ∗3 0 ln(1 + e−α e−x )dx = β2∗3 uv 2
∂2S 1 d
R∞ −α∗ −x
∂α∂β ∼ − β ∗2 dα 0 ln(1 + e e )dx = βv∗2 (29)
∂2S 1−e−v(p)
∂α2 ∼ β∗
12
References
Andrews, G.E. (1976). The theory of partitions. Addison-Wesley publishing Co.
Canfield, R. (1997). From Recursions to Asymptotics: On Szekeres’ Formula for the Number of
Partitions. The Electronic Journal of Combinatorics, 4 (no. 2).
Hand, D.J.. (1997). Construction and Assessment of Classification Rules. John Wiley & Sons,
Chichester.
Rovenchak, A. (2016). Statistical mechanics approach in the counting of integer partitions.
arXiv:1603.01049v1 [math-ph].
Szekeres, G. (1987). Asymptotic Distribution of the Number and Size of Parts in Unequal partitions.
Bull. Austral. Math. Soc., Vol. 36, 89-97.
Tasche, D. (2010). Estimating Discriminatory Power and PD Curves when the Number of Defaults
is Small. arxiv.org/0905.3928.
Tran, M.N., M. V. N. Murthy, and R. J. Bhaduri (2004). On the quantum density of states and
partitioning an integer. Ann. Phys. 311, 204-219.
Van Der Burgt, M. (2008). Calibrating low default portfolios, using the cumulative accuracy profile.
Journal of Model Risk Validation, 1(4): 17-33.
Zeinstraa, C., R. Veldhuisa and L. Spreeuwers (2017). How Random is a Classifier given its Area
under Curve? Lecture Notes in Informatics (LNI), Gesellschaft fur Informatik, A. Bromme, C.
Busch, A. Dantcheva, C. Rathgeb and A. Uhl (Eds.): BIOSIG 2017, Bonn, 259-266.
13