Module 3. Individual Claim Size Modelling

School of Risk and Actuarial Studies
ACTL5106: Insurance Risk Models

Module 3. Individual Claim Size Modelling1
Vincent Tu
School of Risk & Actuarial Studies

UNSW Business School
July 28, 2018
1
1/60 References: MW 3 / (A 2) / (E)
Plan
1 Introduction
Introduction
Components to fitting loss models
Insurance data
2 Data analysis and descriptive statistics
3 Selected parametric claims size distributions
Introduction
Parametric models for Y
4 Model selection
Graphical approaches
Hypothesis tests
5 Calculating within layers for claim sizes
Usual policy transformations
Reinsurance
6 Case studies
Illustrative datasets
Data set A
Data set B
2/60
Introduction
Introduction
Introduction
How to fit loss models to insurance data?

Peculiar characteristics of insurance data:
complete vs incomplete set of observations
left-truncated observations
right-censored observations
Parametric distribution models
model parameter estimation
judging quality of fit
model selection criteria (graphical, score-based approaches)
Some datasets analysed
3/60
Introduction
1 select from a set of candidate distributions

Pareto, Log-Normal, Inverse Gaussian, Gamma, etc.
2 estimate the model parameters
method of moments
maximum likelihood (nice properties)
3 evaluate the quality of a given model
graphical procedures (qq, pp plots, empirical cdf’s)
score-based approaches (Kolmogorov-Smirnoff tests, AD tests,
chi-square goodness-of-fit tests, SBC)
4 determine which model fits best
5 (monitor results)
4/60
Introduction
Insurance data
Complete vs incomplete data
complete, individual data

you observe the exact value of the loss
incomplete data
exact data may not be available
in loss/claims data, these arise in the following situations:
1 observations may be grouped - observe only the range of
values in which the data belongs
2 presence of censoring and/or truncation
3 typically due to typical insurance and reinsurance
arrangements such as deductibles and limits
5/60
Introduction
Insurance data
Left-truncation and right-censoring
left-truncated observation (e.g. excess / deductible)

observation is left-truncated at c if it is NOT recorded when it
is below c and when it is above c, it is recorded at its exact
value.
right-censored observation (e.g. policy limit)
observation is right-censored at d if when it is above d it is
recorded as being equal to d but when it is below d it is
recorded at its observed value.
of course, observations can be both left-truncated and
right-censored
6/60
Introduction
Insurance data
Zero claims
Significant proportions of zero claims are frequent, for a

number of reasons:
Data is policy-based, not claims-based;
Claim not exceeding deductible;
Mandatory reporting of accidents; etc...
This complicates the fitting (parametric distributions often
don’t have a flexible mass at 0, if at all)
Several possible solutions
1 Adjust X by mixing 0 with a parametric distribution
2 Adjust N by reducing the frequency of claims accordingly
(hence ignoring 0 claims)
7/60
Introduction
Insurance data
The first way is mathematically consistent, but contradicts the

model assumption G (0) = 0.
The second way fits in the compound Poisson modeling
framework due to the joint decomposition theorem, and is
often easier, but we may lose some important information by
dropping the zero claims.
8/60
Data analysis and descriptive statistics
Summarising the data
For any type of data analysis, first thing to do is to summarise

the data.
summary statistics: mean, median, standard deviation,
coefficient of variation, skewness, quantiles, min, max, etc.
gives a preliminary understanding of the data
Do some graphs:
histogram (or transformation like log)
density plots
qq plot (normal, for transformed)
9/60
Selected parametric claims size distributions
Introduction
Some additional useful properties
Loss size index function:

Ry
zdG (z)
I(G (y )) = R 0∞
0 zdG (z)
Empirical loss size index function:

Pbnαc
Y(i)
In (α) = Pi=1
b n
i=1 Yi
for α ∈ [0, 1].

Loss size index function, I(G (y )), evaluates the relative
contribution of [0, y ] to the overall mean. (see also Pareto
principle)
10/60
Introduction
The mean excess function
e(u) = E [Yi − u|Yi > u]
The empirical mean excess function

Pn
(Yi − u)1{Yi >u}
ebn (u) = i=1
Pn
i=1 1{Yi >u}
This is useful for the analysis of large claims, and for the
analysis of reinsurance.
11/60
Introduction
Topics outside the scope of this course
Extreme value theory:

log-log plot
regular variation at infinity and tail index
Hill plot
Enrol into ACTL5301 if you want to know about those!
12/60
Gamma Distribution
We shall write Y ∼ Gamma(α, β) if density has the form

β α α−1 −βy
g (y ) = y e , for y > 0; α, β > 0.
Γ (α)
Mean: E (Y ) = α/β
Variance: Var (Y ) = α/β 2
√
Skewness: ςY = 2/ α [positively skewed distribution]
α
β
Mgf: MY (t) = , provided t < β. M_ry (t) = M_y(rt)
β−t For scaled Gamma
Γ (α + k)
Higher moments: E Y k =

Γ (α) β k
Special case: When α = 1, we have Y ∼ Exp(β)
For any constant ρ > 0, ρY ∼ Gamma(α, β/ρ).
13/60
Useful, but complicated

Inverse Gaussian Distribution
We shall write Y ∼ IG(α, β) if density has the form

" #
αy −3/2 (α − βy )2
g (y ) = √ exp − , for y > 0; α, β > 0.
2πβ 2βy
If sth wrong in the data, this division will mess up
Mean: E (Y ) = α/β
Variance: Var (Y ) = α/β 2
√
Skewness: ςY = 3/ α [positively skewed distribution]
√
α 1− 1−2t/β
Mgf: MY (t) = e , provided t < β/2.
The term “Inverse Gaussian” comes from the fact that there
is an inverse relationship between its cgf and that of the
Gaussian distribution, but NOT from the
fact that the inverse is Gaussian!
14/60
Weibull For earthquake
We shall write Y ∼ Weibull (τ, c) if density has the form
g (y ) = (cτ )(cy )τ −1 exp{−(cy )τ }, for y ≥ 0; τ, c > 0.

y
Note G (u) = 1 − exp{−(cy )τ }.
Γ(1+1/τ )
Mean: E (Y ) = c
Variance: Var (Y ) = Γ(1+2/τ
c2
)
− µ2Y
h i
Skewness: ςY = Γ(1+2/τc2
)
− 3µY σY2 − µ3Y /σY3
Mgf: does not exist for τ < 1 and t > 0.
Higher moments: E Y k = Γ(1+k/τ )

ck
For any ρ > 0, ρY ∼ Weibull (τ, c/ρ)
Note: if Z ∼ expo(1) then Z 1/τ /c ∼ Weibull (τ, c).
15/60
= exp ( Normal )
Lognormal has heavy tail
We shall write Y ∼ LN µ, σ 2 and we have

log Y ∼ N (µ, σ 2 ).
Mean: E (Y ) = exp{µ + σ 2 /2}

Variance: Var (Y ) = exp{2µ + σ 2 } exp{σ 2 } − 1

1/2
Skewness: ςY = exp{σ 2 } + 2 exp{σ 2 } − 1

Mgf: does not exist for t > 0.

For any ρ > 0, ρY ∼ LN µ + log ρ, σ 2 .

16/60
Log-gamma just candidate... not very useful
We shall write log Y ∼ Gamma(γ, c) and we have
cγ
g (y ) = (log y )γ−1 y −(c+1) , for y > 1; γ, c > 0.
Γ (γ)
γ
c
Mean: E (Y ) = c−1 for c > 1
γ
c
Variance: Var (Y ) = c−2 − µ2Y for c > 2
h γ i
c
Skewness: ςY = c−3 − 3µY σY2 − µ3Y /σY3
Mgf: does not exist for t > 0.
c γ
Higher moments: E Y k = c−k for c > k
17/60
Pareto Distribution Good one ! for large loss model
We shall write Y ∼Pareto(θ, α) if density has the form
α y −(α+1)
g (y ) = , for y ≥ θ; θ, α > 0.
θ θ
α
Mean: E (Y ) = θ α−1 ,α>1
Variance: Var (Y ) = θ2 (α−1)α2 (α−2) , α > 2
α−2 1/2
Skewness: ςY = 2(1+α)

α−3 α ,α>3
Mgf: does not exist for t > 0
Translated Pareto: distribution of Y = Y − β [see Yellow
Book]
18/60
Model selection
After fitting the model and found parameter by MLE
Graphical approaches Empirical : Histogram, ecdf, quantity

v.s
Model : Parametric shape
For judging quality of model, do some graphical comparisons:

good for general shape
histogram vs. fitted parametric density function
empirical CDF vs fitted parametric CDF Better for checking the Tail
probability-probability (P-P) plot - theoretical vs empirical
cumulative probabilities
quantile-quantile (Q-Q) plot - theoretical vs sample quantiles
Let the
(theoretical)
fitted parametric distribution be denoted
by G x; θb .
emperical
parametric
19/60
Model selection
P-P plot
To construct the P-P plot:

Rank data order the observed data from smallest to largest:
x(1) , x(2) , ..., x(n) .
calculate the
theoretical
CDF at each of the observed data
points: G x(i) ; θb . continuity correction
i − 0.5
for i = 1, 2, ..., n, plot the points against G x(i) ; θb .
n
empirical theoretical
20/60
Model selection
QQplot is better for assessing tail , PPplot is better for assessing the body
Q-Q plot
To construct the Q-Q plot:

Rank
order the observed data from smallest to largest:
x(1) , x(2) , ..., x(n) .
for i
= 1, 2, ..., n,calculate the theoretical quantiles:
i − 0.5 b Theoretical
G −1 ;θ .
n
for i = 1, 2, ..., n, plot the points x(i) against

−1 i − 0.5 b Empirical
G ;θ .
n
These constructions hold only for the case where you have no
censoring/truncation.
21/60
Model selection
Hypothesis tests
Hypothesis tests
Test the null H0 : data came from population with the

specified model against Ha : the data did not come from such
a population.
Some commonly used tests:

Kolmogorov-Smirnoff: K .S. = sup Gb (y ) − G y ; θb

y
R [Gb(y )−G (y ;θb)]2

Anderson-Darling: A.D. = n g y ; θb dy
G (y ;θ )[1−G (y ;θ )]
b b
(observed−expected)2
χ2 goodness-of-fit: χ2 =
P
j expected
22/60
Model selection
Hypothesis tests
Distance Based -- easier to cal but not accurate

Kolmogorov-Smirnoff test Only find the biggest Gap, ignoring the others.
.. -> misleading

Kolmogorov-Smirnoff test: K .S. = sup Gb (x) − G x; θb ,

x
Emprical Theoretical
where
Gb (y ) is the empirical distribution
G (y ; θ) is the assumed theoretical distribution in the null
hypothesis
G (y ; θ) is assumed to be (must be) continuous
θ̂ is the maximum likelihood estimate for θ under the null
max gap > critical
hypothesis.
There are tabulated tables for the critical values. There are
several variations of these tables in the literature that use
somewhat different scalings for the K-S test statistic and
critical regions.
The test does not work for grouped data
23/60
Model selection
Hypothesis tests
Anderson-Darling test
Weighted Average of square difference
R [Gb(x)−G (x;θb)]2
Anderson-Darling test: A.D. = n g x; θb dx
G (x;θb)[1−G (x;θb)]
where n is the sample size. Good: across the whole distribution
The critical values for the Anderson-Darling test are

dependent on the specific distribution that is being tested.
There are tabulated values and formulas for a few specific
distributions.
The theoretical distribution is assumed to be (must be)
continuous
The test does not work for grouped data.
Bad : contains integral

24/60
Model selection
Hypothesis tests
Theoretical
χ2 goodness-of-fit test
Break down the whole range into k subintervals:
Pr( Y belongs to (0,y1)
c0 < c1 < · · · < ck = ∞ x
(E −O ) 2
χ2 goodness-of-fit test: χ2 = kj=1 j Ej j Number of Obs
P
= expected number
Let pˆj = G (cj ; θ̂) − G (cj−1 ; θ̂). in the bin
Then, the number of expected observations in the interval

(cj−1 , cj ] assuming that the hypothesized model is true:
Ej = np̂j (Here, n is the sample size)
how to choose number of bins ?
Let pj = Ĝ (cj ) − Ĝ (cj−1 ).
The number of observations in the interval (cj−1 , cj ] :
1. equal size of obs in each bin
Oj = npj 2. equal width for each bin
(should have more than 5 size in
The statistic has chi-square distribution withthethe
bin)degree of freedom
25/60 equal to: k − 1 − number of parameters estimated

Model selection
Hypothesis tests
How these tests are used AD Chi sqr

KS
Dist 1
Dist 2
Besides testing whether data came from specified model or

not, generally we would prefer models with:
lowest K-S test statistic
lowest A-D test statistic
lowest χ2 goodness-of-fit test statistic (or equivalently highest
p-value)
highest value of the likelihood function at the maximum
Perform formal statistical test, or use as ‘horse race’
26/60
Model selection
Hypothesis tests
AD & KS do not adjust to # of par and obs
Comparison of these tests But chi sqr does
K-S and A-D tests are quite similar - both look at the
difference between the empirical and model distribution
functions.
K-S in absolute value, A-D in squared difference.
But A-D is weighted average, with more emphasis on good fit
in the tails than in the middle; K-S puts no such emphasis.
For K-S and A-D tests, no adjustments are made to account
for increase in the number of parameters, nor sample size.
Result: more complex models often will fare better on these
tests.
For K-S and A-D tests, no adjustments are made to account
for sample size. Result: large sample size increases probability
-> if par increase, can't
of rejecting all models compare by KS and AD
The χ2 test adjusts the degrees of freedom for
27/60
increases in the number of parameters.
Model selection
Hypothesis tests
AIC & BIC to pick # par

Information criteria KS, AD, Chi to pick within # par
Within an MLE framework

Akaike Information Criterion (AIC)
(i)
AIC(i) = −2`Y + 2d (i) ,
where d (i) denotes the number of estimated parameters in gi

Bayesian Information Criterion (BIC)
(i)
BIC(i) = −2`Y + log(n)d (i) .
This is −2SBC, where SBC is Schwarz Bayesian Criterion

(later in the slides)
28/60
Calculating within layers for claim sizes
Usual policy transformations
Deductible and Policy Limit For Next week
One way to control the cost (and variability) of individual claim

losses is to introduce deductibles and policy limits.
Deductible d: the insurer starts paying claim amounts above
the deductible d
Limit M: the insurer pays up to the limit M.
If we denote the damage random variable by D, then if a claim
occurs the insurer is liable for
M
Y = min [max (D − d, 0) , M] .
d
29/60
Reinsurance
1. Reduce volatitty
2. Reduce overall exposure
Reinsurance 3. Reduce Capital, lower cap, higher ROE
risk transfer from an insurer (the direct writer) to a reinsurer:

swap of deterministic (premium) against random (loss)
the risk that the insurer keeps is called the retention
There are different types of reinsurance:
proportional
quota share: the proportion is the same for all risks
surplus: the proportion can vary from risk to risk
nonproportional
(individual) excess of loss: on each individual loss (Xi ) Individual Claim
stop loss: on the aggregate loss (S) Total Claims of Earthquake event
cheap (reinsurance premium is the expected value), or
non cheap (reinsurance premium is loaded)
CAT bonds and the like... (ART)
30/60
Reinsurance
Proportional reinsurance
The retained proportion α defines who pays what:

the insurer pays Y = αX
the reinsurer pays Z = (1 − α)X
This is nothing else but a change of scale and we have
µY = αµX , σY2 = α2 σX2 , γY = γX .
In some cases it suffices to adapt the scale parameter. Example:

If X is exponential with parameter β
Pr[Y ≤ y ] = Pr[αX ≤ y ] = Pr[X ≤ y /α] = 1 − e −βy /α

E(Y) = a / b
and thus Y is exponential with parameter β/α.
31/60
Reinsurance
Non-proportional reinsurance
Basic arrangements:
the reinsurer pays the excess over a retention (excess point) d
the insurer pays Y = min(X , d)
the reinsurer pays Z = (X − d)+
the reinsurer limits his payments to an amount M. In that
case
the insurer pays Y = min(X , d) + (X − M − d)+
the reinsurer pays Z = min {(X − d)+ , M}
two reinsurers
32/60
Reinsurance
Example
Consider a life insurance company with 16,000 1-year term life

insurance policies. The associated insured amounts are:
Benefit (10000’s) # policies
1 8000
2 3500
3 2500
5 1500 Reinsured
10 500
The probability of death (q) for each of the 16,000 lives is 0.02.
This company has an EoL reinsurance contract with retention limit
30,000 at a cost of 0.025 per dollar of coverage.
What is the approximate probability (using CLT) that the total
cost will exceed 8,250,000?
33/60
Reinsurance
The portfolio of retained business is given by

k retained benefit bk (10000’s) # policies nk
1 1 8000
2 2 3500
3 3 4500
Now
X3 X 3
E [S] = nk E [Xk ] = nk bk qk
k=1 k=1
= 8000 · 1 · 0.02 + 3500 · 2 · 0.02 + 4500 · 3 · 0.02
= 570, and
X3 3
X
Var [S] = nk Var [Xk ] = nk bk2 qk (1 − qk )
k=1 k=1
= 8000 · 1 · 0.02 · 0.98 + 3500 · 22 · 0.02 · 0.98
2
+4500 · 32 · 0.02 · 0.98

34/60
= 1225
Reinsurance
The reinsurance cost is
[(5 − 3) · 1500 + (10 − 3) · 500] · 0.025 = 162.5.
Thus, the desired probability becomes

" #
S − E [S] 662.5 − E [S]
Pr[S + 162.5 > 825] = Pr p > p
Var (S) Var (S)

662.5 − 570
≈ Pr Z > √
1225
= Pr[Z > 2.643] = 0.0041.
Without reinsurance, exp/var was 700/2587.20 so the probability

≈ Pr[Z > 2.458] which is higher even though it is not cheap
reinsurance.
35/60
Reinsurance
A useful identity
Note that
min(X , c) = X − (X − c)+
and thus
E [min(X , c)] = E [X ] − E [(X − c)+ ].
The amount E [(X − c)+ ] = Pr[X > c]e(c)
is commonly called ”stop loss premium” with retention c.
is identical to the expected payoff of a call with strike price c,
and thus results from financial mathematics can sometimes be
directly used (and vice versa).
36/60
Reinsurance
Stop loss premiums
Let
E [(X − d)+ ] = Pd . d = retension ratio
Then we have (for positive rv’s)

R∞
Pd = P d [1 − FX (x)] dx if X is continuous
∞
d [1 − FX (x)] if X is discrete = pdf
37/60
Reinsurance
Example
Calculate Pd if X is Exponential with parameter β.
38/60
Reinsurance
Stop loss premiums - recursive formulas in the discrete case
First moment:
if d is an integer
Pd+1 = Pd − [1 − FX (d)] with P0 = E [X ]
if d is not an integer
Pd = Pbdc − (d − bdc)[1 − FX (bdc)].
Second moment Pd2 = E [(X − d)2+ ]:

2
Pd+1 = Pd2 − 2Pd + [1 − FX (d)] with P02 = E [X 2 ].
39/60
Reinsurance
Numerical example
For the distribution F1+2+3 derived earlier in the lecture we have

E [S] = 4 = 128/32 and E [S 2 ] = 19.5 = 624/32 and thus
d f1+2+3 (d) F1+2+3 (d) Pd Pd2 Var ((X − d)+ )
0 1/32 1/32 128/32 624/32 3.500
1 2/32 3/32 97/32 399/32 3.280
2 4/32 7/32 68/32 234/32 2.797
3 6/32 13/32 43/32 123/32 2.038
4 6/32 19/32 24/32 56/32 1.188
5 6/32 25/32 11/32 21/32 0.538
6 4/32 29/32 4/32 6/32 0.172
7 2/32 31/32 1/32 1/32 0.030
8 1/32 32/32 0 0 0.000
P2.6 = P2 − 0.6 · (1 − F1+2+3 (2)) = 53/32.
40/60
Reinsurance
Leverage effect of claims inflation
Choose a fixed deductible d > 0 and assume that the claim at time
0 is given by Y0 . Assume that there is a deterministic inflation
index i > 0 such that the claim at time 1 can be represented by
Y1 = (1 + i)Y0 .We have what you pay is less than should
E [(Y1 − d)+ ] ≥ (1 + i)E [(Y0 − d)+ ].
When tax brackets are not adapted, this leads to ‘cold progression
of taxes’...
41/60
Case studies
We analyse two different loss datasets:

Valdez-DataSetA - this dataset, with 1,000 observations,
consists of insurance claim amounts observed over a fixed
period of time. An example of data consisting of complete/full
information.
Klugman-DataSetB-modified - this dataset was analysed by
Klugman and Rioux (2006) which consists of
truncated/censored liability claims data. Data has 100
observations, with each consisting of the excess/deductible,
the observed claim amount, and an indicator whether policy
limit has been reached.
42/60
Case studies
Data set A
Summary statistics - Valdez-DataSetA
Count 1,000
Mean 1,244.32
Standard deviation 650.32
Variance 422,956.79
Minimum 120
More informative in 25,
75 th
25th percentile 783.75
IQR Median 1,114.50
75th percentile 1,586.50
Maximum 5,799
Skewness 1.71
Kurtosis 5.88
43/60
Case studies
Data set A
Figure 1: preliminary plots

Log gives better fit
Histogram of claimj Histogram of log(claimj)
8 e−04
0.8
Kernel smoothing
0.6
Density
Density
4 e−04
0.4
0.2
0 e+00
0.0
0 1000 2000 3000 4000 5000 6000 5 6 7 8
claimj log(claimj)
Normal Q−Q Plot Normal Q−Q Plot

5000
8
Sample Quantiles
Sample Quantiles
7
3000
6
1000
5
0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Theoretical Quantiles Theoretical Quantiles
44/60
Case studies
Data set A
Revision: The (expected) Fisher information
Score (or gradient) vector consists of first derivatives
∂` (θ; x) 0

∂` (θ; x)
S (θ; x) = , ...,
∂θ1 ∂θm

so that the MLE satisfies F.O.C. S θ; b x = 0 = (0, ..., 0)0 .
The m × m Hessian matrix for ` (θ; x) is defined by

 2
∂ 2 ` (θ; x)

∂ ` (θ; x)
 ∂θ2 ···
2 1 ∂θ1 ∂θm 
∂ ` (θ; x) 

. . .

H (θ; x) = = . . . . .

∂θ∂θ0  2 . 2
. 
 ∂ ` (θ; x) ∂ ` (θ; x) 
··· 2
∂θm ∂θ1 ∂θm Standard Error
45/60
Case studies
Data set A
- continued

This Hessian is used to estimate Var θb .
Minus the expected value of this is called the (expected)
Fisher information.
It is well-known
that a consistent estimator for the covariance
matrix Var θb is given by the inverse of the negative of this
Hessian matrix:
h i−1
Var θb ≥ Var
c θb = −E[H θ; bx ] .
The square root of the diagonal elements of this covariance

estimate give the standard errors of the MLE estimates.
46/60
Case studies
Data set A
Fitting the log-normal distribution - Valdez-DataSetA
Use method of moments for initial estimates, or sometimes

exploit properties of the distribution being fitted.
Parameter estimates with standard errors:
Parameter Estimate (MLE) Standard Error
µ 7.0031 0.0158
σ 0.5010 0.0112
47/60
Case studies
Data set A
Figure 2: plots to assess quality - Valdez-DataSetA
Histogram of claimj Empirical CDF
1.0
8 e−04
0.8
0.6
Density
4 e−04
cdf
0.4
0.2
0 e+00
0.0
0 1000 3000 5000 0 1000 3000 5000
claimj claimj
Q−Q plot P−P plot
1.0
6000
0.8
sample probability
sample quantiles
4000
0.6
0.4
2000
0.2
0.0
0
0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0
theoretical quantiles theoretical probability
48/60
Case studies
Data set A
Comparing the various fitted models - Valdez-DataSetA
Two other distributions are fitted to the data: the Gamma

and the Burr XII models.
The Pareto model was also fitted, but this did not produce
reasonable estimation results. So it was taken off the
comparison.
Below is a summary of the various test statistics resulting
from the three distribution models considered. # par into account
Model MLE’s Loglikelihood r K-S A-D χ2 (p-value) SBC
Lognormal µ
b = 7.0031, σ
b = 0.5010 -7730.832 2 0.0224 1.0001 0.0806 -7737.740
Gamma α
b = 4.2167, βb = 0.0034 -7742.027 2 0.0393 1.0015 <0.0001 -7748.935
Burr XII α
b = 1.3879, γ
b = 3.1533, θb = 1284.7754 -7737.277 3 0.0382 0.9996 0.0027 -7747.638
KS, AD can only compare LN, Gamma because same par
49/60
Case studies
Data set A
Figure 3: fitting the gamma model - Valdez-DataSetA
1.0
8 e−04
0.8
0.6
Density
4 e−04
cdf
0.4
0.2
0 e+00
0.0
0 1000 3000 5000 0 1000 3000 5000
claimj claimj
1.0
6000
0.8
sample probability
sample quantiles
4000
0.6
0.4
2000
0.2
0.0
0
0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0
50/60
Case studies
Data set A
Figure 4: fitting the Burr XII model - Valdez-DataSetA
1.0
8 e−04
0.8
0.6
Density
4 e−04
cdf
0.4
0.2
0 e+00
0.0
0 1000 3000 5000 0 1000 3000 5000
claimj claimj
1.0
6000
0.8
sample probability
sample quantiles
4000
0.6
0.4
2000
0.2
0.0
0
0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0
51/60
Case studies
Data set B
Summ. stats - Klugman-DataSetB-modified
statistic not censored censored

Count 75 25
Mean 1,276.47 2,084.00
Standard deviation 823.67 1,174.09
Variance 678,427.20 1,378,483.33
Minimum 182.00 1,100.00
25th percentile 739.50 1,100.00
Median 1,131.00 1,500.00
75th percentile 1,560.00 3,100.00
Maximum 4,510.00 5,500.00
Skewness 1.76 1.08
Kurtosis 3.91 0.38
52/60
Case studies
Data set B
Representing the observed data
For our purposes, we shall represent our set of observations as
(tj , xj , δj )
where
tj is the left truncation point;

xj is the claim value that produced the data point; and
δj is indicator whether limit has been reached.
For examples:
Start from 50 to 250 , but not reaching 250
(50, 250, 0) Start from 100 to 1100 but reaching the limit
(100, 1100, 1)
53/60
Case studies
Data set B
Maximum likelihood contributions See annotated slides
Different form of data, thus not usual MLE format

1−δj Y
Y 1 − F (xj ; θ) δj

f (xj ; θ)
L (θ; x) = · .
j 1 − F (tj ; θ) j 1 − F (tj ; θ)
The contribution to the likelihood function for a data point

where the limit has not been reached is given by
1−δj
f (xj )
.
1 − F (tj )
The contribution to the likelihood function for a data point
where the limit has been reached is given by
1 − F (xj ) δj

.
1 − F (tj )
Note here that the policy limit if reached
would be equal to xj − tj .
54/60
Case studies
Data set B
Parameter estimates
Here we consider only two distribution models: the Lognormal and

the Gamma distributions.
Model Parameters Estimates Standard Error Loglikelihood SBC
Lognormal µ 7.1628 0.0994 -626.2581 -630.8633

σ 0.8589 0.0898
Gamma α 1.4398 0.3499 -627.3484 -631.9535
β 0.0009 0.0002
55/60
Case studies
Data set B
Figure 5: fitting the lognormal model
Histogram of claimj Kaplan−Meier CDF
8 e−04
0.8
0.6
Density
4 e−04
cdf
0.4
0.2
0 e+00
0.0
0 1000 3000 5000 0 1000 2000 3000 4000 5000
claimj claimj

6000
0.8
sample probability
sample quantiles
4000
0.6
0.4
2000
0.2
0.0
0
0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8
56/60
Case studies
Data set B
Figure 6: fitting the gamma model
Histogram of claimj Kaplan−Meier CDF
8 e−04
0.8
0.6
Density
4 e−04
cdf
0.4
0.2
0 e+00
0.0
0 1000 3000 5000 0 1000 2000 3000 4000 5000
claimj claimj

6000
0.8
sample probability
sample quantiles
4000
0.6
0.4
2000
0.2
0.0
0
0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0
57/60
Case studies
Data set B
Test results for various models
Table 4 of Klugman/Rioux (2006) paper:
Model Loglikelihood r K-S A-D χ2 SBC
Exponential -628.23 1 0.9399 1.2041 0.1571 -630.49

Lognormal -626.26 2 0.9048 0.5587 0.0685 -630.79
Gamma -627.35 2 0.9993 0.8189 0.2275 -631.88
Lognormal/exp -623.77 4 0.5631 0.2463 0.5576 -632.83
Gamma/exp -623.64 4 0.5577 0.2645 0.5470 -632.71
Lognormal/exp/exp -623.39 6 0.4345 0.1411 0.3035 -636.98
Gamma/exp/exp -623.26 6 0.4497 0.1275 0.3122 -636.86
58/60
Case studies
Data set B
Comments on Klugman-DataSetB-Test results for various

models
A person favors parsimony will pick either the exponential or

the lognormal model (they are favored by SBC)
The two mixtures with two exponential distributions may also
be chosen because they clearly maximize the likelihood.
The lognormal mixture has a better K-S statistic, while the
gamma mixture has the best A-D statistic. This indicates that
the latter may do better in the tails.
···
59/60
Case studies
Data set B
Selecting a model - summary
Klugman and Rioux (2006) suggests the following procedures

for selecting a model for (insurance) loss data:
Do preliminary investigation (e.g. summary statistics,
histogram) of data.
Construct empirical distribution (use Kaplan-Meier for
truncated/censored data).
Construct pictures such as q-q and p-p plots.
Conduct hypothesis tests: Kolmogorov-Smirnoff test,
Anderson-Darling test, and the χ2 goodness-of-fit test.
Calculate the SBC criterion for each model being considered.
Some other considerations when choosing a model:
Keep it simple if at all possible.
Restrict the universe of potential models.
60/60

Module 3. Individual Claim Size Modelling

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 3. Individual Claim Size Modelling

Uploaded by

Copyright:

Available Formats

School of Risk and Actuarial Studies

ACTL5106: Insurance Risk Models

School of Risk & Actuarial Studies

July 28, 2018

How to fit loss models to insurance data?

Components to fitting loss models

1 select from a set of candidate distributions

Complete vs incomplete data

complete, individual data

Left-truncation and right-censoring

left-truncated observation (e.g. excess / deductible)

Significant proportions of zero claims are frequent, for a

The first way is mathematically consistent, but contradicts the

Summarising the data

For any type of data analysis, first thing to do is to summarise

Some additional useful properties

Loss size index function:

Empirical loss size index function:

for α ∈ [0, 1].

The mean excess function

e(u) = E [Yi − u|Yi > u]

The empirical mean excess function

Topics outside the scope of this course

Extreme value theory:

We shall write Y ∼ Gamma(α, β) if density has the form

Useful, but complicated

We shall write Y ∼ IG(α, β) if density has the form

Weibull For earthquake

We shall write Y ∼ Weibull (τ, c) if density has the form

g (y ) = (cτ )(cy )τ −1 exp{−(cy )τ }, for y ≥ 0; τ, c > 0.

We shall write Y ∼ LN µ, σ 2 and we have

Mean: E (Y ) = exp{µ + σ 2 /2}

Mgf: does not exist for t > 0.

Log-gamma just candidate... not very useful

We shall write log Y ∼ Gamma(γ, c) and we have

Pareto Distribution Good one ! for large loss model

We shall write Y ∼Pareto(θ, α) if density has the form

Graphical approaches Empirical : Histogram, ecdf, quantity

For judging quality of model, do some graphical comparisons:

To construct the P-P plot:

To construct the Q-Q plot:

Test the null H0 : data came from population with the

R [Gb(y )−G (y ;θb)]2  

Distance Based -- easier to cal but not accurate

where n is the sample size. Good: across the whole distribution

The critical values for the Anderson-Darling test are

Bad : contains integral

Then, the number of expected observations in the interval

25/60 equal to: k − 1 − number of parameters estimated

How these tests are used AD Chi sqr

Besides testing whether data came from specified model or

AIC & BIC to pick # par

Within an MLE framework

where d (i) denotes the number of estimated parameters in gi

This is −2SBC, where SBC is Schwarz Bayesian Criterion

Deductible and Policy Limit For Next week

One way to control the cost (and variability) of individual claim

risk transfer from an insurer (the direct writer) to a reinsurer:

The retained proportion α defines who pays what:

µY = αµX , σY2 = α2 σX2 , γY = γX .

In some cases it suffices to adapt the scale parameter. Example:

Pr[Y ≤ y ] = Pr[αX ≤ y ] = Pr[X ≤ y /α] = 1 − e −βy /α

Consider a life insurance company with 16,000 1-year term life

The portfolio of retained business is given by

R [Gb(y )−G (y ;θb)]2