You are on page 1of 61

School of Risk and Actuarial Studies

ACTL5106: Insurance Risk Models


Module 3. Individual Claim Size Modelling1

Vincent Tu

School of Risk & Actuarial Studies


UNSW Business School

July 28, 2018

1
1/60 References: MW 3 / (A 2) / (E)
School of Risk and Actuarial Studies
Plan

1 Introduction
Introduction
Components to fitting loss models
Insurance data
2 Data analysis and descriptive statistics
3 Selected parametric claims size distributions
Introduction
Parametric models for Y
4 Model selection
Graphical approaches
Hypothesis tests
5 Calculating within layers for claim sizes
Usual policy transformations
Reinsurance
6 Case studies
Illustrative datasets
Data set A
Data set B
2/60
School of Risk and Actuarial Studies
Introduction
Introduction

Introduction

How to fit loss models to insurance data?


Peculiar characteristics of insurance data:
complete vs incomplete set of observations
left-truncated observations
right-censored observations
Parametric distribution models
model parameter estimation
judging quality of fit
model selection criteria (graphical, score-based approaches)
Some datasets analysed

3/60
School of Risk and Actuarial Studies
Introduction
Components to fitting loss models

Components to fitting loss models

1 select from a set of candidate distributions


Pareto, Log-Normal, Inverse Gaussian, Gamma, etc.
2 estimate the model parameters
method of moments
maximum likelihood (nice properties)
3 evaluate the quality of a given model
graphical procedures (qq, pp plots, empirical cdf’s)
score-based approaches (Kolmogorov-Smirnoff tests, AD tests,
chi-square goodness-of-fit tests, SBC)
4 determine which model fits best
5 (monitor results)

4/60
School of Risk and Actuarial Studies
Introduction
Insurance data

Complete vs incomplete data

complete, individual data


you observe the exact value of the loss
incomplete data
exact data may not be available
in loss/claims data, these arise in the following situations:
1 observations may be grouped - observe only the range of
values in which the data belongs
2 presence of censoring and/or truncation
3 typically due to typical insurance and reinsurance
arrangements such as deductibles and limits

5/60
School of Risk and Actuarial Studies
Introduction
Insurance data

Left-truncation and right-censoring

left-truncated observation (e.g. excess / deductible)


observation is left-truncated at c if it is NOT recorded when it
is below c and when it is above c, it is recorded at its exact
value.
right-censored observation (e.g. policy limit)
observation is right-censored at d if when it is above d it is
recorded as being equal to d but when it is below d it is
recorded at its observed value.
of course, observations can be both left-truncated and
right-censored

6/60
School of Risk and Actuarial Studies
Introduction
Insurance data

Zero claims

Significant proportions of zero claims are frequent, for a


number of reasons:
Data is policy-based, not claims-based;
Claim not exceeding deductible;
Mandatory reporting of accidents; etc...
This complicates the fitting (parametric distributions often
don’t have a flexible mass at 0, if at all)
Several possible solutions
1 Adjust X by mixing 0 with a parametric distribution
2 Adjust N by reducing the frequency of claims accordingly
(hence ignoring 0 claims)

7/60
School of Risk and Actuarial Studies
Introduction
Insurance data

The first way is mathematically consistent, but contradicts the


model assumption G (0) = 0.
The second way fits in the compound Poisson modeling
framework due to the joint decomposition theorem, and is
often easier, but we may lose some important information by
dropping the zero claims.

8/60
School of Risk and Actuarial Studies
Data analysis and descriptive statistics

Summarising the data

For any type of data analysis, first thing to do is to summarise


the data.
summary statistics: mean, median, standard deviation,
coefficient of variation, skewness, quantiles, min, max, etc.
gives a preliminary understanding of the data
Do some graphs:
histogram (or transformation like log)
density plots
qq plot (normal, for transformed)

9/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Introduction

Some additional useful properties

Loss size index function:


Ry
zdG (z)
I(G (y )) = R 0∞
0 zdG (z)

Empirical loss size index function:


Pbnαc
Y(i)
In (α) = Pi=1
b n
i=1 Yi

for α ∈ [0, 1].


Loss size index function, I(G (y )), evaluates the relative
contribution of [0, y ] to the overall mean. (see also Pareto
principle)
10/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Introduction

The mean excess function

e(u) = E [Yi − u|Yi > u]

The empirical mean excess function


Pn
(Yi − u)1{Yi >u}
ebn (u) = i=1
Pn
i=1 1{Yi >u}

This is useful for the analysis of large claims, and for the
analysis of reinsurance.

11/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Introduction

Topics outside the scope of this course

Extreme value theory:


log-log plot
regular variation at infinity and tail index
Hill plot
Enrol into ACTL5301 if you want to know about those!

12/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

Gamma Distribution

We shall write Y ∼ Gamma(α, β) if density has the form


β α α−1 −βy
g (y ) = y e , for y > 0; α, β > 0.
Γ (α)

Mean: E (Y ) = α/β
Variance: Var (Y ) = α/β 2

Skewness: ςY = 2/ α [positively skewed distribution]
 α
β
Mgf: MY (t) = , provided t < β. M_ry (t) = M_y(rt)
β−t For scaled Gamma
Γ (α + k)
Higher moments: E Y k =

Γ (α) β k
Special case: When α = 1, we have Y ∼ Exp(β)
For any constant ρ > 0, ρY ∼ Gamma(α, β/ρ).
13/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

Useful, but complicated


Inverse Gaussian Distribution

We shall write Y ∼ IG(α, β) if density has the form


" #
αy −3/2 (α − βy )2
g (y ) = √ exp − , for y > 0; α, β > 0.
2πβ 2βy
If sth wrong in the data, this division will mess up
Mean: E (Y ) = α/β
Variance: Var (Y ) = α/β 2

Skewness: ςY = 3/ α [positively skewed distribution]
 √ 
α 1− 1−2t/β
Mgf: MY (t) = e , provided t < β/2.
The term “Inverse Gaussian” comes from the fact that there
is an inverse relationship between its cgf and that of the
Gaussian distribution, but NOT from the
fact that the inverse is Gaussian!
14/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

Weibull For earthquake

We shall write Y ∼ Weibull (τ, c) if density has the form

g (y ) = (cτ )(cy )τ −1 exp{−(cy )τ }, for y ≥ 0; τ, c > 0.


y
Note G (u) = 1 − exp{−(cy )τ }.
Γ(1+1/τ )
Mean: E (Y ) = c
Variance: Var (Y ) = Γ(1+2/τ
c2
)
− µ2Y
h i
Skewness: ςY = Γ(1+2/τc2
)
− 3µY σY2 − µ3Y /σY3
Mgf: does not exist for τ < 1 and t > 0.
Higher moments: E Y k = Γ(1+k/τ )

ck
For any ρ > 0, ρY ∼ Weibull (τ, c/ρ)
Note: if Z ∼ expo(1) then Z 1/τ /c ∼ Weibull (τ, c).
15/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

= exp ( Normal )
Lognormal has heavy tail

We shall write Y ∼ LN µ, σ 2 and we have




log Y ∼ N (µ, σ 2 ).

Mean: E (Y ) = exp{µ + σ 2 /2}


Variance: Var (Y ) = exp{2µ + σ 2 } exp{σ 2 } − 1

1/2
Skewness: ςY = exp{σ 2 } + 2 exp{σ 2 } − 1


Mgf: does not exist for t > 0.


For any ρ > 0, ρY ∼ LN µ + log ρ, σ 2 .


16/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

Log-gamma just candidate... not very useful

We shall write log Y ∼ Gamma(γ, c) and we have


g (y ) = (log y )γ−1 y −(c+1) , for y > 1; γ, c > 0.
Γ (γ)
 γ
c
Mean: E (Y ) = c−1 for c > 1
 γ
c
Variance: Var (Y ) = c−2 − µ2Y for c > 2
h γ i
c
Skewness: ςY = c−3 − 3µY σY2 − µ3Y /σY3
Mgf: does not exist for t > 0.
  c γ
Higher moments: E Y k = c−k for c > k

17/60
School of Risk and Actuarial Studies
Selected parametric claims size distributions
Parametric models for Y

Pareto Distribution Good one ! for large loss model

We shall write Y ∼Pareto(θ, α) if density has the form

α  y −(α+1)
g (y ) = , for y ≥ θ; θ, α > 0.
θ θ
α
Mean: E (Y ) = θ α−1 ,α>1
Variance: Var (Y ) = θ2 (α−1)α2 (α−2) , α > 2
α−2 1/2
Skewness: ςY = 2(1+α)

α−3 α ,α>3
Mgf: does not exist for t > 0
Translated Pareto: distribution of Y = Y − β [see Yellow
Book]

18/60
School of Risk and Actuarial Studies
Model selection
Graphical approaches
After fitting the model and found parameter by MLE

Graphical approaches Empirical : Histogram, ecdf, quantity


v.s
Model : Parametric shape

For judging quality of model, do some graphical comparisons:


good for general shape
histogram vs. fitted parametric density function
empirical CDF vs fitted parametric CDF Better for checking the Tail
probability-probability (P-P) plot - theoretical vs empirical
cumulative probabilities
quantile-quantile (Q-Q) plot - theoretical vs sample quantiles
Let the
 (theoretical)
 fitted parametric distribution be denoted
by G x; θb .
emperical
parametric

19/60
School of Risk and Actuarial Studies
Model selection
Graphical approaches

P-P plot

To construct the P-P plot:


Rank data order the observed data from smallest to largest:
x(1) , x(2) , ..., x(n) .
calculate the
 theoretical
 CDF at each of the observed data
points: G x(i) ; θb . continuity correction

i − 0.5  
for i = 1, 2, ..., n, plot the points against G x(i) ; θb .
n
empirical theoretical

20/60
School of Risk and Actuarial Studies
Model selection
Graphical approaches

QQplot is better for assessing tail , PPplot is better for assessing the body
Q-Q plot

To construct the Q-Q plot:


Rank
order the observed data from smallest to largest:
x(1) , x(2) , ..., x(n) .
for i 
= 1, 2, ..., n,calculate the theoretical quantiles:
i − 0.5 b Theoretical
G −1 ;θ .
n
for i = 1, 2, ..., n, plot the points x(i) against
 
−1 i − 0.5 b Empirical
G ;θ .
n
These constructions hold only for the case where you have no
censoring/truncation.

21/60
School of Risk and Actuarial Studies
Model selection
Hypothesis tests

Hypothesis tests

Test the null H0 : data came from population with the


specified model against Ha : the data did not come from such
a population.
Some commonly used tests:
 
Kolmogorov-Smirnoff: K .S. = sup Gb (y ) − G y ; θb

y

R [Gb(y )−G (y ;θb)]2  


Anderson-Darling: A.D. = n g y ; θb dy
G (y ;θ )[1−G (y ;θ )]
b b

(observed−expected)2
χ2 goodness-of-fit: χ2 =
P
j expected

22/60
School of Risk and Actuarial Studies
Model selection
Hypothesis tests

Distance Based -- easier to cal but not accurate


Kolmogorov-Smirnoff test Only find the biggest Gap, ignoring the others.
.. -> misleading
 
Kolmogorov-Smirnoff test: K .S. = sup Gb (x) − G x; θb ,

x
Emprical Theoretical
where
Gb (y ) is the empirical distribution
G (y ; θ) is the assumed theoretical distribution in the null
hypothesis
G (y ; θ) is assumed to be (must be) continuous
θ̂ is the maximum likelihood estimate for θ under the null
max gap > critical
hypothesis.
There are tabulated tables for the critical values. There are
several variations of these tables in the literature that use
somewhat different scalings for the K-S test statistic and
critical regions.
The test does not work for grouped data
23/60
School of Risk and Actuarial Studies
Model selection
Hypothesis tests

Anderson-Darling test
Weighted Average of square difference

R [Gb(x)−G (x;θb)]2  
Anderson-Darling test: A.D. = n g x; θb dx
G (x;θb)[1−G (x;θb)]

where n is the sample size. Good: across the whole distribution

The critical values for the Anderson-Darling test are


dependent on the specific distribution that is being tested.
There are tabulated values and formulas for a few specific
distributions.
The theoretical distribution is assumed to be (must be)
continuous
The test does not work for grouped data.

Bad : contains integral


24/60
School of Risk and Actuarial Studies
Model selection
Hypothesis tests

Theoretical
χ2 goodness-of-fit test
Break down the whole range into k subintervals:
Pr( Y belongs to (0,y1)
c0 < c1 < · · · < ck = ∞ x
(E −O ) 2
χ2 goodness-of-fit test: χ2 = kj=1 j Ej j Number of Obs
P
= expected number
Let pˆj = G (cj ; θ̂) − G (cj−1 ; θ̂). in the bin

Then, the number of expected observations in the interval


(cj−1 , cj ] assuming that the hypothesized model is true:
Ej = np̂j (Here, n is the sample size)
how to choose number of bins ?
Let pj = Ĝ (cj ) − Ĝ (cj−1 ).
The number of observations in the interval (cj−1 , cj ] :
1. equal size of obs in each bin
Oj = npj 2. equal width for each bin
(should have more than 5 size in
The statistic has chi-square distribution withthethe
bin)degree of freedom

25/60 equal to: k − 1 − number of parameters estimated


School of Risk and Actuarial Studies
Model selection
Hypothesis tests

How these tests are used AD Chi sqr


KS
Dist 1
Dist 2

Besides testing whether data came from specified model or


not, generally we would prefer models with:
lowest K-S test statistic
lowest A-D test statistic
lowest χ2 goodness-of-fit test statistic (or equivalently highest
p-value)
highest value of the likelihood function at the maximum
Perform formal statistical test, or use as ‘horse race’

26/60
School of Risk and Actuarial Studies
Model selection
Hypothesis tests
AD & KS do not adjust to # of par and obs
Comparison of these tests But chi sqr does

K-S and A-D tests are quite similar - both look at the
difference between the empirical and model distribution
functions.
K-S in absolute value, A-D in squared difference.
But A-D is weighted average, with more emphasis on good fit
in the tails than in the middle; K-S puts no such emphasis.
For K-S and A-D tests, no adjustments are made to account
for increase in the number of parameters, nor sample size.
Result: more complex models often will fare better on these
tests.
For K-S and A-D tests, no adjustments are made to account
for sample size. Result: large sample size increases probability
-> if par increase, can't
of rejecting all models compare by KS and AD
The χ2 test adjusts the degrees of freedom for
27/60
increases in the number of parameters.
School of Risk and Actuarial Studies
Model selection
Hypothesis tests

AIC & BIC to pick # par


Information criteria KS, AD, Chi to pick within # par

Within an MLE framework


Akaike Information Criterion (AIC)
(i)
AIC(i) = −2`Y + 2d (i) ,

where d (i) denotes the number of estimated parameters in gi


Bayesian Information Criterion (BIC)
(i)
BIC(i) = −2`Y + log(n)d (i) .

This is −2SBC, where SBC is Schwarz Bayesian Criterion


(later in the slides)

28/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Usual policy transformations

Deductible and Policy Limit For Next week

One way to control the cost (and variability) of individual claim


losses is to introduce deductibles and policy limits.
Deductible d: the insurer starts paying claim amounts above
the deductible d
Limit M: the insurer pays up to the limit M.
If we denote the damage random variable by D, then if a claim
occurs the insurer is liable for
M
Y = min [max (D − d, 0) , M] .
d

29/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance
1. Reduce volatitty
2. Reduce overall exposure
Reinsurance 3. Reduce Capital, lower cap, higher ROE

risk transfer from an insurer (the direct writer) to a reinsurer:


swap of deterministic (premium) against random (loss)
the risk that the insurer keeps is called the retention
There are different types of reinsurance:
proportional
quota share: the proportion is the same for all risks
surplus: the proportion can vary from risk to risk
nonproportional
(individual) excess of loss: on each individual loss (Xi ) Individual Claim
stop loss: on the aggregate loss (S) Total Claims of Earthquake event
cheap (reinsurance premium is the expected value), or
non cheap (reinsurance premium is loaded)
CAT bonds and the like... (ART)
30/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Proportional reinsurance

The retained proportion α defines who pays what:


the insurer pays Y = αX
the reinsurer pays Z = (1 − α)X
This is nothing else but a change of scale and we have

µY = αµX , σY2 = α2 σX2 , γY = γX .

In some cases it suffices to adapt the scale parameter. Example:


If X is exponential with parameter β

Pr[Y ≤ y ] = Pr[αX ≤ y ] = Pr[X ≤ y /α] = 1 − e −βy /α


E(Y) = a / b
and thus Y is exponential with parameter β/α.
31/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Non-proportional reinsurance

Basic arrangements:
the reinsurer pays the excess over a retention (excess point) d
the insurer pays Y = min(X , d)
the reinsurer pays Z = (X − d)+
the reinsurer limits his payments to an amount M. In that
case
the insurer pays Y = min(X , d) + (X − M − d)+
the reinsurer pays Z = min {(X − d)+ , M}

two reinsurers

32/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Example

Consider a life insurance company with 16,000 1-year term life


insurance policies. The associated insured amounts are:
Benefit (10000’s) # policies
1 8000
2 3500
3 2500
5 1500 Reinsured
10 500
The probability of death (q) for each of the 16,000 lives is 0.02.
This company has an EoL reinsurance contract with retention limit
30,000 at a cost of 0.025 per dollar of coverage.
What is the approximate probability (using CLT) that the total
cost will exceed 8,250,000?
33/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

The portfolio of retained business is given by


k retained benefit bk (10000’s) # policies nk
1 1 8000
2 2 3500
3 3 4500
Now
X3 X 3
E [S] = nk E [Xk ] = nk bk qk
k=1 k=1
= 8000 · 1 · 0.02 + 3500 · 2 · 0.02 + 4500 · 3 · 0.02
= 570, and
X3 3
X
Var [S] = nk Var [Xk ] = nk bk2 qk (1 − qk )
k=1 k=1
= 8000 · 1 · 0.02 · 0.98 + 3500 · 22 · 0.02 · 0.98
2

+4500 · 32 · 0.02 · 0.98


34/60
= 1225
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

The reinsurance cost is

[(5 − 3) · 1500 + (10 − 3) · 500] · 0.025 = 162.5.

Thus, the desired probability becomes


" #
S − E [S] 662.5 − E [S]
Pr[S + 162.5 > 825] = Pr p > p
Var (S) Var (S)
 
662.5 − 570
≈ Pr Z > √
1225
= Pr[Z > 2.643] = 0.0041.

Without reinsurance, exp/var was 700/2587.20 so the probability


≈ Pr[Z > 2.458] which is higher even though it is not cheap
reinsurance.
35/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

A useful identity

Note that
min(X , c) = X − (X − c)+
and thus
E [min(X , c)] = E [X ] − E [(X − c)+ ].
The amount E [(X − c)+ ] = Pr[X > c]e(c)
is commonly called ”stop loss premium” with retention c.
is identical to the expected payoff of a call with strike price c,
and thus results from financial mathematics can sometimes be
directly used (and vice versa).

36/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Stop loss premiums

Let
E [(X − d)+ ] = Pd . d = retension ratio

Then we have (for positive rv’s)


 R∞
Pd = P d [1 − FX (x)] dx if X is continuous

d [1 − FX (x)] if X is discrete = pdf

37/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Example

Calculate Pd if X is Exponential with parameter β.

38/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Stop loss premiums - recursive formulas in the discrete case

First moment:
if d is an integer

Pd+1 = Pd − [1 − FX (d)] with P0 = E [X ]

if d is not an integer

Pd = Pbdc − (d − bdc)[1 − FX (bdc)].

Second moment Pd2 = E [(X − d)2+ ]:


2
Pd+1 = Pd2 − 2Pd + [1 − FX (d)] with P02 = E [X 2 ].

39/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Numerical example

For the distribution F1+2+3 derived earlier in the lecture we have


E [S] = 4 = 128/32 and E [S 2 ] = 19.5 = 624/32 and thus
d f1+2+3 (d) F1+2+3 (d) Pd Pd2 Var ((X − d)+ )
0 1/32 1/32 128/32 624/32 3.500
1 2/32 3/32 97/32 399/32 3.280
2 4/32 7/32 68/32 234/32 2.797
3 6/32 13/32 43/32 123/32 2.038
4 6/32 19/32 24/32 56/32 1.188
5 6/32 25/32 11/32 21/32 0.538
6 4/32 29/32 4/32 6/32 0.172
7 2/32 31/32 1/32 1/32 0.030
8 1/32 32/32 0 0 0.000

P2.6 = P2 − 0.6 · (1 − F1+2+3 (2)) = 53/32.

40/60
School of Risk and Actuarial Studies
Calculating within layers for claim sizes
Reinsurance

Leverage effect of claims inflation

Choose a fixed deductible d > 0 and assume that the claim at time
0 is given by Y0 . Assume that there is a deterministic inflation
index i > 0 such that the claim at time 1 can be represented by
Y1 = (1 + i)Y0 .We have what you pay is less than should

E [(Y1 − d)+ ] ≥ (1 + i)E [(Y0 − d)+ ].

When tax brackets are not adapted, this leads to ‘cold progression
of taxes’...

41/60
School of Risk and Actuarial Studies
Case studies
Illustrative datasets

Illustrative datasets

We analyse two different loss datasets:


Valdez-DataSetA - this dataset, with 1,000 observations,
consists of insurance claim amounts observed over a fixed
period of time. An example of data consisting of complete/full
information.
Klugman-DataSetB-modified - this dataset was analysed by
Klugman and Rioux (2006) which consists of
truncated/censored liability claims data. Data has 100
observations, with each consisting of the excess/deductible,
the observed claim amount, and an indicator whether policy
limit has been reached.

42/60
School of Risk and Actuarial Studies
Case studies
Data set A

Summary statistics - Valdez-DataSetA

Count 1,000
Mean 1,244.32
Standard deviation 650.32
Variance 422,956.79
Minimum 120
More informative in 25,
75 th
25th percentile 783.75
IQR Median 1,114.50
75th percentile 1,586.50
Maximum 5,799
Skewness 1.71
Kurtosis 5.88

43/60
School of Risk and Actuarial Studies
Case studies
Data set A

Figure 1: preliminary plots


Log gives better fit

Histogram of claimj Histogram of log(claimj)

8 e−04

0.8
Kernel smoothing

0.6
Density

Density
4 e−04

0.4
0.2
0 e+00

0.0
0 1000 2000 3000 4000 5000 6000 5 6 7 8

claimj log(claimj)

Normal Q−Q Plot Normal Q−Q Plot


5000

8
Sample Quantiles

Sample Quantiles

7
3000

6
1000

5
0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

Theoretical Quantiles Theoretical Quantiles

44/60
School of Risk and Actuarial Studies
Case studies
Data set A

Revision: The (expected) Fisher information

Score (or gradient) vector consists of first derivatives

∂` (θ; x) 0
 
∂` (θ; x)
S (θ; x) = , ...,
∂θ1 ∂θm
 
so that the MLE satisfies F.O.C. S θ; b x = 0 = (0, ..., 0)0 .

The m × m Hessian matrix for ` (θ; x) is defined by


 2
∂ 2 ` (θ; x)

∂ ` (θ; x)
 ∂θ2 ···
2 1 ∂θ1 ∂θm 
∂ ` (θ; x) 

. . .

H (θ; x) = = . . . . .

∂θ∂θ0  2 . 2
. 
 ∂ ` (θ; x) ∂ ` (θ; x) 
··· 2
∂θm ∂θ1 ∂θm Standard Error
45/60
School of Risk and Actuarial Studies
Case studies
Data set A

- continued

 
This Hessian is used to estimate Var θb .
Minus the expected value of this is called the (expected)
Fisher information.
It is well-known
  that a consistent estimator for the covariance
matrix Var θb is given by the inverse of the negative of this
Hessian matrix:
    h   i−1
Var θb ≥ Var
c θb = −E[H θ; bx ] .

The square root of the diagonal elements of this covariance


estimate give the standard errors of the MLE estimates.

46/60
School of Risk and Actuarial Studies
Case studies
Data set A

Fitting the log-normal distribution - Valdez-DataSetA

Use method of moments for initial estimates, or sometimes


exploit properties of the distribution being fitted.
Parameter estimates with standard errors:

Parameter Estimate (MLE) Standard Error

µ 7.0031 0.0158
σ 0.5010 0.0112

47/60
School of Risk and Actuarial Studies
Case studies
Data set A

Figure 2: plots to assess quality - Valdez-DataSetA

Histogram of claimj Empirical CDF

1.0
8 e−04

0.8
0.6
Density

4 e−04

cdf

0.4
0.2
0 e+00

0.0
0 1000 3000 5000 0 1000 3000 5000

claimj claimj

Q−Q plot P−P plot

1.0
6000

0.8
sample probability
sample quantiles

4000

0.6
0.4
2000

0.2
0.0
0

0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0

theoretical quantiles theoretical probability

48/60
School of Risk and Actuarial Studies
Case studies
Data set A

Comparing the various fitted models - Valdez-DataSetA

Two other distributions are fitted to the data: the Gamma


and the Burr XII models.
The Pareto model was also fitted, but this did not produce
reasonable estimation results. So it was taken off the
comparison.
Below is a summary of the various test statistics resulting
from the three distribution models considered. # par into account
Model MLE’s Loglikelihood r K-S A-D χ2 (p-value) SBC

Lognormal µ
b = 7.0031, σ
b = 0.5010 -7730.832 2 0.0224 1.0001 0.0806 -7737.740
Gamma α
b = 4.2167, βb = 0.0034 -7742.027 2 0.0393 1.0015 <0.0001 -7748.935
Burr XII α
b = 1.3879, γ
b = 3.1533, θb = 1284.7754 -7737.277 3 0.0382 0.9996 0.0027 -7747.638

KS, AD can only compare LN, Gamma because same par

49/60
School of Risk and Actuarial Studies
Case studies
Data set A

Figure 3: fitting the gamma model - Valdez-DataSetA

Histogram of claimj Empirical CDF

1.0
8 e−04

0.8
0.6
Density

4 e−04

cdf

0.4
0.2
0 e+00

0.0
0 1000 3000 5000 0 1000 3000 5000

claimj claimj

Q−Q plot P−P plot

1.0
6000

0.8
sample probability
sample quantiles

4000

0.6
0.4
2000

0.2
0.0
0

0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0

theoretical quantiles theoretical probability

50/60
School of Risk and Actuarial Studies
Case studies
Data set A

Figure 4: fitting the Burr XII model - Valdez-DataSetA

Histogram of claimj Empirical CDF

1.0
8 e−04

0.8
0.6
Density

4 e−04

cdf

0.4
0.2
0 e+00

0.0
0 1000 3000 5000 0 1000 3000 5000

claimj claimj

Q−Q plot P−P plot

1.0
6000

0.8
sample probability
sample quantiles

4000

0.6
0.4
2000

0.2
0.0
0

0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0

theoretical quantiles theoretical probability

51/60
School of Risk and Actuarial Studies
Case studies
Data set B

Summ. stats - Klugman-DataSetB-modified

statistic not censored censored


Count 75 25
Mean 1,276.47 2,084.00
Standard deviation 823.67 1,174.09
Variance 678,427.20 1,378,483.33
Minimum 182.00 1,100.00
25th percentile 739.50 1,100.00
Median 1,131.00 1,500.00
75th percentile 1,560.00 3,100.00
Maximum 4,510.00 5,500.00
Skewness 1.76 1.08
Kurtosis 3.91 0.38

52/60
School of Risk and Actuarial Studies
Case studies
Data set B

Representing the observed data

For our purposes, we shall represent our set of observations as

(tj , xj , δj )

where

tj is the left truncation point;


xj is the claim value that produced the data point; and
δj is indicator whether limit has been reached.

For examples:
Start from 50 to 250 , but not reaching 250
(50, 250, 0) Start from 100 to 1100 but reaching the limit
(100, 1100, 1)

53/60
School of Risk and Actuarial Studies
Case studies
Data set B

Maximum likelihood contributions See annotated slides

Different form of data, thus not usual MLE format


1−δj Y 
Y  1 − F (xj ; θ) δj

f (xj ; θ)
L (θ; x) = · .
j 1 − F (tj ; θ) j 1 − F (tj ; θ)

The contribution to the likelihood function for a data point


where the limit has not been reached is given by
 1−δj
f (xj )
.
1 − F (tj )
The contribution to the likelihood function for a data point
where the limit has been reached is given by
1 − F (xj ) δj
 
.
1 − F (tj )
Note here that the policy limit if reached
would be equal to xj − tj .
54/60
School of Risk and Actuarial Studies
Case studies
Data set B

Parameter estimates

Here we consider only two distribution models: the Lognormal and


the Gamma distributions.
Model Parameters Estimates Standard Error Loglikelihood SBC

Lognormal µ 7.1628 0.0994 -626.2581 -630.8633


σ 0.8589 0.0898
Gamma α 1.4398 0.3499 -627.3484 -631.9535
β 0.0009 0.0002

55/60
School of Risk and Actuarial Studies
Case studies
Data set B

Figure 5: fitting the lognormal model

Histogram of claimj Kaplan−Meier CDF

8 e−04

0.8
0.6
Density

4 e−04

cdf

0.4
0.2
0 e+00

0.0
0 1000 3000 5000 0 1000 2000 3000 4000 5000

claimj claimj

Q−Q plot P−P plot


6000

0.8
sample probability
sample quantiles

4000

0.6
0.4
2000

0.2
0.0
0

0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8

theoretical quantiles theoretical probability

56/60
School of Risk and Actuarial Studies
Case studies
Data set B

Figure 6: fitting the gamma model

Histogram of claimj Kaplan−Meier CDF

8 e−04

0.8
0.6
Density

4 e−04

cdf

0.4
0.2
0 e+00

0.0
0 1000 3000 5000 0 1000 2000 3000 4000 5000

claimj claimj

Q−Q plot P−P plot


6000

0.8
sample probability
sample quantiles

4000

0.6
0.4
2000

0.2
0.0
0

0 1000 3000 5000 0.0 0.2 0.4 0.6 0.8 1.0

theoretical quantiles theoretical probability

57/60
School of Risk and Actuarial Studies
Case studies
Data set B

Test results for various models

Table 4 of Klugman/Rioux (2006) paper:

Model Loglikelihood r K-S A-D χ2 SBC

Exponential -628.23 1 0.9399 1.2041 0.1571 -630.49


Lognormal -626.26 2 0.9048 0.5587 0.0685 -630.79
Gamma -627.35 2 0.9993 0.8189 0.2275 -631.88
Lognormal/exp -623.77 4 0.5631 0.2463 0.5576 -632.83
Gamma/exp -623.64 4 0.5577 0.2645 0.5470 -632.71
Lognormal/exp/exp -623.39 6 0.4345 0.1411 0.3035 -636.98
Gamma/exp/exp -623.26 6 0.4497 0.1275 0.3122 -636.86

58/60
School of Risk and Actuarial Studies
Case studies
Data set B

Comments on Klugman-DataSetB-Test results for various


models

A person favors parsimony will pick either the exponential or


the lognormal model (they are favored by SBC)
The two mixtures with two exponential distributions may also
be chosen because they clearly maximize the likelihood.
The lognormal mixture has a better K-S statistic, while the
gamma mixture has the best A-D statistic. This indicates that
the latter may do better in the tails.
···

59/60
School of Risk and Actuarial Studies
Case studies
Data set B

Selecting a model - summary

Klugman and Rioux (2006) suggests the following procedures


for selecting a model for (insurance) loss data:
Do preliminary investigation (e.g. summary statistics,
histogram) of data.
Construct empirical distribution (use Kaplan-Meier for
truncated/censored data).
Construct pictures such as q-q and p-p plots.
Conduct hypothesis tests: Kolmogorov-Smirnoff test,
Anderson-Darling test, and the χ2 goodness-of-fit test.
Calculate the SBC criterion for each model being considered.
Some other considerations when choosing a model:
Keep it simple if at all possible.
Restrict the universe of potential models.

60/60

You might also like