Professional Documents
Culture Documents
To cite this article: Xiaoqiang Cai, Limin Wen, Xianyi Wu & Xian Zhou (2015)
Credibility Estimation of Distribution Functions with Applications to Experience
Rating in General Insurance, North American Actuarial Journal, 19:4, 311-335, DOI:
10.1080/10920277.2015.1057649
Download by: [University of Nairobi Library] Date: 03 November 2016, At: 05:53
North American Actuarial Journal, 19(4), 311335, 2015
Copyright
C Society of Actuaries
This article presents a new credibility estimation of the probability distributions of risks under Bayes settings in a completely
nonparametric framework. In contrast to the Fergusons Bayesian nonparametric method, it does not need to specify a mathematical
form of the prior distribution (such as a Dirichlet process). We then show the applications of the method in general insurance premium
pricing, a procedure commonly known as experience rating, which utilizes the insureds claim experience to calculate a proper premium
under a given premium principle (referred to as a risk measure). As this method estimates the probability distributions of losses, not
just the means and variances, it provides a unified nonparametric framework to experience rating for arbitrary premium principles. This
encompasses the advantages of the well-known Buhlmanns and Fergusons approaches, while it overcomes their drawbacks. We first
establish a linear Bayes method and prove its strong consistency in nonparametric settings that require only knowledge of the first two
moments of the loss distributions considered as a stochastic process. Then an empirical Bayes method is developed for the more general
situation where a portfolio of risks is observed but no knowledge is available or assumed on their loss and prior distributions, including
their moments. It is shown to be asymptotically optimal. The performance of our estimates in comparison with traditional methods is
also evaluated through theoretical analysis and numerical studies, which show that our approach produces premium estimates close to
the optima.
1. INTRODUCTION
Pricing or measuring risks is one of the central tasks of financial enterprises and regulators in risk management. Numerous
risk measures have been proposed for this purpose, including value-at-risk, conditional value-at-risk, coherent risk measures, and
distortion risk measures, to name just a few; see, for example, Dhaene et al. (2006), Natarajan et al. (2009), Szego (2002), Wu
and Zhou (2006), and the references therein. In the context of general insurance, to compensate insured losses in property, health,
business, employment, etc., pricing risks is carried out by premium calculation principles or premium principle for short, which are
the translations of risk measures in insurance markets. Excellent reviews of premium principles can be found in Kaas et al. (2001,
chapter 5), Sundt (1999), and Young (2004), among others. In this article, the terms risk measure and premium principle are
alternately used to indicate a rule H (X) that assigns a premium to a risk X in terms of a functional of its distribution function.
In practice, it is only the first step in the process of risk pricing to determine an appropriate premium principle, because the
involved distributions are generally unknown so that H (X) can be estimated only by using the insureds claim experience together
with certain effective statistical methods. These procedures form the so-called experience rating. We are concerned with solutions
to this problem under a Bayesian setting: The decumulative distribution function Pr(X > x) (abbreviated as ddf hereafter) of a
risk X is identified by an unknown and unobservable parameter (vector) , written formally as S(x, ) = Pr(X > x| ), and is a
random variable. The distribution of , denoted by ( ) (specified or unspecified), is referred to as a prior distribution in statistics
and a structural function in actuarial science. The majority of the literature has mainly focused on the situations where S(x, ) is
fully specified given and ( ) is known (or assumed); thus the problems can be solved with the standard parametric Bayesian
methodology through posterior update. Relevant work in this area includes Heilmann (1989), Klugman (1992), Makov et al. (1996),
Address correspondence to Xianyi Wu, Department of Statistics and Actuarial Science, East China Normal University, 200241 Shanghai,
China. E-mail: xywu@stat.ecnu.edu.cn
311
312 X. CAI ET AL.
Pai (1997), Schmidt (1998), and Gomez et al. (2000, 2006). Although the applications of the parametric Bayesian methodology
have been extensively investigated for general insurance, the reality is that the knowledge of the mechanism underlying the
contingent loss is generally insufficient to specify ( ), making these applications impractical. To deal with such situations, one
approach is to allow some unknown parameters in ( ) but retain an assumed mathematical form of ( ). Then the risk pricing
can be carried out by means of empirical Bayes analysis introduced by Robbins (1955, 1964).
In many insurance practices, however, even a mathematical formula of S(x, ) is not available due to the scarcity of information.
Under this circumstance, the analysis can be done only in a distribution-free or nonparametric basis. The best-known solution in the
community of actuarial science so far is Buhlmanns approach to the net premium principle (known as credibility theory), whose
optimality is demonstrated by Buhlmann (1967). Under the net premium principle, Buhlmanns credibility premium of a future
claim Xn+1 can be simply expressed as a weighted average between the empirical individual mean (indicating the information
delivered by historical data of the risk itself) and the collective mean (indicating the aggregated information obtained from all
possible insureds):
((x, )) n
S(x, ) = + Sn (x), (1.2)
(R) + n (R) + n
where Sn (x) = n1 ni=1 I (Xi > x). The corresponding empirical Bayes versions when the prior Dirichlet process contains a few
superparameters are discussed in Zehnwirth (1981). Then the experience rating can be carried out by inserting the estimated ddf
of X into any premium principle.
For Buhlmann-type solutions, unfortunately, as noted by Buhlmann (1970) and Gerber (1980), it is not easy to directly transplant
Buhlmanns method to other premium principles, so that almost all contributions in this area have been largely limited to the net
premium principle. The few exceptions are, chronologically, the variance premium principle (Buhlmann, 1970, chapter 4), Esscher
premium principle and a Buhlmann-type credibility estimation of the variance premium principle (Gerber 1980, Goovaerts et al.
1990 and Pan et al. 2008), and the general weighted loss function premium principle (due to Furman and Zitikis 2008) studied
by Wen et al. (2009). These exceptions prove that it is not feasible to directly apply the idea of Buhlmann (1967) to experience
ratemaking under arbitrary premium principles. On the other hand, for Ferguson-type solutions, while the approach provides a
unified solution for all premium principles, it requires the assumption of a Dirichlet prior and thus has an apparent drawback: like
any other prior in Bayesian methodology, the Dirichlet prior is proposed mainly due to mathematical convenience but is hardly
justifiable in practice. Without the Dirichlet assumption, the estimator in (1.2) would lose its justification, and the risk premium
H (Xn+1 ) by Fergusons approach would not be credible when the true prior is far from the Dirichlet.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 313
In this article we introduce a new distribution-free approach for experience rating under arbitrary premium principles without
precise specification of the prior distribution. It has two direct advantages: (1) truly distribution-free settings as in the Buhlmanns
credibility theory and (2) generating experience rating for arbitrary premium principles as achieved by Fergusons Bayesian
nonparametric method. The main idea is to first derive an estimate of S(x, ) by minimizing the L2 -distance and then embed
the estimated ddf, such as S(x, ), into the premium principle functional to obtain an estimate of the corresponding premium,
(X, ) = H (
H S(x, )). In addition, compared with Jewell (1974), who estimated S(x, ) for every fixed point x so that the
estimators are not necessarily ddfs, and hence may not be feasible to produce empirical premiums by plugging the estimators of
the distributions into mathematical formulae of premium principles, our estimates work smoothly for empirical ratemaking by
using a ddf to estimate S(x, ).
Specifically, we consider the following two models:
(1) Model I: The first two moments of S(x, ) with respect to the prior can be specified. In this case the data X1 , X2 , . . . Xn , . . .
are (conditionally) i.i.d. copies of X given . The investigation of Model I brings insight into and motivates the estimation
methods of the more practically meaningful Model II next.
(2) Model II: This is a general model in which we do not need to specify any moments of S(x, ). Suppose that we have a
portfolio of risks Xi , i = 1, 2, . . . , K, where each Xi is identified with a parameter (vector) i and contributes a series of data
Xi1 , Xi2 , . . . Xini , . . . , which are i.i.d. copies of Xi , given i . The purpose is again to estimate the distributions and then assign
proper premiums to the future claims Xi,ni +1 , i = 1, 2, . . . , K under a certain premium principle.
This case belongs to the framework of empirical Bayes methods first introduced by Robbins (1955, 1964), then applied to
credibility theory by Norberg (1980) under the net premium principle, and Zehnwirth (1981) for any premium principle under
Dirichlet priors.
Our work leads to a new and purely nonparametric estimation method for the ddf under the framework of Bayesian methodology.
To overcome the restriction of Dirichlet priors, we develop an approach in a quite different way from the so-called Bayesian
nonparametrics initiated by Ferguson (1973). It turns out, however, that our estimation happens to coincide with that of Ferguson
(1973) when the prior distribution is Dirichlet; see Remark 2.2 in Section 2 for more details. This is a surprising by-product and
shows that our results actually generalize those of Ferguson (1973).
The remainder of the article is organized as follows. In Section 2, starting with a naive idea of using Buhlmanns method
considered by Jewell (1974), we show by a counterexample that Jewells credibility does not estimate a ddf by a function that is
itself a ddf, and then we develop first the criterion for deriving the linear Bayes estimation and then discuss the optimal estimation
of S(x, ) under Models I and II together with the asymptotic properties of the estimators. Section 3 treats the experience rating
under a number of well-known premium principles. Concluding remarks are given in Section 4. To smooth the flow of presentation,
most of the technical proofs and auxiliary materials are relegated to appendices.
where I{X>x} is the indicator of event {X > x} and E and Var indicate that the expectations are computed with respect to the
distribution of . We also denote ( ) = E[X| ], 2 = E [Var(X| )] and 2 = Var (E[X| ]). The existence of the expectations
in (2.1) is obvious.
2.1. The Performance Measure to Be Optimized
Following Jewell (1974), because S(x, ) = E [I (X > x)| ] for a fixed x, the idea of Buhlmann (1967) suggested a credibility
estimate of S(x, ) given by
and inserting Z(x) into (2.2), S(x, ) can be more precisely rewritten as
n+1
(n i + 1)E [S 2 (x, )] (n i)S 2 (x) S0 (x)E [S 2 (x, )]
S(x, ) = 0
I[X(i1) ,X(i) ) (x), (2.4)
i=1
(n 1)E [S 2 (x, )] nS02 (x) + S0 (x)
where = X(0) < X(1) X(2) X(n) < X(n+1) = are the order statistics of the sample.
This S (x, ) , however, is not generally monotone in x and thus not suitable to be a proper estimate of the ddf, as explicitly
shown in the counterexample below.
Counterexample: Consider the situation where X is nonnegative with E [S 2 (x, )] = q[(1 x)+ ]2 and S0 (x) = q(1 x)+ for
all x 0 and a known constant q (0, 1), where (a)+ = max(a, 0). One example is that takes only values 0 (with probability
1 q) and 1 (with probability q) and X degenerates at 0 for = 0 and S(x, 1) = (1 x)+ for all x 0, where a+ = max(a, 0),
so that 02 (x) = qx(1 x) and 02 (x) = q(1 q)(1 x)2 for all x [0, 1] and zero otherwise. For x (X(n) , 1), the credibility
estimate of S(x, ) is
qx(1 x)
S(x, ) = (1 Z(x))S0 (x) = ,
n(1 q) (n(1 q) 1)x
It is easy to check that dS(x, ) /dx is strictly positive at every x (0, b) and negative at every x (b, 1], where b =
n(1 q)/( n(1 q) + 1) (0, 1). Consequently, as long as X(n) < b, S(x, ) is increasing in x (X(n) , b) and decreasing in
x (b, 1). Thus S(x, ) is not monotone.
The lack of monotonicity in the naive credibility estimate S(x, ) makes it difficult to, for example,
give an intuitive interpretation for S (x, ) and
compute premiums under S (x, ) .
This difficulty is obviously due to the dependence of the credibility factor Z(x) on x. A natural and tractable remedy is to restrict
the credibility factor Z to be a constant free of x and construct a convex set of ddfs containing the empirical distributions, and
then find an optimal estimate S(x,
) in that convex set. In this article, we propose to obtain such an optimal estimate S(x, ) by
seeking a function of the form
n
0 (x) + i I (Xi > x) (2.5)
i=1
where 0 (x) is a nonincreasing function of x, independent of the claims history, and 1 , . . . , n are real-valued decision variables of
the optimization problem. Though initially it appears that we need to require the estimating function in (2.5) to be a ddf, this turns
out to be unnecessary because the solution to (2.6) meets this condition automatically and 0 (x) is proportional to the marginal
ddf S0 (x) of X; see Theorem 2.1 below.
Remark 2.1. If X has a finite mean, it can be easily checked that the integration in (2.6) as well as 02 (x) dx and 02 (x) dx
are finite and all the theory below is valid. Otherwise, if the integral in (2.6) may be infinite, so that the criterion (2.6) fails, an
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 315
easy remedy is to use a probability distribution function W (x) and replace dx with dW (x). Then all the results below remain valid
under W (x).
Theorem 2.1. The credibility estimator of the ddf S(x, ), which minimizes (2.6), is given by
S(x, ) = ZSn (x) + (1 Z)S0 (x), (2.7)
where
n02
Z= with 02 = 02 (x) dx and 02 = 02 (x) dx, (2.8)
n0 + 02
2
which is referred to as credibility factor as well. Moreover, the mean integrated squared error for the optimal estimate is
+
2 02 02
E
S(x, ) S(x, ) dx = . (2.9)
02 + n02
Proof. By temporarily writing Yi = I (Xi > x) and using the notation in (2.1), the integrand in (2.6) can be computed by
2
2
n
n
n
E S(x, ) 0 (x) i Yi = Var S(x, ) i Yi + 1 i S0 (x) 0 (x)
i=1 i=1 i=1
n
n 2
n 2
= 2i 02 (x) + 1 i 02 (x) + 1 i S0 (x) 0 (x)
i=1 i=1 i=1
and thus is minimized with respect to 0 (x) at
0 (x) = 1 ni=1 i S0 (x). The corresponding minimum is
2
n
n
n
Var S(x, ) i Yi = 2i 02 (x) + 1 i 02 (x).
i=1 i=1 i=1
Integrating it with respect to x and then taking the minimization procedure with respect to i leads to j = 02 /(n02 + 02 ),
j = 1, 2, . . . , n, and the final minimum 02 02 /(02 + ni 02 ). This completes the proof.
((R))
( |x) = (x)1 (1 )(R)(x)1 ,
((x))((R) (x))
Consequently, by (2.3),
Inserting these into (2.8) gives the credibility factor Z = n02 /(n02 + 02 ) = n/((R) + n). Thus the estimate in (2.7) is the same
as (1.2). In this aspect, Theorem 2.1 acts as a linearized version of Fergusons theory with weaker prior assumptions: Theorem 2.1
does not require the Dirichlet prior as in Ferguson (1973).
Since Z (0, 1), as a weighted sum of the empirical ddf Sn (x) and the marginal ddf S0 (x) of X (the so-called collective ddf),
the credibility estimator (2.7) is clearly a ddf. In addition, Z 1 as n and Z 0 as n 0, which allows the classical
credibility interpretation: More data lead to more credible empirical ddf Sn (x), where n = 0 indicates the extreme situation where
no sample is observed. Furthermore, we have the following theorem on the strong consistency of the estimator.
Proof. This follows from the well-known Glivenko-Cantelli Theorem on empirical distributions:
sup |S(x, )
S(x, )| Z sup |S(x, ) Sn (x)| + (1 Z) sup |S(x, ) S0 (x)| 0
x x x
Assumption 2.1
1. Conditional on i , the random variables Xij (j = 1, 2, . . . , ni ) are i.i.d. Xi with common unknown ddf and moments:
S(x, i ) = Pr Xij > x|i , 2 (x, i ) = Var I(Xij >x ) |i , i = 1, 2, . . . , K, j = 1, 2, . . . , ni .
2. The random variables 1 , . . . , K are K i.i.d. random variables with a common but unknown prior distribution ( ), so that
S0 (x) = E [S(x, )], 02 (x) = Var (S(x, )) and 02 (x) = E [ 2 (x, )].
The task of estimating the individual ddf in a distribution-free setting is then accomplished in two steps. The first step involves
homogeneous estimation of the ddfs, and the second estimates the structural parameters 02 and 02 .
K
ns
K
ns
L= ast I (Xst > x) , ast R, ast = 1 (2.10)
s=1 t=1 s=1 t=1
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 317
The solution is stated in the next theorem. While the proof of the first part is similar to that of Theorem 2.1, hence omitted, the
one for the second part is put in Section A.1.1 of the appendices.
Theorem 2.3. The homogeneous credibility estimator of S(x, i ) as the solution to (2.11) is
S (x, i ) = Zi Si (x) + (1 Zi )
S(x), (2.12)
where
ni 02 1
K
Zi = , i = 1, 2, . . . , K, and
S(x) = K Zr Sr (x). (2.13)
0 + ni 02
2
r=1 Zr r=1
to derive an inhomogeneous estimator. The resulting estimator, however, is just the same as the one presented in Equations
(2.7)(2.8). This is a typical phenomenon in credibility theory, see, for example, Buhlmann and Gisler (2005), section 3.1.4.
1
K K
S(x) = ni Si (x), where N = ni depends on the portfolio size K. (2.16)
N i=1 i=1
Recall that under the general strong law of large numbers, if the summands are mutually independent, then cn1 nj=1 (Xj
E[Xj ]) 0 for any sequence {cn } of real numbers, provided that
a.s. 2
k=1 ck Var(Xk ) < ; see, e.g., theorem 3.1
ofK DasGupta
(2008, p. 35). It is then easy to see that S(x) is strongly consistent: Taking cK = N , since Var[nK SK (x)] = Var[ nj =1 I (XKj >
318 X. CAI ET AL.
a.s.
x)] 2n2K , it follows that S(x) S0 (x) as K under the condition
n2K
< . (2.17)
K=1
N2
In view of the definitions of 02 (x) and 02 (x), the parameters 02 and 02 can be estimated using
K
ni
2
K
2
SSE(x) = I Xij > x Si (x) and SSA(x) = ni Si (x) S(x) , (2.18)
i=1 j =1 i=1
an idea borrowed from analysis of variances (ANOVA). The estimators can be formally defined as
1
02 = SSE(x) dx (2.19)
N K
and
N K 1
02 = K SSA(x)dx SSE (x) dx . (2.20)
N2 i=1 n2i N K
Increasingly order the claims of individual i as Xi(1) , Xi(2) , . . . , Xi(ni ) and all N = n1 + + nK claims jointly as R1 , R2 , . . . , RN .
Then some algebraic computations give rise to
K ni
2j 1
SSE(x) dx = 1 Xi(j ) and (2.21)
i=1 j =1
ni
1 1
K ni N
SSA(x) dx = (2ni 2j + 1) Xi(j ) (2N 2j + 1)Rj , (2.22)
n
i=1 i j =1
N j =1
Theorem 2.4.
02 and
02 have the following properties.
02 and
1. 02 are unbiased estimators of 02 and 02 , respectively.
2. Under condition (2.17), 02 02 and 02 02 almost surely as K .
Finally, by inserting the estimates of the structure parameters 02 and 02 into (2.12) and (2.13), we can get empirical Bayes
estimators of the ddfs S(x, i ) as
S(x, i ) = Zi
Zi Si (x) + 1 S0 (x), i = 1, 2, . . . , K, (2.23)
where
ni 02 1
K
Z i = , i = 1, 2, . . . , K, and
S0 (x) = K Z r Sr (x). (2.24)
02 + ni 02 r=1 Z r r=1
which all the ni take the same value equal to, for example, ni n, and thus Zi are all the same over i = 1, 2, . . . , K (so are Z i )
and, consequently,
1
K
S0 (x) = Sr (x).
K r=1
The result is stated in the theorem below, and its proof is found in Section A.1.3 of the appendices.
Theorem 2.5. Under conditions (2.17) and (2.25), the S(x, i ) defined by (2.23) and (2.24) are asymptotically optimal in the
sense that
+ +
2 2
lim max E
S(x, i ) S(x, i ) dx S(x, i ) S(x, i ) dx = 0,
(2.26)
K 1iK
where
S(x, i ) is the linear estimator defined by (2.7) and (2.8).
Theorem 3.1. The estimator H (X| ) is strongly consistent for H (X| ) if the premium principle H is continuous with respect to
the L -norm of ddfs, where the L -norm of a function f (x) on R is denoted and defined by
f (x)
= supxR |f (x)|. In other
S(x, ) satisfies supxR |
words, if the estimator (X| )
a.s. a.s.
S(x, ) S(x, )| 0, then H H (X| ).
According to Theorem 3.1, the strong consistency of H (X| ) is guaranteed under certain regularity conditions. This is a
significant improvement over such literature as Gerber (1980), where the credibility estimator is not generally (strongly) consistent,
and as Pan et al. (2008) and Wen et al. (2009), where the consistency of the credibility estimators needs to be proved separately in
every case. Quite a few premium principles H can be represented as a continuous function of expectations of certain functions of X,
for example, Kamps premium H (X| ) = E[X(1 eX )| ]/E[(1 eX )| ]. When the functions are bounded and continuous on
the support of X, it is well known that H is continuous with respect to the weak convergence of the distribution of X (Portmanteau
theorem; cf. DasGupta, 2008, Theorem 1.4), which is a stronger requirement than the continuity with respect to the L -norm in
320 X. CAI ET AL.
our Theorem 3.1. If the limiting ddf is continuous in x, however, these two conditions are equivalent; see theorem 1.3 of DasGupta
(2008).
There are many well-known and extensively discussed premium principles (cf. Young 2004) for which strong consistency of the
experience ratemaking can be easily checked, although not as a result of Theorem 3.1. They include the net premium H (X| ) =
E[X| ], variance premium H (X| ) = E[X| ] + Var(X| ), modified variance premium H (X| ) = E[X| ] + Var(X| )/E[X| ],
standard deviation premium H (X| ) = E[X| ] + Var(X| ), Esscher premiums H (X| ) = E[XehX | ]/E[ehX | ], Kamps
premium H (X| ) = E[X(1 eX )| ]/E[(1 eX )| ], conditional tail expectation premium H (X| ) = E[X|X > , ], and
exponential premium H (X| ) = 1 log[E(eX | )]. The following are two exceptions in which the consistency is not straightfor-
ward.
1. Dutchs premium principle: H (X| ) = E[X| ] + E[(X E[X| ])+ | ], where 0 < 1 and 1. Observe that
ESn [(X E[X| ])+ ] = n1 ni=1 (Xi ( ))+ , where ESn [g(X)] denotes the expectation of g(X) with respect to the
distribution generated by Sn (x). The estimate of H (X| ) is then given by
Z n
(X| ) =
H ( ) + (Xi
( ))+ + (1 Z) (x
( ))+ dF 0 (x) , (3.1)
n i=1
n
where dF0 (x) = 1 S0 (x) is the marginal cumulative distribution function of X,
( ) = ZX n + (1 Z)0 , and Xn = i=1 Xi
(which will be used thereafter). Its consistency is proved in Section A.2.1 of the appendices.
2. Distortion premium principle:
0
H (X| ) = g (S(x, )) dx g(1 S (x, )) dx. (3.2)
0
(X| ) H (X| ) as n
where the credibility factor is given by (2.8). For example, if X has bounded support, then H
a.s.
if g is a Lipschitz function on [0, 1] such that |g(x) g(y)| C|x y| for some constant C, which follows easily from
Glivenko-Cantellis theorem. For the case of unbounded support, the consistency still holds if g is a Lipschitz function on
[0, 1]; see Section A.2.2 of the appendices for more details.
(X| ) = ZX n + (1 Z) 0 ,
H (3.3)
which has the same form as Buhlmanns credibility premium c ( ) = Z c Xn + (1 Z c ) 0 but uses a different credibility
factor, where the superscript c denotes classical, Z = n 2 /(n 2 + 2 ) is the credibility factor, 2 = Var(( )), and
c
straightforward.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 321
2 c 2 2
( ) ( )]2 = Z 2
E [ + (1 Z)2 2 and E
( ) ( ) = (Z c )2 + (1 Z c )2 2 . (3.4)
n n
As a result,
( ) ( )]2
E[
lim = 1. (3.5)
c ( ) ( )]2
n E[
To see how (3.5) follows from (3.4), just plug Z = n02 /(n02 + 02 ) and Z c = n 2 /(n 2 + 2 ) in (3.4) to obtain
n04 2 + 04 2 c 2 2 2
( ) ( )]2 =
E [ and E
( ) ( ) = .
(n02 + 02 )2 n 2 + 2
( ) ( )]2
E[ (04 2 + 04 2 /n)( 2 + 2 /n) 04 2 2
lim = lim = = 1.
c ( ) ( )]2
n E[ n (02 + 02 /n)2 2 2 (02 )2 2 2
Example 3.1. Assume that X1 , X2 , . . . , Xn are i.i.d. as S(x, ) = ex , x > 0, and Gamma(, ) with density ( ) =
1 e / (), > 0, > 2, > 0, where and are known quantities. By some algebraic computations, the expected
squared errors of ( ),
c ( ), and 0 can be shown to satisfy the equalities
( ) ( ))2 ]
E[( 2n c ( ) ( ))2 ]
E[( 1
= 1 + and = . (3.6)
c ( ) ( ))2 ]
E[( ( 1)(n + 2 1)2 E[(0 ( )) ]2 n+1
The two equalities clearly show that the estimate c ( ) and both have the MSEs that are only of n1
( ) is slightly worse than
order of the MSE of the collective premium 0 .
n
where Dn2 = n1 i=1 (Xi Xn )2 . Recall Buhlmanns credibility premium (Buhlmann 1970, chapter 4):
HBul (Xn ) = bX n + (1 b) 0 + csn2 + (1 c) 2 + (1 b) 2 , (3.8)
where
1
n
n 2 Var 2 ( )
sn2 = (Xi Xn )2 , b= 2 , c= .
n 1 i=1 n + 2 Var 2 ( ) + E Var sn2 |
322 X. CAI ET AL.
Because the quantity Var(sn2 | ) in the credibility factor c involves the fourth moment of Xi given , Buhlmann suggested to use
2 4 ( )/(n 1) as an approximation under certain assumptions, so that c was in fact approximated by Var ( 2 ( ))/{Var ( 2 ( )) +
E [2 4 ( )/(n 1)]}. Therefore, in contrast to Buhlmanns version, a direct advantage of (3.7) is that we need only the first two
moments of the risk distribution.
The following example numerically illustrates how H (X| ) in (3.7) is close to Buhlmanns credibility estimator HBul .
Example 3.2. Poisson-exponential model: Let X be Poisson distributed with Pr(X = k| ) = k e /k! given , k = 0, 1, . . . ,
and has an exponential prior density ( ) = e ( > 0), such that ( ) = , 2 ( ) = , 0 = 1/, 2 = 1/2 and
2 = 1/. The risk and collective premiums are H (X| ) = (1 + ) and Hcol (X) = 1/ + ( + 1)/2 , respectively. The posterior
distribution of given Xn is Gamma(nXn +1, n+), with (conditional) mean (nXn +1)/(n+) and variance (nX n +1)/(n+)2 .
The estimators of H (X| ) include the Bayes premium
(n + )( + 1) +
HB (Xn ) = E[Xn+1 |Xn ] + Var(Xn+1 |Xn ) = (nXn + 1),
(n + )2
the premium Hcu (Xn ) in (3.7), Buhlmanns credibility premium HBul in (3.8), and collective premium Hcol (X), where and
henceforth, the subscript cu indicates the current experience premium. Consider the mean squared error of these estimators as
V& = E[(H& (Xn ) H (X| ))2 ] for & = B, Bul, cu, col. While Vcol = ( + 1)2 /2 + 2 /4 and VB = 2 [M 2 n + (Mn 1
)2 +(M +Mn1)2 ] are both exact, where M = [(n+)( +1)+]/(n+)2 , the values of VBul and Vcu can be approximated
only by the Monte Carlo method. We approximated the values of VBul and Vcu for fixed = 0.2 and a variety of values with
sample size n = 30 and n = 100. Accordingly, we also computed their relative efficiencies Eff & = (Vcol V& )/(Vcol VB )
(Eff col = 0 and Eff B = 1, and a larger value of Eff & stands for higher efficiency of the method &). The results are reported in
Table 1 which shows that (1) the estimate Hcu (Xn ) is better than the collective premium Hcol , (2) the Vcu is slightly larger than
VBul but the differences are negligible as the sample size increases, and (3) the estimates Hcu (Xn ), HBul (Xn ) are both very close
to the Bayes premium HB (Xn ).
TABLE 1
Numerical Results of V& and Eff & for n = 30 and n = 100
n = 30 n = 100
VB Vcu Eff cu VBul Eff Bul Vcol VB Vcu Eff cu VBul Eff Bul Vcol
0.2 0.2405 0.4050 0.9822 0.3937 0.9832 61.000 0.0720 0.0958 0.9996 0.1180 0.9992 61.000
0.3 0.1530 0.1582 0.9886 0.1538 0.9893 20.938 0.0479 0.0842 0.9982 0.0779 0.9985 20.938
0.4 0.1189 0.1365 0.9907 0.1546 0.9863 10.562 0.0359 0.0496 0.9986 0.0503 0.9986 10.562
0.5 0.0947 0.18 0.9885 0.0840 0.9939 6.4000 0.0270 0.0306 0.9994 0.0285 0.9997 6.4000
0.6 0.0786 0.0803 0.9886 0.0792 0.9919 4.3086 0.0238 0.0288 0.9988 0.0270 0.9992 4.3086
0.7 0.0671 0.0769 0.9906 0.0720 0.9929 3.1053 0.0180 0.0207 0.9991 0.0199 0.9993 3.1053
0.8 0.0585 0.0847 0.9829 0.0876 0.9812 2.3476 0.0160 0.0174 0.9993 0.0170 0.9995 2.3476
0.9 0.0518 0.0399 0.9916 0.0383 0.9927 1.8387 0.0158 0.0162 0.9998 0.0160 0.9999 1.8387
1.0 0.0465 0.0656 0.9795 0.0664 0.9789 1.4800 0.0142 0.0171 0.9980 0.0171 0.9980 1.4800
Note: Vcol is independent of the sample size and thus is shared by n = 30 and n = 100.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 323
Hn = ni=1 Xi ehXi / ni=1 ehXi and hn ( ) = E (Hn | ) with E , Var , and Cov denoting, respectively, the expectation, vari-
ance, and covariance with respect to a fictitious distribution of , defined in terms of density by ( ) = ( )mh ( )/mh
with mh ( ) = E(ehX | ) and mh = E[mh ( )] = E(ehX ). See, for example, Pan et al. (2008) or Wen et al. (2009). In addi-
tion, Z = Zhn /(Zhn + (1 Z)h0 ), where hn = n1 ni=1 ehXi , h0 = E[ehX ] and Z is given by (2.8) in Theorem 2.1.
Note that the individual and collective premiums are independent of n.
Now that the Esscher premium principle is obtained by minimizing the exponentially weighted quadratic error in (3.9); we use
the weighted quadratic loss E[L(H& (Xn ))] = E[(Xn+1 H& (Xn ))2 ehXn+1 ], for & = col, B, P , G, and cu, to measure the closeness
of the experience premiums H& (Xn ). This is similar to what we have done in the case of the Buhlmanns credibility formula with
quadratic errors in Section 3.2.1. Note that it can be represented in terms of the risk premium H (X| ) as
E[L(H& (Xn ))] = E (Xn+1 H (X| ))2 ehXn+1 + E (H (X| ) H& (Xn ))2 ehXn+1 , (3.10)
where the first term of the right-hand side is independent of &. Hence, comparing E[(H& (Xn ))] can be reduced to comparing
2 2
V& = E H (X| ) H& Xn ehXn+1 = E H (X| ) H& Xn mh ( ) . (3.11)
Obviously E[L(HB (Xn ))] E[L(H& (Xn ))] for all & and E[L(H& (Xn ))] E[L(Hcol (Xn ))] for & = B, P , and G. Thus, HB (Xn )
is the best of H& (Xn ) over all values of &. While we do not generally know whether Hcu is better than HP , HG , and Hcol , it is
highly interesting that Hcu (Xn ) is optimal under the Bernoulli-Uniform model: It coincides with the Bayes premium HB (Xn ). This
is stated in Example 3.3 below. Example 3.4 compares V& for & = G, P and cu under the Poisson-Gamma model.
Example 3.3 (Bernoulli-Uniform Model). Let X be a Bernoulli variable given with Pr(X = 1| ) = 1 Pr(X = 0| ) = and
uniformly distributed over interval (0, 1). Then Hcu (Xn ) = HB (Xn ); see Section A.3.2 of the appendices for a proof.
i.i.d
Example 3.4 (Poisson-Gamma Model). Let Xi P oisson ( ) and Gamma(, ) with density ( ) = 1 e / (),
> 0, > 2, > 0. It follows that E(X| ) = Var(X| ) = and, given Xn , the posterior distribution of is Gamma( +
nX n , + n). The corresponding premiums are listed in Table 2, where
The term approx indicates that Pans premium HP X n can be computed only by a Monte Carlo approximation (an algorithm
is presented in Algorithm A.1)
For Hcu (Xn ), the credibility factor is Z = Zhn /(Zhn + (1 Z) h0 ), with Z = n 20 /(n 20 + 02 )
min(i, j ) ( + i + j ) ( + i) ( + j )
02 = (3.12)
i=1 j =1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
324 X. CAI ET AL.
TABLE 2
Experience Premiums under Poisson-Gamma Model
Premium and Individual Collective Bayes Pan Gerber Current
denotation H (X| ) Hcol (X) HB (Xn ) HP (Xn ) HG (Xn ) Hcu (Xn )
and
min(i, j ) ( + i + j )
02 = ; (3.13)
i=1 j =1
i!j ! () ( + 2)+i+j
see Section A.3.4 of the appendices for proofs of both (3.12) and (3.13) and
It is also interesting that
HG Xn = HB Xn ; (3.14)
We conclude this section with two remarks: (1) there are cases where Hcu is optimal, and (2) even if Hcu is not optimal, it is
tightly close to the optima, which is strongly supported by the numerical results in Table 3, where the lowest efficiency of Hcu is
0.9296 at = 5 and n = 10 (a very small sample size).
TABLE 3
Numerical Results of V& and Eff & for n = 10 and n = 100
n = 10 n = 100
VB Vcu Eff cu VP Eff P Vcol VB Vcu Eff cu VP Eff P Vcol
2.0 0.1128 0.1740 0.9475 0.1521 0.9663 1.2776 0.0151 0.0368 0.9828 0.0369 0.9827 1.2776
2.5 0.1320 0.1847 0.9668 0.1697 0.9762 1.7191 0.0217 0.0398 0.9893 0.0391 0.9897 1.7191
3.0 0.2087 0.2574 0.9758 0.2423 0.9833 2.2207 0.0277 0.0517 0.9891 0.0515 0.9892 2.2207
3.5 0.2245 0.3002 0.9705 0.2829 0.9772 2.7889 0.0292 0.0651 0.9870 0.0654 0.9869 2.7889
4.0 0.3390 0.5018 0.9474 0.4518 0.9635 3.4311 0.0390 0.1177 0.9768 0.1121 0.9784 3.4311
4.5 0.3898 0.5987 0.9445 0.5144 0.9669 4.1551 0.0484 0.1423 0.9771 0.1453 0.9764 4.1551
5.0 0.4718 0.7885 0.9296 0.6411 0.9624 4.9698 0.0733 0.2139 0.9713 0.2166 0.9707 4.9698
5.5 0.4463 0.7589 0.9425 0.6646 0.9598 5.8848 0.0647 0.1809 0.9800 0.1678 0.9823 5.8848
6.0 0.5736 0.8912 0.9499 0.7999 0.9643 6.9106 0.1004 0.2412 0.9793 0.2403 0.9795 6.9106
6.5 0.6296 1.1399 0.9313 0.9642 0.9550 8.0590 0.1181 0.3274 0.9736 0.3052 0.9764 8.0590
7.0 0.8354 1.3253 0.9424 1.1425 0.9639 9.3425 0.1248 0.3946 0.9707 0.3773 0.9726 9.3425
7.5 1.0700 1.8302 0.9217 1.5353 0.9520 10.7752 0.1745 0.6393 0.9562 0.6018 0.9597 10.7752
8.0 1.3422 2.0515 0.9357 1.7321 0.9646 12.3724 0.1542 0.5338 0.9689 0.5107 0.9708 12.3724
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 325
TABLE 4
Averages of 100 ISEs of the estimates of the ddf
Policy No. i 1 2 3 4 5 6 7 8 9 10
ISE of
S(x, i ) 1.832 1.752 1.801 1.797 1.870 1.859 1.838 1.739 1.861 1.796
ISE of
S (x, i ) 1.914 1.799 1.868 1.854 1.937 1.936 1.903 1.793 1.933 1.870
ISE of
S(x, i ) 1.973 1.858 1.979 1.909 2.006 2.011 1.958 1.860 1.991 1.915
were computed and, after being multiplied by 100 to make the values in a moderate scale, are listed in Table 4. In this table, in
terms of the average of the ISEs, the empirical Bayes estimation is slightly worse than the inhomogeneous estimation, and the
latter is slightly worse than the homogeneous estimation. This loss of accuracy is clearly caused by the additional estimation of
the unknown structure parameters 02 , 02 , and S0 (x).
This simulation also computed the averages of the squared errors (SE) of the empirical premiums obtained by plugging in
S(x, i ). The squares of differences between the empirical and theoretical premiums, under net, variance, modified variance, and
standard deviation principles, after being multiplied by 10, are listed in Table 5.
To measure the efficiency of the empirical premium computed by S(x, ), we computed the quantities
ASE(col) ASE(EB)
Eff = , (3.15)
ASE(col) ASE(Bayes)
where ASE(H ) is the average of the squared errors obtained by applying premium principle H: col means collective premium H (X),
EB the empirical premium computed from S(x, ), and Bayes the Bayesian premium computed as follows. Under the probability
distribution setting in the simulation, the predictive distribution of a future loss, such as Xi,ni +1 , given {Xij , j = 1, 2, . . . , ni } is
TABLE 5
Averages of 10 SEs for Empirical Premiums
Policy No. i 1 2 3 4 5 6 7 8 9 10
SE of Net 0.731 0.658 0.778 0.679 0.848 0.699 0.794 0.683 0.764 0.753
SE of Var. 5.310 5.246 40.51 5.794 13.30 4.771 7.171 8.373 10.28 11.93
SE of ModVar 1.579 1.522 1.842 1.460 1.771 1.562 1.695 1.548 1.704 1.691
SE of StDev 1.303 1.189 1.439 1.188 1.484 1.249 1.403 1.232 1.374 1.362
326 X. CAI ET AL.
TABLE 6
Efficiency of Empirical Premiums with Respect to Collective Premium
Policy No. i 1 2 3 4 5 6 7 8 9 10
Eff of Net 0.988 0.974 1.000 0.991 0.971 0.982 0.978 0.973 0.956 0.957
Eff of Var. 0.868 0.614 0.793 0.917 0.834 0.851 0.906 0.949 0.889 0.801
Eff of ModVar 0.934 0.930 0.958 0.956 0.937 0.931 0.943 0.931 0.922 0.924
Eff of StD 0.975 0.966 0.991 0.985 0.965 0.970 0.973 0.963 0.948 0.953
ni
the Pareto distribution with shape parameter ni + and scale parameter j =1 Xij + , so that
ni ni
j =1 Xij + j =1 Xij +
E[Xi,ni +1 |Xi1 , . . . , Xini ] = and Var Xi,ni +1 |Xi1 , . . . , Xini = .
ni + 1 ni + 1
The Bayes premium for Xi,ni +1 was then computed by substituting the predictive distribution into a risk premium for the risk
distribution S(x, ) under the net, variance, modified variance, and standard deviation premium principles. The resulting efficiencies
from the simulation under the four principles above are listed in Table 6, which shows that the empirical Bayes premiums under
all four premium principles are of high efficiencies, though they vary over premium principles.
4. CONCLUDING REMARKS
We have developed a completely nonparametric estimation for loss distributions and established a unified distribution-free
approach to experience rating for arbitrary premium principles. The method combines the advantages of Buhlmanns credibility
theory and Fergusons nonparametric Bayes premiums and thus provides a powerful tool to generate appropriate experience rating
given the growing body of premium principles developed in general insurance. It is demonstrated under a number of principles
that, although this new approach does not guarantee theoretical optimality, it does produce solutions that are close to the optima.
In examples we have examined (Section 3.2.3) for the Esscher premium principle, the efficiencies with respect to the optimal
premium range between 92.17% and 97.58% even with a small sample size of n = 10 (cf. Table 3).
This new approach can be broadly applied in almost all premium pricing problems in general insurance, including health care,
income protection, property, financial products, and business. More broadly, our distribution-free approach to estimate distribution
functions can be applied to many other areas, such as reserve evaluation (including incurred but not reported and reported but
not settled claims) to predict outstanding claim losses, Bonus-Malus insurance systems (cf. Ferreira 1974; Lemaire 1995) that
give premium discount to low risks in the past year, the optimal claim decision problem of policyholders (see, e.g., Haehling von
Lanzenauer 1974; Braun et al. 2006), health care cost analysis (Bertsimas et al. 2008; Enthoven and Fuchs 2006; Stephens et al.
2005), and simulation of health insurance markets (Feldman and Dowd 1982). This approach is also useful in economics, finance,
and other areas where previous experiences influence present and future risks.
The data structure we have used is, however, limited to the Buhlmann type (conditionally i.i.d.). The extension of the approach
to the Buhlmann-Straub model (Buhlmann and Straub 1970) is not difficult. There are, however, further interesting topics for future
researches, including problems where the data possess certain types of hierarchical settings, losses or risks of regression structures
dependent on covariates, and correlation structures such as panel data. It will be desirable and of practical importance to investigate
if results parallel to what we have found here could be derived for problems with different data settings. On the other hand, our
approach has been established by means of optimal estimation of the risk distributions under the L2 -distance measure, where
optimization could be performed based on derivative equations. It will be interesting to investigate if distribution-free approaches
of comparable performance could be developed under other distance measures. Another interesting topic is to theoretically identify
the conditions under which the experience ratings deduced by inserting the estimated distribution would agree with existing ones
such as Buhlmanns credibilities for the net premium principle and variance premiums, and the Gerbers and Pans versions for
Esscher premiums.
FUNDING
The authors acknowledge the support of GRF Grants No. 410211 and 410213 from the Research Grants Council of Hong
Kong, for X. Q. Cai, NSFC Grant No. 71361015, Jiangxi Provincial Natural Science Foundation Grant No. 20142BAB201013,
No. 2013M540534 from the China Postdoctoral Science Foundation, No. 2014T70615 from the China Postdoctoral Fund Special
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 327
Project for L. M. Wen, and Shanghai Philosophy and Social Science Foundation Grant No. 2010BJB004, the 111 Project under
Grant No. B14019, and NSFC Grant No. 71371074 for X. Y. Wu.
REFERENCES
Antoniak, C. E. 1974. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Annals of Statistics 2(6): 11521174.
Bertsimas, D., M. V. Bjarnadottir, M. A. Kane, J. C. Kryder, R. Pandey, S. Vempala, and G. Wang. 2008. Algorithmic Prediction of Health-Care Costs. Operations
Research 56: 13821392.
Braun M., P. S. Fader, E. T. Bradlow, and H. Kunreuther. 2006. Modeling the Pseudodeductible in Insurance Claims Decisions. Management Science 52(8):
12581272.
Buhlmann, H. 1967. Experience Rating and Credibility. ASTIN Bulletin 4: 199207.
Buhlmann, H., 1970. Mathematical Methods in Risk Theory. Berlin: Springer-Verlag.
Buhlmann, H. 1980. An Economm Prerium Principle. ASTIN Bulletin 11: 5260.
Buhlmann, H., and A. Gisler. 2005. A Course in Credibility Theory and Its Applications. Amsterdam: Springer.
Buhlmann, H., and E. Straub. 1970. Glaubwudigkeit fur Schadensaze. Bulletin of the Swiss Association of Actuaries 70(1): 11133.
DasGupta, A., 2008. Asymptotic Theory of Statistics and Probability. New york: Springer Science+Business Media.
Dhaene, J., S. Vanduffel, M. J. Goovaerts, R. Kaas, Q. Tang, and D. Vyncke. 2006. Risk Measures and Comonotonicity: A Review. Stochastic Models 22:
573606.
Enthoven, A. C., and V. R. Fuchs. 2006. Employment-Based Health Insurance: Past, Present, and Future. Health Affairs 25: 15381547.
Feldman, R. D., and B. E. Dowd. 1982. Simulation of a Health Insurance Market with Adverse Selection. Operations Research 30: 10271042.
Ferreira, J. 1974. The Long-Term Effects of Merit-Rating Plans on Individual Motorists. Operations Research 22: 954978.
Ferguson, T. 1973. A Bayesian Analysis of Some Non-parametric Problems. Annals of Statistics 1(2): 209230.
Furman, E., and R. Zitikis. 2008. Weighted Premium Principles. Insurance: Mathematics and Economics 42(1): 459465.
Gerber, H. U. 1980. Credibility for Esscher Premium. Mitleilungen der Vereinigung schweiz. Versicher ungsmathematiker 3: 307312.
Ghosh, J. K., and R. V. Ramamoorthi. 2003. Bayesian Nonparametrics. Springer Series in Statistics. New York: Springer-Verlag.
Gomez, E., A. Hernandez, and F. J. Vazquez-Polo. 2000. Robust Bayesian Premium Principles in Actuarial Science. Journal of the Royal Statistical Society, Series
D 49(2): 241252.
Gomez, E., A. Hernandez, and F. J. Vazquez-Polo. 2006. On the Use of Posterior Regret -Minimax Actions to Obtain Credibility Premiums. Insurance:
Mathematics and Economics 39(1): 115121.
Goovaerts, M. J., R. Kaas, A. E. Van Heerwaarden, and T. Bauwelinckx. 1990. Effective Actuarial Methods. Amsterdam: North-Holland.
Hachemeister, C. A. 1975. Credibility for Regression Models with Application to Trend. In Credibility: Theory and Applications, Proceedings of the Berkeley
Actuarial Research Conference on Credibility, New York: Academic, pp. 129163.
Haehling von Lanzenauer, C. 1974. Optimal Claim Decisions by Policyholders in Automobile Insurance with Merit-Rating Structures. Operations Research 22:
979990.
Heilmann, W. R. 1989. Decision Theoretic Foundations of Credibility Theory. Insurance: Mathematics and Economics 8: 7795.
Jewell, W. S. 1974. The Credible Distribution. ASTIN Bulletin 7(3): 237269.
Kaas, R., M. Goovaerts, J. Dhaene, and M. Denuit. 2001. Modern Actuarial Risk Theory. New York: Kluwer Academic.
Klugman, S. A. 1992. Bayesian Statistics in Actuarial Science: With Emphasis on Credibility. Boston: Kluwer.
Lau, J. W., T. K. Siu, and H. Yang. 2006. On Bayesian Mixture Credibility. ASTIN Bulletin 36(2): 573588.
Lemaire, J. 1995. Bonus-Malus Systems in Automobile Insurance. New York: Kluwer Academic.
Makov, U. E., A. F. M. Smith, and Y. H. Liu. 1996. Bayesian Methods in Actuarial Science. Journal of the Royal Statistical Society, Series D 45(4): 503515.
Mashayekhi, M. 2002. On Asymptotic Optimality in Empirical Bayes Credibility. Insurance: Mathematics and Economics 31: 285295.
Natarajan, K., D. Pachamanova, M. Sim. 2009. Constructing Risk Measures from Uncertainty Sets. Operations Research 57(5): 11291141.
Norberg, R. 1980. Empirical Bayes credibility. Scandinavian Actuarial Journal. 1980: 172194.
Norberg, R. 2004. Credibility Theory. In Encyclopedia of Actuarial Science, edited by J. Teugels and B. Sundt. Chichester, UK: Wiley.
Pai, J. S. 1997. Bayesian Analysis of Compound Loss Distributions. Journal of Econometrics 79(1): 129146.
Pan, M., R. Wang, and X. Wu. 2008. On the Consistency of Credibility Premiums Regarding Esscher Principle. Insurance: Mathematics and Economics 42:
119126.
Pitselis, G. 2004. A Seemingly Unrelated Regression Model in a Credibility Framework. Insurance: Mathematics and Economics 34: 3754.
Robbins, H. 1955. An Empirical Bayes Approach to Statistics. In Proceedings of the Third Berkeley Symposium on Mathematics, Statistics and Probability 1:
157164.
Robbins, H. 1964. The Empirical Bayes Approach to Statistical Decision Problems. Annals of Mathematics and Statistics 35: 120.
Schmidt, K. D. 1991. Convergence of Bayes and Credibility Premiums. ASTIN Bulletin 20(2): 167172.
Schmidt, K. D. 1998. Bayesian Models in Actuarial Mathematics. Mathematical Methods of Operations Research 48: 117146.
Stephens, C. R., H. Waelbroeck, and S. Talley. 2005. Predicting Healthcare Costs Using GAs. In Proceedings of the 2005 Workshops on Genetic and Evolutionary
Computation, June 2526, Washington, D.C., GECCO 05. ACM, New York, pp. 159163. http://doi.acm.org/10.1145/1102256.1102291.
Sundt, B. 1999. An Introduction to Non-life Insurance Mathematics. 4th edition. Karlsruhe: Verlag Versicherungswirtschaft.
Szego, G. 2002. Measures of Risk. Journal of Banking and Finance 26: 12531272.
Wen, L., X. Wu, and X. Zhao. 2009. The Credibility Estimators under Generalized Weighted Loss Functions. Journal of Industrial and Management Optimization
5(4): 893910.
Wu, X., and X. Zhou. 2006. A New Characterization of Distortion Premiums Via Countable Additivity for Comonotonic Risks. Insurance: Mathematics and
Economics 38: 324334.
Young, V. R. 2004. Premium Principles. In Encyclopedia of Actuarial Science, edited by J. Teugels and B. Sundt, pp. 13221331. New York: Wiley.
Zehnwirth, B. 1977. The Mean Credibility Formula is a Bayes Rule. Scandinavian Actuarial Journal 212216.
328 X. CAI ET AL.
Zehnwirth, B. 1979. Credibility and the Dirichlet Process. Scandinavian Actuarial Journal 1323.
Zehnwirth, B. 1981. A Note on the Asymptotic Optimality of the Empirical Bayes Distribution Function. Annals of Statistics 9: 221224.
Discussions on this article can be submitted until July 1, 2016. The authors reserve the right to reply to any discussion. Please see
the Instructions for Authors found online at http://www.tandfonline.com/uaaj for submission instructions.
APPENDICES
A.1. Proofs of Theorems in Section 2.3
A.1.1. Proof of Theorem 2.3
Proof. Note that the mean squared error of
S (x, i ) can be decomposed as
2 2
E
S (x, i ) S(x, i dx = E
S(x, ) S(x, i ) + S (x, i ) S(x, ) dx
2
=E
S(x, ) S(x, i ) dx + 2E
S (x, i ) S(x, ) S(x, ) S(x, i ) dx
2
+E
S (x, i )
S(x, ) dx . (A.1)
K K
Second, it follows
from the equalities S (x, i )
S(x, i ) = (1 Zi ) r=1 Zr (Sr (x) S0 (x)) / r=1 Zr and Cov S(x, i )
S(x, i ), Sr (x) = 0, r = 1, 2, . . . , n, that
E S (x, i )
S(x, i ) S(x, ) S(x, i ) = 0. (A.3)
Third, as Var (Sr (x)) dx = 02 /nr + 02 = 02 /Zr ,
2 2
E S (x, i )
S(x, i ) dx = Var S (x, i )
S(x, i ) dx
(1 Zi )2 2
K
2 (1 Zi )2
= K 2 Zr Var (Sr (x)) dx = 0K . (A.4)
r=1 Zr r=1 r=1 Zr
Inserting (A.2), (A.3), and (A.4) into (A.1) leads to the desired equality:
r=i Zr + 1
2
02 (1 Zi )2 02 02
E S (x, i ) S(x, i ) dx = K + (1 Zi ) 0 =
2
.
r=i Zr + Zi 0 + ni 0
2 2
r=1 Zr
which implies the unbiasedness of 02 . Furthermore, write n = (n1 , n2 , . . . , nK ) , N = diag(n1 , n2 , . . . , nK ) (the diagonal matrix
with diagonal elements n1 , n2 , . . . , nK ) and S = (S1 (x), S2 (x), . . . , SK (x)) . Then E[S] = S0 (x)1 and Var(S) = 02 (x)N1 +02 (x)I ,
because
2
Var(Si (x)) = Var(1 Yi /ni ) = n2
i 1 0 (x)I + 0 11 1 = 0 (x)/ni + 0 (x).
2 2 2
Since
2
ni
2j n 1 1 ni
1 ni
E[Ti2 ] = E Xi(j ) (2j ni 1)2 E 2
i
Xi(j )
j =1
n i ni j =1
ni j =1
1
ni
2 2 (n2i 1) 2
= 4j 4j (ni + 1) + (ni + 1)2 E Xi1 = E Xi1 ,
ni j =1 3
we have
Var (TK ) E TK2 1 2 (n2K 1)
K K E Xij K <
K=1 ( i=1 (ni 1))2 K=1 ( i=1 (ni 1))2 3 K=1 ( i=1 (ni 1))
2
due to condition (2.17). Thus the consistency of 02 follows from Kolmogorovs strong law of large numbers for independent but
not identically distributed series. To show the consistency of
02 , note the expression
1 1
K
2
K SSA(x) dx = K ni (Si (x) S0 (x)) dx
2
S0 (x) S(x) dx.
i=1 ni i=1 ni i=1
First, as |x| dS(x) |x| dS0 (x) by the strong law of large numbers and maxx |S0 (x) S(x)| 0 (Glivenko-Cantellis
theorem), under condition (2.17) we have
2
S0 (x) S0 (x) dx = x S0 (x) S(x) d S0 (x) S(x)
= x S0 (x) S(x) dS(x) x S0 (x) S(x) dS0 (x)
max S0 (x) S(x) |x| dS(x) + |x| dS0 (x) 0.
x
330 X. CAI ET AL.
K
We next treat i=1 ni (Si (x) S0 (x))2 dx/ Ki=1 ni . Note that
(Si (x) S0 (x)) dx = 2
x (Si (x) S0 (x)) d (Fi (x) F0 (x))
0
= x (Si (x) S0 (x)) d (Fi (x) F0 (x)) + x (F0 (x) Fi (x)) d (Fi (x) F0 (x))
0
0 0
= xSi (x) dF i (x) + xS0 (x) dF 0 (x) xFi (x) dF i (x) xF0 (x) dF 0 (x)
0 0
0 0
+ xF0 (x) dF i (x) + xFi (x) dF 0 (x) xS0 (x) dF i (x) xSi (x) dF 0 (x).
0 0
Define Hi (x) = Si (x)I (x 0) + Fi (x)I (x < 0) and H0 (x) = S0 (x)I (x 0) + F0 (x)I (x < 0). Then
(Si (x) S0 (x)) dx = 2
|x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x)
|x|H0 (x) dF i (x) |x|Hi (x) dF 0 (x)
|x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x).
Thus
2
Var ni (Si (x) S0 (x))2 dx n2i E (Si (x) S0 (x))2 dx
2
n2i E |x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x)
2 2
2n2i E |x|Hi (x) dF i (x) + E |x|H0 (x) dF 0 (x)
2n2i E x 2 dF i (x) + x 2 dF 0 (x) = 4n2i E[Xi1 2
].
K
1
K ni (Si (x) S0 (x))2 dx E (Si (x) S0 (x))2 dx 0
i=1 ni i=1
+
D= E (
S(x, i ) +
S(x, i ) 2S(x, i ))(
S(x, i )
S(x, i )) dx
+ +
= E (
S(x, i )
S(x, i ))2 dx + 2 E ( S(x, i ) S(x, i ))(
S(x, i )
S(x, i )) dx.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 331
Note |Z i Zi | 1 and
n02 n 2 1 2
/ 2
2
/ 2 2
0 02
|Z i Zi | = 2 0 = 0 0
0 0
.
0 + n02 02 + n 20 n 1 + 02 /n02 1 + 02 /n 20 02 02
It follows that
2
2
max |Z i Zi | min 02 02 , 1 = A (say).
1iK 0 0
2
2 1
K
S(x, i )
S(x, i ) 2(1 Z i )2 Sr (x) S0 (x) + 2(Z i Zi )2 (Si (x) S0 (x))2
K r=1
K
2
2
2 (Sr (x) S0 (x)) + 2A2 (Si (x) S0 (x))2 .
K r=1
Consequently,
+
max E (
S(x, i ) S(x, i ))2 dx
1iK
K
2
2 + +
2
max E (S r (x) S0 (x)) dx + 2 E A (Si (x) S0 (x)) 2
dx
1iK K 2
r=1
2 +
2 0
= + 02 + 2 max E A2 (Si (x) S0 (x))2 dx.
K n 1iK
+ + 2
2
2+ 2/(2+)
2/(2+)
E A (Si (x) S0 (x)) dx E[A ]
2
E (Si (x) S0 (x)) 2(2+)/
dx
+ 2
2/(2+) 2/(2+)
E[A2+ ] E (Si (x) S0 (x))2 dx
332 X. CAI ET AL.
2/(2+) 2
2/(2+) +
02 (x)
= E[A2+ ] + 02 (x) dx .
n
+ 2
max E S(x, i )
S(x, i ) dx
1iK
+ 2 2/(2+)
2
2 02
2+ 2/(2+) 0 (x)
+ 0 + 2 E[A ]
2
+ 0 (x)
2
dx 0 as K
K n n
This is equivalent to
Z 1
n n
lim (Xi
( ))+ = lim Z lim (Xi ( ))+ = E [X ( )]+ . (A.7)
n n n n n
i=1 i=1
It follows that
(x
( ))+ dF 0 (x) (x ( ))+ dF 0 (x) + (x
( ))+ (x ( ))+ dF 0 (x)
(x ( ))+ dF 0 (x) + |
( ) ( )| dF 0 (x)
= (x ( ))+ dF 0 (x) + |
( ) ( )| .
Thus
(1 Z) (x
( ))+ dF 0 (x) (1 Z) (x ( ))+ dF 0 (x) + |
( ) ( )| 0 (A.8)
converges to E[X] + E [(X ( ))+ ] almost surely. This completes the proof.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 333
(X| ) H (X| ) =
H xg (ZSn (x) + (1 Z) S0 (x)) d(ZFn (x) + (1 Z) F0 (x))
xg (S(x, )) dF(x, )
= x g (ZSn (x) + (1 Z) S0 (x)) g (S(x, )) d(ZFn (x) + (1 Z) F0 (x))
+ xg (S(x, )) d(ZFn (x) + (1 Z) F0 (x)) xg (S(x, )) dF(x, ).
Therefore, the consistency follows from Theorem 2.2, the strong law of large numbers, and Z 1 as n :
x g (ZSn (x) + (1 Z) S0 (x)) g (S(x, )) d(ZFn (x) + (1 Z) F0 (x))
|x| g (ZSn (x) + (1 Z) S0 (x)) g (S(x, )) d(ZFn (x) + (1 Z) F0 (x))
C max |ZSn (x) + (1 Z) S0 (x) S(x, )| |x| | d(ZFn (x) + (1 Z) F0 (x))| 0
x
and
xg (S(x, )) d(ZFn (x) + (1 Z) F0 (x)) xg (S(x, )) dF(x, ) 0.
Proof. By the modeling assumptions it is easy to see ( ) = 1/ , 0 = /( 1), 2 = 2 /[( 1)( 2)], and 2 =
2 /[( 1)2 ( 2)]. Hence Z c = n/(n + 1) and
c ( ) = (nX n +)/(n+ 1). On the other hand, as S0 (x) = (/( + x))
and E [S(x, ) ] = (/( + 2x)) , we have
2
2
02 (x) = and 02 (x) = ,
+ 2x +x +x + 2x
implying 02 = /[2(2 1)( 1)] and 02 = /[2( 1)]. The credibility factor and the estimator of (3.3) are then given,
respectively, by
n n 2 1
Z= and
( ) = Xn + .
n + 2 1 n + 2 1 n + 2 1 1
A.3.3. The Monte Carlo Approximation of Pans Credibility Premiums under the Esscher Principle
1 ( + k) 1 ( + i + j )
E [S(x, )] = and E [S(x, )]2 = .
() k=x+1 k! ( + 1)+k () i=x+1 j =x+1 i!j ! ( + 2)+i+j
Consequently,
+1 k
1 ( + k) ( + 1 + k) 1
E [S(x, )] = = = ,
x=0
() x=0 k=x+1 k! ( + 1)+k k=0 k! ( + 1) +1 +1
where the last equality holds because the summands are the probabilities of a negative binomial distribution. It follows that
min(i, j ) ( + i + j )
02 = E [S(x, )] E S(x, )2 =
x=0
i=1 j =1
i!j ! () ( + 2)+i+j
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 335
and
02 = Var (S(x, )) = E [S(x, )]2 {E [S(x, )]}2
x=0 x=0
2
( + i + j ) ( + k)
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2) +i+j
x=0 k=x+1
k! () ( + 1)+k
2 ( + i) ( + j )
( + i + j )
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2)+i+j x=0 i=x+1 j =x+1
i!j ! ()2 ( + 1)2+i+j
( + i + j ) ( + i) ( + j )
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
min(i,j
)1
( + i + j ) ( + i) ( + j )
=
i=1 j =1 x=0
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
min(i, j ) ( + i + j ) ( + i) ( + j )
= .
i=1 j =1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
( + nX n )eh
HB Xn = . (A.10)
+ n eh + 1