Professional Documents
Culture Documents
Alexey Lindo
Alexey Lindo
ISBN 978-91-7597-326-5
Alexey Lindo, 2016.
Doktorsavhandlingar vid Chalmers tekniska hgskola
Ny serie nr 4007
ISSN 0346-718X
Department of Mathematical Sciences
Chalmers University of Technology
and University of Gothenburg
412 96 Gteborg
Sweden
Phone: +46 (0)31-772 10 00
Abstract
Markov processes are among the most widely used stochastic models both in theory
and applications. Therefore, we are especially interested in obtaining explicit results,
either analytically or numerically.
In the first paper of the thesis we consider a class of Markov processes called GaltonWatson branching processes. Linear-fractional Galton-Watson processes are among the
few cases where many characteristics of the process can be computed explicitly. We
extend the two-parameter linear-fractional family to a much richer four-parameter family of reproduction laws. The corresponding process also allows explicit computations,
with the possibility of an infinite mean, or even an infinite number of offspring.
The linear-fractional family can be otherwise generalized by introducing the types
for the individuals. Recently, Sagitov studied such processes with countably many types.
In our second paper, we study the linear-fractional processes with a general type space.
For this family of processes, we obtained transparent limit theorems for the subcritical,
critical and supercritical cases. Furthermore, for general linear-fractional processes we
proved an abstract Perron-Frobenius theorem, without any conditions on -algebra of
the type space and without employing the Nummelin-Athreya-Ney regeneration technique.
In the third paper, we develop new non-parametric estimation methods for the
compound Poisson distribution. The concerned estimation problem arises in the inference of Lvy processes recorded at equidistant time intervals. The key estimator is
based on the series decomposition of functionals of a measure and relies on the steepest
descent technique recently developed in the variational analysis of measures. We provide a computer implementation of the developed method together with a broad range
of simulation results.
The fourth paper of the thesis concerns limit theorems involving two functionals
for a random matrix with independent and uniformly distributed elements drawn from
the finite cyclic group of integers. Applying the Chen-Steins method we derive bounds
on total variation distance between a Poisson distribution and the distribution of the
functionals.
Preface
This thesis consists of the following papers.
(1) Sagitov, S., Lindo, A. (2016)
A special family of Galton-Watson processes with explosions.
In Branching Processes and Their Applications. Lect. Notes Stat.
Proc. (I.M. del Puerto et al eds.) Springer (to appear).
(2) Lindo, A., Sagitov, S. (2015)
General linear-fractional branching process with discrete time.
Preprint.
(3) Lindo, A., Zuyev, S., Sagitov, S. (2015)
Nonparametric estimation of infinitely divisible distributions based
on variational analysis on measures.
Preprint.
(4) Lindo, A., Sagitov, S. (2015)
Asymptotic results for the number of Wagners solutions to a generalised birthday problem.
Statistics and Probability Letters. 107, 356361.
iii
Acknowledgements
First of all, I would like to thank my advisers Serik Sagitov and Sergey
Zuyev for guiding and supporting me over the years. I was always amazed and
inspired by the ability of Serik to discard all irrelevant information and get
straight to the point. I would like to thank Serik for the long discussions we
had that helped me to grasp all the details of my work. Seriks criticism was
always constructive and helped me to obtain a higher standard in my research.
I hope that one day I will become as good an advisor as Serik was to me.
I am grateful to Sergey for always being enthusiastic about our research
and for providing thought-provoking ideas.
I am indebted to Aila Srkk for her ability to listen and help in the most
difficult situations.
I would like to offer my special thanks to Gerold Alsmeyer for inviting me
as a guest researcher to the University of Mnster.
I am grateful to our faculty members and especially to Peter Jagers for
his insightful comments on Paper I. I would like to thank Patrik Albin, Peter
Hegarty, Olle Hggstrm, Torgny Lindvall, Holger Rootzn, Mats Rudemo and
Jeff Steif for the wonderful graduate courses. I also express my thanks to Petter Mostad for giving me an opportunity to lecture and directing me through
important elements of teaching.
I would like to thank the guests of our department Kostya Borovkov, Petter
Guttorp, Ildar Ibragimov and Gnter Last for their time and valuable research
suggestions.
I wish to thank the administrative staff, and especially Lotta Fernstrm,
who knows the answer to every question and is always there to help.
I would like to thank my wonderful friends in the department. You have all
assisted me in my studies and finishing this work would have been impossible
without you.
I would like to thank my coaches Anders Holme from Gteborg Sim and
Evgeny Zubkov from CSKA Moscow for keeping my mind in a healthy body.
v
vi
ACKNOWLEDGEMENTS
Contents
Abstract
Preface
iii
Acknowledgements
Part I. INTRODUCTION
1. Random walks, Poisson processes and Markov jumps
2. Markov branching processes
3. Linear birth and death process
4. Theta-branching process
5. Multi-type linear-fractional branching process
6. Estimation of the jump size distribution
7. Generalized birthday problem
8. Contributions of Alexey Lindo to the joint papers
Bibliography
1
3
4
5
6
9
10
12
13
15
19
23
23
25
26
28
31
33
35
38
41
41
43
45
vii
viii
CONTENTS
47
49
51
53
57
57
62
66
68
71
74
77
79
83
83
85
87
89
90
91
Part I
INTRODUCTION
are independent for any t1 < t2 < < tn . The process {X(t)}t0 has stationary
increments if the distribution of X(t + s) X(t) depends only on the length s of
the time interval [38, Section 1.3]. A Poisson process {N (t)}t0 with intensity
is a continuous time process with independent and stationary increments.
The size of an increment in any time interval of length t follows the Poisson
distribution with mean t.
The process {N (t)}t0 can be viewed as a counting process that represents
the number of events occurring up to time t. From the assumptions on increments, it follows that the waiting time between the arrival of two consecutive events has an exponential distribution. Moreover the whole sequence of
waiting times between arrivals is formed by independent and identically distributed exponential random variables.
Combining the two discussed models, the general random walk and Poisson process, we arrive at the compound Poisson process. The mathematical
definition was first given by F. Lundberg [60] in the same PhD thesis in which
the Poisson process was defined. Let {Xi }
i=1 be a sequence of independent and
3
X
i
X
j
j
Pij (t)s =
P1j (t)s .
j=0
j=0
The term branching processes was introduced by Kolmogorov and Dmitriev in [46],
where the branching processes were treated as the Markov processes with a
countable number of states.
The branching processes are of special importance in biology, where they
are widely used to model the population size fluctuations [27], [31], [42]. They
also appear naturally in epidemic models, see e.g. a recent paper [52]. A
Markov branching process describes the size Z(t) at time t of a population of
independently evolving particles. Each particle lives for an exponential time
4
where a := e(bd) and c := (1 e(bd) )/(b d). For the linear-fractional process the n-th iteration f n of the offspring generating function can be calculated
explicitly
an
1
=
+ c(1 + a + an1 ),
1 f n (s) 1 s
and in this case the n-step transition probability can be obtained as a closedform expression in terms of parameters a and c. Knowing iterations explicitly,
we can find most of the limit distributions exactly. For example, we can find
Yaglom limits and transition functions of the Q-process [82], describe the Martin boundary [74] and obtain results for the coalescent process [51]. Starting
from the seminal paper of Kozlov [48], linear-fractional generating functions
play a very special role in the theory of branching processes in a random environment. Knowing the closed form expression for the iterations we can simply
calculate the answer of some technical problem. For example, recently it was
shown that under the quenched approach the critical branching process in a
random environment dies out slowly [98]; at the same time under the annealed
approach the sudden extinction of the population may occur [99].
4. Theta-branching process
Iterations of linear-fractional generating functions provide important insights into the limiting behaviour of branching processes. Therefore, it is important to understand the mechanism behind the iterations of generating functions from this family. In the spirit of [86], we consider the functional equation
approach to explicit iterations of probability generating functions. Let
s
V (s) :=
,
1s
where clearly V is a generating function and it is easily invertible. The class of
linear-fraction generating functions can be viewed as a collection of functions
f satisfying the functional equation
(4.1)
where a = 1/f (1) and p0 = f (0). In this case, the solutions form a two-parameter
class of probability generating functions uniquely determined by the pair a
(0, ) and p0 [0, 1). It can be easily checked that
V (p0 ) = (1 a)V (q),
6
X
k=1
vk s k ,
vk 0,
V (s) < ,
s [0, 1),
that for a set of pairs (a, p0 ), relation (4.1) defines probability generating function f whose iterations f n have the form
an 1
.
a1
Given that the inverse V1 has the closed-form expression, the iterations can be
computed explicitly as
1 an
n
f n (s) = V1 a V (s) + V (p0 )
.
1a
In [82] we produced a new family of probability generating functions with
explicit iterations. The corresponding functions V are presented in the table
below.
V (f n (s)) = an V (s) + V (f n (0)),
V (f n (0)) = V (p0 )
V (s)
Parameters
(A s) A
a (0, 1), q [0, 1], (0, 1], A 1
||
||
A (A s)
a (0, 1), q [0, 1], [1, 0), A 1
log A log(A s) a (0, 1), q [0, 1], = 0, A 1
The presented family covers almost all known probability generating functions with explicit iterations and we call it the theta-family. We use the name
theta-process for the Galton-Watson process with the offspring generating function belonging to the theta-family.
To our knowledge, the only other example is due to Harris [28, Section
I.7.2], which falls into the framework of functional equations presented above.
For the Harris family of generating functions with explicit iterations,
the function V is
a > 1,
V (s) = s k /(1 s k ).
The theta-family contains non-regular generating functions f (1) < 1 related to improper probability distributions. We can interpret the quantity
1 f (1) as the probability of producing the mutant individual. The thetabranching process has two absorbing states
T0 := inf{n : Zn = 0},
T1 := inf{n : Zn = },
7
1
r,
A1
r [0, ].
where
=
||,
r (0, ],
1 1
(log A1 ) , r = 0,
w=
1,
r {0} {},
r
1 e , r (0, ).
log w
log a , where
is the Euler-Mascheroni
t [0, ), u [0, ),
where the summands are independent and identically distributed. This observation led de Finetti to another central notion of the theory, the notion of
10
of n independent and identically distributed random variables. Infinite divisibility is actually a distributional property. For example, Normal, Poisson,
negative binomial, Gamma, Gumbel and compound Poisson distributions are
all infinitely divisible. De Finettis wonderful insights were that increments of
the Lvy processes are necessarily infinitely divisible and characteristic functions can be used to prove it. Kolmogorov confirmed de Finettis conjecture for
processes with finite second moments in [44], [45]. In full generality, the result
was obtained in the works of Lvy [54], [55] and Khintchine [43]. In particular,
they showed that the characteristic function of an infinitely divisible random
variable is of the from Ee iW = e() with
Z
22
+ (e ix 1 ix 1I{|x|<} )(dx),
() = ia
2
where the real number a is a drift term, 2 0 is the diffusion coefficient and
measure is called the Lvy measure. The above representation is called the
canonical Lvy-Khintchine representation. Moreover Lvy fully described the
structure of continuous time processes with independent and stationary increments. He showed that not only increments of the Lvy process are infinitely
divisible, but that the converse is also true and for a given infinitely divisible
distribution we can construct the Lvy process {Wt }t0 , with W1 having that
distribution [84, Theorem 7.10]. This result can be viewed as an embeddability criterion for a random walk, namely the random walk can be embedded
into a Lvy process if and only if its increments are infinitely divisible. For a
brief historical exposition of the subject we refer to [62] and Notes to Chapter
2 in [84].
In practice, we cannot observe the trajectory of the Lvy process {W (t)}t0
continuously. In the inference of Lvy processes, there exist two main approaches to the data collection, called low frequency and high frequency sampling. Imagine that we observe the embedded random walk {W (k)}
k=0 up to
time T at sampling rate > 0. In the low frequency approach, the data is collected by observing the process at the constant frequency . In other words,
observations are equidistant. The asymptotic results are obtained by letting
the size of the observation window T go to infinity. Taking the high frequency
approach, we keep the size observation window T fixed, but increase the rate
at which measurements are collected. The importance of choosing the proper
sampling rate is described in [19].
The embedded random walk {W (k)}
k=0 has necessarily infinitely divisible increments and consequently the statistical inference problem reduces to
11
the recovery of the Lvy triple (a, 2 , ). The first two elements of the triple,
(a, 2 ) describe the diffusive part of the Lvy process {W (t)}t0 and the positive
measure governs the pure-jump part of the process. The complexity of the
problem comes from its non-parametric nature and the intrinsic properties of
the space of measures. Note that the set of finite positive measures on the real
line is closed in the space of signed measures only under conic combinations;
moreover, it contains no inner points. Recently, a gradient descent method of
constrained optimization in the space of measures was presented in [68].
In [58], we used the low-frequency approach, assuming that the Lvy process {Wt }t0 is observed at equidistant time points and consequently we have
a sample from an infinitely divisible distribution. We proposed two methods, which we call ChF (Characteristic Function Fitting) and CoV (Convolution
Method).
The ChF method targets the general problem of the Lvy triple estimation.
It is based on fitting the infinitely divisible characteristic function to the empirical characteristic function [23] constructed from the observations.
The second method is intended specifically for the compound Poisson processes. This method is based on the connection between compound Poisson
processes and associated Poisson point processes on the plane [39, Section 16.9]
and Taylor-like expansions developed for the expectations functionals of the
Poisson point processes [53], [67]. We note that every pure discontinuous Lvy
process can be approximated by a compound Poisson process with any degree
of accuracy [11].
It is still an open question whether space and time are discrete or continuous [30]. Nevertheless, it is common to consider that the physical process
makes discrete changes on a microscopic scale and use the diffusion approximations for macroscopic behaviour, see for example [96]. In finance, we are
certain that characteristics like prices are discontinuous and therefore should
be modeled by jump processes [11]. Therefore, pure-jump processes form a
very important subclass of Lvy processes.
Further research topics. The low frequency sampling approach also plays
an important role in the statistical inference of continuous time Markov branching processes [41]. We believe that by using the obtained results and a connection between branching processes and compound Poisson processes, new nonparametric statistical methods can be obtained. We are planning to investigate
this possibility in the future. It is also interesting to investigate the statistical
properties of the ChF and CoV estimators.
7. Generalized birthday problem
Richard von Mises introduced the birthday problem [66] in 1939. He asked
how many people should gather before the probability of having two people
12
born on the same day of the year exceeds 50 percent. The surprising answer is
only 23. The total number of pairs of people that share the same birthday can
be approximated by the Poisson distribution, see for example [3, Section 5.3].
A related, to problem setting, cryptograpic attack called birthday attack [85]
is justified by this probabilistic observation. Mathematically, a pair x1 , x2 is
called a collision for a given function f if f (x1 ) = f (x2 ). The birthday attack
uses some random process to find a collision in the cryptographic hash functions.
There are various generalizations of the birthday problem, with an arbitrary number of days in a year, with multiple matches, with multiple types of
individuals, see e.g. [24] and references therein. One of the generalizations,
particularly important for cryptography, was proposed by Wagner in [100]. In
the same article, Wagner presented an effective method of finding some of the
solutions of this generalized birthday problem. An important quantity of interest for any algorithm is its average time complexity [12]. In [57], we reformulated Wagners generalization of the birthday in terms of random matrices over
a finite cyclic group of integers. Then, the number of solutions to the problem
is the value of a certain functional of the random matrix. It turns out that the
Chen-Stein method can be used to bound the total variation distance between
the distribution of the considered functionals and Poisson distribution. From
this and other results presented in our paper [57], the answer to the question of
average time complexity can be derived. We want to emphasize that the effectiveness of the Chen-Stein method for this problem follows from the fact that
Wagners algorithm is built upon the full binary tree. It also connects our work
to the research on other tree-based algorithms like binary search [18], [29] and
quick sort [80].
Further research topic. The Wagners algorithm can be seen as a coalescent
process on a binary tree. It is interesting to investigate the connection of our
results presented in [57] with the recent works [7], [14].
14
Bibliography
[1] Athreya, K.B., Ney, P.E. (1972) Branching processes. Springer.
[2] Athreya, K.B., Ney, P.E. (1978) A new approach to the limit theory of recurrent Markov
chains. Trans. Amer. Math. Soc. 245, 493501.
[3] Barbour, A.D., Hols, L., Janson, S. (1992) Poisson approximation. Oxford Studies in Probability. Clarendon Press.
[4] Basharin, G.P., Langville, A.N., Naumov, V.A. (2004) The life and work of A.A. Markov.
Linear Algebra Appl. 386, 326.
[5] Bateman, H. (1910) Note on the probability distribution of -particles. Phil. Mag. 20(6), 698
704.
[6] Baum, L.E., Petrie, T. (1966) Statistical inference for probabilistic functions of finite state
Markov chains. Ann. Math. Statist. 37(6), 15541563.
[7] Benjamini, I., Lima, Y. (2014) Annihilation and coalescence on binary trees. Stoch. Dyn. 14(3).
[8] Borovkov, A.A. (1998) Ergodicity and stability of stochastic processes. Wiley.
[9] Campbell, N.R. (1909) The study of discontinuous phenomena. Proc. Cam. Phil. Soc. 15, 117
136.
[10] Chung, K.L. (1960) Markov chains with stationary transition probabilities. Springer-Verlag.
[11] Cont, R., Tankov, P. (2003) Financial modeling with jump processes. Chapman and Hall/CRC.
[12] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009) Introduction to Algorithms . 3rd
ed., MIT Press and McGraw-Hill.
[13] Cover, T.M., Thomas, J.A. (2006) Elements of information theory. 2nd ed., Wiley-Interscience.
[14] Debs, P., Haberkorn, T. (2015) Diseases transmission in a z-ary tree. arXiv:1507.06483.
[15] de Finetti, B. (1929) Sulle funzioni a incremento aleatorio. Rendiconti della Reale Academia
Nazionale dei Lincei. (Ser. VI) 10, 163168.
[16] de Finetti, B. (1929) Sulla possibilit di valori eccezionali per una legge di incrementi
aleatori. Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI) 10, 325329.
[17] de Finetti, B. (1929) Integrazione delle funzioni a incremento aleatorio. Rendiconti della Reale
Academia Nazionale dei Lincei. (Ser. VI) 10, 548553.
[18] Devroye, L. (2005) Applications of Steins method in the analysis of random binary search
trees. In Steins method and Applications. Institute for Mathematical Sciences Lecture Notes
Series. World Scientific Press. 247297.
[19] Duval, C. (2014) When is it no longer possible to estimate a compound Poisson process?
Electron. J. Statist. 8(1), 274301.
[20] Erlang, A.K. (1909) The theory of probabilities and telephone conversations. Nyt. Tidsskr.
Mat. 20, 3341.
[21] Feller, W. (1940) On the integr-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488515.
[22] Feller, W. (1968) An introduction to probability theory and its applications. vol. 1, John Wiley
& Sons.
[23] Feuerverger, A., Mureika, R.A. (1977) The empirical characteristic function and its applications. Ann. Statist. 5, 8897.
[24] Gorroochurn, P. (2012) Classic problems of probability. Wiley-Blackwell.
[25] Goryainov, V.V. (1993) Fractional iteration of probability generating functions and imbedding discrete branching processes in continuous processes. Russian Acad. Sci. Sb. Math. 79(1),
4761.
[26] Guttorp, P., Thorarinsdottir, T.L. (2012) What happened to discrete chaos, the quenouille
process, and the sharp markov property? Some history of stochastic point processes. International Statistical Review 80(2), 253268.
15
[27] Haccou, P., Jagers, P., Vatutin, V.V. (2005) Branching processes: variation, growth and extinction of populations. Cambridge University Press.
[28] Harris, T.E. (1963) The theory of branching processes. Springer, Berlin.
[29] Holmgren, C., Janson, S. (2015) Limit laws for functions of fringe trees for binary search
trees and random recursive trees. EJP 20, 151.
[30] Hossenfelder, S. (2013) Minimal length scale scenarios for quantum gravity. Living Rev.
Relativity 16.
[31] Jagers, P. (1975) Branching processes with biological applications. John Wiley and Sons.
[32] Jagers, P. (1989) General branching processes as Markov fields. Stoch. Proc. Appl. 32(2), 183
212.
[33] Jagers, P. Nerman, O. (1996) The asymptotic composition of supercritical, multi-type
branching populations. Sminaire de probabilits de Strasbourg. 30, 4054.
[34] Joffe, A., Letac, G. (2006) Multitype linear fractional branching process. J. Appl. Probab.
43(4), 10911106.
[35] Kaj, I. (2002) Stochastic modelling in broadband communication systems. Society for Industrial
and Applied Mathematics.
[36] Karlin, S., McGregor, J. (1968a) Embeddability of discrete time simple branching processes
into continuous time branching processes. TAMS 132, 115136.
[37] Karlin, S., McGregor, J. (1968b) Embedding iterates of analytic functions with two fixed
points into continuous groups. TAMS 132, 137145.
[38] Karlin, S., Taylor, H.M. (1975) A first course in stochastic processes. Academic Press.
[39] Karlin, S., Taylor, H.M. (1981) A second course in stochastic processes. Academic Press.
[40] Kerstan, J., Matthes, K., Mecke, J. (1978) Infinitely divisible point processes. Wiley.
[41] Kiedling, N. (1975) Maximum likelihood estimation in the birth-and-death process. Ann.
Stat. 3(2), 363372.
[42] Kimmel, M., Axelrod, D.E. (2015) Branching processes in biology. 2ed., Springer.
[43] Khintchine, A.Ya. (1937) A new derivation of a formula by P. Lvy. Bulletin of the Moscow
State University. 1, 15 (in Russian).
[44] Kolmogorov, A.N. (1932) Sulla forma generale di un processo stocastico omogeneo (Un
problema di Bruno de Finetti). Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI)
15, 805808.
[45] Kolmogorov, A.N. (1932) Ancora sulla forma generale di un processo stocastico omogeneo.
Rendiconti della Reale Academia Nazionale dei Lincei. (Ser. VI) 15, 866869.
[46] Kolmogorov, A.N., Dmitriev, N.A. (1947) Branching random processes. Dokl. Akad. Nauk.
SSSR 56, 710.
[47] Kolmogorov, A.N., Sevastianov, B.A. (1947) Calculation of final probabilities of branching
random processes. Dokl. Akad. Nauk. SSSR 56, 783786.
[48] Kozlov, V.M. (1977) On the asymptotic behavior of the probability of non-extinction for
critical branching processes in a random environment. Theory Probab. Appl. 21(4), 791804.
[49] Kuczma, M., Choczewski, B., Ger, R. (1990) Iterative functional equations. Encyclopedia of
mathematics and its applications. Cambridge University Press.
[50] Lambert, A. (2010) The contour of splitting trees is a Lvy process. Ann. Probab. 38(1), 348
395.
[51] Lambert, A., Popovic, L. (2013) The coalescent point process of branching trees. Ann. Appl.
Probab. 23(1), 99144.
[52] Lambert, A., Trapman, P. (2013) Splitting trees stopped when the first clock rings and Vervaats transformation. J. Appl. Probab. 50(1), 208227.
[53] Last, G. (2014) Perturbation analysis of Poisson processes. Bernoulli 20(2), 486513.
16
[54] Lvy, P. (1934) Sur les intgrales dont les lments sont des variables alatoires indpendentes. Annali della Regia Scuola Normale di Pisa. (Ser. II) 3, 337366.
[55] Lvy, P. (1935) Observation sur un prcedent mmoire de lauteur. Annali della Regia Scuola
Normale di Pisa. (Ser. II) 4, 217218.
[56] Liemant, A., Matthes, K., Wakolbinger, A. (1988) Equilibrium distributions of branching processes. Springer.
[57] Lindo, A., Sagitov, S. Asymptotic results for the number of Wagners solutions to a generalised birthday problem. Statistics and Probability Letters. 107, 356361.
[58] Lindo, A., Zuyev, S., Sagitov, S. Nonparametric estimation of infinitely divisible distributions based on variation analysis on measures. arXiv:1510.04968.
[59] Lindo, A., Sagitov, S. General linear-fractional branching processes with discrete time.
arXiv:1510.06859.
[60] Lundberg, F. I. Approximerad framstllning af sannolikhetsfunktionen. II. terfrskring af
kollektivrisken (PhD Dissertation, Uppsala University, Uppsala, Sweden). Almqvist & Wiksell.
[61] Lundberg, O. On random processes and their application to sickness and accident statistics (PhD
Dissertation, Stockholm University, Stockholm, Sweden). Almqvist & Wiskell.
[62] Mainardi, F., Rogosin, S. (2006) The origin of infinitely-divisible distributions: from de
Finettis problem to Lvy-Khinchine formula. Math. Methods Econ. Finance 1, 3755.
[63] Markov, A.A. (1906) Extension of the law of large numbers to dependent quantities (in Russian). Izv. Fiz.-Matem. Obsch. Kazan. Univ. (2nd Ser.) 15, 135156.
[64] Markov, A.A. (1907) An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains (in Russian). Izv. Akad. Nauk (VI Ser.) 7, 153163.
[65] Meyn, S., Tweedie, R.L. (2009) Markov chains and stochastic stability. 2ed., Cambridge University Press.
[66] Mises, R. von. (1939) ber Aufteilungs- und Besetzungs-Wahrscheinlichkeiten. Revue de la
Facult des Sciences de lUniversit dIstanbul: Sciences naturelles 4, 145-163.
[67] Molchanov, I., Zuyev, S. Variational analysis of functionals of Poisson processes. Mathematics of Operations Research 25(3), 485508.
[68] Molchanov, I., Zuyev, S. (2004) Optimisation in space of measures and optimal design.
ESAIM: Probability and Statistics 8, 1224.
[69] Nagaev, S.V. (2015) The spectral method and the central limit theorem for general Markov
chains. Doklady Mathematics. 91(1), 5659.
[70] Nagaev, S.V., Wachtel, V. (2007) The critical Galton-Watson process without further power
moments. J. Appl. Probab. 44(3), 753769.
[71] Nerman, O. (1984) The growth and composition of supercritical branching populations on general
type spaces. Dep Mathematics, Chalmers U. Tech. and Gothenburg U. 1984-4.
[72] Nummelin, E. (1978) A splitting technique for Harris recurrent Markov chains. Z. Wahrsch.
Verw. Gebiete 43(4), 309318.
[73] Nummelin, E. (1984) General irreducible Markov chains and non-negative operators. Cambridge
University Press.
[74] Overbeck, L. (1994) Martin boundaries of some branching processes. Ann. Inst. Henri
Poincar Probab. Stat. 30(2), 181195.
[75] Page, L., Brin, S., Motwani, R., Winograd, T. (1998) The PageRank citation ranking: bringing
order to the web. Technical report. Stanford InfoLab.
[76] Pearson, K. (1905) The problem of the random walk. Nature 72, 294.
[77] Pollak, E. (1974) Survival and extinction times for some multitype branching processes.
Adv. Appl. Prob. 6, 446462.
[78] Reichl, L.E. (2009) A modern course in statistical physics. 3rd ed., Wiley-VCH.
17
18
Part II
PAPERS
PAPER I
PAPER I
1. Introduction
Consider a Galton-Watson process (Zn )n0 with Z0 = 1 and the offspring
number distribution
pk = P(Z1 = k),
k 0.
The properties of this branching process are studied in terms of the probability
generating function
f (s) = p0 + p1 s + p2 s2 + . . . ,
where it is usual to assume that f (1) = 1, however, in this paper we allow for
f (1) < 1 so that a given particle may explode with probability p = 1f (1). The
probability generating function f n (s) = E(s Zn ) of the size of the n-th generation
is given by the n-fold iteration of f (s)
f 0 (s) = s,
f n (s) = f (f n1 (s)),
n 1,
k 1,
fully characterized by just two parameters: p0 [0, 1) and p (0, 1]. In Section 2
for each [1, 1] we introduce a family G of functions with explicit iterations
containing the linear-fractional family as a particular case. In Section 3 we
demonstrate that all f G are probability generating functions with f (1)
1. A Galton-Watson processes with the reproduction law whose probability
generating function belongs to G will be called a theta-branching process.
The basic properties of the theta-branching processes are summarized in
Section 4, where it is shown that this family is wide enough to include the cases
of infinite variance, infinite mean, and even non-regular branching processes
with explosive particles.
Recall that the basic classification of the Galton-Watson processes refers to
the mean offspring number m = EZ1 . Let q [0, 1] be the smallest non-negative
root of the equation f (x) = x and denote by
T0 = inf{n : Zn = 0}
the extinction time of the branching process. Then q = P(T0 < ) gives the
probability of ultimate extinction. For m 1 and p1 < 1, the extinction probability is q = 1, while in the supercritical case m > 1, we have q < 1.
If f (1) < 1, then the Galton-Watson process is a Markov chain with two
absorption states {0} and {}. In this case the branching process either goes
extinct at time T0 or explodes at the time
T1 = inf{n : Zn = },
with
P(T1 n) = 1 f n (1),
P(T1 < ) = 1 q,
where the latter equality is due to f n (1) q. In Section 5, using explicit formulas for f n (s) we compute the distribution of the absorption time
T = T0 T1 .
p
1p
, c=
.
1 p0
1 p0
This observation immediately implies that the n-fold iteration f n of the linearfractional f is also linear-fractional
a=
an
1
=
+ c(1 + a + . . . + an1 ).
1 f n (s) 1 s
(A f (s)) = a(A s) + c,
s [0, A),
with the help of two extra parameters (A, ) which are invariant under iterations.
Definition 2. Let (1, 0) (0, 1]. We say that a probability generating
function f belongs to the family G if
f (s) = A [a(A s) + c]1/ ,
0 s < A,
(i) a 1,
c > 0,
(0, 1], A = 1,
(ii) a (0, 1), c = (1 a)(1 q) , q [0, 1), A = 1,
(iii) a (0, 1), c = (1 a)(A q) , q [0, 1], A > 1.
25
0.
0 s < A,
0 s < .
Definition 4. A Galton-Watson process with the reproduction law whose probability generating function f G , [1, 1], will be called a theta-branching
process.
It is straightforward to see, cf. Section 4, that each of the families G is
invariant under iterations: if f G , then f n G for all n 1. The fact, that
the functional families in Definitions 2 and 3 are indeed consist of probability
generating functions with f (1) 1, is verified in Section 3.
Parts of the G families were mentioned earlier in the literature as examples
of probability generating functions with explicit iterations. Clearly, G1 G1 is
the family of linear-fractional probability generating functions. Examples in
[10] leads to the case A = 1 and [0, 1), which was later given among other
examples in Ch. 1.8 of [8]. The case A = 1 and (0, 1) was later studied in [9].
A special pdf with = 1/2,
p1 = (A q)1a aAa1 ,
na1
, n 2.
pn = pn1
nA
Therefore, (pn )n1 are monotonely decreasing with
n
Y
1+a
a
1a n
1
pn = aA (A q) A
,
k
k=2
n 2,
and for n 2,
n1
X
cA i
Bi,n ,
pn =
1+
a
+
cA
(a + cA ) n!
aAn+1
i=1
where all Bi,n = Bi,n () are non-negative and, for n 2, satisfy the recursion
Bi,n = (n 2 i)Bi,n1 + (1 + i)Bi1,n1 , i = 1, . . . , n 1,
A f (s)
= [a + c(A s) ]1/ ,
As
f (s) = a(s)1+ ,
f (s) = (1 + )ac(A s)1 (s)1+2 ,
(n)
(s) =
n1
X
i=1
n 2,
where Bi,n are defined in the statement. To finish the proof it remains to apply
the equality pn = f (n) (0)/n!.
27
Proof. Put
X
g(s) = (s 1)f (s) = p0 +
(pk1 pk )s k
k=1
From
g(s) = s 1 + (1 s)2 [a + c(1 s) ]1/ ,
c(1 + )
> 0.
a+c
Furthermore,
G (s) = c(1 + )(1 f (s))1 f (s)
k 2.
4. Basic properties of f G
In this section we distinguish among nine cases inside the collection of families {G }11 and summarize the following basic fomulas: f n (s), f (1), f (1),
f (1). In all cases, except Case 1, we have a = f (q). The following definition,
cf [3], explains an intimate relationship between the Cases 3-5 with A = 1 and
the Cases 7-9 with A > 1.
Definition 7. Let A > 1 and a probability generating function f be such that
f (A) A. We call
f (sA) X
=
pk Ak1 s k
f (s) :=
A
k=0
= q.
Clearly,
the dual generating function for f and denote q = qA1 , so that f(q)
= f (q).
f (q)
28
d (0, ).
c (0, ).
The corresponding theta-branching process is critical with either finite or infinite variance. If (0, 1), then f (1) = and for = 1 we have f (1) = 2c.
This is the only critical case in the whole family of theta-branching process.
Case 3: (0, 1], a (0, 1),
h
i1/
f n (s) = 1 an (1 s) + (1 an )(1 q)
,
q [0, 1).
q [0, 1).
q [0, 1).
1 f (1) = (1 a)1/|| (1 q)
f n (s) = an s + (1 an )q,
q [0, 1].
q [0, 1].
We have f (A) = A, and the dual generating function has the form of the Case 3:
]1/ .
f(s) = 1 [a(1 s) + (1 a)(1 q)
q [0, 1].
and
We have f (A) = A, and the dual generating function belongs to the Case 4:
1a (1 s)a .
f(s) = 1 (1 q)
q [0, 1].
we obtain
For our special family of branching processes we compute explicitly the distribution functions of the times T0 , T1 , T .
Cases 1-4. In these regular cases we are interested only in the extinction
time:
n/
a
[1 + d dan ]1/ ,
1/ ,
(1 + cn)
P(n < T0 < ) =
n (1 (1 q) )]1/ 1 ,
(1
q)
[1
Case 1,
Case 2,
Case 3,
Case 4.
1/
P(n < T < ) = (A q) [1 a (1 (A q) A )]
[1 a (1 (A q) (A 1) )]
.
Case 6. In this trivial case
P(n < T0 < ) = an q,
where
=
||,
r (0, ],
1 1
(log A1 ) , r = 0,
w=
1,
r {0} {},
r
1 e , r (0, ).
log w
log a , where
is the Euler-Mascheroni
Proof. In view of
P(T1 n|T1 < ) =
it suffices to verify that
h
i1/|| A 1
Aqh
,
1 an (1 (A 1)|| (A q)|| )
1q
1q
1 ay (1 (A 1)|| )
i1/||
e wa .
Finally, if r = 0, then
1 (A 1)|| ||/,
and therefore
h
1 ay (1 (A 1)|| )
32
i1/||
e a .
If = 0 and A = 1 + e 1/ , > 0, then for any fixed a (0, 1) and q [0, 1),
y
y (, ).
6. The Q-process
As explained in Ch I.14, [1], for a regular Galton-Watson process with transition probabilities Pn (i, j), one can define another Markov chain with transition
probabilities
jqji Pn (i, j)
, i 1, , j 1,
Qn (i, j) :=
ni
where = f (q). The new chain is called the Q-process, and from
X
f (sq) f n (sq) i1
s d
Qn (i, j)s j = n i (f ni (sq)) = s n
q
f n (q)
iq ds
j1
we see that the Q-process is a Galton-Watson process with the dual reproducf (sq)
tion q and an eternal particle generating a random number of ordinary
f (sq)
particles with E(s ) = f (q) , see [3]. The Q-process in the regular case is interpreted in [1] as the original branching process "conditioned on not being extinct
in the distant future and on being extinct in the even more distant future".
Exactly the same definition of the Q-process makes sense in the non-regular
case, only now the last interpretation should be based on the absorption time
T rather than on the extinction time T0 . Indeed, writing Pj () = P(|Z0 = j) we
get for j 1,
j
and therefore,
f k n (1) f k n (0)
i
i
f n+k
(1) f n+k
(0)
j1 j s
i, j 1,
satisfies
Q(f (s)) = Q(s),
Q(q) = 0.
P
In the critical case as well as in the subcritical case with
k=2 pk k log k =
the solution is trivial: Q(s) 0. Otherwise, Q(s) is uniquely defined by the
above equation with an extra condition Q (q) = 1, so that the Q-process has a
stationary distribution given by
Qn (i, j) jqj1 j ,
with
X
jq j1 j s j = sQ (sq).
j1
These facts concerning Q(s) remain valid even in the non-regular case. It
is easy see from (2.2) that for our family with , 0 and A > q, the generating
function
Q(s) = (A s) (A q) ,
This leaves us with two cases when A = q = 1. In the critical Case 2 the answer
is trivial: Q(s) 0. In the subcritical Case 1, we have = a1/ and
(1 f (s)) + d = [(1 s) + d],
which yields
Q(s) = [(1 s) + d]1/ .
From these calculations it follows, in particular, that for our family of branching processes, in all subcritical cases, the classical x log x moment condition
holds:
X
pk k log k < .
k=2
Using these explicit formulas for Q(s) we can easily find the conditional
probability distributions
lim P(Zn = j|T > n) = bj ,
34
j 1.
Turning to the Case 2, recall that for any critical Galton-Watson process, there
exists a limit probability distribution
lim P(Zn = j|T0 = n + 1) = wj ,
such that
X
j1
j 1,
f n (sp0 ) f n (0)
.
n f n (p0 ) f n (0)
wj s j = lim
Since
we obtain
[1 s(1 [1 + c]1/ )] 1
wj s =
.
c
j
t [0, ), u [0, ),
(see [6] for a recent account of continuous time Markov branching processes).
Our task for this section is for
Peach f G to find a pair (h, ) such that f (s) =
F1 (s). We will denote by =
k=2 khk the corresponding offspring mean number and by q the minimal nonnegative root of the equation h(s) = s which gives
35
h(s) = 1 (1 s) +
(1 s)1+ .
1+
Taking successive derivatives of h it easy to see that it is a probability generating function with h (0) = 0. Next we show that using this h as the offspring
probability generating function for the continuous time branching process we
can recover f (s) for the theta-branching processes as F1 (s) by choosing and
adapted to Cases 1- 3.
Case 1. For a given pair a (0, 1) and d (0, ), put
=
(1 + )d
,
(1 + )d + 1
= [(1 + 1 )d + 1 ] ln a.
(1 )(1 x) + 1+ (1 x)1+
1 + 1+ e log(1x)
s
s
1/
t
t
.
Ft (s) = 1 a (1 s) + (a 1)d
rewritten as
(1 s)1+ (1 q) (1 s)
h(s) = s +
.
1 + (1 q)
For a given pair a (0, 1) and q [0, 1) choosing
= [(1 + 1 )(1 q) 1 ] ln a1
It is easy to see that f (s) = F1 (s) covers the whole subfamily G corresponding
to the Cases 1-3.
ln(1 s) ln(1 q)
.
1 ln(1 q)
h(s) = h0 + (1 h0 )
X
k=2
sk
.
k(k 1)
In this form with h0 = 0, the generating function h appeared in [4] as the reproduction law of an immortal branching process. Earlier in [8], this reproduction
law was introduced as
h(s) = 1 (1 h0 )(1 s)(1 ln(1 s)).
To see that the theta-branching process in the Case 4 is embeddable into the
Markov branching process with the above mentioned reproduction law, use the
first representation of h and apply (7.2). As a result we obtain for s , q,
Z Ft (s)
Z Ft (s)
ln(1 x)
dx
t
=
=
1 ln(1 q)
(1 x)(ln(1 x) ln(1 q))
ln(1 q) ln(1 x)
s
s
= ln[ln(1 s) ln(1 q)] ln[ln(1 Ft (s)) ln(1 q)].
(A s)1+ (A q) (A s)
,
(1 + )A (A q)
= [(1 + 1 )A (A q) 1 ] ln a1 .
Turning to Definition 7 we see that this h in the Case 7 is dual to the h in the
Case 3, and in the Case 9 it is dual to that of the Case 5.
Case 6. In this trivial case the corresponding continuous time branching
process is a simple death-explosion process with h(s) = q and = ln a1 .
Case 8. Similarly to the Case 4 we find that the pair
h(s) = s + (A s)
lead to
ln(A s) ln(A q)
,
1 + ln A ln(A q)
= (1 + ln A ln(A q)) ln a1 ,
t
38
PAPER II
preprint.
PAPER II
1. Introduction
Multi-type branching processes with a general measurable space (E, E) of
possible types of individuals, were addressed in monographs [10, 11, 16], see
also paper [2]. Notably, in [7], [8], and [12] the authors develop a full-fledged
theory for the general supercritical branching processes with age dependence.
These results rely upon the generalisations of the Perron-Frobenius theorem for
irreducible non-negative kernels [14] and Markov renewal theorems [1]. Therefore, a typical limit theorem for general branching processes involves technical
conditions of irreducibility imposed on the reproduction law connecting the
elements of the type space E.
This paper deals with the general Bienaym-Galton-Watson processes, that
is branching processes of particles with non-overlapping generations. Denote
by Zn (A) the number of n-th generation particles whose types belong to A E.
The particles are assumed reproduce independently according to a reproduction law allocating offspring types by a random algorithm regulated by the
parental type. A key characteristic of the multi-type reproduction law is the
expectation kernel
(1.1)
M(x, A) := Ex Z1 (A),
41
x E,
A E,
x (A) := 1{xA} ,
is indexed by the type x of the ancestral particle. Thus, the measure M(x, dy)
gives the mean offspring numbers to a mother of type x. The measure-valued
Markov chain {Zn (dx)}n0 has the mean value kernels
M n (x, A) = Ex Zn (A),
x E,
A E,
n 1.
Here and elsewhere the integrals are always taken over the type space E.
More specifically, we study, what we call, LF-processes, branching particle systems characterised by general linear-fractional distributions. At the expense of the restricted choice for the particle reproduction law, we are able to
obtain explicit Perron-Frobenius asymptotic formulas using a straightforward
argument without directly referring to the general Markov chain theory. Our
approach develops the ideas of [15] dealing with the countably infinite type
space E.
An LF-process has a reproduction law parametrised by a triplet (K, , m)
consisting of a sub-stochastic kernel K(x, dy), probability measure (dy), and
a number m (0, ). Given the ancestral particle type x, the total offspring
number Z1 (E) is assumed to follow a linear-fractional distribution
s
.
Ex sZ1 (E) = p0 (x) + (1 p0 (x))
1 + m ms
Here the probability of having no offspring p0 (x) = Px (Z1 (E) = 0) depends on
the parental type, while the geometric number of offspring beyond the first one
1
Z1 (E)1
Ex s
| Z1 (E) > 0 =
1 + m ms
has mean m independently of x. (Even though all the Z1 (E) = k offspring of
the ancestral particle are produced instantaneously, we assume that they are
somehow labeled from 1 to k.)
The assignment of types to k = Z1 (E) offspring is done independently using conditional distribution K(x, dy)/K(x, E) for the first one, and distribution
(dy) for the other k 1 offspring. These quite restrictive assumptions produce
an important feature of the LF-process in that its kernel (1.1) has a particular
structure
(1.2)
where the first term, K(x, A), is the contribution of the first offspring and the
second term is the joint contribution of other offspring.
42
mk1
,
P(Z(E) = k|Z(E) > 0) =
(1 + m)k
k 1,
AE
P(Xi A) = (A), i 2.
43
EhZ = Es1
Z(Ak )
sk
For the distribution from Definition 10 the generating functional has the following linear-fractional form
R
h(y)(dy)
R
EhZ = 1 (E) +
.
1 + m m h(y)(dy)
Theorem 11. Consider an LF-process {Zn (dy)} with
R
h(y)K(x, dy)
R
.
(2.1)
Ex hZ1 = 1 K(x, E) +
1 + m m h(y)(dy)
Then for each n 1,
(2.2)
Ex hZn = 1 Kn (x, E) +
where the triplet (Kn n , mn , ) is uniquely specified by the triplet (K, , m) via the
relations
n1 Z
X
mn = m
(2.3)
M k (x, E)(dx),
k=0
(2.4)
n (A) = m1
n m
n1 Z
X
k=0
(2.5)
Kn (x, A) = M n (x, A)
M k (x, A)(dx),
mn
M n (x, E)n (A).
1 + mn
3. Perron-Frobenius theorem
According to Theorem 11, the asymptotic behaviour of the LF-process is
determined by the asymptotic behaviour of M n (x, A) as n , which is the
subject of the this section. Here we analyse the growth rate of M n (x, A) in terms
of the generating functions
X
X
(s)
n n
(s)
M (x, A) =
s M (x, A), K (x, A) =
s n K n (x, A),
n=0
n=0
K n (x, A)
(E) = 1,
u(x)(A)
,
mRf (R)
n ,
u(x)(dx) = m
=m
n=1
Z Z
k=1
M (x, A) = K (x, A) + m
n
X
i=0
K (x, E)
ni
(y, A)(dy) m
M n (y, A)(dy),
(s)
(s)
(s)
b(s) = m(K
(sR)
(x, E) 1)
mRf (R)
R (M (x, A) K (x, A))
0
n
if f (R) < ,
if f (R) = .
The next lemma is a Tauberian theorem taken from [5, Chapter XIII.10].
P
n
14. Let a(s) =
n=0 an s be a probability generating function and b(s) =
P Lemma
n
n=0 bn s is a generating function for a non-negative sequence so that a(1) = 1 while
P
b(s)
n
b(1) (0, ). Then the non-negative sequence (cn ) defined by
n=0 cn s = 1a(s) is
such that cn
b(1)
a (1)
as n .
All direct offspring particles in this lineage are treated as the originators of
the offspring individuals to the ancestral one. As a result, the ancestral individual produces random numbers of offspring at times 1, . . . , L1. The corresponding litter sizes are mutually independent and have the same geometric distribution with mean m. The newborn individuals live independently according to
the same life law as their mothers. Thus defined CMJ-process has population
size at time n coinciding with the generation size Zn (E) of the LF-process with
parameters (K, , m) starting from a particle whose type has distribution (dx).
This single-type CMJ-process conceals the information on the types of the
particles. To recover this information we introduce additional labelling of individuals using the types of underlying particles. The evolution of a labeled
individual over the type space E can be described by a Markov chain whose
state space E = E is the particle type space E augmented with a graveyard
state. The transition probabilities of such a chain are given by a stochastic ker {}) = 1 K(x, E) for x E, and K(,
e n an = 1,
n=1
where an stands for the mean number of offspring produced by the ancestral
individual at time n.
mf (e ) = 1,
Since an = mdn , we conclude that, provided the Malthusian parameter exists,
we have
= ln = ln R.
48
X
=m
ndn e n = mRf (R),
n=1
R (0, ),
x ER ,
f (R) < .
These propositions are basic asymptotic results for the general LF-processes
extending similar statements for the countably infinite E in [15, Section 6].
Proposition 16. Assume (6.1) and let < 1. Then as n ,
(6.2)
1 mf (1) n
u(x).
(1 + m)
Furthermore, conditionally on the ancestral type x and the survival event {Zn (E) >
,
m),
where
0}, the distribution of Zn converges to lf(,
Z
m(1 + f (1))
1
, (dy)
=
K (1) (x, A)(dx),
m =
1 mf (1)
1 + f (1)
Z
m
m(1 + f (1))
,
1 mf (1)
m(1 + f (1))(A)
n u(x)
Kn (x, A)
(A)
,
1+m
yielding
Kn (x, A)
m
Kn (x, E)
1 mf (1)
Z
(R)
(x, A) K
(1)
(x, A) (dx).
n1 Z
m X
n (A) =
mn
k=0
Kn (x, E) =
u(x)
1
M n (x, E)
,
1 + mn
n(1 + m)
Thus, (2.6) gives the stated asymptotics for the survival probability, and the
weak convergence follows from the next corollary of (2.7):
R
w(y)
n
R w(y)
)n (dy)
1
(1
e
1
,
Ex e n Zn (dy) |Zn (E) > 0 =
R
w(y)
1
+
I
n
w
) (dy)
1 + m (1 e
n
where Iw = (1 + m) 1 w(y)(dy).
Proposition 18. Assume (6.1), let > 1, and put c = ( 1)/(1 + m). Then
Px (Zn (E) > 0) cu(x),
R
For any measurable function w : E (, ) with w(y)(dy) (0, ), and for
any x 0, we have
R
w(y)Z (dy)
n
n
> x|Zn (E) > 0 e xc .
Px R
w(y)(dy)
X
n=1
mn s n1 =
m(1 + f (s))
.
(1 mf (s))(1 s)
m(1+f (sR))
1sR
entails
53
PAPER III
preprint.
PAPER III
1. Introduction
The paper develops new methods of non-parametric estimation of the distribution of compound Poisson data. Such data naturally arise in the inference
of a Lvy process which is a stochastic process (Wt )t0 with W0 = 0 and time
homogeneous independent increments. Its characteristic function necessarily
has the form Ee iWt = e t() with
Z
22
+ (e ix 1 ix 1I{|x|<} )(dx),
(1.1)
() = ia
2
where > 0 is a fixed positive number, a R is a drift parameter, 2 [0, ) is
the variance of the Brownian motion component, and the so-called Lvy measure
satisfying
Z
(1.2)
({0}) = 0,
min{1, x2 } (dx) < .
57
(e
ix
1 ix 1I|x|< )(dx),
a=
x(dx).
(,)
In an even more restrictive case with a finite total mass kk := (R), the Lvy
process becomes a compound Poisson process with times of jumps being a Poisson
process with intensity kk, and the jump sizes being independent random variables with distribution kk1 (dx). Details can be found, for instance, in [22].
Suppose the Lvy process is observed at regularly spaced times producing
a random vector (W0 , Wh , W2h , . . . , Wnh ) for some time step h > 0. The consecutive increments Xi = Wih W(i1)h then form a vector (X1 , . . . Xn ) of independent
random variables having a common infinitely divisible distribution with the
characteristic function () = e h() , and thus can be used to estimate the distributional triplet (a, , ) of the process. Such inference problem naturally
arises in in financial mathematics [8], queueing theory [1], insurance [17] and
in many other situations, where Lvy processes are used.
By the Lvy-It representation theorem [22], every Lvy process is a superposition of a Brownian motion with drift and a square integrable pure jump
martingale. The latter can be further decomposed into a pure jump martingale with the jumps not exceeding in absolute value a positive constant and
a compound Poisson process with jumps or above. In practice, only a finite
increment sample (X1 , . . . , Xn ) is available, so there is no way to distinguish between the small jumps and the Brownian continuous part. Therefore one usually chooses a threshold level > 0 and attributes all the small jumps to the
Brownian component, while the large jumps are attributed to the compound
Poisson process component (see, e.g. [2] for an account of subtleties involved).
Provided an estimation of the continuous and the small jump part is done,
it remains to estimate the part of the Lvy measure outside of the interval
(, ). Since this corresponds to the compound Poisson case, estimation of
such is usually called decompounding which is the main object of study in
this paper.
Previously developed methods include discrete decompounding approach
based on the inversion of Panjer recursions as proposed in [5]. [7], [10] and
[12] studied the continuous decompounding problem when the measure is
assumed to have a density. They apply Fourier inversion in combination with
58
1X
1I{Xk x}
Fn (x) =
n
k=1
1 X iXk
n () =
e
.
n
k=1
To give an early impression of our approach, let us demonstrate the performance of the ChF methods on the famous data by Ladislaus Bortkiewicz who
collected the numbers of Prussian soldiers killed by a horse kick in 10 cavalry
corps over a 20 years period [4]. The counts 0, 1, 2, 3, and 4 were observed
109, 65, 22, 3 and 1 times, with 0.6100 deaths per year and cavalry unit. The
author argues that the data are Poisson distributed which corresponds to the
measure concentrated on the point {1} (only jumps of size 1) and the mass
being the parameter of the Poisson distribution which is then estimated by the
sample mean 0.61. Figure 2 on its left panel presents the estimated Lvy measures for the cutoff values k = 1, 2, 3 when using CoF method. For the values of
k = 1, 2, the result is a measure having many atoms. This is explained by the
fact that the accuracy of the convolution approximation is not enough for this
3 essentially concentrated at {1}
data, but k = 3 already results in a measure
3 k = 0.6098. In Section 6
thus supporting the Poisson model with parameter k
we return to this example and explain why the choice of k = 3 is reasonable.We
observed that the convergence of the ChF method depends critically on the
choice of the initial measure, especially on its total mass. However, the proposed combination of CoF followed by ChF demonstrates (the right plot) that
1 which is as good as
this two-step (faster) procedure results in the estimate
3.
The rest of the paper has the following structure. Section 2 introduces
the theoretical basis of our approach a constraint optimisation technique in
the space of measures. In Section 3 we perform analytic calculations of the
gradient of the loss functionals needed for the implementation of ChF. Section 4
develops the necessary ingredients for the CoF method and proves the main
61
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.0
0.1
0.2
0.3
0.4
0.5
^ || = 0.6099
||
~3
||1|| = 0.6109
~3
1
0.6
^ || = 0.4589
||
1
^ || = 0.5881
||
2
^
||3|| = 0.6099
1
^
2
^
3
10
10
Figure 2. The Bortkiewicz horse kick data. Left panel: comparison of CoF estimates for k = 1, 2, 3. Right panel: comparison of the estimate by CoF with k = 3 and a combination of
CoF with k = 1 followed by ChF.
2. Optimisation in the space of measures
In this section we briefly present the main ingredients of the constrained
optimisation of functionals of a measure. Further details can be found in [18]
and [19].
In this paper we are dealing with measures defined on the Borel subsets
of R. Recall that any signed measure can be represented in terms of its Jordan decomposition: = + + , where + and are orthogonal non-negative
measures. The total variation norm is then defined to be kk = + (R) + (R).
Denote by M and M+ the class of signed, respectively, non-negative measures
with a finite total variation. The set M then becomes a Banach space with sum
and multiplication by real numbers defined set-wise: (1 +2 )(B) := 1 (B)+2 (B)
and (t)(B) := t(B) for any Borel set B and any real t. The set M+ is a pointed
cone in M meaning that the zero measure is in M+ and that 1 + 2 M+ and
t M+ as long as 1 , 2 , M+ and t 0.
62
(2.3)
L() min
subject to M+ , H() C,
where the last constraint singles out the set of Lvy measures, i.e. the measures
satisfying (1.2). This corresponds to taking C = {0} R being a cone in R2 and
Z
(2.4)
H() = ({0}), min{1, x2 } (dx) .
Theorem 19. Suppose L : M R is strongly differentiable at a positive finite
measure satisfying (1.2) and possess a gradient function L(x; ). If such
provides a local minimum of L over M+ H 1 (C), then
L(x; ) = 0 a.e.
63
Indeed, assume that there exists TA () such that DG()[] := < 0. Then
there is a sequence of positive numbers tn 0 and a sequence n A such that
= limn tn1 (n ) implying n because k n k = tn (1+o(1))kk 0. Since
any bounded linear operator is continuous, we also have
DG()[] = DG()[lim tn1 (n )] = lim tn1 DG()[n ] = .
n
Furthermore, by (2.1),
thus
for all sufficiently small tn . Thus in any ball of there is a n A such that
G(n ) < G() so that is not a point of a local minimum of G over A.
Next step is to find a sufficiently rich class of measures belonging to the
tangent cone to the set L := M+ H 1 (C) of all possible Lvy measures. For
this, notice that for any L, the Dirac measure x belongs to TL () since
+ tx L for any t 0 as soon as x , 0. Similarly, given any Borel B R, the
negative measure |B = ( B), which is the restriction of onto B, is also
in the tangent cone TL (), because for any 0 t 1 we have t B L.
Since G(x; ) is a gradient function, the necessary condition (2.6) becomes
Z
G(x; ) (dx) 0 for all TL ()
and substituting = x above we immediately obtain the inequality in (2.5).
Finally, taking = B yields
Z
G(x; ) (dx) 0.
B
Since this is true for any Borel B, then G(x; ) 0 -almost everywhere which,
combined with the previous inequality, gives the second identity in (2.5).
A rich class of functions of a measure represent the expectation of a functionals of a Poisson process.
64
X
1
T G(z1 , . . . , zn ) (dz1 ) . . . (dzn ).
(2.7)
E+ G() = E G() +
n! X n
n=1
Generalisations of this formula to infinite and signed measures for square integrable functionals can be found in [16]. A finite order expansion formula
can be obtained by representing the expectation above in the form E+ G() =
E E [G( + ) ] where and are independent Poisson processes with
intensity measures and , respectively, and then applying the moment expansion formula by [3, Theorem 3.1] to G( + ) viewed as a functional of
65
2 2 /2}
(,) 2 2 /2}
cos{hQ2 (, a, )},
sin{hQ2 (, a, )}.
n,1 () =
1X
cos(Xj ),
n
n,2 () =
j=1
1X
sin(Xj ),
n
j=1
(2) The partial derivative of the loss functional with respect to is equal
to
Z
where
2 2
2 2
(3) Expression for the gradient function corresponding to the Frchet derivative with respect to the measure is obtained using the Chain
rule (2.2):
Z
LChF (x; ) = 2 {1 (; a, , ) n,1 ()]}1 ()[x, ]()d
Z
+ 2 {2 (; a, , ) n,2 ()}2 ()[x, ]()d,
where the gradients of i () := i (; a, , ), i = 1, 2, with respect to
the measure are given by
n
o
2 2
1 ()(x; ) = he h{Q1 (,) /2} cos hQ2 (, a, ) q1 (, x) sin hQ2 (, a, ) q2 (, x) ,
n
o
2 2
2 ()(x; ) = he h{Q1 (,) /2} sin hQ2 (, a, ) q1 (, x) + cos hQ2 (, a, ) q2 (, x) .
67
(1)i|J| F(y jJ xj ).
J{1,2,...,n}
The sum above is taken over all the subsets J of {1, 2, . . . , n} including the empty set.
X
Gy ( + z ) = 1I
xj y x = Gyx ().
(tj ,xj )
J{1,2,...,n}
X
1
U
F(y) (dt1 dx1 ) . . . (dtn dxn )
= F(y) +
n! (R+ R)n x1 ,...,xn
n=1
nZ
X
h
= F(y) +
U
F(y) (dx1 ) . . . (dxn ).
n! Rn x1 ,...,xn
n=1
The empirical convolution of a sample (X1 , . . . , Xn ),
1 X
(4.4)
Fn2 (y) := n
1I{Xi + Xj y}.
2 1i<jn
1 X
hk+1 Z
(2hkk)i
Ux1 ,...,xk+1 F(y)(dx1 ) . . . (dxk+1 )
.
(k + 1)! Rk+1
2
i!
i=k+1
Notice that the upper bound corresponds to a half the distribution tail P{Z
k + 1} of a Poisson random variable, say Z Po(2hkk). Thus, to have a good
estimate with this method, one should either calibrate the time step h (if the
data are coming from the discretisation of a Lvy process trajectory) or to use
higher k to make the remainder term small enough. For instance, for the horse
69
LCoF (x; )
Z
k
X
i Z
h
Ux1 ,...,xi Fn (y)(dx1 ) . . . (dxi ) Fn2 (y) (y, x, )(y)dy,
= 2h
Fn (y) +
i! Ri
i=1
where
(y, x, ) =
k1 j Z
X
h
j=0
j!
Rj
This formula follows from the Chain rule (2.2) and the equality
Z
k
X
hj
(U
F )(y)(dx1 ) . . . (dxj ) (x; )
j! Rj x1 ,...,xj n
j=1
=h
k1 j Z
X
h
j=0
j!
Rj
To justify the last identity, it suffices to see that for any integrable symmetric
function u(x1 , . . . , xj ) of j 1 variables,
Z
u(x1 , . . . , xj )(dx1 ) . . . (dxj ) (x; )
Rj
Z
u(x, x1 , . . . , xj1 )(dx1 ) . . . (dxj1 ),
=j
Rj1
70
j Z
X
k=1
=j
Rj
Rj
Rj
i := ([xi , xi + )),
l := ([xl , )).
for i = 2, . . . , l 1,
Clearly, the larger is l and the finer is the grid {x1 , . . . , xl } the better is approximation, however, at a higher computational cost.
Respectively, the discretised version of the gradient function L(x; ) is the
vector
g = (g1 , . . . , gl ), gi := L(xi ; ), i = 1, . . . , l.
(1)
For example, the cost function L = LCoF with (y) 1 has the gradient
Z
Z
(1)
2
LCoF (x; ) = 2h
Fn (y) Fn (y) + Fn (y z)(dz) Fn (y x) dy.
The discretised gradient for this example is the vector g with the components
Z
l
X
2
Fn (y xj )j Fn (y xi ) dy, i = 1, . . . , l.
(5.2)
gi = 2h
Fn (y) Fn (y) +
j=1
Our main optimisation algorithm has the following structure: In the master algorithm description above, the line 3 uses the necessary condition (2.5)
as a test condition for the main cycle. In the computer realisations we usually
71
The
Pl MakeStep subroutine looks for a vector which minimises the linear
form i=1 gi i appearing in the Taylor expansion
L( + ) L() =
l
X
gi i + o(||).
i=1
|i | ,
i i ,
i = 1, . . . , l.
i
for i i ,
Pi 1
i :=
j=1 j for i = i + 1,
0
for i i + 2,
and if i > ig , then
i ,
0
i :=
P ig
j=1
j
for i ig ,
for ig < i < l,
for i = l.
The presented algorithm is realised in the statistical computation environment R (see [21]) in the form of a library mesop which is freely downloadable
from one of the authors webpage.1
1http://www.math.chalmers.se/~sergei/download.html
73
~3
1
0.08
|||| = 1
^ || = 0.6153
||
1
^ || = 0.8605
||
2
^ || = 0.927
||
3
|||| = 1
^ || = 0.927
||
~3
||1|| = 0.9514
0.2
0.02
0.00
0.0
0.02
0.2
0.04
0.4
0.06
0.6
^
^
1
^
^
2
^
^
3
than with k = 1 and k = 3: two jumps of sizes +1 and -1 sometimes cancel each
other, which is indistinguishable from no jumps case. Moreover, -1 and 2 added
together is the same as having a single size 1 jump. The left panel confirms that
going from k = 1 through k = 2 up to k = 3 improves the performance of CoF
although the computing time increases drastically. The corresponding total
k to the theoretical distribution are 0.3669, 0.6268 and
variation distances of
0.1558. The combined method gives the distance 0.0975 and according to the
right plot is again a clear winner in this case too. It is also much faster.
Unbounded compounding distribution. On Figure 5 we present the simulation results for a discrete measure having an infinite support N. For the
computation, we limit the support range for the measures in question to the interval x [2, 5]. As the left panel reveals, also in this case the CoF method with
k = 3 gives a better approximation than k = 1 or k = 2 (the total variation distances to the theoretical distribution is 0.1150 compared to 0.3256 and 0.9235,
respectively) and the combined faster method gives even better estimate with
1 , ) = 0.0386. Interestingly, k = 2 was the worst in terms of the total
dTV (
75
|||| = 1
^ || = 0.6122
||
1
^ || = 0.9019
||
2
^ || = 0.9041
||
~3
1
0.10
0.4
^
^
1
^
^
2
^
^
3
|||| = 1
^ || = 0.9041
||
~3
||1|| = 1.038
0.4
0.2
0.00
0.0
0.05
0.2
|||| = 1
^ || = 0.6406
||
1
^ || = 1.2047
||
2
^
|| || = 0.9539
~3
1
0.10
0.4
^
^
1
^
^
2
^
^
3
|||| = 1
^ || = 0.9539
||
~3
||1|| = 1.0279
0.05
0.4
0.00
0.2
0.0
0.05
0.2
0.30
0.0
0.00
0.05
0.2
0.10
0.15
0.4
0.20
0.25
0.8
|||| = 1
^ || = 0.6192
||
1
^ || = 1.0362
||
2
^ || = 0.9018
||
~3
||1|| = 0.9047
1
^
2
^
~3
1
0.6
0.35
0.0
0.0
0.2
0.1
0.4
0.2
0.6
|||| = 1
^ || = 0.5847
||
1
^ || = 0.8805
||
2
^ || = 0.8439
||
~3
||1|| = 0.8751
1
^
2
^
~3
1
0.3
0.8
78
79
80
PAPER IV
PAPER IV
1. Introduction
Let (N , M, L) be three natural numbers larger than or equal to 2. Assume
that we have a random matrix
(1.1)
A = (aij ), 1 i L, 1 j N
ai1 ,1 + . . . + aiN ,N = b,
M
PM1
b=0
:= E(V0 ) = LN M 1 .
ai = (ai1 ,1 , . . . , aiN ,N ),
83
i = (i1 , . . . , iN ) J
Vb = LN , so that by
of {0, 1, . . . , M 1} as the set of possible values for aij . Wagners algorithm has a
binary tree structure, see Figure 1, starting from N leaves at level n and moving
toward the top of the tree at level 0. For a given a vector x = (x1 , . . . , x2n ) with
xj Dm the algorithm searches for the value
(n)
(1.4)
obtained recursively in a way explained next (the special state indicates that
(0)
the algorithm is terminated and a solution is not found). Put xj xj . For
x(n)
1
x(n-1)
1
x(n-1)
2
...
x(1)
x(1)
1
x1
...
x 2 x3
x4
x(1)n-1
x(1)
n-1
2
-1
x 2n-3
x2n-2 x2n-1
x 2n
(h1) M
x2j1 + x2j
= b,
(h1)
(h)
i J.
(1.5)
where , defined by (1.2), is the mean total number of solutions. Clearly, Rn,m is
the conditional probability of a given zero-sum random vector to be Wagners
solution.
There is a growing number of papers studying the properties of various
tree based algorithms with some of them, in particular [5], suggesting further
developments of Wagners approach. The main results of this paper are stated
in the next section. Theorem 2.1 gives an integral recursion for calculating the
limit for the key ratio (1.5). Theorem 2.2 gives an upper bound for the total
variation distance between Poisson distribution Po() and L(V0 ), distribution
of V0 , as well as a bound for the total variation distance between Po() and
L(W ). Recall that the total variation distance between the distributions of Z+ valued random variables X and Y , where Z+ = {0, 1, 2, . . .}, is given by
dTV (L(X), L(Y )) = sup |P(X A) P(Y A)|.
AZ+
Among related results concerning speed of convergence for functional of random matrices over finite algebraical structures we can only name a recent paper
[2].
2. Main results
Define a sequence of polynomials {n (x)}n1 by
Z x
Z 2n
(2.1)
n (x) :=
n1 (u)n1 (x u)du + 2
n1 (u)n1 (u x)du,
0
with 1 (x) 1.
m ,
0.50000
n=2
0.04818
n=3
n=4
0.00023
3
11
13
15
Theorem 2.2. For a random matrix (1.1) consider the number V0 of vectors
M
vi (j) =
j
X
k=0
vi+1 (k)vi+1 (j k) + 2
2i
X
k=j+1
vi+1 (k)vi+1 (k j)
2i
X
2
(k).
vi+1
k=1
(m)
By the forthcoming Corollary 21, we can write pn,m = vmn (0) so that
(m)
(3.2)
n = 1, . . . , m 1.
and for 2 i m 1 and 0 j 2mi1 , we have pi,m (j) = pi,m (j) with pi,m (j)
satisfying the recursion
pi,m (j) =
j
X
k=0
pi1,m (k)pi1,m (j k) + 2
mi
2X
k=j+1
Proof. There are exactly M = 2m +1 different ordered pairs of numbers from the
set Dm that add modulo M up to a given j Dm1 . These pairs have the form:
for j = 0,
(2m1 + k, 2m1 k), k = 0, . . . , 2m ,
87
Since these pairs appear with equal probability M 2 , the first claim follows.
On the other hand, for a given j Dmi with i 2, there are only M |j|
different ordered pairs of numbers from the set Dmi+1 that add modulo M up
to j. These pairs have the form:
(2mi + k, 2mi + j k),
mi+1
2X
k=j
k = j, . . . , 2mi+1 ,
j = 0, . . . , 2mi1 ,
k = |j|, . . . , 2mi+1 ,
j = 2mi1 , . . . , 1.
2mi+1
pi,m (j) =
k=j
The stated symmetry property pi,m (j) = pi,m (j) now follows recursively from
the assumption of uniform distribution. To finish the proof of the lemma, it
remains to observe that after replacing k 2mi by l in the last relation for
pi,m (j) we get
pi,m (j) =
mi
2X
l=j2mi
pi1,m (l)pi1,m (j l) +
=
mi
2X
l=j+1
j
X
k=0
pi1,m (l)pi1,m (l j) +
1
X
l=j2mi
pi1,m (k)pi1,m (j k) + 2
mi
2X
k=j+1
pi1,m (l)pi1,m (j l)
88
n,m :=
max
0j2mn1
m .
To this end we use induction over n. The base case n = 1 is trivial. To prove the
inductive step observe first that by (3.1)
(4.2) Rn,m (j) = 2
j
X
k=0
1m
Rn1,m (k)Rn1,m (j k) + 2
mn
2X
k=j+1
max
m>n 0j2mn1
Rn,m (j)
is finite.
On the other hand, by (2.1),
Z
Z j
1m
m
n1,m (u)n1,m (j u)du +2
n,m (j) = 2
0
2mn
so that
n,m (j) = 2
j
X
k=0
(4.3)
1m
+2
n1,m (k)n1,m (j k)
mn
2X
k=j+1
with accordingly defined remainder term n,m (j). Uniform continuity of n (x)
yields uniform convergence n,m (j) 0 as m , and (4.1) follows from (4.2)
and (4.3), since
h
i
n,m 2 Cn1 + maxn n (x) n1,m + maxmn |n,m (j)|.
0x2
0j2
89
P(Z = k) e
E
+
E
i
k
i k .
k!
4(1 e )
iJ kJ i
kZ+
iJ kJ i \{i}
iJ
i , where
i = (i1 , . . . , iN )
are identically distributed with E(i ) = M 1 , and mutually independent. Independence is due to the defining property of the matrix A. Indeed, if k , i and
(without loss of generality) 1, . . . , j are the coordinates where these two vectors
differ, then
M
= M 1
bDm
Therefore, we can apply Lemma 5.1 with J i = {i}, and the first statement of
Theorem 2.2, concerning L(V0 ), follows from E(V0 ) = and
XX
E i E k = LN M 2 = M 1 .
iJ kBi
i ,
iJ
so that E i = pn,m and therefore,
XX
2
E i E k = LN LN (L 1)N pn,m
N L1 2 .
iJ kJ i
Let l1 , . . . , lj are the coordinates where the vectors i, k differ. Then it follows that
X
M
M
E i k
P(akl ,l1 + . . . + akl ,lj = b; ail ,l1 + . . . + ail ,lj = b; Hn (ai ) = 0)
1
= M 1
bDm
and we get
X X
iJ kJ i \{i}
P(ail
bDm
,l
1 1
+ . . . + a il
,lj
j
= b; Hn (ai ) = 0) = M 1 pn,m ,
E i k LN LN (L 1)N pn,m M 1 N L1 .
Acknowledgements. The first author is grateful to Vladimir Vatutin and Andrey Zubkov for formulating an initial problem setting that eventually lead to
this research project.
Bibliography
[1] Arratia, R., Goldstein, L. and Gordon, L. (1989) Two moments suffice for Poisson approximations: The Chen-Stein method. Ann. Probab., 17(1), 925.
[2] Fulman, J, and Goldstein, L. (2015) Steins method and the rank distribution of random matrices over finite fields. Ann. Probab., 43(3), 12741314.
[3] Greferath, M. (2009) An introduction to ring-linear coding theory. In M. Sala, S. Sakata,
T. Mora, C. Traverso, and L. Perret, editors, Grbner Bases, Coding, and Cryptography, Springer,
Berlin Heidelberg, 219238.
[4] Lang, S. (2002) Algebra. Graduate Texts in Mathematics. Springer New York.
[5] Minder, L. and Sinclair, A. (2012) The extended k-tree algorithm. J. Cryptology, 25(2), 349
382.
[6] Schroeppel, R. and Shamir, A. (1981) A T = O(2n/2 ), S = O(2n/4 ) algorithm for certain NPcomplete problems. SIAM J. Comput., 10(3), 456464.
[7] Wagner, D. (2002) A generalized birthday problem. In In CRYPTO, Springer-Verlag, 288303.
91