You are on page 1of 9

Data:

We use the burglary data (FBI code 05) for year 2014. There are 14306 events, each with time t i
and location ( x i , y i ) .
Model :
( x , y , t)= (x , y )+ r ( xx i , y y i) t (tt i )
i

1
ai e(x x ) +( y y ) / L where T is

2
2L T i
the total duration of the dataset (here 365 days). The two kernels t and r , as well as the
background weights ai , are to be inverted. The smoothing length L is also to be optimized. We
here follow the approach of Marsan and Lenglin (2008) and use a simple histogram distribution for
the two kernels: t (t )=b k for T k t < T k +1 , and r (r )=c k for Rk r< Rk+ 1 .We use the
following discretization in time and distance :
T ={ 0 ; 0.1; 0.2; 0.5 ; 1 ; 2 ; 3 ; 4 ; 5 ; 7 ; 10 ; 15 ; 20 ; 30 ; 50 ; 100 } days, and
R= { 0 ; 0.1 ; 0.2 ; 0.3 ; 0.4 ; 0.5 ; 0.7 ; 1 ; 1.5 ; 2 ; 3 ; 5 ;10 ; 20 } km.
2

with the stationnary background rate density (x , y )=

Expectation-Maximization algorithm :
Knowing L , the parameters { a i , b k , c k } are inverted by Expectation-Maximization. The
influence of event i on event j is ij = r ( x j xi , y j yi )t (t jt i ) , and the sum of all the
influences of past events on j is j= ij . The background rate density for event j is
i< j
2
2
(xx i ) +( y y i ) / L

j= (x j , y j)= ij with ij =ai e

/2 L T (note that the summation is now on

all events i , thus including events i> j , and even j itself). We define the probabilities
ij

that j is causally triggered by i , and 0,ij = ij


that j is a background
ij = +

j
j
j+ j
event linked to the background node i . These probabilities are normalized by
ij+ 0,ij=1 .
i< j

The algorithm iterates the following steps :


Expectation : the probabilities ij and 0,ij are computed from the estimated kernels.
Initially, the probabilities are all taken equal to 1 and normalized according to the
normalization ij + 0,ij =1 .
i< j

Maximization : knowing these probabilities, the log-likelihood is then


T t i

f (a , b , c)= ai
i

that ai = j 0,ij , b k =

dt t (t)+ 0,ij ln ij+ ij ln ij


0

i, j

. Maximizing f gives

i , j>i

k
where i , k =T k+1 T k if T t i T k+1 ,
i , k
i

i , k =T t iT k if T k T t i<T k +1 , and i , k =0 if T t i <T k ,


'k
k =
ij , and c k =
ij and
with ' k =

S k 'i
i , j /T t t <T ,T t <T
i, j / R r < R
k

k+ 1

k+1

ij

k+ 1

S k =( R2k+1R2k ) .
Convergence is tested by requiring that all non-zero values bk and c k are changed by less than
ln b k
5 % in logarithm, e.g.,
1 <0.05 , where bk is the value updated during the
ln b k '

Maximization step, and bk ' is the value prior to this step.


The smoothing length can also be optimized during the maximization step, with
0, ij ( x ix j )2+( y i y j)2
L= ij
. However, doing so lead to the trivial solution L=0 and
2 ij
ij

bk =0 , implying that ij =0 , 0,ij =0 if i j , and 0,ii =1 , which has no predictive


value. This solution is a global maximum ( f when L0 ). We therefore test two distinct
approaches : (model type 1) we modify the model by imposing 0,ij =0 if i and j are
co-located, and invert L , similarly to Mohler (2014); (model type 2) we keep L fixed to an a
priori value, and keep the best L after comparing the models with a cross-validation method. The
1st approach gives a best L=0.024 km. For the 2nd, we use the burglary data from the 81 first
days of 2015 (1/1/2015 to 22/3/2015) and compute the log-likelihood on this time period for the
intensity (x , y , t ) predicted by the 2014 data alone. We also cross-validate the models of type 1
to compare the two approaches. We find the best L value to be 0.1 km for the models of type 2,
the cross-validation giving a better fit to the 2015 data than the models of type 1, cf Figure 1.
Cross-validation with L=0.024 km in the case of the 1st approach gives a poor fit.
We show in Figure 2 the two interaction kernels t and r , for the type 2 model with L=0.1
km. Interaction is practically negligible, apart for near-repeats within less than a day from each
other, and accounting for only 1.8 % of all events.
We computed a second set of cross-validations, this time by also including the 2015 data in the
triggering part : the model parameters in ( x , y , t)= (x , y )+ r ( xx i , y y i) t (tt i ) are
i

unchanged (in particular the background rate-density ( x , y ) is thus estimated from the 2014
data only), but the triggering term r ( xxi , y y i ) t (tt i )
i

is now computed by summing over both 2014 and 2015 data. Remarkably, the log-likelihood is
systematically found to be lower with this approach, see Table 1. This is counter-intuitive, as using
more recent data to update the triggering term is expected to improve the prediction. A closer look
at the time series (Figure 3) shows that there were significantly less events in the first 81 days of
2015 as predicted from the 2014 data. Since including the new 2015 events in the calculation of the
triggering term result in a larger predicted number, doing so only strengthen the over-estimation.
The over-estimation of the number of events in 2015 highlights the fact that, practically speaking,
one would like to predict just where, rather than both when and where, the next event will occur, so
( x , y , t)
that only the predicted marginal density
is of actual interest, instead of the
dx dy (x , y , t)
complete space-time rate-density (x , y , t ) . We therefore introduce a second measure of the
capacity of the model to predict the future locations of the subsequent events, as
g(a , b , c )= ln (x i , y i , t i ) , where the summation is done on the 2015 events only, and the
i

triggering term of (x , y , t) is computed by summing over all preceding events (including those
of 2015). We show in Figure 4 that type 2 models perform better than type 1, but more importantly
that a simple (exponential) smoothing of all the previous events does actually better in predicting
the location of the next event, although the improvement is only marginal. This is particularly
surprising, since accounting for memory in the system should a priori improve the prediction
compared to a memory-less prediction as done with a simple smoothing. This is here due to a
change in the spatial properties of the burglary events in 2015 (compared to 2014), which are found

to be more distant of each other : the mean distance between any two burglaries was 13.58 km in
2014, and 14.05 km in 2015. For both years, consecutive events tend be less distant than average,
but there still exist a significant difference between the two time periods, cf Figure 5. Exploiting the
temporal clustering as done with our models will lead to predicted events to close to the
immediately preceding (past) event, while the simple smoothing will predict a distance slighlty
larger, hence a better prediction.
These results cast strong doubts on the capacity of the models proposed here to outperform simple
hotspot maps obtained by smoothing, for the dataset analyzed. The triggering contribution to the
occurrence of future events is small (it accounts only for 1.7 % for the best model). Accounting for
memory in the system therefore can only provide a very modest contribution to the effectiveness of
the prediction scheme.
More importantly, it is assumed that the dynamics of the process stays the same over time. Possible
non-stationarity of the process is thus clearly an issue, as it will prevent the use of past information
to predict the future. This is for example experienced in this analysis, as 2015 burglary events are
clearly not distributed (in time and in space) as they were in 2014. This non stationarity is likely due
to uncontroled evolutions in the way these acts are performed, but, in situations were new
prediction algorithms are set up and exploited by police patrols, could also be a response by
burglars to such a change. Unlike natural processes like earthquakes, analyses like the one presented
here could therefore have the ability to modify the observed process, making it more difficult to
correctly predict future events.

L (km)
0
Difference in
log-likelihood

0.01

0.02

0.05

0.1

0.2

0.4

100 %

99.9 %

98.7 %

98.1 %

93.5 %

73.5 %

45.8 %

<10-15

<10-7

-0.012

-0.017

-0.056

-0.17

-0.33

Table 1 : percentage of background events 0 and difference in cost function f / N between


cross-validation with and without use of the 2015 data, function of the smoothing length L , for
type 2 models. In all cases the likelihood is lower when including the 2015 data to compute the
triggering intensity.

Figure 1 : mean of cost function ( f / N ) for various values of the


smoothing length L used to compute the background rate-density
( x , y ) , for the two approached described in the text, obtained by
cross-validation. The model of type 1 with optimized L has L=0.024
km.

Figure 2 :Interaction kernels t (top graphs) and r (bottom graphs) for model type 2
with L=0.1 km. The two dashed lines show power-laws with exponents -1.5 (for t )
and -7 (for r ).

Figure 3 : number of events (in blue) and predicted number, using (magenta) or not using
(green) the 2015 events in the triggering term summation.

Figure 4 : difference in the cost function -g normalized by the number of events to be


predicted, compared to type 2 model predictions, for the two simple smoothing of the
2014 (blue) and the 2014 and 2015 data up to the prediction time (red). The simple
smoothing is done using an exponential kernel. For smoothing lengths up to 0.1 km,
the simple smoothing performs better than the more sophisticated model proposed
here that accounts for memory effects.

Figure 5 : mean distance between pairs of events separated by (n-1) events, for
the two time periods analyzed separately.

You might also like