Professional Documents
Culture Documents
Acknowledgements
Without the helps from some of the kindest, smartest and most enthusiastic people, I would never be able to produce any work as better
as this project.
First of all, I thank my supervisors, Prof. Berc Rustem and Prof.
Daniel Rueckert, for unlimited idea and support they have given me.
Prof. Duncan Gillies for being my personal tutor and second marker.
Dr. Daniel Kuhn for being patient to hear my talking and Dr. George
Tzallas-Regas for any time discussion and suggestion.
I also owe a debt to Prof. Rasmus Larsen for sending me a very useful
tutorial and Dr. Stefan Klein for his advice and explanation.
Finally, my thanks to all of my friend who have made an exciting time
at Imperial.
Abstract
Contents
1 Introduction
2 Image Registration
2.1
2.2
7
8
2.3
Registration framework . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3.1
Cost Function F . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3.2
2.3.3
Gradient g . . . . . . . . . . . . . . . . . . . . . . . . . . .
Transformation W(p) . . . . . . . . . . . . . . . . . . . . .
11
12
2.3.4
Optimization . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.3.5
Pre-condition . . . . . . . . . . . . . . . . . . . . . . . . .
13
3 Deterministic Optimization
3.1 Gauss-Newton (GN) . . . . . . . . . . . . . . . . . . . . . . . . .
14
16
3.2
Levenberg-Marquardt (LM) . . . . . . . . . . . . . . . . . . . . .
16
3.3
Quasi-Newton (QN) . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.4
3.5
17
18
3.6
19
4.2
4.3
iv
23
CONTENTS
5 Stochastic Approximation
5.1
39
Stochastic Approximation . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Robbin-Monro and the derivative . . . . . . . . . . . . . .
39
41
5.1.2
Decaying sequence . . . . . . . . . . . . . . . . . . . . . .
41
5.2
Difference Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .
42
5.3
Sampling Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Deterministic Sampling . . . . . . . . . . . . . . . . . . . .
44
44
5.3.2
Stochastic Sampling . . . . . . . . . . . . . . . . . . . . .
45
46
54
54
5.4
6.2
6.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
6.2.1
Cost Function F . . . . . . . . . . . . . . . . . . . . . . . .
54
6.2.2
6.2.3
Image Gradient I . . . . . . . . . . . . . . . . . . . . . .
Transformation W(p) and Jacobian W
. . . . . . . . . . .
p
54
55
6.2.4
Other evaluations . . . . . . . . . . . . . . . . . . . . . . .
55
User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
7 Conclusion
58
60
References
69
List of Tables
3.1
14
3.2
20
4.1
28
vi
List of Figures
1.1
Pictures of Lena . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
2.1
2.2
2.3
2.4
9
13
3.1
21
3.2
22
4.1
23
4.2
24
4.3
4.4
25
29
4.5
30
4.6
31
4.7
4.8
32
33
4.9
34
35
36
37
38
vii
LIST OF FIGURES
5.1
44
5.2
45
5.3
46
5.4
47
5.5
5.6
50
51
5.7
52
5.8
53
61
62
62
63
63
64
viii
Chapter 1
Introduction
Given two images 1.1(a) and 1.1(c). The two images are not similar, we want to
(a) Original
(b) Difference
(c) Deformed
(a) Original
(b) Difference
(c) Deformed
defined as the process of finding the one-to-one mapping between the coordinates
in the image spaces of interest such that the points so transformed will correspond
to the same anatomical point. We also emphasize the use of difference image
which represent the error between two images by the mean of taken an absolute
difference between two imaged pixel intensities at the same coordinates.
In the next Chapter, we briefly describe Image Registration Framework. We
show that Image Registration problem can be formulated as an Optimization
problem, thus we will analyse the performance of difference Optimization approaches. In fact, there is no such Optimization method that produce the best
performance for all type of applications but the choice of Optimization methods depends on particular applications. We, therefore, review the Optimization
methods based on different type of deformation images.
In Chapter 3, Deterministic Optimizations are listed for random uniformly
deformed images. The Quasi Newton (QN) method shows the best convergence
rate however it suffers from outliers in some test cases. Besides Quasi Newton,
Gauss Newton (GN) and Levenberg Marquardt (LM) methods produce a consistent convergence rate and GN is slightly better than LM. Nonlinear Conjugate
Gradient (NCG) result shows that the method is very dependent on the type of
input images.
The interesting part comes in Chapter 4, we present a framework for Recursive Subsampling registration. The technique Subsampling has been presented
before, however, most of those concepts are not similar from our concept. The
test results show that recursive Subsampling technique outperforms all normal
deterministic optimization methods. In addition, we also introduce a Weighted
Subsampling approach, which is inspired from the Difference Sampling in Chapter 5. Weighted Subsampling methods demonstrate even more atractive results
compared to Subsampling methods for localised deformed images.
Finally, the best of this thesis is here. In Chapter 5, we proposed a novel approach based on Stochastic Appoximation. The type of registrations that is suitable for this method is local deformation at local parts of images, which is a very
likely type of applications in medical image processing. This approach employs a
random non-uniform sampling method that we call it Difference Sampling. The
results show that for local-part-deformed images, Difference Sampling Stochastic
Chapter 2
Image Registration
Image Registration has been an active research area from the last few decades.
In general, depends on the type of applications, we use different registration
techniques. Classification of image registration is primely defined as followed:
Dimension: 2D-2D, 3D-3D, 2D-3D
Nature of Deformation: Rigid, Affine, Local Deform
Optimization procedure
Modality: monomodal, multimodal
Manual, Semi-automatic, Automatic registration
An overview on classical registration methods can be found in (30). All methods
that we discuss in this paper emphasize the 2D-2D monomodal automatic registration application with a potential to extend to 3D-3D multimodal applications.
The type of deformation (39) of an image determine the complexity of the
registration problem and affect the choice of suitable registration methods. Typically, the deformation of an image is categorised by number of parameters or
degree of freedom: rigid, affine transformation and local deformation.
2.1
2.2
Local Deformation
(2.1)
where x = (x, y), Tglobal is an affine transformation matrix and Tlocal is a local
transformation matrix. In this paper we only examine the local deformation
therefore Tglobal is absorbed in 2.1. Following Rueckerts formulation (38), we
3 X
3
X
Bm (u)Bn (v)pi+m,j+n
(2.2)
m=0 n=0
where pi,j is a control point of the grid px py with uniform spacing and Bm is
a m-th cubic B-spline basis function (26) and:
i = bx/nx c 1
j = by/ny c 1
u = x/nx bx/nx c
v = y/ny by/ny c
One of the attractive features is that the basis functions have local supports, i.e.
if we change a control point pi,j , it only affects its local neighbourhoods. The
mesh of control points P acts as parameters for the transformation matrix and
its resolution px py decides the degree of freedom (number of parameters) of the
registration problem. A large spacing mesh grid is less expensive to solve than a
fine mesh grid, however its pay off is that it can not model a small local deform.
For example, a mesh grid of 5 5 control points yields a 50 -parameter problem
should not produce as quality matching as a mesh grid of 9 9 control points
(162 -parameter problem), however, the 50 -parameter problem is less expensive
than the 162 -parameter problem. The choice of mesh grid resolution is up to the
2.3
Registration framework
There are various algorithms for image registration such as difference decomposition (15) or linear regression (7) however the gradient-based framework that first
proposed by Lucas-Kanade (28) is still the most widely use technique.
Given a deformed image I and a referenced image T. The registration process
aims to find a spatial transformation matrix W(p), where pT = (p1 , . . . , pn ) is
a set of parameters, that match the two images: I(W(p)) T . Lucas-Kanade
algorithm iteratively generates the transformations W(pk ) that reduce the difference between two images. The process of generating a set of warp parameters pk
at iteration k -th requires the gradient of the cost function, gk , and an appropriate
descent parameter, ak , that ensure the descent property of the cost function:
pk+1 = pk + ak gk
(2.3)
The gradient of the cost function is derived in the next section. The descent
parameter is defined depends on different optimization schemes (Chapter 3 and
5).
2.3.1
Cost Function F
10
(2.4)
(2.5)
2
1X
W
F=
I(W(p)) + I
p T
2 x
p
I I
where I = x
, y is the gradient of image I evaluated at W(p) and
(2.6)
W
p
is the
Jacobian of the transformation matrix (2.9). Next, differentiate 2.6 with respect
to p yields:
X
x
W
I
p
T
W
p T
I(W(p)) + I
p
(2.7)
where H is the Hessian matrix of the objective function and a part of the descent
parameter in Equation 2.3. The second term in the RHS is the gradient of the
objective function (2.8). Expression 2.7 is only used when applying Deterministic Optimization methods, for Stochastic methods, we use different techniques
(Chapter 5). The choice of Hessian evaluation is one of the main sources that
affects the application performance.
2.3.2
Gradient g
11
(2.8)
2.3.3
Transformation W(p)
The transformation matrix is defined by the B-spline tensor model 2.2, W(p) = Tlocal .
Hence, the derivative of the deformation field with respect to control points p is:
3
3
W X X
=
Bm (u)Bn (v)
p
m=0 n=0
(2.9)
For any given input images, we can always compute the Jacobian W
at the bep
gining of the procedure and do not need to recompute it during the registration
process. This is a big advantage to reduce the computational cost.
During transformation process, interpolation procedure is essential. Images
come discrete with pixel values at integer coordinates. However, after apply transformation, the coordinates are likely to be fractional numbers. Therefore we must
be able to evaluate the image pixels at arbitrary coordinates. This is achieved
by interpolation. Different methods of interpolation such as linear, bilinear, trilinear can be used. In this paper, we will use linear interpolation because of its
reasonable good quality and less expensive than higher order interpolation.
2.3.4
Optimization
12
2.3.5
Pre-condition
In practice, the use of cost function 2.4 is not realiable if we do not include some
pre-conditions. The first condition is that the input images must come from the
same source to assure monomodality. The second condition is that the deformed
fields must not be folded. One way to exempt the second condition is by adding
the regularisation term into the cost function (17). However, we do not include
this therefore we ensure that no folding is possible for random generated inputs
by ensuring the Jacobian of the transformation fields is non-negative.
13
Chapter 3
Deterministic Optimization
A standard formula for deterministic optimization methods follows the derivation
from 2.3 and 2.7:
pk+1 = pk H1
k gk
(3.1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
N2
N n2
N2
N2
N 2n
N 2n
O(H)
O( )
n3
Table 3.1: Complexity of the registration framework where N is the total number of pixels
of an input image and n is number of control points of the mesh grid
(3.2)
The following sections show that the choice of Optimization affect the performance because it contributes different complexity measure to evaluate p. The
step-size complexity often negligible because in medical image processing application, the initial deformation often not very large (no folding allowed), therefore
the approximate descent direction itself often satisfy the descent property.
14
(1)
Iterate :
(2)
W
by 2.9
p
Evaluate W(p) by 2.2
(3)
(4)
Evaluate I
(5)
Evaluate I
(6)
W
p
X
x
W
I
p
T
[I(W (p)) T ] (2.8)
(7)
(8)
(9)
i.
kpk p
ii.
kFk F
iii.
kgk g
iv.
for NCG method: (7) is not used, (9) p is computed using 3.8
Step (1)-(6) and (10) are essential for every method, step (8) is negligible for good
approximation. Therefore the comparable complexity now relies on step (7) and
(9).
15
(3.3)
3.1
Gauss-Newton (GN)
(3.4)
3.2
Levenberg-Marquardt (LM)
2
X W T W X
0
I W
...
p2
H
I
I
+
.
.
..
p
p
..
..
.
x
x
...
..
.
2
W
I p
n
(3.6)
16
3.3
Quasi-Newton (QN)
(3.7)
The computational cost to compute 3.7 is O(n3 ), thus the comparable complexity
is reduced to O(n3 ).
3.4
Starting from a Linear Conjugate Gradient method to solve for a convex quadratic
function (35, p102), Fletcher and Reeves (14) extend the method to nonlinear
17
3.5 Step-size
(3.8)
gkT gk
dT
(g
k1 k gk1)
(3.9)
(3.10)
The choice of k has a big influence to the convergent property of the method.
In this study, we adopt a hybrid version (9) as in (22):
k = max(0, min(kDY , kHS ))
(3.11)
One practical implementation of NCG methods is to restart the iteration at every m steps by setting = 0, this property is heuristically handled in the hybrid
method above. Readers can refer to (9) and (8) for extensive review on convergence properties of NCG. The comparable cost to evaluate the search direction is
O(n2 ).
3.5
Step-size
Many step-size strategies are available to date (35, Chapter 3), thus the choice
of an optimal step-size strategy remains one of the most difficult thing in optimization. A traditional line search method, Armijo rule (1), guarantee global
convergence, and for certain type of problems, we could apply the Modified Armijo
method (42) which shows better performance. For our comparative study, we implement an Armijo-similar method with a certain maximum number of iterations:
1
k 1
2
2
T
= min kgk+1 k2 kgk k2 gk p
(3.12)
k
2
2
1
where (0, 1); k = 0,1,2,. . . and 0,
2
18
3.6
19
Algorithm
Gauss-Newton
Levenberg-Marquardt
Quasi-Newton
NCG
Comparable Convergence
Complexity
Rate
N 2 n2 + n3
N 2 n2 + n3
n3
n2
Fast
Medium
Very Fast
Depends
Lena Test
Ankle Test
SSD Time SSD Time
47.2
40.8
53.7
39.7
26.7
39.5
7.8
18.4
20.8
20.8
22.3
21.4
29
44.8
11.6
78.2
20
(a) Lena
(b) Ankle
Figure 3.1: Convergence Rate for 3 random tests. Each panel in the figure shows one
21
random test. F is the SSD measure, T is CPU runtime measure. GN (GaussNewton),
LM(LevenbergMarquardt), QN(QuasiNewton), NCG(Nonlinear Conjugate Gradient)
(a) Lena
(b) Ankle
Figure 3.2: Average performance on 100 random tests. Difference is SSD measure, Time is
CPU runtime measure. GN (GaussNewton), LM(LevenbergMarquardt), QN(QuasiNewton),
22
NCG(Nonlinear Conjugate Gradient)
Chapter 4
Recursive Subsampling and
Weighted approach for
Deterministic Optimizations
4.1
23
paper by Sun and Guo (44), however our proposal method is more general.
As we know for our gradient descent optimizer, a better initial estimation
of objective parameters enhance the rate of convergence. This idea helps us to
think of a way to find a good estimation of warp parameters p supplied to the full
resolution image registration. For instance, given the images of resolution Nx Ny
with total number of pixels N , we could shrink it by half to N2x N2y with N4 pixels,
downscale it again by half to
Nx
4
N4y with
N
16
N
-pixels
16
images is around 4 times faster than N4 -pixels images and 16 times faster than
full images. The idea of recursive subsampling framework is that we will use the
result warped parameters of a smaller dimensional registration phase as an initial
estimated parameters for the larger dimensional registration phase. In order to
do this, we add a recursive subsampling mechanism to our traditional registration
framework 2.4. The number recursive phase indicates how many times we want
to downsampling the images, stop shrinking the images when the resolution is
smaller than 50 50 could be a good heuristic criterion.
24
The use of Recursive subsampling methods also empower the good performance of Quasi-Newton method compared to others because with good initial
estimation, Quasi-Newton method is less likely to encounter premature termination at local minima.
4.2
25
(4.1)
where x is a pixel of an image and x [0, 1]. We want to assign weights that
proportional to x . Lets define a weight that corresponding to pixel x to be:
(x) = + log(1 + x )
(4.2)
4.3
(4.3)
All difference measure are Sum of Square Difference (SSD), all time measure are
CPU runtime in seconds.
Recursive Subsampling methods.
First, we examine the rate of convergence of the Recursive Subsampling methods compared to traditional methods. In our tests, we downscale the images by
N N N
3 levels, thus there are 4 phases with number of pixels equal to 64
, 16 , 4 and N .
For a random test, we use the same set of random input images of Lena pictures
and Ankle pictures, apply both methods and compare the rate of convergence.
Figure 4.4 show the test results.
The above figure shows that Subsampling technique is generally faster and produces similar or better results compared to single (normal) methods. We now compare rate of convergence of different subsampling optimizations. Figure 4.5 shows
convergence rate of four Subsampling optimizations: Subsampling Gauss-Newton
(SGN), Subsampling Levenberg-Marquardt (SLM), Subsampling Quasi-Newton
(SQN) and Subsampling Nonlinear-Conjugate-Gradient (SNCG). Each panel
demonstrates convergence rate of one recursive phase: panel 1, 2, 3 and 4 correspond to
N N N
, , ,
64 16 4
resolution phases costs a fraction of time compared to large resolution phases. The
26
27
distribution. We then apply the Recursive Subsampling methods with and without weight. Figure 4.12 compares the convergence rate between two approaches.
Clearly, we can see that for local deformation, Weighted Transformation converge
faster than UnWeighted Transformation. One disadvantage of Weight approach
is that it seems to stop prematurely when it get close to the local minimum.
However, as same as an argument for single Quasi-Newton method, we benefit
from the fast convergence therefore we could reapply the method using current
results. In general, the optimal values obtained by Weight methods are almost as
same as UnWeight methods while the consumed time is much more appreciated
(Figure 4.13).
Image
Type
Initial
SSD
Ankle 300x336
1562.15
20.7
20.7
21.6
22.1
20.8
26.7
9.5
75.0
Brain 354x353
3315.5
18.9
18.9
18.9
19.1
25.1
31.6
9.5
82.0
Knee 353x343
2526.71
38.5
38.6
38.2
39.2
30.1
33.4
12.3
64.2
Lena 256x256
1025.37
38.3
37.6
36.8
39.1
25.8
34.3
9.8
20
Lung 394x378
1327.49
8.1
8.1
8.3
8.2
19.7
27.8
10.1
98.4
Table 4.1: Summary of results for Subsampling methods on Random Deformation (100 tests
each image). The suffix after an image name is the resolution. Subsampling
GN(GaussNewton), LM(LevenbergeMarquardt), QN(QuasiNewton),
NCG(NonlinearConjugateGradient)
28
(a) Lena
(b) Ankle
Figure 4.4: Convergence rate of single and subsampling methods. F is SSD measure, T
is CPU runtime. The lower red dots in subsampling methods are convergence of shrinked phases.
GN(GaussNewton),LM(LevenbergeMarquardt),QN(QuasiNewton),NCG(NonlinearConjugateGradient)
29
(a) Lena
(b) Ankle
Figure 4.5: SubSampling Convergence Rate. F is SSD value, T is runtime. Each panel
shows convergence rate of one recursive phase. S-prefix is Subsampling.
GN(GaussNewton),LM(LevenbergeMarquardt),QN(QuasiNewton),NCG(NonlinearConjugateGradient)
30
Figure 4.6: Lena: Performance of single and subsampling methods. Difference is SSD value,
Time is runtime. Single is traditional method. GN(GaussNewton),
LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)
31
Figure 4.7: Ankle: Performance of single and subsampling methods. Difference is SSD
value, Time is runtime. Single is traditional method. GN(GaussNewton),
32
LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)
Figure 4.8: Knee: Convergence of 5 5 and 7 7-grid of control points. F is SSD value, T
is runtime. The data is drawn from the final recursive phase with full resolution. Subsampling
GN(GaussNewton), LM(LevenbergeMarquardt), QN(QuasiNewton),
NCG(NonlinearConjugateGradient)
33
Figure 4.9: Knee: Average Performance with different grid size. Difference measure is SSD,
Time is runtime. Subsampling GN(GaussNewton), LM(LevenbergeMarquardt),
QN(QuasiNewton), NCG(NonlinearConjugateGradient)
34
35
36
Figure 4.12: Lena: Converegence of UnWeight and Weight methods for local deformation.
Difference is SSD measure, Time is CPU runtime. Subsampling GN(GaussNewton),
LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)
37
Figure 4.13: Lena: Average performance on 100 test between UnWeight and Weight
methods for local deformation. Difference is SSD, Time is runtime. SGN, SQN, SLM, SNCG
are Subsampling Optimization methods
38
Chapter 5
Stochastic Approximation
5.1
Stochastic Approximation
Image Registration is a large scale optimization problem. In addition to deterministic optimization algorithms that we reviewed in Chapter 3, the stochastic
gradient descent methods (23) is also widely investigated in current researchs.
The approximation framework follows the same scheme as 2.3 where:
k
pk+1 = pk + ak g
(5.1)
The distinctions in Stochastic Method (SM) include the derivative of the cost
k and the decaying sequence
function, g(pk ), is replaced by an approximation g
{ak }. SM aims to find the unknown solutions by successively reducing the inaccuracy in their estimates. They have been successfully applied in many applications
and been evaluated in the Image Registration field (22).
The speed and accuracy of SM depends on the quality of the gradient estimation obtained by random sampling. In general, Random Uniform Sampling (RUS)
is often used for both monomodal and multimodal stochastic image registration
(21). In this chapter, we present a novel approach to random Difference Sampling
(DS) methods which uses either Deterministic Sampling strategy or Stochastic
Sampling strategy. We argue that when the input images are very localised deformed, RUS results in too few samples at that specific location. One solution
could be allow more iterations, to ensure that in the end, enough samples have
been drawn from the local deformed regions. However, the immediate effect of
39
using more iterations is more computational time. If we can realiably detect the
misaligned regions, we could greatly accelerate the registration. Our Difference
Sampling Stochastic method aims to detect the deformed regions based on the
difference image x = kIx T xk, and randomly pick a subset of pixels from those
regions based on some defined non-uniform probability distribution.
When the image is largely deformed, the difference image is no longer a realiable indicator of misalignment. Ideally, Difference Sampling will converge to
Random Uniform as its non-uniform probabilities becomes almost uniform.
The registration procedure for Stochastic Approximation methods is similar
to 3.3 except we replace step (7), (8), (9), (10) by 5.1 and termination criterion
is replaced by convergence of {pk }:
W
by 2.9
p
Iterate : (2) Evaluate W(p) by 2.2
P recompute : (1)
U ntil :
i.
E{pk+1 } E{pk }
(5.2)
where is a subset of random pixels. The study of complexity for Deterministic Optimization (3.2) shows that step (5), (6), (7), (9) are the most expensive
and costs at least N 2 n + n3 (SQN and SNCG). In Stochastic methods, step (5)
and (6) costs S2 n instead of N 2 n, where S << N is the size of , step (7)
and (8) costs n. It makes up the competitive cost of S2 n + n much smaller than
N 2 n + n3 .
40
In the next sections we learn how to pick random pixels and how to approximate using those random data.
5.1.1
(5.3)
as k
(5.4)
5.1.2
Decaying sequence
(5.5)
Different adaptive step-sizes are described in (20) and (21). For simplicity and
adopting (3), we employ the implementation of step-sizes sequence in (20). The
algorithm observes that the more rapid oscillates of pk about the stationary point
p
, the closer pk is to its optinum. At the same time, the decaying sequence ak
should approach to zero.
41
(5.6)
where Qik is the number of sign changes in pim pim1 , m = 2,..k and Qi1 = 0.
A and are heuristically chosen depends on applications. In our experiments. I
set = 150 and A = 15.
5.2
Difference Sampling
Given a two input images which are very localised misalignment. A Random
Uniform Sampling (RUS) method might cause a bias in estimating the difference,
as in Figure 5.1, because it does not provide sufficient samples at the deformed
parts. Difference Sampling (DS), in contrast, is a non-uniform sampling approach
which could reduce the variance of the approximation, as in Figure 5.2. DS takes
into account the probability distribution based on the current difference between
two images.
The idea of DS basically follows a common sense: If the deformed image differs
from the referenced image at small local parts, we need not to consider very
much at those parts that are already identical but only at those small parts that
are different. In addition, the larger error between a pair of pixels (at the same
coordinates in two images), the more likely those pixels will be pick. Interestingly,
a few non-uniformly sampling approaches have been proposed before by Bhagalia
(3) and Sabuncu (41). However, their sampling methods emphasize the imageedges which is quite different to our approach and also cause a bias in case of
localised deformation.
In order to study the variance reduction by DS, we briefly explain how nonuniform random distribution brings advantages in certain problems. We want to
42
sample the a subset of pixels that indicate the current misalignment. Recall that
the error (2.4) can be written as:
1X
[I(W(p)) T]2
2 x
(5.7)
= f (X)
(5.8)
f (X)
w(X)
(5.9)
The expectation and variance of the above estimations can be written as:
uni = E(F ) =
dif
(5.10)
(5.11)
(5.12)
The above equations show that the expectation of RUS and DS methods are
similar. Therefore the use of DS is only advantageous if we can formulate a
f (X)
distribution X PD that ensure var( w(X)
) < var(f (X)). It is possible to do
so by setting the larger weight w(x) at pixels that have more influence to f (X).
How to set up the difference distribution and how to sample the data will be
discussed next.
43
5.3
Sampling Strategy
kIi Ti k
+ i
(5.13)
5.3.1
Deterministic Sampling
44
5.3.2
Stochastic Sampling
kIi T ik
45
5.4
We examine the rate of convergence between Subsampling Deterministic Optimization algorithms and Robin-Monro Stochastic Optimization using two type of
sampling techniques(Difference Sampling with Deterministic strategy, Stochastic
strategy; and Random uniform Sampling). For fairness of comparison, we take a
subset of 2% of total number of pixels for all stochastic methods. Registration
is applied for MRI pictures of Knee with 512x512 pixels. Since the main part
of the MRI picture locates at the center, the side parts have uniform intensity
therefore Random-Uniform Deformation (RUD) of the image results in the large
misalignment at the center of the image and Random-Local Deformation (RLD)
results in the small change at one part of the center of the image (Figure 5.4).
A Random uniform Sampling Stochastic method (RSS) will be re-sampled
at every iteration to avoid bias and earn sufficient samples at every region of
the pictures. The Deterministic Sampling Stochastic method (DSS) only get
samples once since it concentrate on the regions of difference and also take into
account neighbour regions. The Stochastic Sampling Stochastic method (SSS)
samples the pixels based on the current difference of the two images therefore,
theoretically, it need to be re-sampled at every iteration. However, Stochastic Resampling at every iteration is very costly and indeed, the changes in difference
after one iteration is not much, we could possibly re-sample using Stochastic
46
Figure 5.4: MRI Knee: We generate 100 random uniform deformation (RUD) images by
randomly perturbed every control points and 100 random local deformation (RLD) images by
randomly perturbed one control point.
47
48
produce any better result, then we apply Deterministic Optimization for further
reduction.
49
(a) RandomUniformDeform Convergence by UnWeight Det. methods. Weight methods are not
applicable here because of large deformation
Figure 5.5: Convergence of RUD by Deterministic and Stochastic methods. Data for Det.
Opt. is drawn from last phase of Recursive SubSampling. SGN,SLM,SQN,SNCG are
Subsampling Det. methods. D/R/S-5 SS are Deterministic/ Random/
Stochastic-resample every 5 iterations
50 Sampling Stochastic methods
Figure 5.6: Convergence of RLD by Deterministic and Stochastic methods. Data for Det.
Opt. is drawn from last phase of Recursive SubSampling. SGN,SLM,SQN,SNCG are
Subsampling Det. methods. DSS/RSS/SSS-5 are Deterministic/ Random/
51
Stochastic-resample every 5 iterations Sampling Stochastic methods
Figure 5.7: MRI Knee: Average Performance of Stochastic Approximation methods for two
type of deform: RUD and RLD. DSS/RSS/SSS-suffix are Deterministic/ Random/
Stochastic-resample frequency52Sampling Stochastic methods
(a) Convergence Rate of Combined and Stochastic methods. F is SSD, Time is runtime.
(b) Average Performance on 100 tests of Combined and Stochastic methods. F is SSD, Time is
runtime.
Figure 5.8: Comparison between Combined Stoc-Det and Stochastic methods. DSS-SQN,
RSS-SQN, SSS-SQN are combined methods. DSS, RSS, SSS-5 are Stochastic methods
53
Chapter 6
MATLAB Implementation: Vreg
6.1
Introduction
The code presents all algorithms described in this paper is manually designed
using MATLAB version 7 R14 SP3 (www.mathworks.com) with Spline toolbox.
Some coding notations are learned from (2) and B-spline implementation follows a tutorial by R. Larsen (25). I outline some important evaluations in the
registration framework 3.3
6.2
Implementation
6.2.1
Cost Function F
6.2.2
Image Gradient I
54
6.2 Implementation
6.2.3
W
p
We construct the transformation based on B-spline model (2.2). The tensor Bspline is defined by two sets of control points (knots) with respect to the row
and the column directions. The knots are placed at uniform spacing. In order
to handle the displacement of image boundaries, we need to add extra 3 knots
on top of the boundaries knots. For instance, a set of knots on one row can be
constructed by:
k = augknt(0:space:rowlength,3)
Each 2D basis function is the tensor product of a row and a column B-spline basis
function. Let the row functions be bxi (x), i=1. . . m and the column functions be
byj (y), j=1. . . n. Then the displacement W = (Wx , Wy ) becomes:
Wx =
Wy =
m X
n
X
i=1 j=1
m X
n
X
i=1 j=1
Use MATLAB spline make function spmak to construct basis function from knots
sequence. We can classify row and column B-spline functions into two sets:
Qx i = bxi (x)
Qy j = byj (y)
Using Knonecker product Q = kron(speye(2),kron(Qx,Qy)) we obtain:
W = I2 Qx Qy p = Qp
where I2 is a sparse identity matrix size 2 2. It is easy to see that
6.2.4
W
p
Other evaluations
55
= Q.
6.3
User Guide
Using Vreg is easy. User needs to run the application on MATLAB, set the
paths to VReg package and all of its subfolders. Once you have done this, call
the register function with desired parameters and let the machine works it out.
The registration process for inputs of resolution less than 512 512 should not
take more than one minute if using an appropriate algorithm.
Given any two 2D images T and I where I is a deformed version of T. We
start the registration process by calling:
[F,p] =
register(T,I,p1,p2,[algo],[recur.phase],[init.warp],[max.iter],[show],[fig])
Only the first four parameters are always essential although user is encouraged
to indicate algo. The rest are not needed however you need to put empty notation
[] at the parameter that you do not want to include. List of parameters:
T,I are filenames of input images. I is the deformed image.
p1,p2 indicates grid of control points p1 p2, eg. 7 7.
algo is the choice of algorithms:
GaussNewton,LevenbergMarquardt,QuasiNewton,NonlinearConjugateGradient
or, alternatively, GN, LM, QN, NCG. User wants to use Weight
methods have to add arguments weight,[] at the end of the call,
ie. after [fig]. Default = 0.75.
RandomSamplingStochastic, DeterministicSamplingStochastic or, alternatively, RSS, DSS. User uses these methods can add arguments [%],[],[A] after [fig] to indicates how many pixels should
be sampled, eg. 0.02 indicates 2% of total pixels. Default values:
%=0.02, =150, A=15.
StochasticSamplingStochastic[-resample] or SSS[-resample] where [-resample]
indicates the frequency of resampling, eg. SSS-5 means resampling
after every 5 iterations. User uses this method can add arguments
[],[],[A] to the end. Note: (0.2, 0.8), larger fewer samples
(5.3.2). Default values: =0.7, =150, A=15.
56
recur.phase is the number of recursive phase we want. A single registration has 0 recursive phase. A Subsampling methods has 1 or more recursive
phases. Note: do not chop down the image too much, it can not be registered, recur.phase should be less than 4. Default values: 3 for Deterministic
methods and always 0 for Stochastic methods.
init.warp is the initial estimation of warped parameters, it should be the
result from a previous registration using this application. Default value is
zeros.
max.iter indicates maximum number of iterations allowed to run. Default
value: 100.
show, fig : show=1 will show the registration process on the figure(fig)
and suitable for demo because it slows down the registration. Default value:
show=0
57
Chapter 7
Conclusion
In this paper, we have discussed the performance of different type of Optimization methods by theoretical review and practical experiments. The choice of
Optimization methods is mainly depends on: the size of the input images and
the type of deformation (random deformation or very localised deformation). We
have shown that the new proposed approaches based on detecting the misalignment parts of the input images can accelerate and produce a better results.
We define Optimizations as either Deterministic approach or Stochastic approach. For each approach, we construct a unify registration framework. Deterministic approach is suitable for small and medium size images while Stochastic
is more suitable for large images.
In Deterministic approach, our extensive study shows that Quasi-Newton is
the better choice compared to Gauss-Newton, Levenberg-Marquardt and Nonlinear Conjugate Gradient methods. In addition, the Recursive Subsampling methods always outperform the methods without Subsampling. We also examine the
effect of applying Weight (based on difference image) to transformation matrix.
The results show that Weight Deterministic methods produce better convergence
rate compared to UnWeight methods for very localised deformed input images.
In Stochastic approach, we have demonstrated that, for localised deformation,
the use of Difference Sampling methods with either Deterministic strategy or
Stochastic strategy produces a better convergence rate compared to Random
Uniform Sampling. In addition, Stochastic Sampling strategy perform slightly
better than Deterministic strategy for most of localised deformation experiments;
58
59
Appendix A
Performance of SubSampling
methods on different images
Figure A.1 shows the images used for registration. Lena 256256 indicates the
image Lena has resolution of 256x256.
The box-plot below shows the average performance on 100 tests.
60
Figure A.1: Different type of images used for SubSampling Methods Test
61
62
63
64
References
[1] Larry Armijo. Minimization of functions having lipschitz continuous first
partial derivatives. 1966. 18
[2] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying
framework. International Journal of Computer Vision, 56:221 255, 2004.
7, 16, 54
[3] Roshni R. Bhagalia.
65
REFERENCES
[9] Y.H. Dai. An efficient hybrid conjugate gradient method for unconstrained
optimization. Ann. Oper. Res., vol. 103, pp. 3347, 2001. 18
[10] Y.H. Dai. A family of hybrid conjugate gradient methods for unconstrained
optimization. Math. Comput., vol. 72, pp. 13171328, 2003. 12, 18
[11] J. E. Dennis, Jr., and J. J. Mor. Quasi-newton methods, motivation and
theory. SIAM Rev, vol. 19, pp. 46-89, 1977. 12, 17
[12] Vandermeulen et. al. Multi-modality image registration within covira. In
Medical imaging: analysis of multimodality 2D/3D images, Vol. 19 of Studies
in health, technology and informatics, pp. 2942, 1995. 3
[13] A. C. Evans, D. L. Collins, P. Neelin, and T. S. Marrett. Correlative analysis
of three-dimensional brain images. Computer-integrated surgery, Technology
and clinical applications, 1996. 10
[14] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.
Computer Journal 7, 1964. 17
[15] M. Gleicher. Projective registration with difference decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997. 10
[16] A. Goshtasby. Image registration by local approximation methods. Image
and Vision Computing 6, 1988. 8
[17] Eldad Haber and Jan Modersitzki. Image registration with guaranteed displacement regularity. Int. J. Comput. Vision, 71(3):361372, 2007. 13
[18] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in
evolution strategies. Evol. Comput., vol. 9, no. 2, pp. 159195, 2001. 12
[19] Neville Hunt and Sidney Tyrrell. http://www.coventry.ac.uk/ec/ nhunt/meths/strati.html.
44
[20] H. Kesten. Accelerated stochastic approximation. Ann. Math. Stat., 29:41
59, 1958. 41
66
REFERENCES
[21] Stefan Klein, Josien P. W. Pluim, Marius Staring, and Max A. Viergever.
Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 81:227239, 2009. 39, 41
[22] Stefan Klein, Marius Staring, and Josien P.W. Pluim. Evaluation of optimization methods for nonrigid medical image registration using mutual
information and b-splines. IEEE Transactions on Image Processing, 2007.
17, 18, 23, 39
[23] H.J. Kushner and G.G. Yin. Stochastic Approximation and recursive algorithms and applications. Springer-Verlag, New York, 2003. 39
[24] J. Kybic and M. Unser. Fast parametric elastic image registration. IEEE
Trans. Image Process., vol. 12, no. 11, pp. 14271442, 2003. 23
[25] Rasmus Larsen. Medical image analysis non-linear b-spline based image
registration, part 2. tutorial, 2007. 54
[26] Seungyong Lee, George Wolberg, and Sung Yong Shin. Scattered data interpolation with multilevel b-splines. IEEE Transactions on Visualization and
Computer Graphics, 3:228244, 1997. 9
[27] Kenneth Levenberg. A method for the solution of certain non-linear problems
in least squares. The Quarterly of Applied Mathematics, 2:164168, 1944. 7
[28] B. Lucas and T. Kanade. An iterative image registration technique with
an application to stereo vision. In Proceedings of the International Joint
Conference on Artificial Intelligence, 1981. 10
[29] F. Maes, D. Vandermeulen, and P. Suetens. Comparative evaluation of multiresolution optimization strategies for multimodality image registration by
maximization of mutual information. Med. Image Anal., 3:373386, 1999. 7
[30] J. B. Antoine Maintz and Max A. Viergever. A survey of medical image
registration, 1997. 3, 6
[31] Donald Marquardt. An algorithm for least-squares estimation of nonlinear
parameters. SIAM Journal on Applied Mathematics, 11:431441, 1963. 7
67
REFERENCES
[32] D. Mattes, D. R. Haynor, H. Vesselle, T. K. Lewellen, and W. Eubank. Petct image registration in the chest using free-form deformations. IEEE Trans.
Med. Imag., vol. 22, no. 1, pp. 120128, 2003. 23
[33] C. R. Meyer and et. al. Demonstration of accuracy and clinical versatility of mutual information for automatic multimodality image fusion using
affine and thin-plate spline warped geometric deformations. Medical Image
Analysis, 1997. 8
[34] Bryan S. Morse. Image registration, lucas-kanade algorithm. CS 650: Computer Vision lecture notes. vii, 7
[35] J. Nocedal and S. J. Wright. Numerical optimization. Springer, 2006. 7, 11,
12, 16, 17, 18
[36] X. Pennec, P. Cachier, and N. Ayache. Tracking brain deformations in time
sequences of 3d us images. Pattern Recognit. Lett., 24:801813, 2003. 2
[37] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math.
Statist., 22:400407, 1951. 12, 41
[38] D. Rueckert and et. al. Nonrigid registration using free-form deformations:
Application to breast mr images. IEEE Transaction on Medical Imaging, 18,
1999. 8
[39] Daniel Rueckert. Tutorial on image registration. Tutorial. 6
[40] Berc Rustem. Algorithms for equilibria, games and systems of nonlinear
equations. Lecture Notes, 2005. 17
[41] M. R. Sabuncu and P. J. Ramadge. Gradient based nonuniform sampling
for information theoretic alignment methods. Proc. Intl. Conf. IEEE Engr.
in Med. and Biol. Soc., 3:16831686, 2004. 42, 44
[42] Z. J. Shi and J. Shen. New inexact line search method for unconstrained
optimization. 2005. 18
68
REFERENCES
69