You are on page 1of 9

EE698B-Digital Video Processing

2006-2007/I
Term Project
HAWK-EYE TRACKING OF A CRICKET BALL

Ankit Misra
Y3053
ankitm@iitk.ac.in
&
Gaurav Teltia
Y3120
gteltia@iitk.ac.in

Report Last Compiled : November 15, 2006

ABSTRACT

Object tracking is a key computer vision topic, which aims at detecting the position of a moving
object from a video sequence. Our project will explore the problem of Object tracking using two
entirely different approaches namely a support vector based color matching algorithm and other
block matching algorithms. Reliable detection has been stressed in the first approach whereas a
computationally efficient approach has been followed in the later. Finally a hybrid approach has
been suggested for real time cricket ball tracking. A Hawk eye demonstration as an application has
also been implemented.

1
Contents

1 Introduction 3

1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Design 3

2.1 Color-Pixel Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Motion Block Matching Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Problems and Inferences 6

3.1 Block Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Search Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Hybrid Approach 7

5 The Hawk Eye 7

6 Future Work 9

2
1 Introduction

1.1 Problem Statement

Develop an algorithm to track a cricket ball during a match and construct an hawk-eye view of the
global trajectory of the ball.

1.2 Objective

The aim is to track, as closely as possible, the cricket ball in a match.Inherently the problem
statement has more challenges other than the usual ones associated with object tracking.These
include problems arising due to extremely fast moving ball (against much slowly moving football
or basketball).Also its small size aggravates the problem.Futhermore we wish to propose a realtime
algorithm which makes the job even more tougher.

1.3 Applications

The generic “hybrid-tracking” algorithm proposed is of vital importance for Live TV coverage com-
panies.They utilize such algorithm to present a better analysis of the match.The hawk-eye animator
helps commentators to convey their comments easily to the viewers. The algorithm proposed is
totally general and the hawk-eye animator is an add-on developed during this term project.The
tracking algorithm would find its application in any surveillance and monitoring situation.

2 Design

In this section we investigate the usage of two different types of approach and then propose a hybrid
approach for optimal performance.

2.1 Color-Pixel Based

Over time, individual pixels in a video sequence undergo constant change. These changes can
stem from many possibilities including the presence of moving objects, fluctuations in lighting, and
intrinsic capture noise. Reliable background segmentation is the first step in the processing and
it must account for these changes in a graceful manner. Stauffer and Grimson in [1] push the
notion that different types of pixel fluctuations can be modeled by a weighted mixture of Gaussian
distributions. The distribution mixture for an individual pixel over time is:

PK
P (Xt ) = i=1 ωi,t N (X1t , µi,t , σi,t )

where Xt is a vector of YUV values pertaining to a pixel at time t. The omega values correspond to
the weights of each Gaussian component contributes to the estimated distribution of a pixel. Multiple

3
Gaussian are necessary to account for the possibility of changing lighting conditions and other
unpredictable interactions between foreground and background. It is desired to use as many Gaussian
classes as possible to increase robustness for the system, but it can become quite computationally
expensive if that number becomes too high. Empirically, at least five classes of at least 5 are sufficient
for an accurate representation of the background layer of video sequences.

The values of the means, variances, and weights are computed and updated on a frame-by-frame
basis. The number of frames necessary for convergence to a suitable background model is quite low.

Once the background is identified for a given frame in the video sequence, it becomes quite easy to
separate the objects of interest. However, the areas in the foreground pertaining to moving objects
tend to be somewhat noisy due to the effects of interaction between the moving objects and the
background layer. Examples of these interactions are shadowing and moving of background objects.
The noise usually presents itself in the form of trails that follow the moving object or sparse fields
of noise that linger where a moving object was located in a previous frame.This could be avoided
using a simple median filter if the artifact is too severe.

So now we have isolated the foreground that is likely to include bat, batsmen etc in addition to
the ball ( our area of interest ).We wish to isolate the ball from rest of the foreground objects. We
accomplished this using a novel idea of involving support vectors to generate the decision function
for the same.We make use of support vector theory in decision making to decide if the color of the
pixel in question ‘matches’ to that of a ball.The detailed theory is skipped here due to shortage of
space.In nut-shell we do a non-linear transformation from the RGB space so that a simple linear
decision function is crafted.

2.2 Motion Block Matching Based

We explored the capabilities of the block matching algorithm when applied for object tracking.
These algorithms estimate the amount of motion on a block by block basis, i.e. for each block in
the current frame, a block from the previous frame is found, that is said to match this block based
on a certain criterion like Mean Absolute Difference or AMAD.

We implemented and compared different algorithms based on some test streams. Our evaluation
was based on two measures, CPU time for computational complexity and AMAD for quality. We
tested the performance of the following algorithms:

1. Exhaustive Search Algorithm One of the first algorithms to be used for block based motion
compensation is what is called the Full Search or the Exhaustive Search. In this, each block
within a given search window is compared to the current block and the best match is obtained
(based on one of the comparison criterion). Although, this algorithm is the best one in terms of
the quality of the predicted image and the simplicity of the algorithm, it is very computationally
intensive. With the realization that motion compensation is the most computationally intensive
operation in the coding and transmitting of video streams, people started looking for more
efficient algorithms. However, there is a trade-off between the efficiency of the algorithm
and the quality of the prediction image. Keeping this trade-off in mind a lot of algorithms
have been developed. These algorithms are called Sub-Optimal because although they are
computationally more efficient than the Full search, they do not give as good a quality as it.
2. Three Step Search AlgorithmThis algorithm was introduced by Koga et al in 1981. It be-
came very popular because of its simplicity and also robust and near optimal performance. It

4
searches for the best motion vectors in a coarse to fine search pattern. The algorithm may be
described as:

Step 1: An initial step size is picked. Eight blocks at a distance of step size from the centre
(around the centre block) are picked for comparison.

Step 2: The step size is halved. The centre is moved to the point with the minimum distortion.
Steps 1 and 2 are repeated till thestep size becomes smaller than 1.
One problem that occurs with the Three Step Search is that it uses a uniformly allocated
checking point pattern in the first step, which becomes inefficient for small motion estimation.
3. Four Step Search The algorithm starts with a nine point comparison and then the other points
for comparison are selected based on the following algorithm:

Step 1: Start with a step size of 2. Pick nine points around the search window centre. Calcu-
late the distortion and find the point with the smallest distortion. If this point is found to be
the centre of the searching area go to step 4, otherwise go to step 2.

Step 2: Move the centre to the point with the smallest distortion. The step size is maintained
at 2. The search pattern, however depends on the position of the previous minimum distortion.

a) If the previous minimum point is located at the corner of the previous search area, five
points are picked.

b) If the previous minimum distortion point is located at the middle of the horizontal or ver-
tical axis of the previous search window, three additional checking points are picked.

Locate the point with the minimum distortion. If this is at the centre, go to step 4 otherwise
go to step 3.

Step 3: The search pattern strategy is the same, however it will finally go to step 4.

Step 4: The step size is reduced to 1 and all nine points around the centre of the search are
examined.
The computational complexity of the four step search is less than that of the three step search,
while the performance in terms of quality is as good. It is also more robust than the three
step search and it maintains its performance for image sequences with complex movements like
camera zooming and fast motion. Hence it is a very attractive strategy for motion estimation.
4. Diamond SearchThe proposed DS algorithm employs two search patterns which are derived
from the crosses (+) The first pattern, called large diamond search pattern (LDSP), comprises
nine checking points from which eight points surround the center one to compose a diamond
shape. The second pattern consisting of five checking points forms a smaller diamond shape,
called small diamond search pattern (SDSP). In the searching procedure of the DS algorithm,
LDSP is repeatedly used until the step in which the minimum block distortion (MBD) occurs
at the center point. The search pattern is then switched from LDSP to SDSP as reaching to
the final search stage. Among the five checking points in SDSP, the position yielding the MBD
provides the motion vector of the best matching block. The DS algorithm is summarized as
follows.

Step 1: The initial LDSP is centered at the origin of the search window, and the 9 checking
points of LDSP are tested. If the MBD point calculated is located at the center position, go

5
(a) Three Step Search

(b) Four Step Search

Figure 1: Block Matching Algorithm Tested

to Step 3; otherwise, go to Step 2.

Step 2: The MBD point found in the previous search step is re-positioned as the center point
to form a new LDSP. If the new MBD point obtained is located at the center position, go to
Step 3; otherwise, recursively repeat this step.

Step 3: Switch the search pattern from LDSP to SDSP. The MBD point found in this step is
the final solution of the motion vector which points to the best matching block.

3 Problems and Inferences

3.1 Block Size

The selection of block size is essential to the performance of any block matching algorithm. In our
case also the selection of block size was a not an easy task. Initially we picked the block size equal
to the size of the ball. But we observed that even a small error in the estimation of the center of
the ball led to a drifting away of the block from the ball on to the pitch as the ball moved for
large periods over the pitch and slowly the block started mapping the pitch.
A smaller block size on the other hand gives a jitter as the ball changes in color and texture over
the course of the video and the block would also match on to different areas of the ball. At one time
we even saw the case of internal drift where in the block slowly drifted on to the seam and then
on to a similar looking pitch. We finally decided on a smaller block size and the problem of internal
drift was solved by checking the amount of red pixels available in each block if there were no red
pixels found we would switch to color based techniques for the next frame

6
3.2 Search Area

The issue of search area also presented an interesting case as the speed of the ball was not only
different for different bowlers but also within a video (for example when the bowl hits the bat it
goes faster). We could not chose a large search area as that would be computationally expensive
and a smaller search area meant that we could lose the ball. We went for the later as even if we lost
the ball we could fall back on the color based approach to look for it for the next frame. To detect
a ball loss we again set a minimum threshold on the number of red pixels in a block.

4 Hybrid Approach

After facing different set of problems in both the approaches we went for a hybrid approach based
on both the techniques which is summarized as follows:

Step 1: We started with looking for the ball from the color based technique as the block matching
algorithms can never be self starting because of their generic nature. We found out the co-ordinates
of the center of the ball by finding out the center of mass of the best match.

Step 2: Next we take the co-ordinates from Step 1 and switch to a much faster and computationally
less expensive block matching algorithm, which tracks the ball on subsequent frames.

Step 3: As discussed in the previous section we know that a block matching algorithm can lose the
ball and map on to some other objects like gloves, pitch portions of the bat etc therefore it was
necessary to keep checking the block matching algorithm as well. For this we calculate the number
of red pixels in each block at each step as well, if they are found to be less than a threshold we
switch to the color based technique to find the ball again.

Step 4: After the color based techniques give us the ball coordinated again then we switch over to
block matching again. In this fashion we switch controls between the two techniques wherein we
employ the faster block matching algorithms for most frames but switch to a more reliable color
based technique whenever required thus making a reliable real time ball tracking scheme possible.

5 The Hawk Eye

After tracking the ball and compensating for camera motion a 3D hawk-eye view of the trajectory
was constructed.Averaging of neighboring outputs to smoothen the final trajectory.Missing values
were interpolated from the other values.The output can be seen in Fig. 3

7
(a) Detection From Support Vectors and Foreground When Block Matching Fails

(b) Final Output

Figure 2: Hybrid Approach

8
Figure 3: Hawk-Eye Animator Output from different views

6 Future Work

In future one may like to generate a hawk-eye animation from two orthographic views that would
give even better trajectory.

References
[1] Chris Stauffer and W.E.L. Grimson. Adaptive background mixture models for realtime tracking,
CVPR99, Fort Colins, CO, (June 1999).

[2] Shan Zhu, and Kai-Kuang Ma IEEE Trans. on Image Processing Volume 9, Number 2, February
2000 : Pages 287:290

You might also like