You are on page 1of 4

Object Tracking in Video Pictures based on Image Segmentation and Pattern Matching

Takashi Morimoto, Osamu Kiriyama, Youmei Harada, Hidekazu Adachi, Tetsushi Koide and Hans J rgen Mattausch u
Research Center for Nanodevices and Systems, Hiroshima University, 1-4-2 Kagamiyama, Higashi-Hiroshima 739-8527, Japan Phone:+81-82-424-6265, Fax:+81-82-424-3499, Email:morimoto@sxsys.hiroshima-u.ac.jp

Abstract We propose a novel algorithm for object tracking in video pictures, based on image segmentation and pattern matching. With the image segmentation, we can detect all objects in images no matter whether they are moving or not. Using image segmentation results of successive frames, we exploit pattern matching in a simple feature space for tracking of the objects. Consequently, the proposed algorithm can be applied to multiple moving and still objects even in the case of a moving camera. We describe the algorithm in detail and perform simulation experiments on object tracking which verify the tracking algorithms efciency. VLSI implementation of the proposed algorithm is possible.

with Imax being the maximum value of the luminance. For color images, the connection-weights are given by the minimum of the three components in the RGB data, Wij = min[Wij (R), Wij (G), Wij (B)]. (2)

I. I NTRODUCTION The moving object tracking in video pictures has attracted a great deal of interest in computer vision [1]. For object recognition, navigation systems and surveillance systems, object tracking is an indispensable rst-step. The conventional approach to object tracking is based on the difference between the current image and the background image [2], [3], [4], [5]. However, algorithms based on the difference image cannot simultaneously detect still objects. Furthermore, they cannot be applied to the case of a moving camera. Algorithms including the camera motion information have been proposed previously, but, they still contain problems in separating the information from the background. In this paper, we propose a novel algorithm for object tracking in video pictures. Our algorithm is based on image segmentation and pattern matching. With the image segmentation algorithm, we can extract all objects in images. The proposed algorithm for tracking uses pattern matching between successive frames. As a consequence, the algorithm can simultaneously track multiple moving and still objects in video pictures and can even be applied in the case of a moving camera. This paper is organized as follows. The proposed algorithm, consisting of three stages for image segmentation, feature extraction as well as object tracking and motion determination, is described in Sec. II in detail. In Sec. III, we present simulated results of a multiple object-tracking example. Section IV is devoted to the conclusions. II. P ROPOSED C ONCEPT FOR M OVING O BJECTS T RACKING A. Image Segmentation The employed image segmentation algorithm [6] allows digital VLSI implementation, and has already been veried by designed and fabricated image segmentation test-chips [7]. The image segmentation algorithm can be classied as region growing approach. Inclusion of each pixel i in a given segment is decided by examination of the pixels connection-weights Wij with neighboring pixels j. For gray-scale images, the connectionweights Wij are functions of the luminance differences |Ii Ij | and determined as Imax Wij = , (1) 1 + |Ii Ij |

The algorithm is given in Fig. 1 and proceeds as follows. First, in the initialization phase, the connection weights are calculated. Then, leader cells (self-excitable cells), which are seed pixels of the region growing process, are dened by the condition Wij > p , jN (i) where p is a threshold. Second, one of the leader cells is self-excited and a new region is grown from this leader cell. During each step of the growing process a new cell is excited, provided that the sum of connection-weights with neighboring excited cells is larger than a threshold z . The above growing operation is repeated as long as excitable cells exist. Finally, if there is no excitable cell left, then the process nishes and the excited cells, considered to be a segment, are labeled and inhibited. The growing of new segments from leader cells is continued as long as remaining leader cells exist. Figure 2 shows the owchart of the image segmentation algorithm. B. Feature Extraction for Segments In this subsection, we describe the extracted features of segmented objects. Figure 3 shows an example of a segment for explanation purposes. 1) Area: By counting the number of pixels included in segment i of the t-th frame, we calculate the area of the object ai (t). 2) Width and Height: We extract the positions of the pixel Pxmax (Pxmin ) which has the maximum (minimum) x-component: Pxmax = (Xmax,x , Xmax,y ), Pxmin = (Xmin,x , Xmin,y ), (3a) (3b)

where Xmax,x , Xmax,y , Xmin,x , and Xmin,y are the x- and ycoordinates of the rightmost and leftmost boundary of segment i, respectively. In addition, we also extract Pymax = (Ymax,x , Ymax,y ), Pymin = (Ymin,x , Ymin,y ). (4a) (4b)

Then we calculate the width w and the height h of the objects as follows wi (t) = Xmax,x Xmin,x , hi (t) = Ymax,y Ymin,y . (5a) (5b)

3) Position: We dene the positions of each object in the frame as follows Xmax,x + Xmin,x xi (t) = , (6a) 2 Ymax,y + Ymin,y yi (t) = . (6b) 2

0-7803-8834-8/05/$20.00 2005 IEEE.

3215

START

[Image Segmentation Algorithm] 1) Initialization a) Set all pixels (cells) i to non-excitation. Xi (0) = 0; Zi (0) = 0; li = 0; b) Calculation of the connection-weights. (a) For gray-scale image Imax Wik = 1+|I I | , k N (i); i k (b) For color image I(R) W (R)ik = 1+|I(R) max , I(R) |

(a) Initialization Calculation of connection-weights Wik Determination of leader pixel (pi 1) (b) Self-excitable pixel detected (leader pixel with pi 1 present) ? YES NO END (c) (d) Excitable pixel detected ? YES

c) Determination of leader pixels (cells) if ( Wij > p ) then pi = 1; otherwise pi = 0; d) Initialization of global inhibitor. Z(0) = 0; 2) Self-excitation (New segments leader-pixel excitation) if (leader pixels == ) then stop; // terminate else if (f ind leader() == i pi == 1) then Xi (t + 1) = 1; Z(t + 1) = 1; //self-excitation go to (3.Excitation) else go to (2.Self-excitation) ; 3) Excitation (Segment-growing) Setting of global inhibitor. Z(t) = Zi (t); // logical OR of Zi (t) i if (Z(t) == 0) then if (Xi (t) == 1) then Xi (t + 1) = 0; Zi (t + 1) = 0; pi = 0; li = 1; // inhibition (labeled) go to (2.Self-excitation); else if (Xi (t) == 0 Zi (t) == 0) then Si (t) = (Wik Xk (t)); if (Si (t) > z ) then Xi (t + 1) = 1; Zi (t + 1) = 1; // excitation else Xi (t + 1) = 0; Zi (t + 1) = 0;//non-excitation else if (Xi (t) == 1 Zi (t) == 1) then Xi (t + 1) = 1; Zi (t + 1) = 0; go to (3.Excitation);
kNi jNi

i k I(G) W (G)ik = 1+|I(G) max , i I(G)k | I(B) , W (B)ik = 1+|I(B) max I(B)k | i Wik = min{W (R)ik , W (G)ik , W (B)ik }.

(e) NO Excitation of dependent pixels

Self-excitation (image segmentation of one region starts)

(f) Inhibition, region labeling (image segmentation of one region ends)

Fig. 2. Flowchart of the used image segmentation algorithm. (a) initialization, (b) detection of self-excitable pixel, (c) self-excitation, (d) detection of excitable dependent pixels (e) excitation, (f) inhibition of all excited pixels, and labeling.
x Pymin=(Ymin,x , Ymin,y) y Pxmin=(Xmin,x , Xmin,y) wi hi
segment i

Pxmax=(Xmax,x , Xmax,y) Pymax=(Ymax,x , Ymax,y)

Fig. 1. Detailed description of the used image segmentation algorithm. Wik , Ii and t are connection weight between pixel i and k, luminance of pixel i and time-step variable, respectively.

Fig. 3. Explanation of the proposed feature extraction from the image segmentation result.

4) Color: Using the image data at Pxmax , Pxmin , Pymax and Pymin , we dene the color feature of each object as in Eq. (7) for the R (Red) component Ri (t) = [R(Pxmax ) + R(Pxmin ) +R(Pymax ) + R(Pymin )] /4, as well as by equivalent equations for the G and B components. C. Objects Tracking and Motion Determination The proposed algorithm for object tracking exploits pattern matching with the features above and makes use of the minimum distance search in the feature space. We now go into more details of our algorithm. Using the image segmentation result of the object i in the t-th frame, we rst extract the features of the object (t, i). Here, the notation (t, i) stands for the objects i in the t-th frame. Then we perform the minimum distance search in the feature space between (t, i) and (t 1, j) for all objects j in the preceding frame. Finally, the object (t, i) is identied with the object in the preceding frame which has the minimum distance from (t, i). Repeating this matching procedure for all segments in the current frame, we can identify all objects one by one and can keep track of the objects between frames. A few comments on further renements of the proposed algorithm are in order. (7)

(1) In calculation of the distance between (t, i) and (t 1, j) in position space, it is more appropriate to take account of motion determination and use estimated positions (x (t) and y (t)) x (t) = xj (t 1) + mx,j (t 1), j
yj (t)

(8a) (8b) (8c) (8d)

= yj (t 1) + my,j (t 1),

mx,j (t 1) = xj (t 1) xj (t 2), my,j (t 1) = yj (t 1) yj (t 2),

instead of raw positions xj (t 1) and yj (t 1) (see Fig. 4). The quantities mx,j (t 1) and my,j (t 1) correspond to the motion vector of x- and y-directions of the object j. These replacements are available and used from the third frame onwards. (2) We have not specied the distance measure used for matching yet. In the simulation experiments we could conrm that besides the Euclidean distance DE the simpler Manhattan distance DM is already sufcient for object tracking purposes. These two distances between vectors (x1 , , xn ) and (y1 , , xn ) are dened as DE = and DM = |x1 y1 | + + |xn yn |. (3) In order to treat all object features with equal weights, it is necessary to normalize the features. One possible way is dividing them by their maximum values. Dividing by 2n , where the integer n

(x1 y1 )2 + + (xn yn )2 ,

3216

x (xi(t),yi(t)) y (xj(t-1),yj(t-1))
motion vector (t-2,j) (t-1,j)

(t,i)

2 4 3

2 1 4 3
2nd frame 1st frame 2nd frame

distance to real position

1st frame

1
estimated position

2 4

2 1 4 3
4th frame 3rd frame 4th frame

my,j(t-1) mx,j(t-1) (xj(t-2),yj(t-2))

3
3rd frame

xj'(t)=xj(t-1)+mx,j(t-1) yj'(t)=yj(t-1)+my,j(t-1)

sequence 1 (moving balls)

sequence 2 (humans)

Fig. 6.

Sample video pictures.

Fig. 4.

Estimation of the positions in the next frame.

[Objects Tracking Algorithm] 1) Feature Extraction a) Extraction of the area ai (t) and the positions of the pixels Pxmax , Pxmax , Pxmax and Pxmax for the segment i. b) Calculation of the width, height of the segment i wi (t) = Xmax,x Xmin,x , hi (t) = Ymax,y Ymin,y . c) Calculation of the (present) positions (xi (t), yi (t)) of the segment i. xi (t) = (Xmax,x + Xmin,x )/2, yi (t) = (Ymax,y Ymin,y )/2. d) Calculation of the color features of the segment i. Ri (t) = [R(Pxmax ) + R(Pxmin ) +R(Pymax ) + R(Pymin )]/4, Gi (t) = [G(Pxmax ) + G(Pxmin ) +G(Pymax ) + G(Pymin )]/4, Bi (t) = [B(Pxmax ) + B(Pxmin ) +B(Pymax ) + B(Pymin )]/4. 2) Pattern Matching in the Feature Space if (t == 1) then a) Perform feature-extraction for segments. b) go to (image segmentation of the next frame). if (t 2) then a) Perform feature-extraction for segment i. b) Calculation of distances D(t, i; t 1, j), j. c) Search for the minimum distance among the distances D(t, i; t 1, k) min D(t, i; t 1, j), j. d) Identify (t, i) with (t 1, k) and remove (t 1, k) from reference data. e) Estimation of the positions of the segment i in the next frame x (t + 1) = xi (t) + mx,i (t), i yi (t + 1) = yi (t) + my,i (t). f) Repeat the matching procedure [from b) to e)] for all segments in the t-th frame. g) go to (image segmentation of the next frame).

object 1

object 2

object 1

object 3

object4

object 2 (b) The image segmentation results of the third frame in sequence 2.

(a) The image segmentation results of the third frame in sequence 1.

Fig. 7.

The image segmentation results for the frames of Fig. 6.

we have normalized the area feature by division with 28 and the other features by division with 24 . Furthermore, the decimal parts of the numbers have been omitted. The tracking quality is evaluated with the Euclidean and the Manhattan distances. Between (4, 1) and (3, j) (j = 1, , 4), for instance, are calculated Euclidean-distance results are DE (4, 1; 3, 1) = 1, DE (4, 1; 3, 2) = 11, DE (4, 1; 3, 3) = 3, DE (4, 1; 3, 4) = 9, and Manhattan-distance results are DM (4, 1; 3, 1) = 2, DM (4, 1; 3, 2) = 21, DM (4, 1; 3, 3) = 9, DM (4, 1; 3, 4) = 16. Here, the symbols DE (t, i; t 1, j) and DM (t, i; t 1, j) denote the Euclidean and Manhattan distances between (t, i) and (t 1, j), respectively. Decimal parts in the Euclidean distances are again omitted. Obviously, DM (4, 1; 3, 1) as well as DE (4, 1; 3, 1) are clearly minimum so that the moving object (4, 1) correctly matches with (3, 1). In the same way, all other objects in the fourth frame match with their counterparts in the third frame correctly (see Fig. 8(a)). We have conrmed the same positive results of the proposed algorithm for the tracking from the rst to the second and the second to the third frame (see Fig. 8(a)). There exists no qualitative difference between the use of the Euclidean distance and the Manhattan distance. Thus, the use of the simple Manhattan distance turned out to be sufcient for the tracking. We have also tested the algorithm for more complicated pictures including humans (sequence 2 in Fig. 6). Figure 8(b) shows results of calculation of the Manhattan distances between successive frames after the normalization. One can see that both objects correctly match with their counterparts in the preceding frame. For non-rigid objects,

Fig. 5.

Detailed description of the proposed object tracking algorithm.

is determined for each feature so that approximately equal weights results, is another possibility. The second possibility has the advantage that the division can be realized by a shifting operation in a hardware realization. Figure 5 shows a detailed description of the proposed tracking algorithm. III. S IMULATIONS This section presents simulated results of the object tracking algorithm. In Fig. 6, two video sequence each consisting of four sample frames with QVGA (320240) size can be seen. Note that we explicitly show object indices in the pictures. Figure 7 demonstrates image segmentation results of the third frame of each video sequence. Let us look at the third and fourth frames of the simpler sequence 1 for evaluation of the proposed object tracking algorithm. The extracted features of the objects are listed in Table I. In these tables,

3217

TABLE I E XTRACTED FEATURES FOR SAMPLE 1 object (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4) a 5 3 3 4 5 3 3 4 5 2 3 4 5 2 3 4 w 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 h 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 x 3 6 8 11 3 5 8 11 4 5 7 12 4 4 7 12 y 8 7 6 9 8 7 6 9 8 7 6 10 8 7 6 10 mx 0 -1 0 0 1 0 -1 1 0 -1 0 0 my 0 0 0 0 0 0 0 1 0 0 0 0 R 14 10 13 15 14 11 13 15 14 10 14 15 15 9 13 15 G 5 15 4 7 5 15 4 7 5 15 4 7 5 14 4 7 B 4 3 5 5 4 4 5 5 4 4 5 5 4 3 5 4

(a) The Manhattan distances between succesive frames for sample 1.

object (1,1) (1,2) (1,3) (1,4)


(2,1) (2,2) (2,3) (2,4)

object (2,1) (2,2) (2,3) (2,4)


(3,1) (3,2) (3,3) (3,4)

2 21 9 13

21 4 20 26

9 22 2 16

16 31 17 3

0 18 12 14

21 3 19 23

12 18 0 12

14 22 12 0

object (3,1) (3,2) (3,3) (3,4)


(4,1) (4,2) (4,3) (4,4)

1 21 10 16

16 4 19 25

11 21 2 14

13 25 12 2

(b) The Manhattan distances between succesive frames for sample 2.

object (1,1) (1,2)


(2,1) (2,2)

object (2,1) (2,2)


(3,1) (3,2)

object (3,1) (3,2)


(4,1) (4,2)

1 14

14 1

1 12

12 1

2 11

12 1

Fig. 8.

Results of the minimum Manhattan distance search.

too, the proposed algorithm based on the image segmentation and the pattern matching with the minimum Manhattan distance search was veried to work very well. IV. C ONCLUSIONS AND D ISCUSSION We have proposed an object tracking algorithm for video pictures, based on image segmentation and pattern matching of the segmented objects between frames in a simple feature space. Simulation results for frame sequences with moving balls and humans verify the suitability of the algorithm for reliable moving object tracking. We also have conrmed that the algorithm works very well for more complicated video pictures including rotating objects and occlusion of objects. In order to extract color features of segmented objects, we used the mean value of four boundary pixels. Thus, we cannot extract correct color features of an object that has gradation or texture. Nevertheless, the mean value turns out to sufciently represent the objects color features for the tracking purpose. A multicolored object would be segmented into several parts by the image segmentation algorithm. It would be recognized as a more complicated object through the identical movement of these parts. There maybe also the concern, that the linear motion estimation is too simple and may fail for objects moving in a complicated nonlinear way. However, if the movement is not extremely fast, the deviation from estimated positions between successive frames is so small, that correct tracking is reliably achieved. Furthermore, if mistracking occurred at some frame by reason of occlusion, newly appearing or disappearing objects, the proposed algorithm could recover correct tracking after a couple of frames. This stability characteristic of the algorithm results from the fact that the object matching is performed in feature space between all objects in successive frames. The relative simplicity of this tracking algorithm promises that an FPGA implementation is possible and already sufcient for real

time applications with a few moving objects. As noted in Sec. III, it is sufcient for the tracking to use the simple Manhattan distance. Thus, VLSI implementation of the algorithm is possible by using our developed architectures for image segmentation [7] and a fullyparallel associative memory with high-speed minimum Manhattan distance search [8], both of which have been already realized as VLSI circuits. ACKNOWLEDGMENT The work was supported in part by Grant-in-Aid for Encouragement of Young Scientists (No. 16700184) from the Ministry of Education, Culture, Sports, Science and Technology, Japanese Government. The authors would like to thank Z. Zhu and K. Yamaoka (Research Center for Nanodevices and Systems, Hiroshima University, Japan) for fruitful discussion on image segmentation and object tracking. R EFERENCES
[1] W. G. Kropatsch and H. Bischof, Digital Image Analysis, Springer, 2001. [2] G.L. Foresti, A real-time system for video surveillance of unattended outdoor environments, IEEE Trans. Circuits and Systems for Vid. Tech., Vol. 8, No. 6, pp. 697704, 1998. [3] C. Stauffer and W.E.L. Grimson, Learning patterns of activity using realtime tracking, IEEE Trans. PAMIL, Vol. 22, No. 8, pp. 747757, 2000. [4] H. Kimura and T. Shibata, Simple-architecture motion-detection analog V-chip based on quasi-two-dimensional processing, Ext. Abs. of the 2002 Int. Conf. on Solid State Devices and Materials (SSDM2002), pp. 240 241, 2002. [5] S. W. Seol et al., An automatic detection and tracking system of moving objects using double differential based motion estimation, Proc. of Int. Tech. Conf. Circ./Syst., Comput. and Comms. (ITC-CSCC2003), pp. 260 263, 2003. [6] T. Morimoto et al., Efcient Video-Picture Segmentation Algorithm for cell-network-based digital CMOS implementation, IEICE TRANS. INF. & SYST., Vol. E87-D, No. 2, pp.500503, 2004. [7] T. Morimoto et al., Digital low-power real-time video segmentation by region growing, Ext. Abs. of the 2004 Int. Conf. on Solid State Devices and Materials (SSDM2004), pp. 138139, 2004. [8] Y. Yano et al., Fully parallel nearest Manhattan-distance-search memory with large reference-pattern number, Ext. Abs. of the 2002 Int. Conf. on Solid State Devices and Materials (SSDM2002), pp. 254255, 2002.

3218

You might also like