You are on page 1of 5

Proceedings of the

IEEE Intelligent Vehicles Symposium 2000


Dearborn (MI), USA October 3-5, 2000
Recognition of 3D Compressed Images and Its Traffic Monitoring Applications
Nicole S. Love, Ichiro Masaki*, Berthold K.P. Horn
Massachusetts Institute of Technology Artificial Intelligence Laboratories
545 Technology Square, Cambridge, MA 02 139
nsl@ai.mit.edu, bkph@ai.mit.edu
*Massachusetts Institute of Technology Microsystems Technology Laboratories
Intelligent Transportation Research Center
50 Vassar Street, Cambridge, MA 02139
masaki @ mi t .edu
Abstract
In a digital image network for trafJic monitoring a
large number of cameras are connected to control ten-
ters through a hierarchical network. Compressed image
data and recognition results are transmitted over the net-
work. with conventional approaches, each control ten-
ter receives compressed image data along with prelimi-
nary recognition results from low level control centers or
surveillance cameras. Each center needs to decompress
image data for further recognition processing, and if nec-
essary the center sends the compressed image data and
recognition results to the upper-level control centeer:
In order to increase the cost-efJiciency of the digital
image network, we propose eliminating the decompres-
sion required at each center by developing a recognition
method which works in the compressed domain. The main
stream of conventional image compression methods such
as Discrete Cosine Transfonn is based on spatial frequency
which makes it difJicult to carry out recognition processes
in the compressed domain. In contrast, we will compress
the image data by using attributes which are relevant both
for compression and recognition. Examples of the common
attributes are binary edge locations and the color i nf om-
tion surrounding the edge. This and other information is
retained in the compression domain to enable recognition
without decompression.
1 Introduction
Until recently, image analysis and image compression
have been studied independently. Over the past few years
the interest in combining the two has increased. Re-
searchers are beginning to see the benefits of analyzing
compressed images without decompression. A few of the
advantages of combining image analysis and image com-
pression techniques include a decrease in memory usage, a
decrease in processing time and an increased efficiency in
database retrieval.
Many applications rely on efficient image compression
schemes. Video conferencing, image databases, surveil-
lance, and image networks are examples of such applica-
tions. These applications require the compression of im-
ages for efficiency. In some cases, image quality, and/or
data that can be extracted from the images is important.
The need for image compression and analysis remains key
to these applications. This paper demonstrates recognition
of 3D compressed images illustrated on traffic monitoring
applications.
A traffic monitoring network which relies on image
compression schemes may consist of a hierarchical image
network as shown in Figure 1. This network utilizes real-
time image processing to count vehicles, and determines
vehicle speeds and vehicle density. These networks also
transmit image data to higher levels in the network for ac-
cident or traffic flow analysis. A traffic monitoring system
which saves on processing time is essential for efficient op-
eration of the system. The goal of this research is to de-
velop a 3D compression algorithm which allows recogni-
tion from compressed image data without decompression.
2 Recognition of Compressed Image without
Decompression
Three applications are being developed for recognition
of an image in the 3D compressed domain. These applica-
tions are vehicle counting, vehicle speed detection and ve-
hicle matching. Vehicle counting requires object detection
and tracking. Vehicle speed detection requires object mo-
0-7803-6363-91001$10.00 0 2000 IEEE 463
Figure 1 : Traffic Monitoring System Architecture
Acquisition Acquisition
t t
Feature Extraction Feature Extraction
tion estimation. Vehicle matching utilizes template match-
ing and scaling of the images. These applications, though
specific to traffic monitoring, demonstrate the capabilities
of the system and can be used on any system requiring
recognition of 3D compressed images.
Acquisition
t
Feature Extraction
\-
Figure 2: System Overview
3 Three-Dimensional Image Acquisition
The system consists of three modules: 3D image acqui-
sition, compression, and recognition. Figure 2 shows the
overview of the system. The 3D image acquisition module
receives three input images and produces the edge depth
map of the center image. The compression module uses the
edge depth map, edge map and the center image to com-
press the attributes of the image. The recognition unit uses
the 3D compressed data to determine the vehicle count, ve-
hicle speed or vehicle matching without decompressing the
image. The results can be stored into memory or be trans-
mitted over the network to higher level control centers.
The 3D compression algorithm begins with a 3D image
acquisition. 3D information is used to distinguish objects
in a scene. Conventional 2D compression algorithms are
not conducive to detection of partially occluded objects,
but the 3D information along with color allows the system
to detect partially occluded objects.
Stereo vision algorithms have used two or more cam-
eras. The two camera approach produces correspondence
Figure 3: Flow chart.
errors. Our system uses three cameras. The third camera
is used to reduce the number of missed correspondences.
Additionally, our system [ 11uses feature correlation rather
than area correlation as in Kanade's system [3]. Feature
correlation is faster due to the decrease in data and the
decrease in calculations in the algorithm. The 3D image
acquisition is based on a trinocular vision algorithm [2]
which produces an edge depth map of the center image.
Figure 3 is a flow chart of the algorithm. The images are
acquired simultaneously from all three cameras. The cam-
eras are equally spaced, with their optical axes aligned.
Figure 4 shows a sample of three images taken by the cam-
eras. The first step is to generate the vertical edge gradient
Figure 4: Sample input triple.
for each image. Edge points in the left and right images
are matched using the center camera to help eliminate false
464
correspondences.
using
The depth may be calculated directly for each disparity
f
4- 4
z=b-
where b is the length of the baseline (distance between left
and right camera), f is the focal length, and 4 and 4 are
the x-coordinates in the left and right images respectively.
Figure 6 displays a histogram of the number of edge points
versus depth that can be constructed after bin averaging.
The significant peaks correspond to objects in the original
image. The three peaks correspond to the sign on the left,
the vehicle in the closest lane and the parked vehicle which
is partially occluded.
4 Compression
The goal of the compression algorithm is two-fold; (I )
to compress the data as much as possible without los-
ing any of the relevant information, and (2) to provide
a representation of the data which is conducive to ob-
ject recognition. The decision was made to use a contour
based lossy compression method based on Mizukis algo-
rithm [8]. Mizuki proposed a 2D compression algorithm,
which has been extended to 3D in this research. The algo-
rithm produces a high compression ratio and enhances the
recognition component of the system.
Marshall [6] uses a contour based compression method
for shape recognition. Marshalls work assumes that a con-
tour with length larger than some threshold is an object.
This assumption creates a problem if the contour of an ob-
ject is disconnected. An object with 2 or more long con-
tours would be considered to be more than one object. The
differences between Marshalls algorithm and this com-
pression algorithm is the addition of three-dimensional
data and the allowance of broken contours. Although Mar-
shalls method has problems with overlapping objects and
objects consisting of several contours, his work demon-
strates the potential of a contour based compression do-
main used for shape recognition.
Many image processing schemes use the edge infor-
mation as a guide in object recognition, for this reason
a contour based algorithm was used [7]. The contours
provide a skeleton of the contents of the image which is
used for recognition of objects as well as compression.
The algorithm focuses on retaining information relevant to
recognition, such as contour, color, and distance attributes.
A flow chart of the steps are shown in Figure 5. A
description of the components are as follows:
Depth Map The depth map contains the distance of edge
pixels from the center camera. The method of deter-
mining the distance is described in Section 3.
Contour Coding The contours are determined by tracing
edges with similar color and distance information.
The contours are then coded by using the start loca-
tion and the directional codes for subsequent points in
the contour. Short contours are eliminated.
Meancoding The mean RGB value of pixels between
contours is coded.
Distance Coding The mean distance value of contour pix-
els is coded.
Color Extraction RGB values for pixels between con-
tours is determined and a line approximation is used
to represent the information.
Color Coding The endpoints of the line approximating
the RGB values for pixels between contours is coded.
Binary Block Matching The edge maps from two con-
secutive images are used to determine the motion vec-
tors of each n x n block.
Error Coding The reconstructed image from the motion
vectors is compared with the second image to produce
an error which is encoded.
. Motion
1 vectors --
center
U- !
RecOnstruct
1
Compressed
age
Concatenation
Figure 5: Compression Algorithm
The algorithm combines contour, color, and distance at-
tributes to produce an image coding system which can be
used for recognition.
5 Recognition
The transportation industry has always been concerned
with the efficient and safe movement of traffic. The data
gathered from traffic monitoring is used to make real-time
traffic control decisions which affect the movement and
465
safety of traffic. This research is geared towards high-
way monitoring to demonstrate the feasibility and bene-
fits of recognition of 3D compressed images. Current traf-
fic monitoring systems have problems with shadows and
overlapping vehicles [4], [ 5] . The 3D data will provide the
necessary information to eliminate the problems of shad-
ows and overlapping vehicles. Three applications (vehicle
counting, vehicle speed detection and vehicle matching)
will demonstrate the feasibility of the system. All three
applications require the detection of vehicles in the image.
Vehicle counting and vehicle speed detection require the
tracking of vehicles in subsequent scenes. Vehicle speed
detection utilizes motion estimates to determine the vehicle
speeds. The following sections discuss the significant com-
ponents for recognition: detection of vehicles in a scene,
tracking of vehicles in subsequent scenes, and motion esti-
mations of vehicles.
5.1 Detection of Vehicles
Vehicle counting, speed detection and vehicle match-
ing rely on the detection of vehicles in a scene. Detection
methods utilizing color, edge and depth maps still apply
in the 3D compressed domain. The 3D compressed im-
age retains these attributes which allows for object detec-
tion. Each vehiclein the scene consists of several contours.
In detecting the vehicle within the 3D compressed data, a
histogram of the number of edges at given distances can
be constructed and used to classify the vehicles. Figure 6
shows the histogram of the edges from the image used in
Figure 4. Based on the histogram the edges can be classi-
fied, distinguishing the objects in the scene. Figure 6 shows
an example of the detection of vehicles using only the dis-
tance and proximity information.
In vehicle matching, the color can be used in the detec-
tion of a vehicle. The probability of finding the same car in
different images can increase if the color is used as an in-
dicator. Database searches also rely on the color of objects
in the images. In the compression method, color attributes
are retained and can be used to detect vehicles. A search of
the compressed image database may be initiated for a par-
ticular color vehicle. The correct image(s) and the location
of the vehicle(s) in the images can be determined from the
color information. Figure 7 displays the result of searching
for a yellow car in the compressed image. Using both dis-
tance and color to detect vehicles in an image or database
is extremely useful for traffic monitoring applications.
5.2 'hacking Vehicles in Subsequent Images
Once a vehicle has been detected, vehicle counting and
speed detection require tracking of the vehicle through sub-
sequent images. Tracking of the vehicle can be determined
assuming only translational motion of the contours. The
f 160
a0
40 1
Figure 6: Histogram and detected objects
Figure 7: Detection of yellow car from compressed image
matching of contours is based on color, shape, and re-
stricted locations of the vehicle.
Groups of contours are assigned to a vehicle by the de-
tection component. Tracking the vehicle will consist of
matching the overall color of the vehicle or if necessary
the color distribution of the contours. Also matching the
shape or more precisely the grouping of the contours which
make up the car is important in ensuring the correct vehicle
has been located. Another key component is the method
to search for the corresponding vehicle. Specifically, the
difficulty arises in deciding where to search. Limiting the
search of vehicles and checking the most likely positions
first can increase the performance and decrease the pro-
cessing time of the system.
466
5.3 Speed Estimation
Once a vehicle has been detected and tracked through
subsequent images, an estimate of the speed can be deter-
mined using the three-dimensional data. The location of
the vehicle can be determined in the 3D domain from the
calculation of the depth map. The tracking and frame speed
is enough information to estimate the speed of the vehi-
cle through subsequent images. To estimate the speed, the
x,y, and z component of the average velocity is calculated
and the magnitude of the average velocity is taken as the
estimated speed of the vehicle. All velocity components
follow this basic form:
where Xi is the x-coordinate in the real world of a point in
the izh frame, j is the difference between the two frames
used to calculate the average speed, and F, is the frame
rate. The magnitude of the velocity is
The speed of the vehicle will be calculated by taking the
average of all points with known estimated speed for the
particular vehicle. The motion vectors can be used to esti-
mate the speed once the vehicles have been detected.
6 Conclusion
This research focuses on recognition of 3D compressed
images without decompression. Traffic monitoring ap-
plications are developed to demonstrate the benefits of
combining image processing and image compression tech-
niques to produce an efficient system which can be used
for image networking systems and dynamic route guid-
ance. Any applications requiring the transmission, storage
and processing of images can benefit in reduced usage of
memory and processing time by combining processing and
compression. The combination of compression and pro-
cessing is a natural extension of image compression, utiliz-
ing its attributes and increasing its benefits. The increasing
interest in this combination may lead to new image com-
pression standards which will focus on the capability of
recognition without decompression as well as compression
performance.
References
[ 11J. Bergendahl, A Computationally Efficient Stereo Vision
Algorithmfor AdaptiveCruise Control, MIT Master's The-
sis, May 1997.
[2] J. Bergendahl, 1. Masaki, B.K.P. Hom, Three-camerastereo
vision for intelligent transportation systems, Proceedings
of SPl Es Photonics East '96 Symposium, Boston, MA,
November 18-22, 1996.
467
U1
[41
T. Kanade, A Stereo Machinefor Video-RateDenseDepth
Mapping and Its NewApplications, Proc. ARPA l mg e Un-
derstanding Workshop, pp. 805-814, PalmSprings, 1996.
N. Kehtamavaz, C. Huang, T. Urbanik, Video ImageSens-
ing for a Smart Controller at Diamond Interchanges, Pro-
ceedings of the 1995 Annual Meeting of ITS America, pp.
447-451, March1995.
[5] J . Malik, S. Russell, J. Weber, T. Huang, and D. Koller. A
MachineVision Based Surveillance Systemfor California
Roads, PATHproject MOU-83 Final Report, 1995.
[6] S . Marshall. Application of ImageContours to Three As-
pects of ImageProcessing: Compression, ShapeRecog-
nition andStereopsis, IEE Proceedings-I Communications
Speech & Vision, Vol. 139, No. 1, pp. 1-8, Feb. 1992.
[7] I. Masaki. Industrial Vision Systems Based onApplication-
Specific IC Chips, IEICE Transactions, Vol. E 74, NO. 6,
June1991.
[8] M. M. Mizuki, EdgeBased Video ImageCompression for
Low Bit RateApplications, b4.S.E.E Thesis, MIT, Cam-
bridge, September 1996.

You might also like