You are on page 1of 4

Neural-fuzzy Approach for Content-based Retrieval of Digital Video

Siddhivinayak Kulkarni Department of Computer Science and Mathematics Nipissing University 100 College Drive, North Bay ON PIB 8L7 Canada Emaik siddhik@nipissingu.ca Abstract
As digital video databases become more and more pervasive, finding video in large databases becomes a major problem. Because of the nature of video (streamed objects), accessing the content of such databases is inherently a time-consuming operation. This paper proposes a novel neural-fizzy based approach for retrieving a specific video clip from a video database. Fuzzy logic is used for expressing queries in terms of natural language and neural network is designed to learn the meaning af these queries. The queries are designed based on the features such as colour and texture of shots, scenes and objects in video clips. Error back propagation algorithm is proposed to learn the meaning of queries in fizzy terms such as very similar, similar and some-what similar. Thepreliminary experiments were conducted on a small video database and different combinations of queries using colour and texture features along with visual video clip andachieved very promising results.
I

Keywords: Neural Network; Fuzzy Logic; Video Processing; Content-based Retrieval

1. INTRODUCTION
Searching in multimedia data is different in naNre than searching text based data. Such searches include searching by content or content based retrieval. Hence multimedia data requires different approach for indexing and retrieval compared with text based techniques. A well known content-based image retrieval system is QBIC [I]. It allows objects to be extracted semi -automatically, through interactive outlining or automatically based on optical flow. Cuts are also detected based on global motion using a technique which is insensitive to individual object motion. The QBIC uses a video hierarchy and only extracts 2D objects. The photobook system [2] extracts objects through either segmentation or detection. Segmentation is based on clustering and affine motion models. Object extraction through
CCECE 2004- CCGEI 2004, Niagara Falls, Mayimai 2004 0-7803-8253-6/04/%17.00 02004 IEEE

detection searches for regions of an image which fit a particular model. The problem with extraction through detection is that it is limited to the semantic models. 2D objects are extracted instead of 3D objects for efficiency. Smith and Chang [3] have proposed a system which uses colour sets and texture features to extract two-dimensional regions. Images can be queried by spatial relationships between regions [4]. Zhang et al. [5] present a uniform solution for content-based video retrieval and compression. Key-objects are extracted from video through motion segmentation. These key-objects are updated so that their motion descriptors are relative to the background. Several techniques are proposed to automatically segment digital video into scenes, shots and subshots based on colour histogram, motion, texture and shape feature [6, 71. The VideoQ system described by Chang et al. segments video based on global motion, and tracks objects based on colour, motion and edge information [8,91. Kobla et al. analyze a special problem for scene change detection that pose gradual transitions due to fades, which dissolves an other special effects edits which are usually found in videos [lo]. Netra-V includes a new spatio-temporal segmentation and object-tracking scheme, and a hierarchical object-based video representation model [I I]. Li et al. [12] proposed m the current GBIRD implementation, a video is first temporally segmented into scenes and the middle frame is selected as the keyframe in each scene. This paper proposes a novel fuzzy-neural based technique for retrieval of digital video from the database. Fuzzy data model is proposed for interpretation of queries in terms of natural expressions for colour and texture and neural network to learn the meaning of these queries. This paper is organized as follows: Section 2 gives an idea of feature extraction techniques for colour and texture, Section 3 describes the fuzzy interpretation of queries in natural expressions and use of neural network to learn the meaning of these queries are described in Section 4 experimental results in Section 5 , and finally the conclusion and future research are detailed in Section 6.

- 2235 -

2. FEATURE EXTRACTION
First each h e in the video xene are segmented, and then the different features of the frame will he extracted and store them as feature database. For each frame, the following features are stored: Colour: The distribution of colour is a useful feature for image representation. Colour distribution, which is best represented as a histogram of intensity values, is more appropriate as a global property which does not require knowledge of how an image is composed of different objects. So this technique works extremely well to extract global colour components from the images. A colour histogram technique is proposed for extracting the colours from the images. The colour histogram for an image is constructed by counting the number of pixels of each colour. The colour of any pixel may be represented in terms of the components of red, green and blue values. These histograms are invariant under translation and rotation about the view axis and change only under the change of angle of view, change in scale and occlusion. Let FS denote the set of features used to represent colour content, = (colour]. The feature representation set of colours is rep (colour) = {red, green, blue, white, black, yellow, orange, pink, purple, brown]. A n image histogram refers to the probability mass function of the image intensities. This is extended for colour images to capture the joint probabilities of the intensities of the three<olour channels. More formally, the colour histogram is defined by h,,, = N .prob{R = r, G = g, B = b}, where

3. FUZZY INTERPRETATION OF QUERIES


Queries in relational systems are exact match queries: the system is able to return exactly those tuples a user is precisely asking for and nothing more. To specify the resulting relation, conditions concerning known attributes of known relations can be formulated. The similarity can be computed on the basis of features of the images such as colour, texture etc. in mnst cases. Therefore some fuzzy values to interpret queries is proposed in this research. In some applications, fuzzy systems often perform better than traditional systems because of their capability to deal with non-linearity and uncertainty. One reason is that while traditional systems make precise decisions at every stage, fuzzy systems retain the infomtion about uncertainty as long as possible and only draw a crisp decision at the last stage. Another advantage is that linguistic rules, when used in fuzzy systems, would not only make tools more intuitive, hut also provide better understanding of the outcomes. A relationship is defined to express the distribution of the truth of the variable. Theoretically, a fuzzy set F is the universe of discourse X = (x} is the number in the range [0, a]. indicating the extent to which x has the attribute F. Thus, if x is the amount of content of the extracted h e , similaf may be considered as a particular value of the fuzzy variable type and each x is assigned a number in the range 0 to a, p p p e ( x : a ] , indicates the extent tn which x is ) that

R, G, B represent the threesolour channels and N is the number of pixels in an image. These RGB values are converted into Hue [0,3601, Saturation [0, 11 and Value [O, 11. The colour of the query object is matched with the mean colour of a frame in the database as follows:

[o,

considered to be content: p,,(x)

: [o, a ]is called the


4 [O,

membership function. When the membership function

c, = J(Lq- L,)2+ 4(Uq- U,)2 + 4(Vq - V,)2


where & is the weighted Euclidean colour distance in the CIE-LUV space, and the subscripts q and t refer to query and target, respectively. Texture: Tamura feature extraction technique is used for extracting the different texture features from key frames. These measures include coarseness, contrast and orientation of the textual content of a key frame. The distance matrix is simply Euclidean distance weighted along each texture feature with the variance along each channel. In this equation, a, t and o refer to coarseness, contrast and orientation
, respectively, and 0,O r ,(Torepresents to the variances in the corresponding features.

is normalized (i.e. a=1), then


and the fuzzy logic is normal.

p p p e ( x :X )

1 1

Refemng to Figure 1, the query to retrieve a video clips from the database is prepared in terms of natural language, such as some-what similar, similar and very similar type of some specific colour andlor texture. This approach is to make retrieval systems intelligent so that they can interpret human language. The general syntax for queries is as follows: Q u e y = QFormat [Domain] [Visual][Features] [Fuzzy Interpretationl[Simplel[Comhination] Domain = qportsl news1 commercial1 travel1 cartoon> Visual = qpecify video clip for similarity> Features = <colourl texture>

- 2236 -

Fuzzy Interpretation =<somewhat similar I similar 1 very similar> Simple =<single feature or visual> Combination =<Boolean operator AND I OR>

Table 1. Minimum and maximum values similar I some-what verysimilar I similar

4.1 Neural Network Architecture


Fig 1. Query interpretation The user will provide queries in terms of natural language such as very similar, similarand some-what similar of some specific colour and texture. In this model, the interpretation domain is a fuzzy set [0, 11. The ranges of the values used are [0.9, I ] for 'Very similar", [0.4, 0.51 for ''similar" and [0.15, 0.251 for "some-what similar".

fw

tijjm O M

p ( t )= similar
somewhutsimilar

The need for fuzzy querying systems has increased since the anival of multimedia data in the field of database management systems. The results of a query are ranked according to their degree of satisfaction. Fuzzy collection is the term used for the video collection with all its features in terms of membership scores. Fuzzy predicates are the imprecise terms used in the fuzzy query. Consider a query: "Find all the video clips which are very similar in colour AND some-what similar in texture to the visual example clip." In these queries the terms very similar and some-what similar are the attributes which have no exact interpretation and processing these predicates involves the use of fuzzy scores of the features concerned (colour, texture).

velysimilur

i f c ~ < 0 . 91 >
i f c ~ < 0 . 4 0.5>
Fig 2 Neural network architecture . Table 2 indicates all the parameters that were used to train the neural network and the R M S error obtained after training. The experiments were conducted by varying the number of hidden units and iterations. The optimum results were obtained with 6 hidden units and 1000 iterations. The training pairs were formed for each colour and type. There were 110 training pairs for each type; therefore the total training pairs obtained were

if CE <0.15

0.25 >

330.
Table 2. Neural network parameters for experiments q a Iterations RMS Hidden units I I I error S I 0.7 I 0.2 I 100 I 0.00793 . . .~ 5 1 0.7 1 0.2 1000 0.00441 6 1 0.8 I 0.3 1 1000 I 0.00208

1
I

~~

~~~

~~

~~

4. LEARNING OF QUERIES
To retrieve the required video clips from the video database, it is necessaty to learn the meaning of the query, which is in terms of colours and textures with three types such as very similar, similar and somewhat similar The supervised learning neural network is efficient to learn these features and content types. The error back propagation neural network is proposed to learn the meaning of those queries. This approach ovexomes the problem of re-training of the neural network if there is any change in database of video clips. This approach can be also extended for the real world database such as the database on the Internet. The ranges of minimum and maximum values for the three attributes are shown in the Table I.

5. EXPERIMENTAL RESULTS
To test the effectiveness of the developed system, the preliminary implementation of a prototype is kept simple and used a collection of video clips from World Wide Web (WWW) as our database. Experiments were conducted for different cornhinations of the queries in terms of colour and texture feature along with video clips. Figure 3 shows the results obtained by a composite query: Retrieve all video clips which contain similar texture and some-what similar colour specified by visual clip/image. The best video clip for suitable query appears first with the highest confidence factor. In figure 3, imagelclip #I32 contains similar whitehlue colour and some-what similar texture of sky and snow having the confidence factor as 0.8921. Other

- 2237 -

imageslclips appear in sequence with the confidence factor in descending order from left to right and from top to bottom Similarly the queries can be formed by different combinations of colour and texture along with visual example imageiclip.
Video Rehieval Results
Query ImageNideo

131 J. Smith and S-F. Chang, Automated image retrieval using colour and texture, Technical Report 414-95-20, Columbia University, July 1995. 141 C. Li, J. Smith, L. Bergman, and V. Castelli, Sequential processing for content-based retrieval of composite objects, Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases VI, January 1998. [51 H. Zhaog, J. Wang, and Y. Altunbasak, Contentbased video retrieval and compression: A unified solution, Proceedings of IEEE International Conference on ImageProcessing, pp. 13-16, 1997. [61 E. Ardizzone, M. Casia, V. Gesu, and C. Valenti, Content-based indexing of image and video databases by global and spatial features, Proceedings of International Conference on Pattern Recognition, pp. 140-144, 1996. [71 E. Ardizzone, M. Casia, and D. Molinelli, Motion and colour based video indexing and retrieval, Proceedings of International Conference on Pattern Recognition, pp. 135-139, 1996. 181 S . Chang, W. Chen, H. Meng, H. Sundaram, and D.Zhong, Video Q: An automated content-based video search system using visual cues, Proceedings of the Fifth ACM Infernational Multimedia Conference, November 1997. [9] D. Zhong and S . Chang, Video object model and segmentation for content-based video indexing, ISCAS, pp. 1492-1495, Vol. 2, 1997. [lo] V. Kobla, D. Doermann, and C. Faloutsos, Compressed domain video indexing techniques using DCT and motion vector information in MPEG video, Proceedings of SPIE Conference on Storage and Retrieval f o r Image and Video Databases V, pp. 200-211, 1997. [ I l l Y. Deng, D. Mukherjee, and B. Manjunath, Tetra-V: Toward an object-based video representation, Proceedings of SPIE Storage and Retrieval for Image and Video Databares VI, pp. 202-213, 1998.

F-O8IWyj&g CLp # 12

F = 0 . 8 0 9 1 m F - 0 . 7 9 7 8 k k ~ F=O,7390yj&g clip n 56 CIipn91 Chp678

Fig. 3. Video Retrieval Results for the Query: Similar Texture and Some-what Similar Texture

6. CONCLUSION AND FUTURE RESEARCH


7

The interesting approach of this research is to pose a query in terms of natural expression using fuzzy logic. Neural network is proposed to learn the meaning of these queries. Similarity based on low level colour and texture features of the video clips am used to retrieve a specific clip from the video database and obtained very promising results. The preliminaly results support ow basic ideas and the approach appears promising. This research will be extended for posing a query in terms of different objects in video clips and other features of a clip like motion, zoom etc.

References
[I] M. Flickner. et.al. H. Sawhnev. W. Niblack. J. Ashley, Q. Huang, B. Dom;M. Gorkani; J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: The QBIC system, IEEE Computer, pp. 23-32, September 1995. 121 A. Pentland, R. Picard, and S. Scarloff, Photobook: Tools for content-based manipulation of image databases, Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases II, pp. 3447, 1994.

1121 J. Wei, M. Drew, and Z. Li. Illumination


invariant video segmentation by hierarchical robust thresholding, In SPIE Electronic Imaging 98, volume 3312, pp. 388-201,1998.

You might also like