You are on page 1of 5

Web based Image Similarity Search

Vinay Menon, Uday Babbar and Ujjwal Dasgupta

Delhi College of Engineering, Delhi

Email :{ambatvinay@gmail.com, udaydce@gmail.com, ujjwaldasgupta@gmail.com}

Abstract—This paper proposes an implementation for a Two images can be perceived as similar because of
web based image similarity search using a robust image the presence of similar objects (e.g. a ball, a girl, a
matching algorithm and a user definable matching Monalisa portrait) or because of a similar look (same
criterion. Matching is done using image signatures. colors, shapes, textures). Recognition of objects and
Image signatures are generated by decomposing images hence content based image retrieval is extremely
using wavelet transforms. A database of image difficult. This algorithm therefore analyzes the image
signatures is generated by using it in conjunction with a
composition in terms of colors, shapes and textures.
web crawler. Search results are found by calculating the
Euclidian distance between the signatures of the subject Another problem in implementation of a search
and the database. Proposed architecture is time
engine is that comparing images in a large scale
efficient, robust and uses a rotation, and scale and
perspective invariant method.
collection is a processing power and bandwidth
intensive task. So an image signature /fingerprint
Keywords: Image matching, Similarity, Reverse image based solution has to be used. This algorithm should
search, image signature extract features from an image and use them to
calculate a signature. The signatures generation
I. INTRODUCTION
should be such that size, rotation, segmentation
insensitive matching is possible. These signature
As internet gets highly multimedia intensive there is
/fingerprints generation in addition to allowing for
an increased need for tools to manipulate large
these comparisons should be fast enough for a viable
collections of media and to make search for media
implementation.
more intuitive. Traditionally search engines use
manual captions, text in the related webpage, size and
dimension of images for image retrieval. This method II. IMPLEMENTATION
is highly subjective and may lead to unrelated search
The implementation structure consists of three main
results. The challenge is to bridge the gap between
blocks: the crawler, the signature generator and the
the physical characteristics of digital images (e.g. similarity calculator.
color, texture) that are used for comparison, and the
semantic meaning of the images that are used by A. WEB CRAWLER
humans to query the database. A focus of significant
research is an algorithmic process for determining the The web crawler feeds the signature calculator with
perceptual similarity of images. The comparison images from the internet which in turn populates the
algorithm has to judge differences in images in a database. This is implemented in PHP. Code snippet
similar way as a human would. This is easier said as is given below:
done, because of some special properties of the
human eye and brain. class Crawler {
protected $markup = '';
What makes humans perceive two images as being
public function __construct($uri) {
similar? This is a difficult question for many reasons.
$this->markup = $this->getMarkup($uri);
} { L* = 116*((Y/Yn)^(1/3)) with Y/Yn>0.008856
{ L* = 903.3*Y/Yn with Y/Yn<=0.008856
public function getMarkup($uri) { u* = 13*(L*)*(u'-u'n)
return file_get_contents($uri); v* = 13*(L*)*(v'-v'n)
}
public function get($type) { Where u'=4*X/(X+15*Y*+3*Z) and
$method = "_get_{$type}"; v'=9*Y/(X+15*Y+3*Z) and u'n and v'n have the
if (method_exists($this, $method)){ same definitions for u' and v' but applied to the white
return call_user_method($method, $this); point reference.
}
}
The CIE XYZ is itself defined from the RGB using
protected function _get_images() {
the following transformation:
if (!empty($this->markup)){
preg_match_all('/<img([^>]+)\/>/i', $this-
|X| |0.430574 0.341550 0.178325| |Red |
>markup, $images); |Y| = |0.222015 0.706655 0.071330| * |Green|
return !empty($images[1]) ? $images[1] : |Z| |0.020183 0.129553 0.939180| |Blue |
FALSE;
} The finger print or the signature vector is composed
} of three parts: a grayscale DCT part, a color DCT
part and an EDH part.
protected function _get_links() {
if (!empty($this->markup)){
The discrete cosine transform (DCT) helps separate
preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this-
the image into parts (or spectral sub-bands) of
>markup, $links);
differing importance (with respect to the image's
return !empty($links[1]) ? $links[1] : FALSE;
visual quality). The DCT transforms a signal from the
}
spatial domain into the frequency domain. A signal in
}
the frequency domain contains the same information
}
as that in the spatial domain. The order of values
obtained by applying the DCT is coincidentally from
B. SIGNATURE (FEATURE VECTOR) lowest to highest frequency. This feature and the
GENERATOR psychological observation that the human eye is less
sensitive to recognizing the higher-order frequencies
This is the crux of the implementation where all of leads to the possibility of compressing a spatial signal
the image processing is done. The fingerprint has to by transforming it to the frequency domain and
be calculated only once for each image and then can dropping high-order values and keeping low-order
ones.
be stored in a database for fast searching and
comparing.

Initially before the fingerprint is calculated, the


image is resized/scaled into a standard size. Since
color comparisons are difficult to calculate in most
color spaces especially in the popularly used RGB
color space, the image is transformed into the CIE
Luv color space. CIE Luv has components L*, u* and
v*. L* component defines the luminance, and u*, v*
define chrominance. CIE Luv is very used widely in
color differences, especially with additive colors. The
CIE Luv color space is defined from CIE XYZ.
The general equation for a 2D (N by M image) DCT
is defined by the following equation:
CIE XYZ -> CIE Lab
Where
Then a 2D-DCT of the two color components are
calculated and used for the color part of the
fingerprint. A two-dimensional DCT-II of an image
or a matrix is simply the one-dimensional DCT-II,
from above, performed along the rows and then along
The basic operation of the DCT is as follows: the columns (or vice versa). That is, the 2d DCT-II is
given by the formula (omitting normalization and
other scale factors):
• The input image is N by M;
• f(i,j) is the intensity of the pixel in row
i and column j;
• F(u,v) is the DCT coefficient in row k1
and column k2 of the DCT matrix.
• For most images, much of the signal energy Here only the first three of each are taken, since the
lies at low frequencies; these appear in the human eye is much more sensitive to luminosity than
upper left corner of the DCT. to color. This part of the fingerprint represents the
• The DCT input is an 8 by 8 array of color composition of the image with reduced spatial
integers. This array contains each pixel's resolution compared to the gray scale part.
gray scale level;
• 8 bit pixels have levels from 0 to 255. On the luminosity channel of the image an EDH
(Edge Direction Histogram) is calculated using a
The output array of DCT coefficients contains Sobel kernel. The Sobel kernel relies on central
integers; these can range from -1024 to 1023. differences, but gives greater weight to the central
pixels when averaging:
The 64 (8 x 8) DCT basis functions are illustrated

This is equivalent to first blurring the image using a 3


× 3 approximation to the Gaussian and then
calculating first derivatives. This is because
convolutions (and derivatives) are commutative and
associative. These kernels are designed to respond
maximally to edges running vertically and
A 2D-DCT is calculated over the L-channel horizontally relative to the pixel grid, one kernel for
(Luminosity) of the image. The first coefficient each of the two perpendicular orientations. The
stands for the DC value, or average luminosity of the kernels can be applied separately to the input image,
image. The next coefficients represent the higher to produce separate measurements of the gradient
order values with increasing frequency. A number of component in each orientation (call these Gx and Gy).
these coefficients (the first 10 coefficients) are taken These can then be combined together to find the
and normalized for the grayscale fingerprint part. absolute magnitude of the gradient at each point and
This part of the fingerprints represents the basic
composition of the image.
the orientation of that gradient. The gradient The smaller the distance value the more similar are
the images.
magnitude is given by:

The histogram consists of eight equal bins N, NE, E,


SE, S, SW, W and NW. This part of the fingerprint
represents the "flow directions" of the image.

Image shows result after applying Sobel operator:

C. SIMILARITY CHECKER

The similarity checker calculates nearness of


matching between two images by calculating a
distance value.

Identical images yield the same fingerprint, similar


images yield fingerprints similar to each other
(according to a distance function) and unequal III. EVALUATION
images yield totally different fingerprints. The
distance function compares the fingerprints by
weighting the coefficients and calculating the
Euclidean Distance.

The user can input various comparison


considerations for example, if the general brightness
of the image should not be considered when
comparing, the DC components weight can simply be
set to zero. For less detail and a more tolerant search
the higher coefficients can be made smaller or set to
zero. For grayscale searching the color components
weights are simply set to zero. For rotation or mirror
Image shows some results from the implementation.
invariant searching the components are shuffled
accordingly and compared again. This weight vector There exists no clearly defined benchmark for
can be used for a lot of tuning. assessing a proposed similarity measure on images.
Indeed, human notions of similarity may vary, and
The Euclidean distance between points p and q is different definitions are appropriate for different
the length of the line segment . In Cartesian tasks.
coordinates, if p = (p1, p2,..., pn) and q = (q1, q2,..., qn)
are two points in Euclidean n-space, then the distance The approach specified here has specific strengths
from p to q is given by: and weaknesses. Since this is a web based
implementation it entails large scale image matching
and speed is an
important criterion Scaling this implementation for a V. REFERENCES
full fledged search engine would require more
[1] Percentile Blobs for Image Similarity by Nicholas R. Howe,
improvements in the efficiency of the fingerprint
Cornell, University Department of Computer Science, Ithaca, NY
calculation which is central to this application. 14853

A more efficient retrieval system can be constructed [2] Image Similarity: A Genetic Algorithm Based Approach by R.
by implementing a Hierarchical Image classification C. Joshi, and Shashikala Tapaswi
using certain tagged images and extrapolating the [3]Color spaces FAQ
groups by using this application. http://www.ilkeratalay.com/colorspacesfaq.php

[4]Image Similarity: Thomas Moser

IV. CONCLUSION [5]Discrete Cosine Transform:


http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html
Here we have presented search engine capable of
[6] Hierarchical Image Classification, Stefan Kuthan, Allan
finding similar images using an algorithm for Hanbury, TU Vienna
measuring perceptual similarity. In this system a web
based image crawler is initially used in conjunction [7] Edge Detection, Bryan S. Morse, Brigham Young University
with the signature generator to create a database of
[8] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee,W.
image fingerprints. Fingerprints are constructed using Niblack, and D. Petkovic. Automatic and semi-automatic methods
processes that are fast and less CPU intensive. During for image annotation and retrieval in QBIC. In W. Niblack and R.
the search the fingerprint of the subject image is C. Jain, editors, Storage and Retrieval for Image and Video
Databases III. SPIE – The International Society for Optical
calculated and compared with that of the database.
Engineering, 1995.
Results of similarity were found by calculating the
Euclidean distances. This comparison is flexible i.e. [9] J. Smith, “Integrated Spatial and Feature Image Systems:
if desired; images can be compared independent of Retrieval, Compression and Analysis,” PhD thesis, Columbia
University, USA, February, 1997.
rotation or vertical and horizontal mirroring as well
as grayscale only, depending on which difference
function is taken.

You might also like