Image Retrieval

Statistical Correlation Analysis in Image Retrieval
Mingjing Li*, Zheng Chen, Hong-Jiang Zhang

Microsoft Research China
49 Zhichun Road, Beijing 100080, China
{mjli, zhengc, hjzhang}@microsoft.com
Abstract
A statistical correlation model for image retrieval is proposed. This model captures
the semantic relationships among images in a database from simple statistics of user-
provided relevance feedback information. It is applied in the post-processing of
image retrieval results such that more semantically related images are returned to the
user. The algorithm is easy to implement and can be efficiently integrated into an
image retrieval system to help improve the retrieval performance. Preliminary
experimental results on a database of 100,000 images show that the proposed model
could improve image retrieval performance for both content-based and text-based
queries.
Keywords
Image retrieval, relevance feedback, correlation analysis, statistical learning, and

knowledge accumulation
1. Introduction
As the amount of digital image data available on the Internet and in digital libraries is
rapidly growing, there is a great need for efficient image indexing and access tools in
order to fully utilize this massive digital resource. Image retrieval is a research area
dedicated to address this issue and substantial research efforts have been made.
However, by and large, the earlier image retrieval systems have all taken keyword or
text-based approaches for indexing and retrieval of image data. Because image
annotation is a tedious process, it is practically impossible to annotate all images on
the Internet. Furthermore, due to the multiplicity of contents in a single image and the
subjectivity of human perception and understanding, it is also difficult to make
exactly the same annotations to the same image by different users. To address those
limitations, content-based image retrieval (CBIR) approaches have been studied in the
last decade [1, 2, 3, 4, 5]. These approaches work with descriptions based on
properties that are inherent in the images themselves such as color, texture, and shape
and utilize them for retrieval purposes. Since visual features are automatically
extracted from images, automated indexing of image databases becomes possible.
However, despite the many research efforts, the retrieval accuracy of today’s CBIR
algorithms is still limited and often worse than keyword based approaches. The
problem stems from the fact that visual similarity measures, such as color histograms,
in general do not necessarily match perceptional semantics and subjectivity of images.
In addition, each type of image features tends to capture only one of many aspects of
*
Author to whom all correspondence should be addressed.
image similarity and it is difficult to require a user to specify clearly which aspect
exactly or what combination of these aspects he/she wants to apply in defining a
query. To address these problems, interactive relevance feedback techniques have
been proposed [6, 7, 8, 9, 10, 11, 12]. The idea is that we should incorporate human
perception subjectivity into the retrieval process and provide users opportunities to
evaluate retrieval results, and automatically refine queries on the basis of those
evaluations. Lately, this research topic has become the most challenging one in CBIR
research.
In general, relevance feedback process in CBIR is as following. For a given query,

the CBIR system first retrieves a list of ranked images according to a predefined
similarity metric, often defined by the distance between query vector and feature
vectors of images in a database. Then, the user selects a set of positive and/or
negative examples from the retrieved images, and the system will refine the query and
retrieve a new list of images. Hence, the key issue in relevance feedback approaches
is how to incorporate positive and negative examples in query and/or the similarity
refinement.
The early relevant feedback schemes for CBIR have been mainly adopted from text
document retrieval research and can be classified into two approaches: query point
movement (query refinement) and re-weighting (similarity measure refinement).
Both have been built based upon the vector model in information retrieval theory [13,
14]. Recently, more computationally robust methods that perform global optimization
have been proposed. The MindReader retrieval system formulates a minimization
problem on the parameter estimating process [9]. It allows for correlations between
attributes in addition to different weights on each component.
However, as presented above, while all the approaches adapted from text document
retrieval do improve the performance of CBIR, there are severe limitations: even with
feedback, it is still difficult to capture high level semantics of images when only low-
level image features are used in queries. The inherent problem with these approaches
is that the low-level features are often not as powerful in representing complete
semantic content of images as keywords in representing text documents. In other
words, applying the relevance feedback approaches used in text document retrieval
technologies to low-level feature based image retrieval will not be as successful as in
text document retrieval. Using low-level features alone is not effective in
representing users’ feedbacks and in describing their intentions. Furthermore, in these
algorithms, the potentially captured semantic in the relevance feedback processes in
one query session is not memorized to continuously improve the retrieval
performance of a system. To overcome these limitations, another school of ideas is to
use learning approaches in incorporating semantics in relevance feedback [15, 16, 17,
18].
The PicHunter framework further extended the relevance feedback and learning idea
with a Bayesian approach [17]. With an explicit model of what users would do, given
what target image they want, PicHunter uses Bayesian rule to predict what is the
target they want, given their actions. This is done via a probability distribution over
possible image targets, rather than refining a query. To achieve this, an entropy-
minimizing display algorithm is developed that attempts to maximize the information
obtained from a user at each iteration of the search. Also, this proposed framework
makes use of hidden annotation rather than a possibly inaccurate and inconsistent
annotation structure that the user must learn and make queries in. However, this
could be a disadvantage as well since it excludes the possibility of benefiting from
good annotations, which may lead to a very slow convergence.
The information-embedding framework [16] attempted to embed semantic

information into CBIR processes through relevance feedback using a semantic
correlation matrix and low-level feature distances. In this framework, semantic
relevance between image clusters is learnt from user’s feedback and used to improve
the retrieval performance. In other word, the framework maintains the strengths of
feature-based image retrieval while incorporating learning and annotation in the
relevance feedback processes. Experiments have shown that this new framework is
effective not only in improving retrieval performance in a given query session, but
also in utilizing the knowledge learnt from previous queries to reduce the number of
iterations in the following queries. However, it is complex in terms of computation
and implementation, thus, is difficult to incorporate into practical image retrieval
systems.
Motivated by the work of statistical language modeling [19] and the link structure
analysis in web page search [20, 21], we propose a statistical correlation model that is
able to accumulate and memorize the semantic knowledge learnt from the relevance
feedback information of previous queries. We have also developed an effective
algorithm to apply this model in image retrieval so as to help yield better results for
future queries. This model simply estimates the probability of how likely two images
are semantically similar to each other based on the co-occurrence frequency that both
images are labeled as positive examples during a query / feedback session. It can be
trained from the users’ relevance feedback log, and dynamically updated during the
image retrieval process. The algorithm is so simple that it can be easily incorporated
into an image retrieval system.
Preliminary versions of this paper appeared in the proceedings of the 3rd Intl
Workshop on Multimedia Information Retrieval (MIR 2001) [22] and presented as a
keynote at the 1st Intl Workshop on Pattern Recognition in Information Systems
(PRIS 2001).
The remainder of this paper is organized as follows. In Section 2, the definition of the
correlation model is introduced. In Section 3, the training algorithms of the model are
described. In Section 4, the image ranking schemes based on the correlation model
are explained. Preliminary experimental results on a database of 100,000 images are
presented in Section 5. Finally, concluding remarks are given in Section 6.
2. Definition of Statistical Correlation Model
The main idea behind the proposed model is the assumption that two images represent
similar semantics if they are jointly labeled as relevant to the same query in a
relevance feedback phase. Accordingly, the model estimates the semantic correlation
between two images based on the number of search sessions in which both images are
relevant examples. A search session starts with a query phase, and is possibly
followed by one or more feedback phases. For simplicity, the number of times that
two images are co-relevant is referred to as bigram frequency, while that an image is
relevant is referred to as unigram frequency. The maximum value of all unigram and
bigram frequencies is referred to as maximum frequency. Intuitively, the larger the
bigram frequency is, the more likely that these two images are semantically similar to
each other, so the higher the semantic correlation between them. Ideally, the
correlation strength might be defined as the ratio between the bigram frequency and
the total number of search sessions. In practice, however, there are many images in
the database, and users are usually reluctant to provide feedback information.
Therefore, the bigram frequency is very small with respect to the number of queries.
Here, we define the semantic correlation between two images as the ratio between the
bigram frequency and the maximum frequency. Since the definition of bigram
frequency is symmetric, the semantic correlation is also symmetric. The self-
correlation, i.e., the correlation between an image and itself, is defined in a similar
way, except that the bigram frequency is changed with the unigram frequency of this
image. By definition, the correlation strength is within the interval between 0 and 1.
To be specific,
0 ≤ R( I , J ) ≤ 1 ,
R( I , J ) = R( J , I ) ,
R( I , J ) = U ( I ) M , if I = J ,
R( I , J ) = B( I , J ) M , if I ≠ J ,
where I , J are two images, B ( I , J ) is their bigram frequency, U ( I ) is the unigram
frequency of image I , M is the maximum frequency, R ( I , J ) is the semantic
correlation strength between image I and J .
Because of the model’s symmetry, a triangular matrix is sufficient to keep all

correlation information. All items with zero correlation are excluded from the model
since they do not convey any information. In order to further reduce the model size,
the items with correlation value below a certain threshold may also be removed. This
is called pruning. Therefore, the representation of the correlation model is highly
efficient.
3. Training Algorithms
The proposed correlation model is solely determined by the unigram and bigram
frequencies of images in the database. An intuitive training method is to obtain these
frequencies from the statistics of user-provided feedback information collected in the
user log. Let A denote the query-image adjacency matrix. Its (i, j )th entry is equal
to 1 if the j th image is relevant to the i th query, and is equal to 0 otherwise. Then the
co-relevant matrix AT A contains all necessary information. Its diagonal entries are
the unigram frequencies, while other entries are the bigram ones. This method is
pretty simple, and can be used in the dynamic update of the correlation model during
image retrieval process.
However, we encountered two problems in practice. One is how to process the

irrelevant examples, which also provide important information. Because of the
diversity of search intentions, an image that is relevant to one user’s intention may be
marked as irrelevant by another user, even if their queries are the same. The
correlation model should reflect the common sense of many different users. Another
problem is the data sparseness. Because of large database and limited feedbacks, it is
not easy to collect sufficient training data. In order to address these two problems, the
calculation of unigram and bigram frequencies is a little complicated.
In our solution, the definition of unigram and bigram frequency is extended to take
account of irrelevant images. For a specific search session, we assume a positive
correlation between two positive (relevant) examples, and the corresponding bigram
frequency is increased. We assume a negative correlation between a positive example
and a negative (irrelevant) example, and their bigram frequency is decreased.
However, we do not assume any correlation between two negative examples, because
they may be irrelevant to the user’s query in many different ways. Accordingly, the
unigram frequency of a positive example is increased, while that of a negative
example is decreased. The non-feedback images are not automatically treated as
negative examples in our proposed model. Therefore, these images are excluded from
the calculation of unigram and bigram frequencies.
In order to overcome the problem of data sparseness, the feedbacks of search sessions
with the same query, either a text query or an image example, are grouped together
such that feedback images in different sessions may obtain correlation information.
Within each group of search sessions with the same query, the local unigram
frequency of each image, which is referred to as unigram count, is calculated at first.
Based on these counts, the global unigram and bigram frequencies are updated.
The unigram counts in a group are calculated as follows. Initially, C ( I ) is set to 0,

where C ( I ) is the unigram count of image I . After that, C ( I ) is iteratively updated
for every session in this group: C ( I ) = C ( I ) + 1 , if image I is marked as positive in a
session; C ( I ) = C ( I ) − 1 , if image I is marked as negative in a session; C ( I ) is
unchanged otherwise. This process is repeated for every image in the database.
The unigram frequencies are updated as:

U (I ) = U (I ) + C(I ) ,
while the bigram frequencies of image pairs are updated as:
B ( I , J ) = B ( I , J ) + min{C ( I ), C ( J )} , if C ( I ) > 0, C ( J ) > 0,
B ( I , J ) = B ( I , J ) − min{C ( I ),−C ( J )} , if C ( I ) > 0, C ( J ) < 0,
B ( I , J ) = B ( I , J ) − min{−C ( I ), C ( J )} , if C ( I ) < 0, C ( J ) > 0,
B ( I , J ) = B ( I , J ) , otherwise.
In this way, the symmetry of bigram frequency is reserved.
The procedure for training the correlation model is summarized as follows:

(1) Initializing all unigram and bigram frequencies to zero;
(2) Clustering search sessions with the same query into groups;
(3) Calculating the local unigram counts within a group;
(4) Updating the global unigram frequencies;
(5) Updating the global bigram frequencies;
(6) Repeating step 3, step 4 and step 5 for all session groups;
(7) Setting all negative unigram and bigram frequencies to zero;
(8) Calculating the correlation strengths according to the definition.
4. Image Ranking Schemes

In a feature-based image retrieval system, the images returned by the system are
consistent in the feature space, but are not necessarily consistent semantically. As a
result, some retrieved images are often not relevant to the user’s intention at all. The
proposed correlation model may be used to impose some semantic constraints to the
retrieved results such that more semantically consistent images are presented to the
user; hence, the retrieval accuracy of an image retrieval system is improved.
The basic idea is to reorder the retrieved images based on the correlation model. It
comes from the following observations. Given a query, the similarity between the
query and an image in the database is measured based on their feature vectors. If the
user provides any relevance feedback, the similarity measure is refined accordingly.
Images with the highest similarities are returned as the retrieval results. Among these
images, there are more or less images that are relevant to the query. In general, the
retrieval precision declines as the number of images in consideration increases. This
implies that the feature-based similarity is also a measure of relevance although it is
often not good enough. On the other hand, the retrieved images exhibit different
relationships. Since relevant images convey semantically similar content with respect
to the query, it is likely that previous users have already judged them as co-relevant
through relevance feedback. Therefore, the correlation strength between two relevant
images is expected to be high. In contrast, as irrelevant images may be semantically
different from the query in many different aspects, it is unlikely that they have been
jointly labeled as relevant. Thus, the correlation strength between two irrelevant
images is expected to be low. Similarly, the correlation between a relevant image and
an irrelevant one is also expected to be low. Therefore, it is reasonable to assume that
images having strong correlations with the top-ranked images are likely to be relevant
to the query, even if their similarity scores defined in the feature space are low.
With each retrieved image, we associate a non-negative relevance score, which can be
treated as semantic similarity, and is initialized to its feature-based similarity. Then,
we make use of the relationship between images to iteratively update the scores in the
following way. Relevance scores are propagated into other images via the correlation
model, and each image receives a refined score, which is the sum of all relevance
scores, with the sum weighted by the correlation strength between this image and
others in the retrieved list. The refined relevance scores are further propagated into
others. This process repeats until the properly normalized scores converge to some
equilibrium values.
Suppose there are n images returned by the system, denoted I1 , I 2 , L , I n , which are
ranked in the descending order of their similarities. Let P denote the vector of
relevance scores, and W be a n × n matrix, whose (i, j )th entry is equal to the
correlation strength between image I i and I j . The iterative refinement of relevance
scores is equivalent to P' = λk W k P with k increasing without bound, where λk is a
normalization factor, and P' is the vector of refined scores. As W is a symmetric
matrix and has only non-negative entries, it can be proved that the unit vector in the
direction of P' converges to the principal eigenvector of W , which corresponds to the
largest eigen value of W and has only non-negative entries [20]. This leads to a
possible image ranking scheme. That is, images are re-ranked based on the
corresponding coordinates of the principle eigenvector of W .
This ranking method does improve the image retrieval precision in our experiments.
However, it is not reliable enough. Unlike the link structure analysis in web page
search [20, 21], the correlation model is trained using the limited feedback
information available. Thus, it may be not well trained and be inaccurate in some
sense. Thus, we cannot solely depend on it to re-rank images. Moreover, when the
number of retrieved images is large, the extraction of principal eigenvector becomes
computationally inefficient, and images relevant to other semantics might dominate
the coordinates of the principle eigenvector. Therefore, a more reasonable method is
to calculate the relevance scores in an efficient way and combine them with the
feature-based similarities in producing the final ranking.
Our ranking scheme is as follows. For image I j , its relevance score p j is initialized
to its similarity s j , and is iteratively updated for a fixed number of k times according
to the following equation:
m m
p j = ∑ pi × rij ∑ p , m ≤ n,
i j = 1,2, L n ,
i =1 i =1
where rij is the correlation strength between image I i and I j , i.e., rij = R( I i , I j ) . In
this equation, only the relevance scores of top m images are propagated to others.
Then the final ranking score of image I j is the weighted sum of relevance score p j
and similarity s j :
S j = w × s j + (1 − w) × p j , 0 ≤ w ≤ 1 ,
where w is the semantic weight.
5. Experiments
We have implemented the correlation model and integrated it with an image search
system [23], which provides the functionalities of keyword based image search, query
by image example, and relevance feedback. In this system, the image database has
been greatly expanded, which contains about 100,000 images collected from more
than 2,000 representative websites. These images cover a variety of categories, such
as “animals”, “arts”, “nature”, etc. Their high-level textual features and low-level
visual features are extracted from the web pages containing the images and the image
themselves respectively [24]. The following six low-level features are used in this
system: color histogram in HSV space with quantization 256, first and second color
moments in Lab space, color coherence vector in LUV space with quantization 64,
MRSAR texture feature, Tamura coarseness feature, and Tamura directionality.
The correlation model is trained using users’ search and feedback data collected in the
user log. After internal use for months, about 3,000 queries with relevance feedbacks
were collected.
Two experiments were conducted to evaluate the proposed method: one is text-based
image retrieval; another is pure content-based retrieval. For the former, we chose 20
text queries. These queries are the following keywords: car, flower, tree, cat,
submarine, mars, spring, galaxy, movie star, potato, ship, space, tomb raider, woman,
mountain, Clinton, Jordan, angel, dog, and summer. After that, we asked two subjects
to perform the image search experiments. Each one of them was required to search
for images with every query twice and label all relevant and irrelevant images within
the top 200 results returned by the system, according to his/her own subjective
judgment. First, there was no feedback; second, three images were selected as either
positive or negative examples. All the information was stored in the log, from which
the ground-truth is extracted automatically. For CBIR, 10 image examples were
selected. So far, only one subject performed the query by example experiments. The
ground-truth is obtained by repeatedly conducting relevance feedback until no more
images relevant to the query can be retrieved.
Based on the queries and the ground-truth, the performance evaluation is conducted
automatically by setting different semantic weights. In the experiments, the number
of iterations to refine the relevance scores is set to 5 ( k = 5 ), while the number of
images to propagate relevance scores is set to 30 ( m = 30 ). Because of the
subjectivity of relevance judgment, the image retrieval precision is calculated for each
subject separately, and is averaged finally. The precision is defined as the percentage
of relevant images in the retrieved list.
The experimental results for CBIR are presented in Figure 1, while that for text-based
retrieval in Figure 2, where the horizontal axis is the number of top images in
consideration; the vertical axis is the corresponding retrieval precision; w is the
semantic weight. It is not surprising that the performance of text-based retrieval is
much higher than that of CBIR. In both cases, the proposed correlation model
significantly improves the retrieval precision. For CBIR, the precision is improved
from 10% to 41% for top 10 images, while from 4.6% to 18.5% for top 100 images.
In the experiments, the iterative refinement of relevance scores converges quickly. A

small value of k , such as 4 or 5, will produce stable retrieval results. The selection of
m does not seem to be very critical. The retrieval precision is almost the same when
m = 20 ~ 60 , but it declines when m > 100 . In general, higher semantic weight yields
better precision. However, the best result is not obtained when w = 1 .
6. Conclusion
A statistical correlation model is proposed to improve the retrieval accuracy of image

retrieval systems through re-ranking the top images. The training process is very
simple. The model may be built and updated from the statistics of users’ feedback
information. The representation of the model is efficient. Because of its symmetry, a
triangular matrix is sufficient to store all correlation information. The size of the
model may also be effectively controlled by proper cutoff. The additional
computation is minor when the model is applied in image retrieval. All of these
advantages make the proposed model a good choice in practical image retrieval
systems.
Acknowledgments
We thank Fang Qian, Xiaoxin Yin and Lei Zhang for helping perform the image
search experiments.
References
1. M. Flickner, H. Sawhney, W. Niblack, et al, Query by Image and Video Content:

The QBIC system, IEEE Computer Magazine 28 (9) (1995) 23-32.
2. B. Furht, S. W. Smoliar, H. J. Zhang, Image and Video Processing in Multimedia
Systems, Kluwer Academic Publishers, 1995.
3. J. R. Smith, S. F. Chang, VisualSEEK: A Fully Automated Content-Based Image
Query System, Proc. 4th ACM Multimedia Conf., Boston, USA, 1996, pp. 87-98.
4. J. Huang, S. R. Kumar, M. Mitra, Combining Supervised Learning with Color
Correlograms for Content-Based Image Retrieval, Proc. 5th ACM Multimedia
Conf., Seattle, USA, 1997, pp. 325-334.
5. W. Y. Ma, B. S. Manjunath, NeTra: A Toolbox for Navigating Large Image
Databases, Proc. IEEE Int. Conf. on Image Processing (Vol. I), Washington, USA,
1997, pp. 568-571.
6. I. J. Cox, M. L. Miller, S. M. Omohundro, P. N. Yianilos, PicHunter: Bayesian
Relevance Feedback for Image Retrieval, Proc. 13th Int. Conf. on Pattern
Recognition, Vienna, Austria, 1996, pp. 361-369.
7. Y. Rui, T. S. Huang, and S. Mehrotra, Content-Based Image Retrieval with
Relevance Feedback in MARS, Proc. IEEE Int. Conf. on Image Processing (Vol.
II), Washington, USA, 1997, pp. 815-818.
8. S. Sclaroff, L. Taycher and M. La Cascia, ImageRover: A Content-Based Image
Browser for the World Wide Web, Proc. IEEE Workshop on Content-Based
Access of Image and Video Libraries, Puerto Rico, 1997, pp. 2-9.
9. Y. Ishikawa, R. Subramanya, C. Faloutsos, MindReader: Query Databases
Through Multiple Examples, Proc. 24th Int. Conf. on Very Large Databases, New
York, USA, 1998, pp. 218-227.
10. M. E. J. Wood, N. W. Campbell, B. T. Thomas, Iterative Refinement by
Relevance Feedback in Content-Based Digital Image Retrieval, Proc. 6th ACM
Multimedia Conf., Bristol, United Kingdom, 1998, pp. 13-20.
11. Y. Rui and T. S. Huang, A Novel Relevance Feedback Technique in Image
Retrieval, Proc. 7th ACM Multimedia Conf., Orlando, USA, 1999, pp. 67-70.
12. N. Vasconcelos, A. Lippman, Learning from User Feedback in Image Retrieval
Systems, Proc. Neural Information Processing Systems 12, Denver, USA, 1999,
pp. 977-983.
13. J. J. Rocchio, Relevance Feedback in Information Retrieval, in The SMART
Retrieval System – Experiments in Automatic Document Processing, Prentice
Hall, 1971, pp. 313-323.
14. C. Buckley, G. Salton, Optimization of Relevance Feedback Weights, Proc. 18th
Annual Intl ACM SIGIR Conf., Seattle, USA, 1995, pp. 351-357.
15. T. P. Minka, R. W. Picard, Interactive Learning with a “Society of Models”, Proc.
IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, USA,
1996, pp. 447-452.
16. C. S. Lee, W. Y. Ma and H. J. Zhang, Information Embedding Based on User’s
Relevance Feedback for Image Retrieval, Proc. SPIE Vol. 3846 (Multimedia
Storage and Archiving Systems IV), Boston, USA, 1999, pp. 294-304.
17. I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, P. N. Yianilos, The
Bayesian Image Retrieval System, PicHunter: Theory, Implementation and
Psychophysical Experiments, IEEE Trans. Image Processing 9 (1) (2000) 20-37.
18. Y. Lu, C. Hu, X. Zhu, H. J. Zhang, Q. Yang, A Unified Framework for Semantics
and Feature Based Relevance Feedback in Image Retrieval Systems, Proc. 8th
ACM Multimedia Conf., Los Angeles, USA, 2000, pp. 31-38.
19. P. R. Clarkson, R. Rosenfeld, Statistical Language Modeling Using the CMU-
Cambridge Toolkit, Proc. Eurospeech 97, Rhodes, Greece, 1997, pp. 2707-2710.
20. J. M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, Journal of
ACM 46 (5) (1999) pp. 604-632.
21. R. Lempel and A. Soffer, PicASHOW: Pictorial Authority Search by Hyperlinks
on the Web, Proc. 10th Intl Word Wide Web Conf., Hong Kong, China, 2001, pp.
438-448.
22. M. Li, Z. Chen, W. Liu, H. J. Zhang, A Statistical Correlation Model for Image
Retrieval, Proc. 3rd Intl Workshop on Multimedia Information Retrieval, Ottawa,
Canada, 2001, pp. 42-45.
23. H. J. Zhang, W. Liu, C. Hu, iFind – A System for Semantics and Feature Based
Image Retrieval over Internet, Proc. 8th ACM Multimedia Conf., Los Angeles,
USA, 2000, pp. 477-478.
24. Z. Chen, W. Liu, F. Zhang, M. Li, H. J. Zhang, Web Mining for Web Image
Retrieval, Journal of the American Society for Information Science and
Technology 52 (10) (2001) pp. 831-839.

Q
R
L Z
V
L
F Z
H
U
3 Z

1XPEHURILPDJHV
Figure 1. Precision vs. scope curve for content-based image retrieval

Q
R
L Z
V
L
F Z
H
U
3 Z

1XPEHURILPDJHV
Figure 2. Precision vs. scope curve for keyword-based image retrieval

About the Author – Mingjing Li received his B.S. in electrical engineering from
University of Science and Technology of China in 1989, and Ph.D. in Pattern
Recognition from Institute of Automation, Chinese Academy of Sciences in 1995. He
joined Microsoft Research China in July 1999. His research interests include
handwriting recognition, statistical language modeling, search engine and image
retrieval.
About the Author – Zheng Chen received his B.S., and Ph.D. degrees in computer
science from Tsinghua University, China, in 1994 and 1999, respectively. He joined
Microsoft Research China in March 1999. His research interests include speech
recognition, natural language processing, information retrieval, multimedia
information retrieval, personal information management, and artificial intelligence.
About the Author – Hong-Jiang Zhang received his B.S. from Zhengzhou University,
China in 1982, and Ph.D. from the Technical University of Denmark in 1991, both in
electrical engineering. His research interests include video and image analysis and
processing, content-based image / video / audio retrieval, media compression and
streaming, computer vision and their applications in consumer and enterprise markets.
He has published over 120 articles in the above area. He is a senior member of IEEE,
also serves on the editorial boards of 5 professional journals and a dozen committees
of various international conferences.

Image Retrieval

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Retrieval

Uploaded by

Copyright:

Available Formats

Statistical Correlation Analysis in Image Retrieval

Mingjing Li*, Zheng Chen, Hong-Jiang Zhang

Image retrieval, relevance feedback, correlation analysis, statistical learning, and

In general, relevance feedback process in CBIR is as following. For a given query,

The information-embedding framework [16] attempted to embed semantic

2. Definition of Statistical Correlation Model

Because of the model’s symmetry, a triangular matrix is sufficient to keep all

However, we encountered two problems in practice. One is how to process the

The unigram counts in a group are calculated as follows. Initially, C ( I ) is set to 0,

The unigram frequencies are updated as:

The procedure for training the correlation model is summarized as follows:

4. Image Ranking Schemes

In the experiments, the iterative refinement of relevance scores converges quickly. A

A statistical correlation model is proposed to improve the retrieval accuracy of image

1. M. Flickner, H. Sawhney, W. Niblack, et al, Query by Image and Video Content:

Figure 1. Precision vs. scope curve for content-based image retrieval

Figure 2. Precision vs. scope curve for keyword-based image retrieval

You might also like