An image retrieval system is a technique used in computer system for searching and retrieving images from a large set of database of digital images. Most common and traditional methods of image retrieval used some techniques for adding metadata such as keywords, captions, or description to the images. By doing this, the retrieval of image can be done based on these annotated words. Manual image annotation consumes larger time, laborious and expensive too. To address this issue, large number of research has been done on automatic image annotation. In addition, social web applications and semantic web increasing rapidly, this has been inspired the development of various web based image annotation tools. The rapid evolution of the Internet and the explosive growth of the visual contents of the Web, image search in large scale has attracted considerable attention. The ability of fast similarity search of an image in a large-scale dataset is a research issue which is under consideration. The following work is analysis over the image retrieval in large database through various mechanisms available in the literature.
Original Title
A Survey of Methods used for Fast Retrieval of Images in Large scale dataset
An image retrieval system is a technique used in computer system for searching and retrieving images from a large set of database of digital images. Most common and traditional methods of image retrieval used some techniques for adding metadata such as keywords, captions, or description to the images. By doing this, the retrieval of image can be done based on these annotated words. Manual image annotation consumes larger time, laborious and expensive too. To address this issue, large number of research has been done on automatic image annotation. In addition, social web applications and semantic web increasing rapidly, this has been inspired the development of various web based image annotation tools. The rapid evolution of the Internet and the explosive growth of the visual contents of the Web, image search in large scale has attracted considerable attention. The ability of fast similarity search of an image in a large-scale dataset is a research issue which is under consideration. The following work is analysis over the image retrieval in large database through various mechanisms available in the literature.
An image retrieval system is a technique used in computer system for searching and retrieving images from a large set of database of digital images. Most common and traditional methods of image retrieval used some techniques for adding metadata such as keywords, captions, or description to the images. By doing this, the retrieval of image can be done based on these annotated words. Manual image annotation consumes larger time, laborious and expensive too. To address this issue, large number of research has been done on automatic image annotation. In addition, social web applications and semantic web increasing rapidly, this has been inspired the development of various web based image annotation tools. The rapid evolution of the Internet and the explosive growth of the visual contents of the Web, image search in large scale has attracted considerable attention. The ability of fast similarity search of an image in a large-scale dataset is a research issue which is under consideration. The following work is analysis over the image retrieval in large database through various mechanisms available in the literature.
1 M. Phil Scholar, Department of Computer Science, Gobi Arts and Science College, Gobi, Tamil Nadu, India. 2 Assistant professor, Department of Computer Science, Gobi Arts and Science College, Gobi, Tamil Nadu, India
Abstract An image retrieval system is a technique used in computer system for searching and retrieving images from a large set of database of digital images. Most common and traditional methods of image retrieval used some techniques for adding metadata such as keywords, captions, or description to the images. By doing this, the retrieval of image can be done based on these annotated words. Manual image annotation consumes larger time, laborious and expensive too. To address this issue, large number of research has been done on automatic image annotation. In addition, social web applications and semantic web increasing rapidly, this has been inspired the development of various web based image annotation tools. The rapid evolution of the Internet and the explosive growth of the visual contents of the Web, image search in large scale has attracted considerable attention. The ability of fast similarity search of an image in a large-scale dataset is a research issue which is under consideration. The following work is analysis over the image retrieval in large database through various mechanisms available in the literature. I. INTRODUCTION Nowadays the existence of online image repositories contains hundreds of millions of images of all kinds of content, quality and size. One example to show such an image repository is Flickr. These repositories rapidly grow day by day by making techniques for searching, indexing and navigating. In current research, indexing is mainly based on tags entered by manually or pattern usage of individual and groups. A tag which has been entered manually is not sure that it will refer the original content of the shown image. For instance, consider an example that is the tag Christmas in Flickr. As one might expect, only a very small amount of the image depicts the religious event. Instead of that, the tag often depicts the date and time of creation. Thus several parties and vacation photos pop up with no real theme commonly. These ambiguities and subjectivity of tags retrieve images based on manually entered tags which is quite difficult. Similarity search which is also known as nearest neighbor search is one of the most fundamental problems in image retrieval and in machine learning research communities. It denes the task of nding close or more related samples for a given query . It is most important for several multimedia applications, such asclassication, annotation and content- based multimedia retrieval. Recently, the rapid evolution of the Internet and the explosive growth of the visual contents on the Web, image search in large scale have attracted considerable attention. Comparing the query image with each sample in the database is infeasible because of the non scalability of the linear complexity under practical situations. For example, the photo sharing website such as Flickr, which has over 4 billion images. And another visual content sharing website known as YouTube receives more than 20 h of uploaded videos per minute. Besides, various large-scale content-based image retrieval applications suffer from the dimensionality curse in the case of visual descriptors which has usually hundreds or even thousands of dimensions. Therefore, the storage of the original data tends to be a big problem because searching becomes exhaustive when done beyond the infeasibility. To overcome these difficulties, following surveyed methods has been dealing with these issues which had been considered effective. Analysis of methods for fast image retrieval in large datasets behavior of literature has been presented in the following paper.
II. IMAGE RETRIEVAL BASED ON SIMILARITY
A. Region based similarity In [1] an image retrieval system that is based on a segmented representation of the visual content. This segmented representation leads to a comparison of the image content. That result obtained from that comparison is more "semantic" than a classical global comparison. The regions were compared by system using fuzzy similarity measures and that showed to be psychologically intuitive and easy to aggregate. Further exploitation of the aggregation between similarity measures obtained by region lets the user to build four different types of original visual requests. The resemblance between regions is highly elaborated than the global one. In [1] author considered several aspects of comparison that are color, shape and position. The region is considered as vector for each comparison. As these three vectors belong to a different space, the relation between each other cannot be found, the similarity between these pairs of region depicted by three different measures: Only one measure has been taken for each of the vector pairs. Formally the two regions R(i) and I(j) extracted from Image request R, and Image database entry I. The region similarity Sreg is an aggregation of three sub-measures Sreg|color-Measuring color similarity based on the histogram. Dreg|position-Evaluate proximity of the center of two regions International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
Sreg|shape-Measures the shape similarity based on the minimum bounding rectangles (MBR) of R (i) and I (j). These three measures average can be combined by using the aggregation operator Sreg(R(i),I(j))=c.Sreg|color(R(i),I(j))+p.Dreg|position(R( i),I(j))+s.Sreg|shape(R(i),I(j)) With c+p+s=1. These parameters are used to balance the influence of each of the three feature spaces on region similarity.. These values can be set as c=0. 6, s=0. 2, p=0.
B. Similarity evaluation using simple features Recently the development of complex multimedia applications has called for new types of methods for the organization and retrieval of video sequences and still images .In [2] , few results of a study on similarity evaluation in image retrieval using object orientation, color and relative position as content features. Additionally feature descriptor and queries were query performance can be calculated by using a simple prototype system. The feature extraction process is completely automated and requires no user intervention. The system used in [2] is not a general purpose tool, but it is oriented to thematic image repositories where the stored image semantics are limited to a specific domain. The approach considered in [2] is oriented to retrieving images from a thematic database. Most image collections available in the public domain or through the professional and commercial distribution channels which are organized in sub- collections (directories). And each of them covers a separate theme.
C. Fast Kernel machines In [3] an approach to learn image similarity from Flickr groups was developed. The motivation of [3] is Flickr groups shows how people would like to group similar images on the Internet. To resolve this issue two query images can be considered similar if they are likely to belong to the same set of Flickr groups. In [3] author also described SIKMA, an algorithm used to train an SVM with the histogram intersection kernel by using tens of thousands of training examples. SIKMA is used to train classiers such that it predicts Flickr group memberships. Experimental results shown in [3] give strong evidence such that learned image similarity works better on many tasks such as unsupervised clustering and image matching than directly measuring similarity with visual features.
III. SIMILARITY SEARCH VIA HASHING
A. Self-taught Hashing In [5] Locality-sensitive Hashing (LSH) for the purpose of devising main memory algorithms for nearest neighbor search is proposed. This does not require hash buckets to store only one point which was used in earlier research of [5]. This approach shows better running time and generalization of analysis can be made in the case of secondary memory. The experimental result can be done on two data sets. First data set contains 20,000 histograms of color images from Corel. The second dataset contains 270,000 points of dimensions. The two performance measures have been calculated by the author in [5] they are speedy and Accuracy. The performance of this approach has been compared with SR-tree. The result obtained by experiment indicates it works well even for large number of dimensions and data sizes .An additional advantage obtained in [5] is running time is essentially determined in advance. . B. Locality-sensitive Hashing The fundamental premise of peer-to-peer (P2P) systems is that of individual peers are shared by voluntary resources .Then there is an inherent tension between collective welfare and individual rationality that threatens the viability of these systems. In [7] the intersection of computer science and economics , targets the design of distributed systems which consisting of rational participants with selfish and diverse interests has been discussed. In particular, major findings and open questions related to free-riding in P2P systems such as challenges in the design of incentive mechanisms for P2P systems, factors affecting the degree of free-riding and incentive mechanisms to encourage user cooperation were discussed in [7].
C. Semi-Supervised Hashing In [6] a semi-supervised hashing method has been discussed .This method formulates minimizing empirical error on the labeled data. This can be done while maximizing variance and independence of hash bits over the unlabeled and labeled data. This method can handle both semantic as well as metric similarity. The experimental results on two large datasets (up to one million samples) demonstrate its superior performance over unsupervised and supervised methods. In this [6], a semi-supervised paradigm is presented to learn efficient hash codes .These hash codes can handle semantic similarity/dissimilarity among the data points. This method leads to a very simple Eigen-decomposition based solution. To get top eigenvalues and eigenvectors, the use of iterative solvers makes the computation to work faster.
IV. MULTIMEDIA INDEXING, SEARCH AND RETRIEVAL In [7] Multimedia content is being shared by users in web for uploading videos, audios and photos in social network
A. Event-based Multimedia Indexing Recently, Event detection of web data has attracted a lot of research attention. This attraction is due to the desire of users to extract/exploit structured information and immense amount of available information. In [7] the semantic gap bridging between plain multimedia analysis and human perception has been discussed. In order to support a more human-centered retrieval process and new query types the multimedia researchers developed method to detect and link International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
events to multimedia .Retrieval of multimedia content can be done by event identification and indexing. . B. Time-related multimedia indexing In [7] The speed of multimedia content sharing in the Social web, Social media databases with greater volumes of data in very short time windows. Thus, a major difference between the common multimedia databases and Social web databases is the that they are constantly increasing and population of information with fresh contents. This tends to ask for complicated queries which include time/date related information by the user. Evolution of these databases stores millions of records every day in a certain period of time.. However, queries which filter the retrieved content in specified time windows are needed. In order to enable for efficient search and retrieval the time-evolved social multimedia databases are required.
V. GEOMETRY-PRESERVING VISUAL PHRASES
In [8] an approach which encodes more spatial information into BoV representation is proposed. This shows enough efficient to be applied to large-scale databases. In this work [8] an approach encodes more spatial information through the geometry-preserving visual phrases (GVP). In addition to co- occurrences the long-range spatial layouts and local of words were captured by GVM method. The searching algorithm which is based on GVP increases computational time or little memory usage compared to the BoV method. Moreover, [8] shows an approach can also be integrated to the min-hash method for improving the retrieval accuracy. The experiment results shown on [8] used Oxford 5K and Flicker1M dataset .The result shows that this approach outperforms the BoV method even after following a RANSAC verication.
VI. LDA-BASED IMAGE RETRIEVAL
In [9] Latent Dirichlet Allocation (LDA) for retrieving image from a large scale database has been discussed. It is a generative probabilistic model which is developed for the collections of text documents. In this document are represented by a nite mixture over latent topics which also called hidden aspects. Each topic is characterized by a distribution over words. The main objective of [9] is to model image databases not text databases. LDA represent an image as a mixture of topics, ( i.e. as a mixture of multiple objects).
This work [9] studies the representation of images by Latent Dirichlet Allocation (LDA) models in the context of query by- example retrieval. This can be done on a large real world image database consisting of more than 246,000 images. Results obtained on [9] shows that the approach performs well. The combination of appropriate similarity measure with LDA-based image representation outperforms previous approaches of [9] such as a p LSA-based image representation. The similarity measure based on probabilities and developed for information retrieval gives the best retrieval results. VII. NONNEGATIVE SHARED SUBSPACE LEARNING In [10], A shared subspace learning framework to leverage a secondary source. This is used mainly to improve retrieval performance from a primary dataset. This can be achieved by learning a shared subspace. These learning can be done between the two sources under a joint Nonnegative Matrix Factorization. These sources explicitly control the level of subspace sharing. The framework of image and video retrieval tasks in which tags can be validated from the Label Me dataset. These are all used to improve image retrieval performance from video retrieval from a YouTube dataset and Flickr dataset.
VIII. CONCLUSION
Image retrieval in a large set of database is still a problem of interest in the database and vision communities. A lot of techniques have been discussed to retrieve images quickly in large set of database using similarity search method. In the above survey deals with fast retrieval of images using similarity search based on different hashing techniques, Multimedia Indexing, and other major techniques Non negative shared subspace learning, LDA-Based image retrieval etc. Each of the surveyed methods proves and shows better in some categories and not in some other categories. Still it is a research issue to search and retrieve images in a large set of database such as YouTube, Flickr etc. . REFERENCES
[1]. J.F. Omhover, M. Detyniecki, B. Bouchon- Meunier: A Region-Similarity-Based Image Retrieval System, in Proc .conf. IPMU, 2004. [2]. Eugenio Di Sciascio, Augusto Celentano, Similarity Evaluation in Image Retrieval Using Simple Features, in Proc .conf. Storage and Retrieval for Image and Video Databases V, SPIEs Electronic Imaging , 1997, pp. 814.
[3]. Gang Wang, Derek Hoiem, and David Forsyth, "Learning Image Similarity from Flickr Groups Using Fast Kernel Machines," in Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 16 Jan. 2012.
[4]. Dell Zhang, Deng Cai , Jinsong Lu,Self-taught hashing for fast similarity search,in Proc.conf.33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2010. International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
[5]. Aristides Gionis , Piotr Indyk , Rajeev Motwani,Similarity search in high dimensions via hashing,in Proc .Conf, 1999, pp. 518-529.
[6]. Jun Wang, Sanjiv Kumar, Shih-Fu Chang,Semi- Supervised Hashing for Scalable Image Retrieval,in Proc. Conf In CVPR, 2010.
[7]. Theodoros Semertzidis, Dimitrios Rafailidis, Eleftherios Tiakas, Michael G. Strintzis and Petros Daras,Multimedia Indexing, Search, and Retrieval in Large Databases of Social Networks, in Book: Social media retrieval, 2013, pp. 43-63.
[8]. Yimeng Zhang , Zhaoyin Jia, Tsuhan Chen,Image Retrieval with Geometry-Preserving Visual Phrases,in Proc .IEEE Conference Computer Vision and Pattern Recognition (CVPR), 2011 , pp. 809 816.
[9]. Eva Hrster , Rainer Lienhart , Malcolm Slaney, Image Retrieval on Large-Scale Image Databases,in Proc .IEEE Transaction, 2007.
[10]. Sunil Kumar Gupta, Dinh Phung, Brett Adams, Truyen Tran and Svetha Venkates, Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval,in Proc. Conf. 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, pp. 1169-1178.