Advertising Object in Web Videos

Neurocomputing ] (]]]]) ]]]]]]
Contents lists available at SciVerse ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Advertising object in web videos

Richang Hong a,n, Linxie Tang b, Jun Hu c, Guangda Li d, Jian-Guo Jiang a
a
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
Department of EEIS, University of Science and Technology of China, Hefei 230026, China
College of Computer Science, Zhejiang University, Hangzhou 310027, China
d
School of Computing, National University of Singapore, 117417 Singapore, Singapore
b
c
a r t i c l e i n f o
Keywords:
Video advertising
Visual relevance
Product
abstract
We have witnessed the booming of contextual video advertising in recent years. However, those
advertisement systems solely take the metadata into account, such as titles, descriptions and tags.
This kind of text-based contextual advertising reveals a number of shortcomings in ads insertion and
ads association. In this paper, we present a novel video advertising system called VideoAder.
The system leverages the well organized media information from the video corpus for embedding
visual content relevant ads into a set of precisely located insertion position. Given a product, we utilize
content-based object retrieval technique to identify the relevant ads and their potential embedding
positions in the video stream. Then we formulate the ads association as an optimization problem to
maximize the total revenue for the system. Specically, the Single-Merge and Merge methods are
proposed to tackle the complex query in visual representation. Typical Feature Intensity (TFI) is used to
train a classier to automatically decide which method is more representive. Experimental results
demonstrated the accuracy and feasibility of the system.
& 2013 Elsevier B.V. All rights reserved.
1. Introduction
The explosively growing online multimedia data has brought
new challenges to online video advertising. For traditional video
advertising, the association between video and ads are decided by
keywords matching, such as Google AdWord and AdSense [1].
The relevance is calculated on the basis of video metadata such as
title, description and tags. However, the traditional text-based
contextual advertising reveals a number of disadvantages. First,
conventional video advertising systems usually determine the ads
insertion points by metadata analysis [2] or video structure [15],
without considering the visual coherence between ads and the
insertion point of video. This will make the ads highly intrusive to
the audience. Second, user-tagged text is generally incomplete
and inaccurate, or to say, the text quality will be variable in
diversity and accuracy by the subjectivity of text creator, which
might decrease the revenue of the advertising system.
There exist a wide variety of advertising schemes for contextual based video advertisement. Typical scheme for contextual
relevance are the content relevance system based on the videos
webpage (e.g., Google AdSense [16]). YouTube [19] and Hulu [21]
select relevant ads by mining contextual metadata and insert ads
Corresponding author.
E-mail address: hongrc.hfut@gmail.com (R. Hong).
in the beginning or the last frame of video or several chips. The

vADeo system [19] leverage on scene detection and face recognition in ads insertion. These kinds of ads association disregard the
coherence and relevance between ads and insertion point, bringing intrusive feelings. On the other hand, Mei et al. [2] proposed a
VideoSense system, which insert ads at the positions with the
highest discontinuity and lowest attractiveness while maximizing
the overall global and local relevance. Yi et al. [3] provided a
contextual advertising system which is able to select ads for
individual scenes in video contents taking advantage of video
scripts. However, all the above works have limitation in deeply
mining the visual-content to deciding which product is better to
be advertized.
In this paper, we propose a novel advertising system called
VideoAder [25]. Instead of scanning given textual information or
determining by video structure to nd potential points to insert
ads, we directly detect product in all videos of a video corpus by
content based object retrieval technique [6,17,18]. In order to
represent the complicated visual characteristics of products for
query, we utilize two approaches to represent product and train a
classier for approach selection to make the product query
representation more comprehensive, representative and precise
[26,28]. We also consider how to associate the ads with relevant
video content. The former video advertising systems generally
process videos one by one. It is a video-ads mapping, which
means we have to utilize all ads to search in a video to nd the
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2012.04.040
Please cite this article as: R. Hong, et al., Advertising object in web videos, Neurocomputing (2013), http://dx.doi.org/10.1016/
j.neucom.2012.04.040i
R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]
Fig. 1. The Framework of the VideoAder.
Fig. 2. Examples of product suitable for (a) Single-Merge (Nano 6, Amazon Kindle, IPhone 3GS) approaches and (b) merge approaches (camera).
most relevant video stream and advertising points in the stream.

Therefore we need synchronously process the video corpus after
the ads set is updated [7]. Here, we utilize retrieval method to nd
advertising points. We extracted the keyframes from the videos to
form a large image set. Therefore, given a targeted product, we can
directly search the objects in the image set. We argue that VideoAder
represents one of the rst attempts towards video advertising by
leveraging on visual-based techniques.
2. System framework
The overall system framework is depicted in Fig. 1.
2.1. Preprocessing
All videos of a video community website are decomposed into
series of keyframes at every 5 s. Intuitively, one keyframe per 5 s
j.neucom.2012.04.040i
thus, we select images taken from different viewpoints to form a

merged feature representation.
We rstly collect top 200 images from Google image by text
search. In order to lter noisy images in the collected image set, we
then use Amazon product image to re-rank the 200 images by
content based object retrieval method. Finally we collect top 100
images and merge all the visual words to select the most typical
visual words to represent the product.
can both basically ensure the video contextual continuity and

satisfy the need for product searching. After that, a large corpus of
keyframes forms.
For visual content processing, the Laplacian of Gaussian [4,20]
method is utilized to detect the feature points and scale-invariant
feature transform [5,29] (SIFT) to describe these points. Each
feature point is represented by a 140-dimensionvector, 2 for
coordinates, 128 for description of the feature point and 10 for
the nearest 10 neighbors. Hierarchy clustering is then utilized for
clustering with a total of about 100,000 cluster centers.
2.2.3. Classier for approach selection

Given a product name, we cannot denitely design which
approach would be better to describe the visual characteristics of
product. We may easily design a preferable approach by manual,
but it would be subjective and laborsome.
In order to automatically select a preferable approach to represent
product query, we train a classier. Foremost, we introduce a concept
called typical feature intensity (TFI). TFI indicates the intensity of
typical features which can describe one object representatively. The
TFI of product p is calculated by:
2.2. Visual representation of query

A precise, comprehensive and representative visual representation is of crucial importance for product search. For content based
product retrieval, query representation could be considerable complex. So as to attain a compelling result, the query representation
method is adaptive to the characteristic of product.
2.2.1. Single-merge
In this paper, we represent product query by two approaches.
The rst approach is called Single-Merge. We observe that for some
products, such as iPod Nano 6th Generation, Amazon Kindle, iPhone,
shown in Fig. 2(a), these products are mainly represented by only one
view of the product images, hence we can detect the feature points
from that view only. The Single-Merge query images are selected
from Amazon products for that Amazon product images are always
representative.
TFIp
9Sp9
X 9Mp9
X
1
E Si p,
29Sp9 i 1 j 1
Mj p
where Si(p) is a visual word of product p by Single-Merge approach,

while Mj p is a visual word of product p by Merge approach.
E(w1,w2) returns 1 if visual word w1 equals to w2, else return 0.
Table 1 gives all the 10 kinds of product and there TFI values.
Eq. (1) indicates that Single-Merge visual words are more intensively appeared within the ltering merged words for a certain query
with a high TFI value. In other words, the object is more easily
represented by the Single-Merge approach. Therefore, we deliberate
that for a product, if the TFI value is larger than a threshold (0.3),
Single-Merge approach is selected to represent product query, otherwise, Merge approach selected. The threshold is learned from a
training data constructed of 100 product queries.
2.2.2. Merge
We observe that for some products, such as computer and
camera, seen in Fig. 2(b), their feature points are homogeneously
covered on all sides of the 3D object surfaces and all sides have
similar frequency of occurrences in videos. So the rst query
representation approach can hardly integrally represent the query,
2.3. Searching
Table 1
The TFI value for 10 query products.
Product
TFI
1. Amazon Kindle
2. iPhone 3GS
3. iPod Nano6
4. BlackBerry 9700
5. Cisco 7600 Phone
6. MacBook Pro
7. Nikon P700
8. NintendoWii
9. ThinkPad
10. Xbox36
0.593
0.669
0.516
0.368
0.284
0.157
0.223
0.122
0.161
0.101
2.3.1. Indexing and ranking

By treating each video keyframe as a document and the visual
words extracted from keyframe as textual words, we utilize textual
inverted index method to form the indexing structure. TF-IDF is
used to evaluate the importance of certain visual word. TF represents the visual word frequency in one keyframe, while IDF value
Table 2
Approach selection to query (S represents Single-Merge, M represents Merge).
Product Number
Approach
60
50
48
45
40
30
20
28
1
S
2
S
50 49
5
M
6
M
7
M
8
M
9
M
10
M
Merge
35
25
19
4
S
Single Merge
43
30
3
S
23
14
17
20
10
15
9
10
15
2
Fig. 3. Search results by two approaches.
j.neucom.2012.04.040i
represents the frequency of the visual word in all image corpus

[27,30]. IDF value is learned from a large corpus of about 10 million
images. We can calculate the ranking value as follows:
Rd
9Wd9
X
TFIDF W i d
i1
2.3.2. Filtering
We deliberate about lters to make the search result more
satisfactory. We lter the following keyframes after ranking:
(a) too many or too few feature points detected in this keyframe;
(b) ranking ahead but with few matching points;
(c) in the case of one-to-many matching, lter the redundant
matching points.
2.3.3. Re-ranking
We apply the computational expensive, geometric verication
to the small retrieved image set after searching and ltering the
top 100 keyframes. The spatial consistency from the k (k10)
spatially nearest neighbors is used to lter the visual words.
2.4. Advertisement association optimization
In AdWord [1], each advertiser places bids on a number of
keywords and species a maximum daily budget. The objective is to
maximize the total revenues while respecting ad relevance.
In our framework, let A denote the keywords (products) bid by
a
advertisers containing Na keywords, represented by A fai gN
i 1.
N
Let B fbj gj b 1 denote the all possible insertion points in the video
c
corpus, containing Nb insertion points. Let C fck gN
denote the
k1
candidate ad images for insertion. Online ads insertion can be
described as selection of M keywords, N insertion points and N ad
images from A, B and C. M and N can be given by the publisher of
VideoAder. The objective of ads association is to maximize the
total revenues while respecting the ad relevance. The ad relevance can be measured by two factor, one is the contextual
relevance between keyword and insertion point, the other is local
similarity of ad image and the keyframe in insetion point. Thus,
we introduce the following three items for optimization. Let Rb(ai)
denote the daily budgets of keywords ai, and Rr(ai,bj) denote the
contextual relevance between keyword ai and insertion point bj.
Let Rs(bj,ck) denote the local similarity of ad image ck and the
keyframe in insertion point bj. Local similarity requires that we
should be prior to select the products whose product images are
most relevant to the video scenes (e.g., product images which
have the same viewpoint with the product in the video will have
higher priority than that with different viewpoints).
The ads association can now be formulated as an optimization problem, which simultaneously maximized the three items
[2224]. Here we introduce the following design variables
X x1 ,x2 ,. . .,xNa , xiA{0,1}, and Y y1 ,y2 ,. . .,yNb , yjA{0,1}, and
Z z1 ,z2 ,. . .,zNc , zk A f0,1g, where xi,yj,zk indicate whether keyword ai, insertion point bj and ads image ck are selected
xi 1, yj 1, zk 1 or not xi 0, yj 0, zk 0. The optimization problem can be expressed as following nonlinear 0-1 programming problem, [20]
Na
X
i1
xi Rb ai
Nb
Na X
X
Nb X
Nc
X

xi yj Rr ai ,bj ws
yj zk Rs bj ,ck
i1j1
s:t:
Na
X
xi M,
i1
where Wi(d) represents the ith visual word in document (or

keyframe) d, TFIDF(Wi(d)) returns the weight of that visual word
for ranking. The ranking value is accumulated by all the visual words
in a certain frame.
maxx,y,z f x,y,z wb
wr
Nb
X
yj N,
j1
Nc
X
j1k1
zk N, xi , yj , zk A f0,1g
k1
The parameters (wb, wr, ws) control the emphasis on daily budget
and ad relevance, and satisfy the constrains: 0rwb, wr, ws r1 and
wb wr ws 1. The parameters can be set according to the importance of the optimization item.
By examining Eq. (3), we can observe that there are
N
N
CM
Na C Nb C Nc MNN!solutions in total. The searching space will be
tremendous when dataset is large. For practical usage, we
introduce a heuristic searching algorithm to solve the optimization problem [21], As described in Algorithm 1, the number of
solutions can be signicantly decreased to O(Na Nb Nc
M N0 N0 ).
Algorithm 1. The heuristic searching algorithm for Eq. (3)
1. Initialize: set the labels of all the elements in X, Y and Z as
0.
2. Rank all the elements in X according to wbRb(ai) in a
descendent order, select the top M elements, and set the
labels as 1.
3. Rank all the elements in Y according to
wbRb(ai)wrRr(ai,bj) in a descendent order, select the top N0
(N o N0 {N b ) elements.
4. For each yj in the top N0 elements in Y, select the zk with
the max{wrRr(ai,bj)Rs(bj,ck)}.
5. For each xi in the top M elements in X, select the
unselected yj and zk with the
max{wbRb(ai)wrRr(ai,bj) Rs(bj,ck)}, set the label of yj and zk
as 1.
6. Output all the triples with xi 1, yj 1, zk 1.
3. Experiment
To ensure our algorithm and methods works in practice, we
conduct experiments on a web video set consisting of 307 videos
collected from YouTube. In order to ensure that the results would
make a signicant impact in practice, we concentrate on the 10
popular and distinctive products queries for searching and annotation. Typical queries include iPod, iPhone, Camera, etc.
All the ads are collected from Amazon. For each product query, we
extract most typical or representative product image from Amazon for Single-Merge query representation. To obtain multiple
views images of 3D apparent products, we search Google Images
to form the merged product query representation.
3.1. Evaluation of classier

To evaluate the performance, we collect the top 50 images by
our searching and ranking method, and we then count the
number of keyframes which contain the query product, as is
shown in Fig. 3. Table 2 depicts the query representation
approach selection by our classier. In this scenario, we observe
that for certain kind of product, the selected query representation
approach in Table 2 always receive the preferable result of two
approaches shown in Fig. 3. That means our classier works
perfectly for the approach selection. Given a targeted product
name, we can denitely automatically design the adaptive
approach to form a query representation.
j.neucom.2012.04.040i
3.2. Evaluation of product searching

It is of great challenge to quantify the quality of searching
result. Foremost, product occurrences in web videos are
heavily inuenced by the complex video scenes. For instance,
we may just see one corner of camera in the keyframe and it is
hard to design if this camera has ever appeared in the frame, but
we must make the choice. Thus, we may attain an empirical and
intuitive result.
In Section 3.1, we use the number of satised keyframe within
the top 50 images to evaluate the performance. In our scenario,
we pay more attention to the result precision, in other words, we
care more about nding the accurate advertising points, even
though there doesnt exist a compelling recall. Fig. 3 shows the
searching result by our method. From the result, we clearly see
that the 3D apparent products are more negative to reach a
satisfactory result. First, query representation about the products
which are suitable for Merge approach seems to be harder to
satisfy the video object searching scenario. Moreover, it is more
difcult to reach a compelling viewpoint of product for searching
or object matching. Finally, we must consider that TF-IDF weight
is more suitable for text ranking, and content-based object
retrieval could be fairly complex when considering the geometric

information which is vital to get a wonderful result. Fig. 4 depicts
some search result.
4. Conclusions and future work

The explosive amount of videos from Internet sharing community had entailed the needs for effective video advertising
system. The system ought to be both user-friendly and has high
revenue. In this paper, we have proposed an intelligent video
advertising system-VideoAder as Fig. 5. The evaluation indicates
that this system could associate visual relevant advertisements
with the web videos. It is a commendable way to lessen the
intrusiveness of web ads, and give the video viewers a fresh
feeling during scanning the videos.
In order to identify points for embedding ads, we utilize
content-based object retrieval technique to construct the system
framework [8,9,14]. We use adaptive visual representation
approaches to represent different query product. However, given
a product name, how can we decide which approach is more
suitable to describe that product? We then introduce the TFI
Fig. 4. Search result for three products: (a) search result of Amazon Kindle, (b) search result of iPod Nano 6th and (c) search result of Nikon P7000.
j.neucom.2012.04.040i
Fig. 5. The demo system of VideoAder: (a) main interface and (b) video displaying and advertising interface.
concept and train a classier to achieve preferable approach

selection. Hence, we get a relatively precise, comprehensive query
representation about a certain product.
Furthermore, VideoAder does not rely on text information of

video such as tags, title and description. This releases the system
from the video text information [1013]. Besides that, a more
j.neucom.2012.04.040i
integrated video advertising system can be produced to give more

comprehensive services. We will do more exploration into this
direction.
Acknowledgment
This work was supported by a grant from the National Natural
Science Foundation of China, No. 61172164.
References
[1] A. Mehta, A. Saberi, U. Vazirani, V. Vazirani. AdWards and generalized on-line
matching, in: Proceedings of the 46th Annual IEEE Sysposium on Foundations
of Computer Science, Pittsburgh, USA, October 2005.
[2] T. Mei, X.-S. Hua, L. Yang, S. Li, VideoSence-Towards Effective Online Video
Advertising, ACM Multimedia, Augsburg, Germany, 2007.
[3] B.-J. Yi, J.-T. Lee, H.-W. Woo, H.-C. Rim. Contextual Video Advertising system
using scene information inferred from video scripts, in: Proceeding of the
33rd International ACM SIGIR Conference on Research and Development in
Information Retrieval, Geneva, Switzerland. July 2010.
[4] T. Linderberg, Feature detection with automatic scale election, Int. J. Comput.
Vision 30 (2) (1998) 79116.
[5] D.G. Lowe. Object recognition from local scale-invariant features, in:
Proceedings of the International Conference on Computer Vision, Vancouver,
Canada, 1999.
[6] M. Wang, X.-S. Hua, J. Tang, R. Hong, Beyond distance measurement:
constructing neighborhood similarity for video annotation, IEEE Trans. Multimedia 11 (3) (2009) 465476.
[7] M. Wang, X.-S. Hua, Active learning in multimedia annotation and retrieval: a
survey, ACM Trans. Intell. Syst. 2 (2) (2011) 10.
[8] M. Wang, X.-S. Hua, R. Hong, J. Tang, G.J. Qi, Y. Song, Unied video
annotation via multi graph learning, IEEE Trans. Circuit Syst. Video Technol.
19 (5) (2009) 733746.
[9] R. Hong, M. Wang, M. Xu, S. Yan, T.S. Chua, Dynamic Captioning: Video
Accessibility Enhancement for Hearing Impairment, ACM Multimedia, Beijing, China, 2010.
[10] Z.-J. Zha, L. Yang, T. Mei, M. Wang, Z. Wang, T. Chua, X.-S. Hua, Visual Query
Suggestion: Towards Capturing User Intent in Internet Image Search, ACM
TOMCCAP 6 (3) (2010).
[11] R. Hong, J. Tang, Z.-J. Zha, Z. Luo, T.-S. Chua, Mediapedia: mining web
knowledge to construct multimedia encyclopedia, Lect. Notes Comput. Sci.
5916 (2010) 556566.
[12] M. Wang, K. Yang, X.-S. Hua, H.-J. Zhang, Towards a diverse relevant search
of social images, IEEE Trans. Multimedia 12 (8) (2010) 829842.
[13] M. Wang, Y. Sheng, B. Liu, X.-S. Hua, In-image accessibility indication, IEEE
Trans. Multimedia 12 (4) (2010) 330336.
[14] M. Wang, X.-S. Hua, T. Mei, R. Hong, G.-J. Qi, Y. Song, L.R. Dai, Semisupervised kernel density estimation for video annotation, Comput. Vis.
Image Understand. 113 (3) (2009) 384396.
[15] T. Mei, J. Guo, X.S. Hua, F. Liu, Adon: toward Contextual Overlay In-Video
Advertising, Multimedia Syst. 16 (4) (2010) 335344.
[16] AdSense. Avaliable at: /http://adsense.google.comS.
[17] R. Hong, J. Tang, H.K. Tan, S. Yan, C.W. Ngo, T.S. Chua, Beyond search:
event driven summarization for web videos, ACM Trans. Multimedia Comput.
Commun. Appl. 7 (4) (2011) 35.
[18] R. Hong, M. Wang, G. Li, Z.J. Zha, T.S. Chua, Multimedia Question Answering. IEEE Multimedia. 19(4) (2012) 7278.
[19] YouTube. Avaliable at: /http://www.youtube.comS.
[20] Y. Gao, J. Tang, R. Hong, S. Yan, Q. Dai, N. Zhang, T.S. Chua, Camera
constraint-free view-based 3-D object retrieval, IEEE Trans. Image Process.
21 (4) (2012) 22692281.
[21] Hulu. Avaliable at: /http://www.hulu.comS.
[22] S.H. Srinivasan, N. Sawant, S. Wadhwa, vADeo: Video Advertising System,
ACM Multimedia, Augsburg, Germany, 2007.
[23] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press,
UK, 2004.
[24] C.R. Reeves, Modern Heuristic Techniques for Combinatorial Problems,

Blackwell Scientic Publications, Oxford, 1993.
[25] J. Hu, G. Li, Z. Lu, J. Xiao, R. Hong, Videoader: A Video Advertising System
Based on Intelligent Analysis of Visual Content, ICIMCS, Chengdu, China,
2011.
[26] Y. Gao, M. Wang, Z.J. Zha, Q. Tian, Q. Dai, N. Zhang, Less is more: efcient
3-D object retrieval with query view selection, IEEE Trans. Multimedia 13 (5)
(2011) 10071018.
[27] J. Shen, D. Tao, X. Li, QUC-tree: integrating query context information
for efcient music retrieval, IEEE Trans. Multimedia 11 (2) (2009) 313323.
[28] J. Shen, J. Shepherd, B. Cui, K.L. Tan, A novel framework for efcient
automated singer identication in large music databases, ACM Trans. Inf.
Syst. 27 (3) (2009).
[29] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, Y. Pan, A multimedia retrieval
framework based on semi-supervised ranking and relevance feedback, IEEE
Trans. Pattern Anal. Mach. Intell. 34 (4) (2012) 723742.
[30] Y. Yang, F. Wu, F. Nie, H.T. Shen, Y. Zhuang, A.G. Hauptmann, Web and
personal image annotation by mining label correlation with relaxed visual
graph embedding, IEEE Trans. Image Process. 21 (3) (2012) 13391351.
Richang Hong is a Professor in Department of EEIS,

Hefei University of Technology. Before that, he was a
Postdoctoral Research Fellow in School of Computing,
National University of Singapore. He received his Ph.D.
degrees in July 2008 from University of Science and
Technology of China (USTC). His current research
interests include multimedia question answering,
social media mining and content based image retrieval. From February 2006 to June 2006, he worked as a
research intern in Web Search and Data Mining group
at Microsoft Research Asia. He has authored over 50
journal and conference papers in these areas. He was a
recipient of the Best Paper Award in ACM Multimedia
2010 in Florence, Italy. He served as a technical program committee member of
more than 20 worldwide top conferences and a reviewer of over 20 prestigious
international journals. Dr. Hong is a member of the Association for Computing
Machinery (ACM).
Linxie Tang is a PhD candidate from University of Science and Technology of

China. His research interests are multimedia and computer vision.
Jun Hu is a undergraduate from Zhejiang University. His research interest is

computer vision.
Guangda Li is a research Fellow in Lab for Media Search in School of Computing,

National University of Singapore. He got his PhD degree from National University
of Singapore in 2012. His research interests are social media analysis and
multimedia question answering.
Jian-Guo Jiang is a professor from Hefei University of Technology. His research

interests are digital signal processing and comuter vision.
j.neucom.2012.04.040i

Advertising Object in Web Videos

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advertising Object in Web Videos

Uploaded by

Copyright:

Available Formats

Neurocomputing ] (]]]]) ]]]]]]

Contents lists available at SciVerse ScienceDirect

Advertising object in web videos

in the beginning or the last frame of video or several chips. The

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

Fig. 1. The Framework of the VideoAder.

most relevant video stream and advertising points in the stream.

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

thus, we select images taken from different viewpoints to form a

can both basically ensure the video contextual continuity and

2.2.3. Classier for approach selection

2.2. Visual representation of query

where Si(p) is a visual word of product p by Single-Merge approach,

2.3.1. Indexing and ranking

Fig. 3. Search results by two approaches.

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

represents the frequency of the visual word in all image corpus

where Wi(d) represents the ith visual word in document (or

3.1. Evaluation of classier

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

3.2. Evaluation of product searching

retrieval could be fairly complex when considering the geometric

4. Conclusions and future work

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

concept and train a classier to achieve preferable approach

Furthermore, VideoAder does not rely on text information of

R. Hong et al. / Neurocomputing ] (]]]]) ]]]]]]

integrated video advertising system can be produced to give more

[24] C.R. Reeves, Modern Heuristic Techniques for Combinatorial Problems,

Richang Hong is a Professor in Department of EEIS,

Linxie Tang is a PhD candidate from University of Science and Technology of

Jun Hu is a undergraduate from Zhejiang University. His research interest is

Guangda Li is a research Fellow in Lab for Media Search in School of Computing,

Jian-Guo Jiang is a professor from Hefei University of Technology. His research

You might also like