Professional Documents
Culture Documents
ON
A
1. PROBLEM DEFINITION
Our work is dealing with the very important domains from computer science
viz. Knowledge and data Engineering, Artificial Intelligence including Data mining,
Expert systems, Decision support systems and various information retrieval systems.
Most of the commercial search engines returns roughly the same result for the same
query independent of users actual interest i.e. what the user wants exactly. It also
gives some irrelevant data. Our method will help the user to get the relevant result of
his query. Consider the following example:
If a user submits a query bank on the search engine, then the normal search
engine will give the result including, blood bank, money bank or river bank without
considering users real interest, whether he is interested in blood bank or money bank
or river bank. In that case he needs to provide the interest manually (explicit feedback)
by taking extra efforts. But our personalized search will help the user to get relevant
result without taking extra manual efforts. Our approach will include positive and
negative preferences and will create a user profile which will be employed by a
clustering algorithm. We will use SPY NB algorithm for creation of user profile
followed by Agglomerative Clustering Algorithm.
2. LITERATURE SURVEY
Searching is one of the important task in Computer Science. If the time
required in searching is long then it will result high cost and lower the system
performance.
Present searching techniques does not take user interest in account. The most
of show results related to latest data present in their primary memory.
For example: kolavary-di song. It was mostly searched song on Google. Now
even if a user is interested in cola information related to Coca-Cola Company, yet he
will be shown results for kolavary-di song. It is because present search engines search
results are based on previous user search.
This drawback has been overcome in Concept Based User Profiles project.
User interest is taken into consideration by creating user profile for each query.
Two algorithms are used:
1. Suffix (SPY NB algorithm)
2. Clustering (Agglomerative clustering algorithm)
Suffix algorithm removes articles from search query enter by user and
convert it into keyword form which can be compared with keyword in
database.
Clustering algorithm forms clusters of pages to be viewed to user who has
higher weights.
In profile creation system assigns weights to the pages user interested in.
+1 for positive and +0 for negative interest.
These methods aim at capturing users clicking and browsing behaviours. It deals
with clickthrough data from the user i.e. the documents user has clicked on.
Clickthrough data in search engines can be thought of as triplets (q, r, c)
Where,
q = query
r = ranking
c = set of links clicked by user.
Table 1 illustrates this with an example: the user asked the query support
vector machine, received the ranking shown in Figure 1, and then clicked on the links
ranked 1, 3, and 7. since every query corresponds to one triplet.
Table 2.1.1 Document Based Method.
1. Kernel Machines
http : //svm.first.gmd.de/
2. Support Vector Machine
http : //jbolivar.freeservers.com/
3. SVM-Light Support Vector Machine
http : //ais.gmd.de/ _ thorsten/svm light/
4. An Introduction to Support Vector Machines
http : //www.support vector.net/
5. Support Vector Machine and Kernel Methods References
http : //svm.research.bell labs.com/SVMrefs.html
6. Archives of SUPPORT-VECTOR-MACHINES@JISCMAIL.AC.UK
http : //www.jiscmail.ac.uk/lists/SUPPORTV ECTORMACHINES.html
7. Lucent Technologies: SVM demo applet
http : //svm.research.bell labs.com/SV T/SVMsvt.html
8. Royal Holloway Support Vector Machine
http : //svm.dcs.rhbnc.ac.uk/
9. Support Vector Machine - The Software
http : //www.support vector.net/software.html
Table 2.1.1 shows the ranking presented for the query support vector
machine. Marked in bold are the links the user clicked on. This strategy for extracting
preference feedback is summarized in the following algorithm.
Appl
Recip
m
D1
D2
D3
D4
e
1
0.58
0
0
(a)
e
l
0
0
0
0.58
0.58
0
0
0
1
0
0
0.58
Document term Matrix DT
Table 2.2 (b)
Doc\Category
D1
D2
D3
pudding
COOKING
1
1
0
Footbal
Soccer Fifa
0
0
0
0.58
SOCCER
0
0
1
0
0
0
0.58
D4
0
1
(b) Document-Category matrix DC
ii)
iii)
Apple
Recip
Puddin
footbal
Socce
Fifa
COOKIN
e
0.37
g
0.37
l
0
r
0
G
SOCCER
0.37
0.37
user and applied the same profile to all of the users queries. We believe that different
queries from a user should be handled differently because a users preferences may
vary across queries. For example, a user who prefers information about fruit on the
query orange, may prefer the information about Apple Computer for the query
apple.
3.1.1 Project scope:
The proposed system covers important domains from computer science viz.
Knowledge and data Engineering, Artificial Intelligence including Data mining,
Expert systems, Decision support systems and various information retrieval systems.
We focus on search engine personalization and develop several concept-based
user profiling methods that are based on both positive and negative preferences.
The profiles which capture and utilize both of the users positive and negative
preferences perform the best. The profiles with negative preferences can increase the
separation between similar and dissimilar queries.
The separation provides a clear threshold for an agglomerative clustering
algorithm to terminate and improve the overall quality of the resulting query clusters.
operating system.
Tomcat Apache 6.8
Jdk1.6
MS Access
3.2.2
For Example: Suppose the query is related to reserve bank, then the
words in query such as reserve bank are checked with those in the title of
URL.
3.2.3
3.2.4
Browsing Features:
Simple aspects of user web page interactions can be captured and
quantified. This feature is used to characterize interactions with pages
beyond the results page.
For Example: we compute how long users dwell on a page or domain,
and the deviation of dwell time from expected page dwell time for a query.
Clickthrough Features:
Clicks are a special case of user interaction with the search engine.
For example: for a query-URL pair we provide the number of clicks for
the result, as well as whether there was a click on result below or above the
current URL. The derived feature values such as ClickRelativeFrequency
and Click Deviation are computed.
3.2.8
.
3.3 EXTERNAL USER INTERFACES:
3.3.1. Hardware Interfaces:
Operating system: - Windows 2003 server and above all windows operating
system.
Tomcat Apache 7.1
10
Jdk1.7
MySQL_5.5
11
12
13
14
4.SYSTEM DESIGN
15
The use case view models functionality of the system as perceived by outside
users. A use case is a coherent unit of functionality expressed as a transaction among
actors and the system.
Fig 4.1.1 Use Case Diagram for Concept Based User Profile Search Engine
16
17
18
Start date
End date
Duration
04/8/16
24/8/16
21
Communication
04/8/16
10/8/16
Literature survey
11/8/16
17/8/16
Define scope
18/8/16
19/8/16
Develop SRS
20/8/16
24/8/16
25/8/16
5/10/16
42
25/8/16
31/8/16
Feasibility Analysis
01/9/16
07/9/16
08/9/16
09/9/16
10/9/16
14/9/16
15/9/16
21/9/16
22/9/16
28/9/16
29/9/16
5/10/16
05/01/17
29/03/17
84
05/01/17
25/01/17
21
26/01/17
15/02/17
21
16/02/17
08/03/17
21
09/03/17
29/03/17
21
facility
19
Activity
II
III
IV
VI
VII
VIII
IX
week
week
week
week
Week
week
week
week
week
Aug 4
Aug 11
Aug 18
Aug 25
Sept 1
Sept 8
Sept 15
Sept 22
Sept 29
XI
XII
XIII
XIV
XV
XVI
XVII
XVII
XIX
XX
XXI
XXII
week
week
week
week
week
week
week
week
week
week
week
week
Jan 5
Jan
Jan
Jan
12
19
26
Feb 2
Feb 9
Feb
Feb
Mar
Mar
Mar
Mar
16
23
16
23
5.TECHNICAL SPECIFICATION
20
5.1 ADVANTAGES
1. In short period of time we can have a relevant search.
2. Explicit interest of the user is taken into account by creating user profile for
each query.
3. Same user can have different queries.
5.2 DISADVANTAGES
1.
Sharing of user profile cant be done which can help to reduce time.
2. The existing user profiles cannot be used to predict the intent of unseen
queries.
5.3 APPLICATIONS
1. Text Mining.
3. Search Engine for Business Application.
21
22
Search engine:
A1
R1
A2
R2
A3
23
6.BIBLIOGRAPHY
[1] E. Agichtein, E. Brill, and S. Dumais, Improving web search ranking by
incorporating user behavior information, in Proc. of ACM SIGIR Conference,
2006.
[2] E. Agichtein, E. Brill, S. Dumais, and R. Ragno, Learning user interaction
models for predicting web search result preferences, in Proc. of ACM SIGIR
Conference, 2006.
[3]
Appendix:
500
test
queries.
[Online].
Available:
http://www.cse.ust.hk/dlee/tkde09/Appendix.pdf
[4] R. Baeza-yates, C. Hurtado, and M. Mendoza, Query recommendation using
query logs in search engines, vol. 3268, pp. 588596, 2004.
[5]
[6]
[7]
[8]
[9]
24
[13] F. Liu, C. Yu, and W. Meng, Personalized web search by mapping user
queries to categories, in Proc. of the International Conference on Information
and Knowledge Management (CIKM), 2002.
[14] Magellan. [Online]. Available: http://magellan.mckinley.com/
[15] W. Ng, L. Deng, and D. L. Lee, Mining user preference using spy voting
for search engine personalization, ACM TOIT, vol. 7, no. 4,
2007.
[16] Open directory project. [Online]. Available: http://www.dmoz.org/
[17]
25