Professional Documents
Culture Documents
Outline
Clustering Techniques
Web Document Clustering Conclusion
1. Introduction
3. Clustering Techniques
Hierarchical Agglomerative clustering
Buckshot clustering
- Model
clusters User query
Clustering Engine
Search Engine
Introduction (Contd)
- Relevance
- Browsable Summaries - Overlap - Snippet-tolerance - Speed - Incrementality
Bm and Bns similarity is defined to be 1 iff - |Bm Bn| / | Bm| > 0.5 and - |Bm Bn| / | Bn| > 0.5 - Otherwise, their similarity is defined to be 0.
Experiments
- Web document contained 760 words on average. - Snippets contained 50 words on average.
Execution Time
Pros
the contents of a document collection - Also reduce the search space
Cons
- Computationally expensive
- Difficult to identify which cluster or cluters should be searched
Conclusion
The identification of the unique requirements of
document clustering of Web search engine results.
The definition of STC - an incremental, O(n) time clustering algorithm that satisfies these requirements. The first experimental evaluation of clustering algorithms on Web search engine results, forming a baseline for future work.