Professional Documents
Culture Documents
PROJECT ON
INDEX
INTRODUCTION What is the problem Objective of the project
INTRODUCTION
Focused crawlers in particular, have been introduced for satisfying the need of individuals or organizations to create and maintain subject-specific web portals or web document collections locally or for addressing complex information needs. Classic focused crawlers: It takes as input user query and seed URL and they guide the search towards page of interest. Pages pointed to by links with higher priority are downloaded first. The crawler proceeds recursively on the links contained in the downloaded pages. Semantic Crawlers: Download priorities are assigned to pages by applying semantic similarity criteria for computing page-to-topic relevance: a page and the topic can be relevant if they share conceptually Learning Crawlers: a learning crawler is supplied with a training set consisting of relevant and not relevant Web pages which is used to train the learning crawler BestFirst crawler: priority queue ordered by similarity between topic and page where link was found. PageRank crawler: crawls in pagerank order, recomputes ranks every 25th page. InfoSpiders: uses neural net, back propagation, considers text around links
algorithms, in which it will miss a relevant page if there does not exist a chain of hyperlink The problem in local search algorithm is being trapped with local optimal. The using of local search algorithms become even more obvious after recent Web structural studies revealed the existence of Web communities.
Problem with web communities:Researchers have found that three structural properties of Web communities make local search algorithms not suitable for building collections for scientific digital libraries. 1. Instead of directly linking to each other, many pages in the same Web community relate to each other through co-citation relationships.
Contd
Cntd..
2. Web pages relevant to the same domain could be separated into different Web communities by irrelevant pages.
Contd...
3. Researchers found that sometimes when there are some
links between Web pages belonging to two relevant Web communities, these links may all point from the pages of one community to those of the other, with none of them
pointing in the reverse direction
THANK YOU