Professional Documents
Culture Documents
Subject Code:
3,0,0,4,4
Objectives To focus on a detailed overview of the web mining process and its
techniques
To Understand the basics of Web search with special emphasis on web
Crawling
To understand the basic of indexing and the various type of query
processing approaches.
To appreciate the use of machine learning approaches for Web Content
Mining
To understand the role of hyper links in web structure mining
To appreciate the various aspects of web usage mining
Expected Outcome Upon Completion of the course, the students will be able to
Build a sample search engine using available open source tools
Describe the browser security model in web security
Identify the different components of a web page that can be used for
mining
Apply machine learning concepts to web content mining
Implement Page Ranking algorithm and modify the algorithm for mining
information
Design a system to harvest information available on the web to build
recommender systems
Analyse social media data using appropriate data/web mining techniques
Modify an existing search engine to make it personalized
7 QUERY PROCESSING
Relevance Feedback and Query Expansion - Automatic Local and
3 11
Global Analysis Measuring Effectiveness and Efficiency
8 Recent Trends 2 5
The following is the sample project that can be given to students for the
implementation
1. To develop the Search Engine for retrieval process
2. To develop the Crawler based on domains
3. Efficiently extracting the related textual information and Multimedia contents
from documents.
4. To implement the Indexing structure for multi-dimensional data with dynamic
nature.
5. Opinion Mining and Sentiment Analysis from the document using web mining.
6. To implement the Recommendation System.
7. To implement the effective compression schemes for storing the data using less
storage space.
8. To develop the mechanism for Query Manager.
9. To develop the effective query refinement mechanism based on query algebra.
10. Personalize the search engine.
11. Solving Data Science problems from Kaggle website
List of Case Studies:
12. Market -Customer analysis
13. Biological/ DNA sequence analysis
14. Detecting software bugs
15. Improving storage performance
16. Design of structured pattern mining methods
17. Network alarm pattern mining
18. XML query access pattern analysis
19. System performance
20. Telecommunication network
21. Financial and Scientific data
22. Creating adaptive web sites
23. System improvement
24. Navigation patterns WEBLOG.
Text Books
1. Bing Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric
Systems and Applications), Springer; 2nd Edition 2009
2. Zdravko Markov, Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content,
Structure, and Usage, John Wiley & Sons, Inc., 2007
Reference Books
1. Guandong Xu ,Yanchun Zhang, Lin Li, Web Mining and Social Networking: Techniques and
Applications, Springer; 1st Edition.2010
2. Soumen Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data, Morgan
Kaufmann; edition 2002
3. Adam Schenker, Graph-Theoretic Techniques for Web Content Mining, World Scientific Pub Co
Inc , 2005
4. Min Song, Yi Fang and Brook Wu, Handbook of research on Text and Web mining technologies,
IGI global, information Science Reference imprint of :IGI publishing, 2008.
Web Mining
Knowledge Areas that contain topics and learning outcomes covered in the course
This course is a
Elective Course.
Suitable from 4th semester onwards.
Knowledge of basic mathematics is essential.
This Course is designed with 100 minutes of in-classroom sessions per week, 60 minutes of
video/reading instructional material per week, 100 minutes of lab hours per week, as well as
200 minutes of non-contact time spent on implementing course related project. Generally this
course should have the combination of lectures, in-class discussion, case studies, guest-lectures,
mandatory off-class reading material, quizzes.
Additional weightage will be given based on their rank in crowd sourced projects/ Kaggle
like competitions.
Other comments
[optional]
45 Hours (3
Credit hours
/week 15
Weeks
schedule)