You are on page 1of 6

VARDHAMAN COLLEGE OF ENGINEERING

(Autonomous)
Shamshabad, Hyderabad 501 218

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Academic year 2014-2015 VII Semester COURSE DESCRIPTION
Course Code Course Title Course Structure : :
:

ACS11T20 DATA WAREHOUSING AND DATA MINING Lectures 3 Tutorials 1 Practicals Credits 4

Course Coordinator Team of Instructors

: :

Prof L V Narasimha Prasad, Professor and Head Mr H Venkateswara Reddy and Mr R Madana Mohan

I.

Course Overview: The course addresses the concepts, skills, methodologies, and models of data warehousing. The course addresses proper techniques for designing data warehouses for various business domains, and covers concepts for potential uses of the data warehouse and other data repositories in mining opportunities. Data mining, the extraction of hidden predictive information from large databases , is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions.

II.

Prerequisite(s): Level UG Credits 4 Periods / Week 4 Prerequisites Understand basic DWDM theory and operational concepts

III.

Marks Distribution: Sessional Marks There shall be 2 midterm examinations. Each midterm examination consists of subjective test. The subjective test is for 20 marks, with duration of 2 hours. Subjective test of each semester shall contain 5 one mark compulsory questions in part-A and part-B contains 5 questions, the student has to answer 3 questions, each carrying 5 marks. First midterm examination shall be conducted for the first two and half units of syllabus and second midterm examination shall be conducted for the remaining portion. Five marks are earmarked for assignments. There shall be two assignments in every theory course. Marks shall be awarded considering the average of two assignments in each course. University End Exam Marks Total Marks

75

100

1|Page

IV.

Evaluation Scheme: S.No 1 2 3 4 5 Component I Mid Examination I Assignment II Mid Examination II Assignment External Examination Duration (hours) 2 2 3 Marks 20 05 20 05 75

V.

Course Objectives: I. II. III. IV. V. To introduce students to the basic concepts, techniques and applications of Data Mining. Learn how to preprocess data before applying data mining techniques. Mathematical foundations of data mining tools. Acquiring, parsing, filtering, mining, representing, refining, visualization and interacting with data. To develop professional and ethical attitude, effective communication skills,, leadership, teamwork skill, multidisciplinary approach and an ability to relate data mining issues to broader social context. To develop skills of Programming data mining algorithms using recent data mining software for solving practical problems. To gain experience of doing independent study and research.

VI. VII. VI.

Course Outcomes: 1. 2. Create a target data set to be used for discovery. Choose the data-mining task (classification, regression, clustering, etc.).Understand and apply a wide range of clustering, estimation, prediction, and classification algorithms, including k-means clustering, BIRCH clustering, Kohonen clustering, classification and regression trees, the C4.5 algorithm, logistic Regression, k-nearest neighbor, multiple regression, and neural networks. Understand the mathematical statistics foundations of the algorithms outlined above. Understand and apply the most current data mining techniques and applications, such as Text mining, Time series mining, Spatial mining, Web mining and other current issues. Be proficient with leading data mining software, including WEKA and Clementine. The beneficial uses of data mining and what potential threats do these activities pose. Demonstrate knowledge of professional and ethical responsibilities. Able to communicate effectively in both verbal and written form. Understanding of impact of engineering solutions on the society and also will be aware of contemporary issues. Develop confidence for self education and ability for life-long learning. Can participate and succeed in competitive examinations like GATE, GRE.

3. 4. 5. 6. 7. 8. 9. 10. 11.

2|Page

VII.

How Course Outcomes are assessed: Outcome a An ability to apply knowledge of computing, mathematical foundations, algorithmic principles, and computer science and engineering theory in the modeling and design of computer based systems to real-world problems. An ability to design and conduct experiments, as well as to analyze and interpret data. An ability to design, implement, and evaluate a computer-based system, process, component, or program to meet desired needs, within realistic constraints such as economic, environmental, social, political, health and safety, manufacturability, and sustainability. An ability to function effectively on multi-disciplinary teams. An ability to analyze a problem, and identify, formulate and use the appropriate computing and engineering requirements for obtaining its solution. An understanding of professional, ethical, legal, security and social issues and responsibilities. An ability to communicate effectively, both in writing and orally. The broad education necessary to analyze the local and global impact of computing and engineering solutions on individuals, organizations, and society. Recognition of the need for, and an ability to engage in continuing professional development and life-long learning. Knowledge of contemporary issues. An ability to use current techniques, skills, and tools necessary for computing and engineering practice. An ability to apply design and development principles in the construction of software and hardware systems of varying complexity. An ability to recognize the importance of professional development by pursuing postgraduate studies or face competitive examinations that offer challenging and rewarding careers in computing. N = None S = Supportive Level Proficiency assessed by --

b c

-Assignments, Tutorials, Exams -Assignments, Exams ---Exams -Lab, Exams --

S S H N S H S S H S

d e

f g h

i j k l

--

H = Highly Related

VIII.

Syllabus: UNIT - I DATA WAREHOUSE AND OLAP TECHNOLOGY Data Warehouses definitions multidimensional data model data warehouse architecture schemas. INTRODUCTION TO DATA MINING Definition of data mining kinds of data data mining functionalities classification of data mining systems primitives major issues in data mining. UNIT - II DATA PREPROCESSING Descriptive data summarization- data cleaning data integration and transformation data reduction data discretization and concept hierarchy generation. MINING FREQUENT PATTERNS AND ASSOCIATIONS Basic concepts efficient and scalable frequent itemset mining methods association rule mining.

3|Page

UNIT - III CLASSIFICATION Decision tree induction, bayesian classification rule based classification, prediction accuracy and error measures. UNIT - IV CLUSTER ANALYSIS Cluster analysis categories of clustering methods partitioning methods hierarchical methods density based methods grid based methods model based clustering methods clustering high dimensional data outlier analysis. UNIT - V MINING STREAM, TIME SERIES AND SEQUENCE DATA Mining data streams, Mining time series data, mining sequence patterns in biological data. MINING OBJECT, SPATIAL, MULTIMEDIA, TEXT AND WEB Multi dimensional analysis on complex object data types descriptive mining on complex objects spatial data mining multimedia data mining text mining web mining. IX. List of Text Books / References / Websites / Journals / Others Text Books: nd 1. Jiawei Han and Micheline Kamber (2008), Data Mining: Concepts and Techniques, 2 edition, Elsevier. Reference Books: nd 1. Margaret H Dunham (2006), Data Mining Introductory and Advanced Topics, 2 edition, Pearson Education. 2. Amitesh Sinha (2007), Data Warehousing, Thomson Learning. 3. Xingdong Wu, Vipin Kumar (2009), The Top Ten Algorithms in Data Mining, Taylor and Francis Group. 4. Max Barmer (2007), Principles of Data Mining, Springer.

X.

Course Plan: The course plan is meant as a guideline. There may probably be changes. Lecture No. 1-2 Learning Objective To understand the database technology To know the steps in KDD and kinds of data mining Able to understand the data mining functionalities Able to differentiate data mining systems To know how to enforce task primitives in data mining algorithms. Able to think issues in data mining To compare data warehouse with other data repositories. To construct multidimensional data model. To draw data warehouse architecture. Topics to be covered Introduction to Data Mining: Definition of data mining,Evolution of database technology Steps in KDD and Kinds of Data in mining Data Mining Functionalities Classification of Data Mining Systems Data mining task Primitives Major Issues in Data Mining. Data Warehouse and OLAP Technology: What is Data Warehouse A Multidimensional Data Model and schemas Data Warehouse Architecture Reference T1: 1.1 1.2 T1: 1.2-1.3 T1: 1.4 T1: 1.6 T1: 1.7 T1: 1.9 T1: 3.1 T1: 3.2 T1: 3.3

3-4 5-6 7 8 9 10 11-12 13-14

4|Page

15 16-17 18 19-20 21-22 23-24 25 26-30 31-34 35-36

To formulate descriptive data summarization To understand data cleaning methods To know problems in data integration Able to apply various transformations in data transformation To understand data reduction techniques To know Data Discretization and Concept Hierarchy Generation. Able to know frequent patterns. To understand and design algorithms to find frequent item sets Able to mine interested association rules. Able to classify data items based on their similarity. To draw decision tree for classification. Apply rules for classification Able to predict object behavior. To derive formulas for accuracy and error measures. Able to group similar objects. To know all the clustering algorithms. To compare partitioning methods with hierarchical methods. To compare density based methods with grid based methods. To know model based clustering methods To know high dimensional data clustering and outlier analysis. To understand time series and sequence data. To know in detail about the time series data. To apply mining sequence patterns in biological data. To analyze spatial-text and web data as a multidimensional data. To know descriptive mining of complex data objects. To understand spatial data mining. To understand multimedia data mining. To understand text data mining. To apply mining techniques for world wide web.

Data Preprocessing: Descriptive Data Summarization Data Cleaning Data Integration Data Transformation Data Reduction Data Discretization and Concept Hierarchy Generation. Mining Frequent Patterns and Associations: Basic Concepts. Efficient and Scalable Frequent Itemset Mining Methods Mining Various Kinds of Association Rules Classification and Prediction: Issues regarding classification and prediction Bayesian classification classification by decision tree induction Rule based classification Prediction Accuracy and Error Measures. Cluster Analysis: cluster analysis and types of data in cluster analysis A Categorization of Major Clustering Methods Partitioning Methods- Hierarchical Methods Density based Methods and Grid based methods Model based clustering methods Clustering high dimensional data and Outlier analysis Mining Stream- Time-Series- and Sequence Data: Mining Data Streams Mining Time-Series Data Mining Sequence Patterns in Biological Data Mining Object- Spatial- MultimediaText and Web data: Multidimensional analysis on complex object data types Descriptive mining of complex data objects Spatial Data Mining Multimedia Data Mining Text Mining Mining the World Wide Web.

T1: 2.1-2.2 T1: 2.3 T1: 2.4 T1: 2.4 T2: 2.5 T1: 2.6 T1: 5.1 T1: 5.2 T1: 5.3 T1:6.2&6. 4 T1:6.3 T1: 6.5 T1: 6.11 T1: 6.12 T1: 7.1-7.2 T1: 7.3 T1: 7.4 7.5 T1: 7.6-7.7 T1: 7.8 T1: 7.9&7.11 T1: 8.1 T1: 8.2 T1: 8.4 T1: 10.1

37 38-39 40 41-42 43 44 45-46 47-48 49 50-51 52-53 54-56 57-59 59-61

62 63-65 66 67 68

T1: 10.1 T1: 10.2 T1: 10.3 T1: 10.4 T1: 10.5

5|Page

XI.

Mapping course objectives leading to the achievement of the course outcomes: Course Objectives I II III IV V VI VII N = None S = Supportive H H = Highly Related H H S H Programme Outcomes a H H b c d e f g h i j k l m

XII.

Mapping course objectives leading to the achievement of the course outcomes: Course Outcomes 1 2 3 4 5 6 7 8 9 10 11 N = None S = Supportive H = Highly Related S H H S H H H H H Programme Outcomes a b S H c d e f g h i j k l m

Prepared By Date

: :

Prof L V Narasimha Prasad 27 January, 2012

6|Page

You might also like