You are on page 1of 4

Data Warehousing & Mining Assignment I

1. What is a Data Warehouse. Describe various operations on a Data warehouse with suitable examples. 2. Give an example of generalization-based mining of plan databases by divide- and-conquer. (b) What is sequential pattern mining? Explain. (c) Explain the construction of a multilayered web information base. 3. Write a short note on following: (a) Missing values (b) Noisy data (c) Inconsistent data (d) Data cube aggregation. 4. (a) Given the following measurement for the variable age: 18, 22, 25, 42, 28, 43, 33, 35, 56, 28 Standardize the variable by the following: i. Compute the mean absolute deviation of age. ii. Compute the Z-score for the rst four measurements. 5. Write the syntax for the following data mining primitives: (a) The kind of knowledge to be mined. (b) Explain the syntax for specifying the kind of knowledge to be mined. (c) Measures of pattern interestingness. 6. (a) Explain about analysis of attribute relevance. (b) How is the analytical characterization performed? Explain with an example. 7. a. Discuss the components of data warehouse. b. List out the differences between OLTP and OLAP. c. Discuss the various schematic representations in multidimensional model. d. Explain the OLAP operations I multidimensional model. e. Explain the design and construction of a data warehouse. f. Expalin the three-tier data warehouse architecture. g. Explain indexing.

h. Write notes on metadata repository. i. Write short notes on VLDB. 8 . (a) Explain data mining as a step in the process of knowledge discovery. (b) Briey discuss about data integration. (c) Briey discuss about data transformation. 9. (a) Discuss about Concept hierarchy. . (b) What is Concept description? Explain. (c) What does the data warehouse provide for business analyst? Explain (d) How do data warehousing and OLAP related to Data mining? 10. (a) Justify the role of data cube aggregation in data reduction process with anexample. (b) Discuss the role of Numerosity reduction in data reduction process in detail. (c) Explain the syntax for the following data mining primitives: (a) Task-relevant data (b) Background knowledge (c) Interestingness measures. 11.. Explain the following terms in detail. (a) Concept description (b) Variance and Standard deviation. (c) Mean, median, and mode. (d) Quartiles, outliers, and boxplots. ] 12. (a) Discuss issues to be considered during data integration process. (b) Draw and explain the architecture of typical data mining system. (c) Describe why is it important to have a data mining query language. (d) Write short notes for the following in detail: (a) Attribute-oriented induction. (b) Efficient implementation of Attribute-oriented induction 13 (a) Explain the storage models of OLAP? (b) Suppose that the data for analysis includes the attribute age. the age valu es for the data tuples are increasing order 13 16 16 23 23 25 25 25 25 30 30 30 30 35 35 35 40 40 45 45 45 70 a) How might you determine the outliers in the data? b) What other methods are there for data smoothing? (b) List and describe the primitives for the data mining task? (c) Why perform attribute relevance analysis? Explain the various methods of its (d) Briey compare and explain by taking an example of your point(s). a) Snowake schema, fact constellation b) Data cleaning, data transformation.

14. (a) Define schema and operation-derived hierarchies? (b) Outline a data cube-based incremental algorithm for mining analytical class comparisons? (c)Draw and explain the star schema for the data warehouse? (d)What is data compression? How would you compress data using principle component analysis (PCA)? (e) List and describe the various types of concept hierarchies? (f)List the statistical measures for the characterization of data dispersion, and discuss (g) how they can be computed efficiently in large data bases? 15. a. What are the various issues in data mining? Explain each one in detail? b. Why preprocess the data and explain in brief? c. Write short notes on GUI, DMQL? How to design GUI based on DMQL? d. Explain the various preprocessing steps to improve the accuracy, efficiency, and scalability of the classification or prediction process? e. Briey discuss the data smoothing techniques. 16. (a) Suppose that the data for analysis include the attribute age. The age values for the data tuples are (in increasing order): 13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46, 52,70. i. Use smoothing by bin means to smooth the above data, using a bin dept h of 3. Illustrate your steps. Comment on the eect of the technique for the given data. ii. How might you determine outliers in the data? iii. What other methods are there for data smoothing? (b) Suppose that the data for analysis include the attribute age. The age values for the data tuples are (in increasing order): 13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70. (a) What is the mean of the data? (b) What is the median? (c) What is the mode of the data? Comment on the datas modality. (d) What is the mid range of the data? (e) Can you nd (roughly) the rst quartile(Q1),and third quartile(Q3) of the data? (f) Give the ve number summaries of the data. (g) Show a box plot of the data. (h) How is the quantile-quantile plot dierent from a quantile plot?

(c) Write short notes for the following in detail: (a) Measuring the central tendency (b) Measuring the dispersion of data. 17. (a) Justify the role of data cube aggregation in data reduction process with an example. (b) Discuss the methods for numeric concept hierarchy generation. (c) Write and explain the basic algorithm for Attribute-oriented induction. 18. The four major types of concept hierarchies are: schema hierarchies, set-grouping hierarchies, operation-derived hierarchies, and rule-based hierarchies. (a) Briey dene each type of hierarchy. (b) For each hierarchy type, provide an example. 19. (a) Write the syntax for the following data mining primitives: (a) Task-relevant data. (b) Concept hierarchies. (b) Dene nominal, ordinal, ratio-scaled variables (c) How can concept description mining be performed incrementally and in a distributed manner? (d) Briey discuss the data mining functionalities. (e) Briey discuss the major issues in data mining regarding performance and diverse database types . 20. 1) What is data mining? In your answer, address the following: (a) (b) (c) (d) Is it another type? Is it a simple transformation of technology developed from databases, statis-tics, and machine learning? Explain how the evolution of database technology led to data mining. Describe the steps involved in data mining when viewed as a process of knowl-edge discovery? 2) What is information retrieval?Compare and contrast text mining with information retrieval. 3). Write short note on the following architectures of data mining systems: (a) No coupling (b) Loose coupling (c) Semitight coupling (d) Tight coupling.

You might also like