Professional Documents
Culture Documents
Declaration:
I declare that this assignment is my individual work. I have not
copied from any other student’s work or from any other source
except where due acknowledgment is made explicitly in the text,
nor has any part been written for me by another person.
Student’s Signature:
Ramandeep kaur
Part A:
Q1. What is difference between ROLAP and MOLAP?
Answer: -
ROLAP Architecture
Q2. In data warehouse technology, a multiple dimensional view can be implemented by
a relational database technique (ROLAP), or by a multidimensional database technique
(MOLAP). Briefly describe each implementation technique.
Answer: - The description of both the techniques used in data warehouse technology are
as enlisted below: -
1. ROLAP: - ROLAPs are the intermediate servers that are found between relational back
end server and client front end tools. They make use of RDBMSs or extended RDBMSs to
store and manage data warehouse. ROLAP servers have the ability to optimize the all back
end DBMSs, as well as deployment of other tools and services. ROLAPs tend to be more
scalable than MOLAP.
• Sorting.
• Hashing.
• Grouping operations.
II. Roll-up: - In ROLAP by this we mean that the relational tables are aggregated from
more to les specific.
III. Drill-down: - In this we introduce additional dimensions into the relation tables.
IV. Incremental updating: - In this we break down whole of the database into various
segments and then apply updates to the data warehouse.
2. MOLAP: - These servers allow for multidimensional views of data through array-based
multidimensional engines. They can map multidimensional views onto data cube arrays. The
advantage to this is quicker indexing to pre-computed summarized data. MOLAPs may have
a two-level storage system in order to handle sparse and dense data. Dense sub-cubes are
identified and stored as array structures, whereas sparse sub-cubes use compression to make
storage more efficient.
Answer: - By association we mean that, we are given a set of items and a large collection of
transactions which are subsets of these items. The task is to find relationship between the
presences of various items within these subsets.
The association technique can be applied to data mining using the two steps which are
as given below: -
• Find all frequent sets of items: by definition, each of these sets of items will occur at
least as frequently as a pre-determined minimum support count.
• Generate strong association rules from the frequent sets of items: by definition, these
rules must satisfy minimum support and minimum confidence.
The association technique can also be applied to data mining based upon one of the
followings: -
• Based on the type of values handled in the rule: if a rule concern associations between
the presence and absence of items, it is a Boolean association rule.
• If a rule describes associations between quantitative items or attributes, then it is a
quantitative association rule. In these rules, quantitative values for items or attributes
are partitioned into intervals.
• Based on dimensions of data involved in the rule: if the items in association rule
reference only one dimension, then it is a single dimensional association rule.
• Based on levels of abstractions involved in the rule set: some methods for association
rule mining can find rules at differing levels of abstraction.
In order for the rules to be useful there are two pieces of information that must be supplied as
well as the actual rule:
Q4: List the KDD process and briefly describe the steps of the process.
The primary goal of the KDD process is to extract knowledge from data in the context of
large databases. It does this by using data mining methods to extract what is deemed
knowledge, according to the specifications of measures and thresholds, using a database
along with any required preprocessing, sub sampling, and transformations of that database.
Answer: - The three measures that are used in market basket analysis are as:
1. Support: - The support of an association rule measures the fraction of baskets for
which the rule is true.
2. Confidence: - The confidence in an association rule is a percentage value that shows
how frequently the rule head occurs among all the groups that contain the rule body.
The higher the value, the more often this set of items is associated together.
3. Lift: - The lift value for the association is the ratio of the rule confidence to the
expected confidence of finding the rule in any basket.
Assume min_support = 40% = 2/5, min_confidence = 70%. Five transactions
are recorded in a supermarket:
# Transaction Code
F1 = {B, D, P, M}
k = 2 C2 = BD, BP, BM, DP, DM, PM -> eliminate infrequent BP, DM, PM
F3 = {BDM}
Q6: A database has five transactions. Let min sup = 60% and min con f = 80%.