Processing OLAP Queries in Hierarchically Clustered Databases

Data & Knowledge Engineering 45 (2003) 205224
www.elsevier.com/locate/datak
Processing OLAP queries in hierarchically clustered

databases q
a,* b
Dimitri Theodoratos , Aris Tsois
a
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
b
Department of Electrical and Computer Engineering, National Technical University of Athens,
Zographou 157 73, Athens, Greece
Received 11 September 2002; accepted 12 September 2002
Abstract
On-Line Analytical Processing (OLAP) is a technology that encompasses applications requiring a

multidimensional and hierarchical view of data. OLAP applications often require fast response time to
complex grouping/aggregation queries on enormous quantities of data. Commercial relational database
management systems use mainly multiple one-dimensional indexes to process OLAP queries that restrict
multiple dimensions. However, in many cases, multidimensional access methods outperform one-dimen-
sional indexing methods.
We present an architecture for multidimensional databases that are clustered with respect to multiple
hierarchical dimensions. It is based on the star schema and is called CSB star. We focus on processing
OLAP queries over this schema using multidimensional access methods. Users can still formulate their
queries over a traditional star schema, which are then rewritten by the query processor over the CSB star.
We exploit the dierent clustering features of the CSB star to eciently process a class of typical OLAP
queries. We detect cases where the construction of an evaluation plan can be simplied, and other cases
where additional processing techniques can be applied.
2002 Elsevier Science B.V. All rights reserved.
Keywords: On-Line Analytical Processing; Multidimensional database; Star schema; Hierarchical clustering; Grouping
and aggregation query
q
Research supported by the European Commission under the IST Program project EDITH: European Development
of Indexing Techniques for Databases with Multidimensional Hierarchies.
*
Corresponding author. Tel.: +30-1-7721402; fax: +30-1-7721442.
E-mail addresses: dth@dblab.ece.ntua.gr, dth@cs.njit.edu (D. Theodoratos), atsois@dblab.ece.ntua.gr (A. Tsois).
0169-023X/03/$ - see front matter 2002 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0169-023X(02)00180-5
206 D. Theodoratos, A. Tsois / Data & Knowledge Engineering 45 (2003) 205224
1. Introduction
Decision support applications increasingly rely on On-Line Analytical Processing (OLAP) to

analyze business related information. OLAP is a technology that encompasses applications re-
quiring a multidimensional view of data.
In a multidimensional view of data there is a set of measures that are the metrics of interest
(e.g., total sales). The measures contain numeric data. Each of the measures is uniquely deter-
mined by a set of dierent and often independent dimensions (e.g., time, location, product).
Dimensions have associated with them hierarchies that specify dierent aggregation levels of data
and hence dierent granularities of viewing data (e.g., day, month, year for the time dimension).
Many relational OLAP systems use the star schema [14] to represent the multidimensional data
model. A multidimensional database organized as a star consists of a fact table and a table for
each dimension. A dimension table comprises attributes for each (aggregation) level of the di-
mension and other (descriptive) attributes that characterize the dierent levels of the dimension.
The fact table has attributes for each numeric measure and foreign key attributes to the attribute
of the nest granularity level of each dimension.
OLAP applications often require fast response time to complex grouping/aggregation queries
on enormous quantities of data. Common techniques to improve query performance are mate-
rializing views, and making extensive use of clustering and indexing methods. In multidimensional
databases these techniques have to be adapted in order to account for the multiple dimensions.
Materializing views is an ecient technique when the way to compute a query using the mate-
rialized views is known. However, the general problem of answering and optimizing grouping/
aggregation queries using multiple materialized views [4,24] is complex. Another diculty of the
view materializing technique is the optimal selection of views to materialize [12,23]. View selection
becomes even more complex when the query pattern is not known in advance [16]. Lastly, this
technique incurs important additional space requirements and intricate algorithms for incre-
mentally maintaining the materialized views [10].
Commercial relational database management systems use mainly multiple one-dimensional
indexes, like compound indexes and bitmap indexes, to process OLAP queries that restrict
multiple dimensions. The search key in compound indexes is a concatenation of multiple attri-
butes Therefore, they are useful for processing only some of the queries that restrict these attri-
butes. Selecting views and compound indexes for materialization for a given query set pattern is a
dicult task [11] and depends on the specic query set pattern. Bitmap indexes (and their vari-
ants) [19] are very popular because of their compactness and support of star joins. Nevertheless, in
many cases, multidimensional access methods (e.g., R-tree) outperform bit-mapped indexing
methods [21].
Contribution. In this paper we focus on processing OLAP queries in databases that are clustered
with respect to multiple hierarchical dimensions using multidimensional access methods. The
main contributions are the following:
We present a multidimensional database architecture based on the star model (called CSB star).
The dimension tables are organized using one-dimensional hierarchical clustering and encoding
techniques, while the fact table is organized using a multidimensional access method.
D. Theodoratos, A. Tsois / Data & Knowledge Engineering 45 (2003) 205224 207
The CSB star schema is intended to be a storage option only. The users can express OLAP que-
ries over a traditional star schema. The query processor rewrites these queries over the CSB star
schema.
We exploit the clustering features of the CSB star schema to eciently process a class of typical
OLAP queries. The expensive star-join operations needed in a traditional star schema can be
essentially implemented as multidimensional range restrictions on the fact table and range re-
strictions on the dimension tables. Supplementary joins are implemented as merge join opera-
tions on sorted tables. Grouping operations are performed on partially sorted relations.
In this context we detect special cases where joins of fact table tuples with tuples from the di-
mensions are avoided, and the grouping of the tuples is performed once before all join opera-
tions.
Our approach is heuristic and is not based on a specic cost model. We discuss further improve-
ments of our execution plan where individual tuples and groups of tuples can be ltered at an
early stage of the processing.
Outline. The following section reviews related work. Section 3 introduces the basic concepts of
multidimensional hierarchical clustering adopted here. In Section 4 the architecture of the mul-
tidimensional database is presented. Section 5 describes the class of queries considered, introduces
a number of physical operators, and shows how queries in this class can be processed by ex-
ploiting the clustering scheme of the multidimensional database architecture. In Section 6, we
discuss improvements of the evaluation plan. Section 7 contains concluding remarks and direc-
tions for further work.
2. Related work
Conventional query optimizers focus on the join ordering problem [22]. They exploit knowl-
edge about the group-by clause in a query only by including the grouping columns in the list of
interesting orders during join enumeration. Initial work on group-by addresses the problem of
pipelining group-by and aggregation with join, and using group-by to atten nested SQL queries
[5,15]. The group-by operation can be pushed past one or more joins. This early grouping may
lower the query processing cost by reducing the amount of data participating in the joins. Nec-
essary and sucient conditions for deciding when this transformation is valid are provided by
Yan and Larson [27]. Chaudhuri and Shim [2] generalize the early grouping transformation by
introducing a coalescing grouping transformation. This transformation allows someone to per-
form early group-by, but require additional subsequent grouping operations that coalesce mul-
tiple groups. It also allows someone to deal with the case where not all the aggregating columns
are present in the node of the query evaluation plan where an early group-by operator is placed.
Dierent other cases of early grouping and aggregation are studied and categorized by Yan and
Larson [28], along with their reverse transformations of lazy grouping and aggregation. These
latter transformations postpone the application of a grouping operation until after a join, and
may reduce the number of input rows to the group-by, if the join is selective. The authors con-
sider both directions of transformation during query processing. Transformations as well as
optimization algorithms for queries with aggregate views, and queries containing aggregate nested
subqueries are presented by Chaudhuri and Shim [3]. Query rewriting rules for pushing aggre-
gation operators past selection conditions (and vice-versa) using a generalized projection operator
are presented by Gupta et al. [9].
The lazy grouping and aggregation transformation does not apply to the evaluation plans that
we consider here because the join operations do not reduce the number of tuples of the joining
tables. In contrast, a transformation similar to the coalescing grouping transformation is very
eciently exploited here: an early grouping is pushed past all joins. It is worth noting that this is
due to the architecture of the CSB star schema that uses hierarchical encoding techniques, and
does not apply to a traditional star join schema. Unlike the coalescing grouping transformation,
the early grouping employed in this work alters the join condition. A preliminary version of part
of this work is presented in [25]. A detailed cost model for the optimization of evaluation plans
considered here is elaborated by Tsois et al. [26]. Experimental results on the eciency of the early
grouping in these plans are shown by Karayiannidis et al. [13].
3. Multidimensional hierarchical clustering
This section presents the basic concepts of clustering multiple hierarchical dimensions and
processing range queries that we adopt for our research.
3.1. Multidimensional clustering and the UB-tree
Clustering places objects that are likely to be accessed together physicaly close to each other.
The goal is to limit the number of disk accesses required to process a query. A tuple of a relation
in a relational database can be viewed as a point in a multidimensional space where the dimen-
sions are determined by the attributes of the tuple. In this context, the processing of queries can be
supported by multidimensional access methods which cluster data with respect to multiple attri-
butes [7]. OLAP queries often impose restrictions on multiple dimensions. Multidimensional
clustering can substantially speed up queries that restrict multiple dimensions. The main problem
in the design of multidimensional access methods is that there exists no total ordering among the
points in the multidimensional space that preserves spatial proximity. One way to heuristically
deal with this problem is to discover a total order that preserves spatial proximity to some extent.
This total order is called a space-lling curve. A one-dimensional access method can be used in
combination with the space lling curve to improve the access of the points in the space. Such a
solution to the problem is provided by the UB-tree [1]. The UB-tree partitions the multidimen-
sional space into regions, each of which is mapped to one disk page. In our approach the UB-tree
is used to organize the fact table of a multidimensional database.
3.2. Range queries on UB-trees and the Tetris algorithm
A selection condition in a query may restrict one attribute to an interval of values. A query that
restricts all attributes (dimensions) to an interval is called a range query. The multidimensional
interval determined by the one-dimensional intervals is called a query box. In order to answer a
range query, an algorithm for UB-trees that fetches from the disk only those regions (pages) that
properly intersect the query box is presented by Bayer [1]. The Tetris algorithm [18] is a gener-
alization of this algorithm that eciently combines sort operations with the evaluation of mul-
tiattribute restrictions. It operates on any multidimensional access method that creates a disjoint
partitioning of the multidimensional space (e.g., UB-tree). The Tetris algorithm takes as input an
attribute of a relation and a query box determined by the restrictions of a query on the relation,
and returns the tuples of the relation satisfying the restrictions, ordered on the input attribute.
Compared to the access methods of commercial systems, the Tetris algorithm shows signicant
speedups, important temporary storage requirement reductions for the sorting process, and
multiple times faster production of the rst results of a sort operation [18]. Here we use the Tetris
algorithm to process range queries on a fact table.
3.3. Dimension hierarchies
A dimension hierarchy D of depth k is a list Lk ; . . . ; L1 of k names which are called dimension or

hierarchy levels. With every level L in D, a non-empty nite set of values domL is associated
through the function dom such that domLi \ domLj ;, i 6 j. In each dimension, we addi-
tionally assume an auxiliary level Lk1 whose domain contains the single value all:
domLk1 fallg. For every two levels Li , Li1 , i 2 1; k, a function parent from domLi onto
domLi1 is dened. For every two levels Li1 , Li , i 2 1; k, a function children from domLi1 to
the power set of domLi is dened: childrenv fv0 j v0 2 domLi and parentv0 vg. Clearly, if
v; v0 2 domLi , i > 1, and v 6 v0 then childrenv 6 ;, childrenv0 6 ;, and childrenv \
childrenv0 ;. A value in domLi , i > 1, represents a set of values in domL1 through the
function children. Level L1 (the lowest level in the hierarchy of dimension D) corresponds to the
smallest (nest) granularity of viewing data. Level Lk1 (the top level in the hierarchy of dimension
D) corresponds to the largest granularity of viewing data: all represents all the values in domL1 .
A
Sk1 dimension hierarchy denes a hierarchy tree: the nodes of the tree are the values in
i
i1 domL , while the edges are determined by the parent function. The leaf nodes of the tree are
the values in domL1 . The root node is the unique value of domLk1 . The path of a node (value)
ni 2 domLi , i 2 1; k, is dened to be the concatenation of the nodes nk ; nk1 ; . . . ; ni on the path in
a hierarchy tree from the root node to node ni .
3.4. Hierarchy clustering and encoding
OLAP queries impose restrictions on dierent levels of the dimension hierarchies. Hierarchy
clustering and encoding [17,29] in combination with multidimensional access methods can be used
to speed up these queries and to optimize the storage usage.
In order to take into account dimension hierarchies in the clustering of data, instead of the
values v of the lowest level of a dimension, their path p in the hierarchy tree of the dimension is
considered. Clustering data using the paths of the values places values that have the same parent
in the hierarchy tree physically close to each other because they have a common prex in their
paths.
Building indexes using p instead of v increases the size of the indexes and consequently de-
creases their performance. This problem can be eciently addressed: the size of the concatenated
values can be reduced using the following encoding schema which is quite similar to that of Markl
et al. [17]. Let v 2 domLi , i 2 1; k, and v0 parentv. Let also V be the cardinality of
childrenv0 , and <Q be the less than comparison operator of the query language. We dene a
one-to-one function S : childrenv0 ! 0; V 1 such that: for every u; u0 2 childrenv0 , u <Q u0
implies Su < Su0 . Sv is called the surrogate of v. Note that if <Q is not dened in the query
language for the values in domLi , S is simply dened as a one-to-one function from childrenv0
onto 0; V 1. The compound surrogate of v, Cv, is the path of v where the concatenated values
are replaced by their surrogates.
In a traditional approach a number of bits are needed for the binary representation of the leaf
nodes of the hierarchy tree. For the binary representation of the compound surrogates, a number
of extra bits may be required. However, the storage overhead for the compound surrogates is not
expected to be important in practice. If the distribution of children to parents in a dimension level
is uniform then at most one extra bit per level of storage overhead is needed. Otherwise, the
number of extra bits is a small percentage of the number of bits required to represent the leaf
nodes of the hierarchy tree. For instance, if the average number of children at each level of the
hierarchy is 2a , and the maximum surrogate x at each level of the hierarchy is 2m then a percentage
of m a=a extra bits are required.
4. Multidimensional database architecture
In this section, we present the architecture of our multidimensional database. The schema of the
multidimensional database is a star with one fact table F and dimension tables D1 ; . . . ; Dn . Other
(more normalized) schemas are also possible, as for instance a snowake schema [14], but such a
choice does not essentially aect our approach for processing OLAP queries and therefore we
omit the details for simplicity of presentation. Quite often in the following, and in particular when
a symbol occurs with both a subscript and a superscript, a subscript denotes the dimension, and a
superscript the hierarchy level. Symbols in bold denote sets of objects.
4.1. The dimension tables
The schema of a dimension table Di corresponding to a dimension hierarchy Di of depth ki

consists of:
(a) A set Hi of hierarchy attributes fHi1 ; Hi2 ; . . . ; Hiki g that correspond one-to-one to the ki levels of
the dimension hierarchy; Hi1 corresponds to the lowest level in the hierarchy (that is the lowest
granularity level) and Hiki to the highest.
(b) A set Fi of feature attributes that provide descriptive characterizations of the dierent levels of
the dimension. A feature attribute Fij characterizes the hierarchy attribute Hij . Feature attri-
butes are optional in the schema of a dimension table.
(c) A compound surrogate attribute Ci . If t is a tuple in table Di , tHij 2 domHij , j 2 1; ki ,
and parenttHij tHij1 , j 2 1; ki 1. Value tCi is the compound surrogate of
tHi1 .
By the denition of the compound surrogate of a value, it is clear that a point restriction on a
hierarchy attribute of a dimension table can be expressed as a single range restriction on the
compound attribute of this dimension table.
The compound surrogate attribute Ci is the primary key of the dimension table Di . By the
denition of a dimension hierarchy, Hi1 is also a key of table Di , and the following functional
dependencies between hierarchy attributes hold on Di : Hij ! Hij1 , j 2 1; ki 1. Further-
more, the following functional dependencies between hierarchy and features attributes hold on
Di : Hij ! Fij , i 2 1; ki , where Fij is a feature attribute characterizing the hierarchy attribute Hij .
We associate with a dimension table a primary index Pi on Ci , and a clustered (compound)
index Ii on Hiki ; . . . ; Hi1 ; Ci . Index Ii is important for computing ranges of values on Ci from point
or range restrictions on the dierent hierarchy attributes of Di .
Since the tuples of Di are clustered on Ci , range restrictions on Ci can be computed eciently.
This clustering also provides a grouping of the tuples of the dimension table with respect to any
hierarchy attribute. This property is particularly useful for evaluating grouping/aggregation
queries that are extensively used in OLAP applications.
4.2. The fact table
The schema of the fact table F consists of:
(a) The compound surrogate attributes C1 ; . . . ; Cn , one for each dimension table Di , i 2 1; n.
(b) A set of measure attributes M fM1 ; . . . ; Mk g.
The set of attributes C1 ; . . . ; Cn is the primary key of F . Each Ci in F is a foreign key and refers
to attribute Ci of the dimension table Di . Table F is organized as a UB-tree on the attributes
C1 ; . . . ; Cn . Queries restricting the compound surrogate attributes in F can be evaluated eciently
using the Tetris algorithm. The schema of our multidimensional database is called Compound
Surrogate Based star schema (CSB star for short). Fig. 1 graphically illustrates the CSB star
schema.
Fig. 1. The CSB star schema.

The clustering scheme of the CSB star schema is intended to be a storage option only, without
aecting the formulation of queries by the user. User queries are easily formulated on a simple star
schema (called user star schema). These queries are rewritten by the query processor over the
tables F and D1 ; . . . ; Dn . In the following we consider queries that are rewritten over the CSB star
schema.
5. Processing OLAP queries
We show in this section how to process OLAP queries by exploiting the clustering scheme and
the access methods of the multidimensional database architecture.
5.1. The class of queries considered
We consider OLAP queries of the form shown below. This is a general SQL query that involves
joining a fact table with selected parts of dimension tables, grouping, aggregating, ltering ag-
gregated measures, and sorting. Typical cases of OLAP operations (roll-up, drill-down, etc.) can
be expressed by this SQL query.
SELECT X, A
FROM F ; D1 ; . . . ; Dn
WHERE cJ AND cH AND cF
GROUP BY G
HAVING cA
ORDER BY O
Symbol X is a set of hierarchy and/or feature attributes (called projected grouping attributes). A
is a set of aggregated measures (aggregate functions on measure attributes). Using the categori-
zation of aggregate functions introduced by Gray et al. [8], we focus on distributive SQL ag-
gregate functions: min, max, sum, and count. The aggregate function avg can be expressed in
terms of sum and count. G is a set of hierarchy and/or feature attributes (called grouping at-
tributes); SQL requires X to be a subset of G. Condition cJ is a conjunction of join equalities on
the compound surrogates of the form F Ci Di Ci . Condition cH is a conjunction of compar-
isons involving exclusively hierarchy attributes. Condition cF is a conjunction of comparisons
involving exclusively feature attributes. Condition cA is a conjunction of comparisons involving
exclusively aggregated measures from A. O is a list of attributes from X [ A. The comparisons
involving an attribute A are of the form A h c where h is one of the comparison operators in
f<; 6; ; P; >g, and c is a constant value.
We assume that at least one hierarchy attribute from each dimension is involved in a com-
parison in the WHERE clause of the query. We also assume that if comparisons of the form Hij h c,
where h 2 f<; 6; P; >g appear in the WHERE clause of a query, then the parent function of the
dimension hierarchy Di is monotone. A parent function is monotone if for every v; v0 2 domHij ,
j 1; . . . ; ki , if v 6 v0 then parentv 6 parentv0 . A typical example is the parent function in the
Time dimension: let Date and Month be hierarchy attributes in the Time dimension such that
parentDate Month. Then, for any two dates v; v0 2 domDate such that v 6 v0 (say v 3=3=98
and v0 30=4=99), parentv 6 parentv0 (March98 6 April99). This assumption guarantees that a
range restriction on a hierarchy attribute of a dimension table can be expressed as a single range
restriction on the compound surrogate of the table.
The following proposition provides a syntactic characterization of queries whose hierarchy and
feature attribute restrictions can be expressed as a single multidimensional range restriction on the
compound surrogate attributes of all the dimensions tables. In the following, cHi (cFi ) denotes the
conjunction of comparisons in cH (cF ) involving exclusively hierarchy (feature) attributes of
dimension table Di .
Proposition 1. The restriction cHi ^ cFi on the dimension table Di can be equivalently expressed as a
single range restriction on the compound surrogate attribute of Di if and only if one of the following
conditions hold:
(a) No feature attributes are involved in the condition cHi ^ cFi (that is, only cHi appears in the con-
dition).
(b) There is an equality comparison Hij c, where c is a value, implied by cHi such that for any com-
parison Fim h c0 in cFi , where c0 is a value, j 6 m.
In this section we consider queries whose hierarchy and feature attribute restrictions can be
expressed as a multidimensional range restriction on the compound surrogate attributes of all the
dimensions tables (i.e., a single query box).
5.2. Physical operators
In order to construct an OLAP query evaluation plan, we use a number of physical operators
that are presented below. Some of them are the traditional relational operators and the others are
specic to the organization of the multidimensional database.
We view a compound surrogate attribute Ci as a composite attribute consisting of surrogate
attributes Siki ; . . . ; Si1 . If t is a tuple in the dimension table Di , tSij , j 2 1; ki , is the binary rep-
resentation of the surrogate of tHij tSij StHij .
We denote by PX the projection with duplicate retention operator on the set of attributes X, and
by PX the set-theoretic projection operator (SELECT DISTINCT) on X. rc denotes the selection
operator with selection condition c. SY denotes the sorting operator on the list of attributes Y.
Y denotes the natural join operator on the list of attributes Y. The natural join operator is
applied on tables that are sorted on the list of attributes Y and is implemented as merge join. PX;A
denotes the generalized projection operator [9] (grouping/aggregation operator), where X is a set
of grouping attributes and A is a set of aggregate functions on measures attributes.
The Tetris operator, denoted T l1 ; u1 ; . . . ; ln ; un ; Ci , i 2 1; n, can be applied to the fact table
and represents the Tetris algorithm. lj ; uj , j 2 1; n, is a range of values for the compound
surrogate attribute Cj , and Ci is the compound surrogate attribute on which the resulting table is
ordered (refer to Section 3). The schema of the resulting table is that of the fact table.
The Range operator, denoted RcHi ; B can be applied to the compound index Ii of a dimension
table Di . Symbol B is a boolean variable that takes values 1 and 0. The instance RcHi ; 0 of the
range operator on Ii returns a range of values li ; ui on Ci such that restricting Ci of Di in this

range is equivalent to applying the restriction cHi to Di . The instance RcHi ; 1 of the range operator
on Ii returns, besides li ; ui , a table T . The schema of T is Hiki ; . . . ; Hi1 ; Ci and its content is the set
of tuples t over Hiki ; . . . ; Hi1 ; Ci in Ii such that li 6 tCi 6 ui . If no lower (upper) bound is specied in
cHi for some hierarchy attribute, li (ui ) is the value obtained by letting all the bits of a compound
surrogate value be 0 (1). The use of this operator is claried in Section 5.3.
5.3. Construction of the evaluation plan
We show now how to process OLAP queries of the type presented in Section 5.1 Our approach
is heuristic and is not based on a specic cost model. For ease of presentation we use an example
query that is general enough to encompass dierent processing cases, and we show how our
technique can be applied to produce an evaluation plan.
Example 2. We consider the following query dened over the four-dimensional schema of Fig. 2:
SELECT F13 , H22 , H33 , F33 , sumM

FROM F , D1 , D2 , D3 , D4
WHERE F C1 D1 C1 AND F C2 D2 C2 AND F C3 D3 C3 AND F C4 D4 C4 AND cH1
AND cH2 AND cH3 AND cH4 AND cF3
GROUP BY F13 , H22 , H33 , F33
HAVING cA
ORDER BY H33 , H22
A query evaluation plan for this query that takes advantage of our multidimensional database
architecture is shown in Fig. 3. This plan exploits the hierarchical clustering of the dimension
tables, the multidimensional hierarchical clustering of the fact table, and multidimensional range
query algorithms combined with a sort operation. The expensive star-join operations required in a
traditional star schema are here essentially implemented by a multidimensional range restriction
on the fact table [17]. The presence of the compound surrogate attributes in the fact table allows
for an early grouping and aggregation operation. This operation is facilitated by the fact that the
selected fact table tuples are retrieved sorted on a compound surrogate attribute. The evaluation
Fig. 2. A four-dimensional schema.

Fig. 3. A query evaluation plan.

plan comprises two kinds of nodes: operation nodes representing operations, and data nodes
representing input data, and intermediate and nal results. An operation node is depicted by a
small circle and is labeled by the operation(s) it represents. Some operations may be pipelined in
which case the corresponding temporary results are not actually stored on the disk. The fact table
and the dimension tables are depicted by bigger circles, the compound indexes by rectangles, and
the ranges of computed compound surrogate attribute values by triangles.
We can distinguish dierent steps in the evaluation plan: (1) The computation of ranges of
compound surrogate values for each dimension based on the selection conditions cHi . For this
computation we use the compound indexes of the dimensions without accessing the dimension
tables. (2) The selection of the tuples from the dimension tables Di that satisfy the conditions
cHi ^ cFi . As explained later, for some dimensions this computation may not be required. (3) The
selection of the tuples of the fact table that fall within the query box determined by the computed
ranges of compound surrogate values. (4) The processing of the selected tuples of the fact table
and the joining of these tuples with the selected tuples from the dimensions. (5) The nal
grouping/aggregation, restriction and sorting of the tuples from the previous step. These steps are
shown by dashed boxes in Fig. 3.
Before discussing, in the following, the dierent steps of the evaluation plan, we provide some
denitions.
Denition 3. A (hierarchy or feature) attribute is called restricted attribute if it is involved in a

selection condition (cF or cH ) in the query.
Denition 4. An attribute is called imported attribute if it is a projected grouping hierarchy at-

tribute or a grouping feature attribute in the query. A dimension containing imported attributes is
called joining dimension.
Example 5. In the query of Example 2, F13 is the only restricted feature attribute. F13 , H22 , H33 , F33
are the only imported attributes which are also grouping attributes, while D1 , D2 , and D3 are the
joining dimensions.
An imported attribute needs to be added to selected fact table tuples. A grouping hierarchy
attribute that is not projected in the query need not be added to the fact table tuples since the
compound surrogate attributes can be used for performing the grouping operation. For each
joining dimension, a join operation is needed in order to add the imported attributes of the di-
mension to selected fact table tuples.
Denition 6. A joining dimension Di is called candidate last joining dimension if one of the fol-
lowing two conditions hold:
(a) The rst attribute in the list of sorting attributes in the query (the list O of attributes in the
ORDER BY clause of the query) is a hierarchy attribute of Di .
(b) The rst attribute in the list of sorting attributes in the query is not a hierarchy attribute (or
there is no such a list), there is a grouping feature attribute in the query, and there is a group-
ing hierarchy attribute of Di in the query.
Example 7. In the query of Example 2, D3 is the only candidate last joining dimension.
The nal grouping and sorting operations can be simplied if the last join operation involves a
candidate last joining dimension.
5.4. Computing ranges of compound surrogate values
The rst step in the construction of the query evaluation plan is the application of the Range
operator RcHi ; B to the compound index Ii of each dimension. RcHi ; B computes the lower and the
upper bound of the range of values li ; ui of the compound surrogate using the comparisons in cHi
and the compound index Ii . The range li ; ui is always provided as input to the Tetris operator. It
can also be used for a selection operation on the dimension table Di . If only hierarchy (and no
feature) attributes from a dimension are involved in a query, then their values can be obtained
directly from the compound index Ii , without accessing the dimension table Di . In this case the
parameter B of RcHi ; B is set to 1. In general, the application of the range operator follows the
rules below:
(a) If there is no imported attribute or restricted feature attribute of the dimension in the
query then B is set to 0 and the computed range is used only by the Tetris operator.
(b) If the imported attributes of the dimension are only hierarchy attributes and there is no re-
stricted feature attribute of the dimension in the query then B is set to 1. The computed range
is provided to the Tetris algorithm only, while the set of tuples retrieved from Ii are used in a
subsequent join operation.
(c) If the imported attributes of the dimension include a feature attribute or there is a restricted
feature attribute of the dimension in the query then B is set to 0. The computed range is pro-
vided both to the Tetris algorithm and to a selection operation on the corresponding dimen-
sion table.
Example 8. In the plan of Fig. 3, rule (a) is applied to dimension D4 , rule (b) is applied to di-
mension D2 , and rule (c) is applied to dimensions D1 and D3 .
5.5. Restricting the dimension tables
If the imported attributes of the dimension include a feature attribute or there is a restricted
feature attribute of the dimension in the query, the dimension table needs to be accessed. The
primary index on the compound surrogate attribute Ci of the dimension is used to retrieve the
tuples of the dimension table that fall within the range of values computed by the range operator.
Those of these tuples that satisfy the condition cFi are retained projected over an appropriate set of
attributes. If this computation returns an empty set of tuples, the whole processing of the query is
ended since the answer is an empty set. Otherwise, the computed tuples are subsequently joined
with tuples derived from the fact table. The set S of attributes of Di to be projected are determined
as follows:
(a) S includes all the imported attributes of Di ; and

(b) from the surrogate attributes Siki ; . . . ; Si1 of the compound surrogate Ci , S includes the surro-
gate attributes Siki ; . . . ; Sij , where j is the minimal level of the imported attributes of Di .
Example 9. In the plan of Fig. 3, the set S includes the surrogate attributes S34 and S33 from di-
mension D3 since H33 and F33 are the only imported attributes from this dimension.
If the minimal level j is high enough in the dimension hierarchy this duplicate elimination
projection is expected to signicantly reduce the number of selected tuples.
It is important to note that the tuples from the dimension table are retrieved sorted on the
compound surrogate attribute Ci . Since Ci is the concatenation of the surrogate attributes
Siki ; . . . ; Si1 , these tuples are also sorted with respect to the list of surrogate attributes Siki ; . . . ; Sij .
As a consequence, the elimination of duplicates required for the projection operation can be
performed without extra cost. Furthermore, the sort order of the output tuples can be exploited
in a subsequent merge join operation with tuples derived from the fact table.
This step has to be performed before the next step for all the dimensions that have feature
attributes involved in selection conditions. In this way, we can avoid accessing the fact table for
queries that have an empty answer.
5.6. Multidimensional range selection and sorting
The Tetris operator takes the ranges of values on the compound surrogate attribute computed
in the rst step and retrieves from the fact table the qualifying tuples sorted on a compound
surrogate attribute Ci . In general, the choice of the sorting attribute Ci does not aect the per-
formance of the Tetris operator. We assume that the cache memory requirements of the Tetris
algorithm are satised by the available main memory [18]. The sorting attribute Ci for the Tetris
algorithm is chosen from a dimension according to the following rules:
(a) Ci is chosen from a joining dimension that is not a candidate last joining dimension.
(b) If Ci cannot be chosen by the rule (a), it is chosen from a joining dimension.
(c) If Ci cannot be chosen by rules (a) or (b), it is chosen from a dimension that has a hierarchy
attribute involved as a grouping attribute in the query.
(d) If Ci cannot be chosen by the previous rules, it is chosen arbitrarily.
Example 10. In the plan of Fig. 3, rule (a) is applicable since D1 , D2 and D3 are the joining di-
mensions and D3 is the only candidate last joining dimension. The sorting attribute for the Tetris
algorithm is chosen from dimension D1 .
The previous rules guarantee that the sort order resulting from the Tetris operation can be
exploited in the subsequent merge join operation or the nal grouping/aggregation operation.
Furthermore, they guarantee that the sort order resulting from the last merge join operation can
be used to facilitate the nal grouping/aggregation operation (if this is possible).
5.7. Processing of the selected fact table tuples
The presence of the compound surrogate attributes C1 ; . . . ; Cn in the fact table allows for an
early grouping and aggregation of the fact table tuples resulting from the Tetris algorithm [6]. This
operation can be performed eciently due to the fact that the tuples resulting from the Tetris
operation are already sorted (and thus grouped) with respect to one compound surrogate attribute
Ci , i 2 1; n. The set of attributes on which the grouping is performed comprises the surrogate
attributes Siki ; . . . ; Sij from each dimension Di that contains a grouping attribute in the query. Level
j is the minimal level of the (hierarchy and feature) grouping attributes of Di in the query.
Example 11. In the plan of Fig. 3 only dimensions D1 , D2 and D3 contain a grouping attribute in
the query. Therefore, the grouping is performed on the surrogate attributes S13 S23 S22 S34 S33 .
Usually, in OLAP applications, a fact table contains a huge number of tuples which are
grouped to produce a small number of aggregated results. Therefore, this early grouping oper-
ation is expected to drastically reduce the number of fact table tuples at an early stage of their
processing.
If there is no grouping feature attribute in the query then the nal grouping and aggregation
operation that is presented in the next step is not needed. The reason is that the compound
surrogate attributes have already been used instead, for grouping on hierarchy attributes. In this
case, the selection operation on the aggregated measures (rcA ) can follow immediately after the
early grouping operation to further reduce the number of aggregated fact table tuples that are left
to be processed.
The previous two cases of early grouping resemble the coalescing and push-down grouping
transformations [2,3,9,27,28]. They can be applied here only because all the hierarchy attributes of
a dimension are appropriately encoded in the compound surrogate attribute of this dimension.
If there is at least one joining dimension, one of them has provided its compound surrogate
attribute as a sorting attribute to the Tetris operator. In this case, the aggregated tuples are equi-
joined on the common attributes with the selected tuples of this dimension. Since both sets of
tuples are sorted on their common attributes, a merge join algorithm can be eciently applied.
Each aggregated tuple joins with exactly one tuple from the dimension table. This operation does
not alter the number of tuples resulting from the grouping/aggregation operation. It does add to
these tuples the imported attributes of the joining dimension. Grouping hierarchy attributes of the
dimension table that are not projected grouping attributes in the query are not needed since the
fact table tuples have already been grouped using the surrogate attributes.
A similar operation is performed on the resulting tuples for each other joining dimension. This
operation has to be preceded by a sort operation with respect to the joining attributes, and by a
project operation that eliminates the attributes not needed in subsequent operations. The sort
operation has to be performed only on the tuples resulting from the fact table since the dimension
tuples are already sorted with respect to the joining attributes. The order of the join operations is
not signicant. The only rule that has to be respected is to choose a candidate last joining di-
mension for the nal join operation. In this way the sort order of the resulting tuples is exploited
in the subsequent operations.
Example 12. In the plan of Fig. 3, dimension D1 (which provides the sorting attribute to the Tetris
operator) is chosen for the rst join operation on the surrogate attribute S13 . Dimension D3 (the
unique candidate last joining dimension) is chosen for the last join operation on the surrogate
attributes S34 S33 .
5.8. Final grouping/aggregation and sorting
In the last step of the evaluation plan the tuples resulting from the last join operation are
grouped and aggregated. The aggregated tuples that satisfy the condition cA on aggregated
measures are retained projected over the attributes required in the output. As mentioned previ-
ously, these operations can be avoided when there is no grouping feature attribute in the query. If
the output tuples are required to be sorted, a nal sorting operation is also performed. These
operations exploit the sort order of the tuples resulting from the previous step.
Example 13. The tuples resulting from the last merge join operation are sorted with respect to the
surrogate attribute list S34 S33 . Recall that this implies that these tuples are also sorted with respect
to the hierarchy attribute H33 . The sort order of these tuples is used to eciently perform the
subsequent grouping/aggregation and sorting operations since H33 is both a grouping attribute in
the nal grouping operation and the rst attribute in the list of sorting attributes of the nal
sorting operation. The projection operation is redundant since all the grouping attributes are also
projected grouping attributes.
6. Discussion
The query evaluation plan can be further improved in some cases by a ltering operation before
the early grouping operation.
6.1. Early group ltering
Suppose that: (a) the measure attribute M takes non-negative values, (b) the condition cA on
aggregated values of the query is sumM P c, where c is a constant, and (c) the sorting com-
pound surrogate attribute of the Tetris operator is chosen from dimension Di which includes a
grouping hierarchy attribute Hij in the query. Since the tuples resulting from the Tetris operation
are sorted on Ci , they are also sorted (and thus grouped) with respect to Siki ; . . . ; Sij . One can
observe that an aggregated tuple in the answer of the query is obtained by aggregating tuples from
a single such group. That is, each initial group is further divided in the subsequent steps into
smaller groups based on the other grouping attributes, and the nal sub-groups are aggregated to
produce the result tuples. Therefore, if the condition sumM P l is not satised by the aggregated
value of an initial group, it is not satised by the aggregated value of any of its nal sub-groups.
As a consequence, we can exclude all the tuples of this group from further processing. A similar
technique for datacube queries [8] has been proposed by Ross and Zaman [20].
More generally, we can dene a Filtering operator denoted USiki ; . . . ; Sij ; cA . Siki ; . . . ; Sij , j P 1, is
a list of surrogate attributes from dimension Di , and cA is the condition involving aggregated
measure attributes of the query from A. This operator can be applied, under certain conditions, to
the result R of the Tetris operator T l1 ; u1 ; . . . ; ln ; un ; Ci on the fact table. The schema of the
output table is that of R. The resulting table contains the tuples t in R that agree on the attributes
Siki ; . . . ; Sij with a tuple in rcA PS ki ;...;S j ;A R. Intuitively, this operator removes from R each tuple t
that when grouped with the other tuples of R that agree with t on the attributes Siki ; . . . ; Sij , and the
i i
aggregated measure attributes for each group are computed, the resulting values for the aggre-
gated measure attributes in A do not satisfy condition cA . Note that a tuple in the resulting table is
simply a tuple from R and not a tuple resulting from grouping/aggregating some tuples from R.
Note also that the required aggregated measure values can be computed without extra I/O cost
since the Tetris operation outputs the selected fact table tuples sorted on the surrogate attributes
Siki ; . . . ; Sij . The application conditions of the Filtering operator are the following:
(1) the hierarchy attribute Hij is a grouping attribute in the query; and
(2) a conjunct in cA is in one of the following forms: (a) countM P c, (b) maxM P c, (c)
minM 6 c, and (d) sumM P c for a measure attribute M that takes non-negative values
or sumM 6 c for a measure attribute M that takes non-positive values.
If there is more than one grouping hierarchy attribute from dimension Di in the query, it is
better to select j in the Filtering operator USiki ; . . . ; Sij ; cA to be the minimal level of the grouping
hierarchical attributes Hij . Selecting the minimal level implies that the groups dened by the
surrogate attributes Siki ; . . . ; Sij are minimal (at the lowest possible granularity). Clearly a condition
cA that is violated by a group is also violated by any sub-group of it. In this way, the number of
tuples that are pruned by the Filtering operator is maximized.
In order to favor the application of the Filtering operator, we choose the sorting attribute for
the Tetris operator from a dimension that has a grouping hierarchy attribute without violating the
rules of Section 5.3.
6.2. Early tuple ltering
Suppose that the set A of aggregated measure attributes in the query includes only the ag-
gregate functions min and max and for every aggregated measure attribute minM maxM in
A, a conjunct minM 6 c maxM P c occurs in the condition cA of the query. We call the set of
these conjuncts a filtering set. If a tuple resulting from the Tetris operation does not satisfy a
condition in the ltering set it does not contribute to the answer of the query and therefore can be
excluded from further processing. Note that if we apply early tuple ltering, the early group l-
tering need not be applied; all the tuples of a group that would be pruned by the Filter operator
have already been pruned by the early tuple ltering.
7. Conclusion
Complex grouping/aggregation queries for OLAP applications process enormous quantities of

data and require fast response time. Commercial relational database management systems
use mainly multiple one-dimensional indexes to process OLAP queries that restrict multiple
dimensions. Recent research suggests that multidimensional access methods outperform one-di-
mensional indexing techniques [18,21].
We present the CSB star, an architecture for a multidimensional database that is based on the
star schema. This architecture uses one-dimensional hierarchical clustering and encoding tech-
niques to organize the dimension tables and multidimensional access methods to organize the fact
table. Users can express their queries over a traditional star schema, which are then rewritten by
the query processor over a CSB star schema. We show how the features of this schema allow the
processing of a class of typical OLAP queries: (1) expensive star-join operations are essentially
reduced to multidimensional and one-dimensional range restrictions, (2) supplementary joins are
implemented as merge join operation on sorted tables, and (3) grouping operations are performed
on partially sorted data. We detect special cases where supplementary joins are avoided, and a
grouping operation can be pushed past all join operations. Finally, we examine improvements of
the suggested evaluation plan where groups of tuples can be pruned at an early stage of the
processing.
An interesting extension of the present work considers a larger class of queries. In particular,
relaxing a number of the restrictions adopted here can result in queries that determine multi-
ple query boxes on the compound surrogate attributes. Our results apply to this case, too, by
considering the dierent query boxes separately. However, a cost based optimization technique
that considers dierent groupings of these query boxes is expected to provide further improve-
ments.
References
[1] R. Bayer, The universal B-tree for multidimensional indexing: general concepts, in: Proceedings of the
International Conference on World-Wide Computing and its Applications, Lecture Notes in Computer Science,
1274, 1997, pp. 98112.
[2] S. Chaudhuri, K. Shim, Including group-by in query optimization, in: Proceedings of the 20th International
Conference on Very Large Data Bases, 1994, pp. 354366.
[3] S. Chaudhuri, K. Shim, Optimizing queries with aggregate views, in: Proceedings of the 5th International
Conference on Extending Database Technology, 1996, pp. 167182.
[4] S. Dar, H.V. Jagadish, A.Y. Levy, D. Srivastava, Answering SQL queries with aggregation using views, in:
Proceedings of the 22nd International Conference on Very Large Data Bases, 1996, pp. 318329.
[5] U. Dayal, A unied approach to processing queries that contain nested subqueries, aggregates, and quantiers, in:
Proceedings of the 13th International Conference on Very Large Data Bases, 1987, pp. 197208.
[6] K. Elhardt, EDITH query processing, Technical report, Transaction Software, January 2001.
[7] V. Gaede, O. G unther, Multidimensional access methods, ACM Computing Surveys 30 (2) (1997) 198209.
[8] J. Gray, A. Bosworth, A. Layman, H. Pirahesh, Data cube: a relational aggregation operator generalizing group-
by, cross-tab, and sub-totals, in: Proceedings of the 12th International Conference on Data Engineering, 1996,
pp. 152159.
[9] A. Gupta, V. Harinarayan, D. Quass, aggregate-query processing in data warehousing environments, in: Proceed-
ings of the 21st International Conference on Very Large Data Bases 1995, pp. 358369.
[10] A. Gupta, I.S. Mumick, Maintenance of materialized views: problems, techniques and applications, Data
Engineering 18 (2) (1995) 318.
[11] H. Gupta, V. Harinarayan, A. Rajaraman, J.D. Ullman, Index selection for OLAP, in: Proceedings of the 13th
International Conference on Data Engineering, 1997, pp. 208219.
[12] V. Harinarayan, A. Rajaraman, J.D. Ullman, Implementing data cubes eciently, in: Proceedings of the ACM
SIGMOD International Conference on Management of Data, 1996, pp. 205216.
[13] N. Karayannidis, A. Tsois, T.K. Sellis, R. Pieringer, V. Markl, F. Ramsak, R. Fenk, K. Elhardt, R. Bayer,
Processing star queries on hierarchically-clustered fact tables, in: Proceedings of the 28th International Conference
on Very Large Data Bases, 2002.
[14] R. Kimball, The Data Warehouse Toolkit, John Wiley, New York, 1996.
[15] A. Klug, Equivalence of relational algebra and relational calculus query languages having aggregate functions,
Journal of the ACM 29 (3) (1982) 699717.
[16] Y. Kotidis, N. Roussopoulos, DynaMat: a dynamic view management system for data warehouses, in: Proceedings
of the ACM SIGMOD International Conference on Management of Data, 1999, pp. 371382.
[17] V. Markl, F. Ramsak, R. Bayer, Improving OLAP performance by multidimensional hierarchical clustering,
in: Proceedings of the International Database Engineering and Applications Symposium, 1999, pp. 165
177.
[18] V. Markl, M. Zirkel, R. Bayer, Processing operations with restrictions in RDBMS without external sorting: the
Tetris algorithm, in: Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 562
571.
[19] P. ONeil, D. Quass, Improved query performance with variant indexes, in: Proceedings of the ACM SIGMOD
International Conference on Management of Data, 1997, pp. 3849.
[20] K.A. Ross, K.A. Zaman, Optimizing selections over datacubes, in: Proceedings of the International Conference on
Scientic and Statistical Databases, 2000, pp. 139152.
[21] S. Sarawagi, Indexing OLAP data, Data Engineering 20 (1) (1997) 3643.
[22] M. Steinbrunn, G. Moerkotte, A. Kemper, Heuristic and randomized optimization for the join ordering problem,
VLDB Journal 6 (1997) 191208.
[23] D. Theodoratos, T. Sellis, Data warehouse conguration, in: Proceedings of the 23rd International Conference on
Very Large Data Bases, 1997, pp. 126135.
[24] D. Theodoratos, T. Sellis, Answering queries on cubes using other cubes, in: Proceedings of the International
Conference on Scientic and Statistical Databases, 2000, pp. 109122.
[25] D. Theodoratos, A. Tsois, Heuristic optimization of OLAP queries in multidimensionally hierarchically clustered
databases, in: Proceedings of the 4th International Workshop on Data Warehousing and OLAP, 2001.
[26] A. Tsois, N. Karayannidis, T.K. Sellis, D. Theodoratos, Cost-based optimization of aggregation star queries on
hierarchically clustered data warehouses, in: Proceedings of the International Workshop on Design and
Management of Data Warehouses, 2002, pp. 6271.
[27] W. Yan, P.-A . Larson, Performing group-by before join. in: Proceedings of the 10th International Conference on
Data Engineering, 1994, pp. 89100.
[28] W. Yan, P.-A . Larson, Eager aggregation and lazy aggregation, in: Proceedings of the 21st International
Conference on Very Large Data Bases, 1995, pp. 113.
[29] C. Zou, B. Salzberg, R. Ladin, Back to the future: dynamic hierarchical clustering, in: Proceedings of the 14th
International Conference on Data Engineering, 1998, pp. 578587.
Dimitri Theodoratos received a Diploma in Electrical Engineering from the National Technical University of
Athens in 1985, a masters degree in Computer Science from the Ecole Nationale Superieure de Telecom-
munication of Paris in 1986, and a Ph.D. degree in Computer Science from the University of Paris at Orsay in
1991. From 1993 to 1995 he was a European Union post-doc fellow at Rutherford Appleton Laboratory
(RAL) in UK, and at Institut Nationale de Recherche en Informatique et Automatique (INRIA) in Paris.
Between 1996 and 2001, he worked as a research associate at the Knowledge and Data Base Systems Lab-
oratory in the National Technical University of Athens, and as an assistant professor at the Computer Science
Department of the University of Ioannina and the Information and Communication Systems Department of
the University of Aegean in Greece. Since 2001 he is an associate professor at the Computer Science De-
partment of New Jersy Institute of Technology. His current research interestsinclude Data Bases, Data
Warehousing and On-Line Analytical Processing. Dimitri Theodoratos has published several articles in ref-
ereed journals and conferences. He has also served as a reviewer in journals and as a PC member in inter-
national conferences.
Aris Tsois received a Diploma in Electrical and Computer Engineering from the National Technical Uni-
versity of Athens (NTUA), Greece, in 1995. Currently he is a Ph.D. student at the Knowledge and Database
Systems Laboratory of NTUA under the supervision of Timos Sellis. His research interests include Databases,
Data Warehousing, On-Line Analytical Processing, Query Optimization and Articial Intelligence. Aris Tsois
has been involved in dierent projects related to Data Warehousing and OLAP and has published several
articles in international conferences and workshops.

Processing OLAP Queries in Hierarchically Clustered Databases

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Processing OLAP Queries in Hierarchically Clustered Databases

Uploaded by

Copyright:

Available Formats

Data & Knowledge Engineering 45 (2003) 205224

Processing OLAP queries in hierarchically clustered

Received 11 September 2002; accepted 12 September 2002

On-Line Analytical Processing (OLAP) is a technology that encompasses applications requiring a

Decision support applications increasingly rely on On-Line Analytical Processing (OLAP) to

3. Multidimensional hierarchical clustering

3.1. Multidimensional clustering and the UB-tree

3.2. Range queries on UB-trees and the Tetris algorithm

3.3. Dimension hierarchies

A dimension hierarchy D of depth k is a list Lk ; . . . ; L1 of k names which are called dimension or

3.4. Hierarchy clustering and encoding

4. Multidimensional database architecture

4.1. The dimension tables

The schema of a dimension table Di corresponding to a dimension hierarchy Di of depth ki

4.2. The fact table

The schema of the fact table F consists of:

Fig. 1. The CSB star schema.

5. Processing OLAP queries

5.1. The class of queries considered

5.2. Physical operators

range operator on Ii returns a range of values li ; ui on Ci such that restricting Ci of Di in this

5.3. Construction of the evaluation plan

SELECT F13 , H22 , H33 , F33 , sumM

Fig. 2. A four-dimensional schema.

Fig. 3. A query evaluation plan.

Denition 3. A (hierarchy or feature) attribute is called restricted attribute if it is involved in a

Denition 4. An attribute is called imported attribute if it is a projected grouping hierarchy at-

5.4. Computing ranges of compound surrogate values

5.5. Restricting the dimension tables

(a) S includes all the imported attributes of Di ; and

5.6. Multidimensional range selection and sorting

5.7. Processing of the selected fact table tuples

5.8. Final grouping/aggregation and sorting

6.1. Early group ltering

6.2. Early tuple ltering

Complex grouping/aggregation queries for OLAP applications process enormous quantities of

You might also like