You are on page 1of 7

(IJCNS) International Journal of Computer and Network Security, 65

Vol. 2, No. 2, February 2010

Partial Aggregation for Multidimensional


Online Analytical Processing Structures
Naeem Akhtar Khan1 and Abdul Aziz2
1
Faculty of Information Technology,
University of Central Punjab, Lahore, Pakistan
naeemkhan@ucp.edu.pk
2
Faculty of Information Technology,
University of Central Punjab, Lahore, Pakistan
aziz@ucp.edu.pk

Abstract: Partial pre-computation for OLAP (On-Line- HOLAP, etc., where all physical implementation determines
Analytic Processing) databases has become an important the advantages and disadvantages of storage access an
research area in recent years. Partial pre-aggregation is analysis capabilities and also determines any possible
implemented to speed up the response time of queries that are extensions in future to the model. In the models quoted
posed for the array-like decision support interface, subject to the above, the two most common in practice are the
different constraint that all pre-computed aggregates must fit Multidimensional On-line Analytic Processing (MOLAP)
into storage of a pre-determined calculated size. The target
model and the Relational On-line Analytic Processing
query workload contains all base and aggregate cells that are
stored in a multidimensional structure (i.e. cube). These queries
(ROLAP) model. The main advantage of ROLAP, which
are in fact range queries pre-defined by users for the support of depends on relational database (RDB) technology, is that the
decision makers. The query workload of an OLAP scenario is database technology is well standardized (e.g., SQL2) and is
the set of queries expected by the users. Most of the published readily available too. This permits for the implementation of
research only deals with the optimization for the workload of a physical system, based on readily available technology and
views in the context of ROLAP (Relational OLAP). Many open standards. As this technology is well studied and
researchers have criticized partial-computation schemes, researched, there are mechanisms which allow for
optimized for views that lack of support to ad-hoc querying. The transactions and authorization schemes, thus allowing for
other main aspect is that a view may be too large for pre- multi-user systems with the ability to update the data as
computation that calculate very small answers. In this paper, we required.
study the problems of partial pre-computation for point queries,
The main disadvantage of this technology is that the query
which are best for MOLAP (Multidimensional OLAP)
language as it exists (SQL) is not so sufficiently powerful or
environment. We introduce multidimensional approach for
efficiency of the cover-based query processing and the flexible enough to support true OLAP features [1].
effectiveness of PC Cubes. Furthermore, there is an impedance difficulty, that the
results returned, tables, always required to be converted to
Keywords: OLAP, MOLAP, Data-Cubes, Data warehouse. another form before further programming abilities can be
performed. The main advantage of MOLAP, which depends
1. Introduction on generally proprietary multi-dimensional (MDD) database
technology, is based on the disadvantages of ROLAP and is
A data warehouse (DW) is centralized repository of the major reason for its creation. MOLAP queries are quite
summarized data with the main purpose of exploring the powerful and flexible in terms of OLAP processing. The
relationship between independent, dimensions, static physical model further closely matches the
variables and dependent, dynamic, variables facts or multidimensional model, and the impedance issue is
measures. There is a trend within the data warehousing remedied within a vendor’s side. However, there are
community towards the separation of the requirements for disadvantages of the MOLAP physical model: 1st) There is
preparation and storage necessary for analyzing the no real standard for MOLAP; 2nd) there are no off-the-shelf
accumulated data and the requirements for the exploration MDD databases per se; 3rd) there are scalability problems;
of the data with the necessary tools and functionality and 4th) there are problems with authorizations and
required [1]. In terms of the storage necessities, a transactions. As the physical implementation ultimately
convergent tendency is towards a multi-dimensional determines the abilities of the system, it would be advised to
hypercube model [2]. On the other hand in terms of analysis find a technology that combines and maximizes the
and the tools required for On-Line Analytic Processing advantages of both ROLAP and MOLAP while at the same
(OLAP), there is a trend towards standardizing this as well; time minimizing the dis-advantages.
e.g., the efficient OLAP Council’s Multi-Dimensional Online Analytical Processing (OLAP) has become a basic
Application Programmers Interface (MD-API). Although component of modern decision support systems. As claimed
the trends are for separating the storage from the analysis, in [3] introduced the data-cube, a relational operator as well
the actual physical implementation of DW/OLAP systems as model used for computing summary views of data that
reconnects them. This is an evident from the parade of can, in turn, significantly improve the response time of core
acronyms used today, e.g., MOLAP, ROLAP, DOLAP, OLAP operations such as roll-up, drill down, and slice and
66 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 2, February 2010

dice, drill down approach is reflected in Figure 1. Typically approaches. The main issue, as outlined in [6] is the
developed on top of relational data warehouses these indexing problem for the fastest execution of OLAP queries.
summary views are formed by aggregating values across The main problem for ROLAP is that it does not offer an
different attribute combinations. For a d-dimensional input immediate and fast index for OLAP queries. Many well-
set R, there are 2d probable group bys. A data-cube as well known vendors have chosen the sacrifice scalability for
as a lattice which is often used for representing the inherent performance. Query performance issue is discussed for
relationships between group-bys [5], there are two standard ROLAP and proposed a novel, distributed multi-
data-cube representations: ROLAP (set of relational tables) dimensional ROLAP efficient indexing scheme [7]. They
and MOLAP (multi-dimensional OLAP). The main showed that the ROLAP advantage for high scalability can
differences between ROLAP and MOLAP architectures are be maintained, while at the same time providing a rapid
shown in Table 1 [4]. index for OLAP queries. They proposed a distributed
indexing efficient scheme which is a combination of packed
R-trees with distributed disk striping and Hilbert curve
based data ordering. Their method requires very meager
communication volume between processors and works in
very low bandwidth connectivity multi-processor
environments such as Beowulf type processor clusters or
workstation farms. There is no requirement of a shared disk
and scales well with respect to the number of processors
used, and for further improving the scalability of ROLAP
with respect to the size and dimension of the data set (which
is already better than MOLAP’s scalability), they extend
their indexing scheme to the partial cube case.
The large number of group-bys, 2d, is a major problem in
practice for any data-cube scheme. They considered the case
Figure 1. Hierarchies in data
where they do not wish to build (materialize) all group-bys,
The array-based structure, MOLAP (Multi-dimensional but only a subset. For example, a user definitely wants to
OLAP), has the advantage that native arrays provide an only materialize those group-bys that are frequently used,
immediate form of indexing for queries of cube. Research thereby saving disk space and time for the cube
has shown, however, that MOLAP has scalability problems construction. The problem was to find a best way to answer
[6]. For example, high-dimension data-cubes represent effectively those less frequent OLAP queries which required
tremendously sparse spaces that are not easily adapted to the group-bys that had not yet been materialized. Solving this
MOLAP model. Hybrid indexing schemes are normally problem they presented an indexing scheme, based on
used, significantly diminishing the power of the model. “surrogate group-bys”, which answers such queries
effectively. Their experiments showed that their distributed
Table 1: ROLAP Vs MOLAP query engine is almost as efficient on “virtual” group-bys as
it is on ones that actually exist. In summary, they claimed
Feature ROLAP MOLAP that their method provides a framework for distributed high
performance indexing for ROLAP cubes with the following
Usage Variable Good performance properties [7].
performance
In practical, it shows lower communication volume, fully
Relational engine Multidimensional engine adapted to external memory. There is no requirement of
Storage Tables/tuples Proprietary arrays shared disk, maintainable, incrementally; it is efficient for
and Access
SQL access Lack of a standard spatial searches in various dimensions, scalable with respect
language language
to data sizes, dimensions, and number of processors. They
Third party tools Sparse data compression
Database
implemented their distributed multi-dimensional ROLAP
Easy updating Difficult updating indexing scheme in STL and MPI, C++ and tested it on a 17
Size
Large space for 2% index space
indexes node Beowulf cluster (a frontend and 16 compute nodes).
Gigabyte-Terabyte Gigabyte While easily extendable for sharing everything multi-
processors, their algorithms performed well on these low-
Moreover, since MOLAP requires to be integrated with cost commodity-based systems. Their experiments showed
standard relational databases, middleware of some form that for RCUBE index construction and updating, close to
must be employed for handling the conversion between optimal speed has been achieved. A RCUBE index having
relational and array-based data representations. The fully materialized data cube of ≈640 million rows (17 Giga-
efficiency of relational model, ROLAP (Relational OLAP), bytes) on a 16 processor cluster can be generated in just
does not suffer by such restrictions. In standard relational within 1 minute. Their method for distributed query
tables its summary records are stored directly without any resolution also exhibited good speedup achieving, for
need for data conversion. ROLAP table based data example, a speedup of 13.28 on just 16 processors. For
representation does not pose scalability problems. Yet, many distributed query resolution in partial data-cubes, their
current commercial well-known systems use the MOLAP experiments showed that searches against absent (i.e. non-
(IJCNS) International Journal of Computer and Network Security, 67
Vol. 2, No. 2, February 2010

materialized) group-bys can typically be easily resolved at various dimensions that characterize the facts. For example,
only a small additional cost. Their results demonstrated that in a area of retail, typical indicators are price and amount of
it is possible to build a ROLAP data-cube that is scalable a purchase, dimensions being location, product, customer
and tightly integrated with the standard relational database and time. A dimension is mostly organized in hierarchy, for
approach and, at the same time, provide an efficient index to example the location dimension can be aggregated in city,
OLAP queries. division, province, country.
The "star schema" molds the data as a simple cube, where
2. DW and OLAP Technologies hierarchical relationship in a dimension is not explicit but is
rather encapsulated in attributes; the model of star schema
The prominent definition of Data Warehouse is a "subject-
with dimensions is reflected in Figure 2.
oriented, integrated, nonvolatile and time-variant collection
of data in support of management's decisions" [8]. A data
warehouse is a well organized single site repository of
information collected from different sources. In data
warehouse, information is organized around major subjects
and is modeled so as fast access to summarize data. "OLAP"
as discussed in [9],[10] refers to analysis functionalities
generally used for exploring the data. Data warehouse has
become a most important topic in the commercial world as
well as in the researcher’s community. Data warehouse
technology has been mostly used in business world, in
finance or retail areas for example. The main concern is to
take benefits from the massive amount of data that relies in Figure 2. Star Schema
operational databases. According to [11], the data-modeling The dimension tables are normalized by “snowflake
paradigm for a data warehouse must fulfill the requirements schema”, and make it possible for explicitly representing the
that are absolutely different from the data models in OLTP hierarchies by separately identifying dimension in its
environments, the main comparison between OLTP and different granularities. At last, when multiple fact tables are
OLAP environment is reflected in Table 2. required, the "fact constellation" or "galaxy schema" model
allows the design of collection of stars. OLAP architectures
Table 2: OLTP Vs OLAP
adopt a multi-tier architecture as reflected in Figure 3 where
Feature OLTP OLAP
first tier is a warehouse server, implemented by a relational
DBMS. Data of interest must be extracted from OLTP
Amount of data
retrieved Small Large systems (operational legacy databases), extracted, cleaned
per transaction and transformed by ETL (Extraction, Transformation,
Level of data Detailed Aggregated
Loading) tools before going to load in the warehouse.

Views Pre-defined User-defined


Age of data Current (60- Historical 5-10 years and
90 days) also current
Typical write Update, Bulk insert, almost no
operation insert, delete deletion
Tables Flat tables Multi-Dimensional tables
Figure 3. How does an OLAP piece fit together
Number of Large Low-Med
users This process aims to consolidate heterogeneous schema
Data High (24 hrs, Low-Med (structure heterogeneity, semantic heterogeneity) and for
availability 7 days) reducing data in order to make it conform to the data
Database size Med (GB- High (TB – PB) warehouse model (by implementing aggregation, dis-
TB) cretization functions). Then the data warehouse holds high
Query Requires Already “optimized” quality, efficient historical and homogeneous data. The
Optimizing experience second tier is a data mart, a data mart handles data received
from the data warehouse, which is reduced for a selected
The data model of the data warehouse must be simple for subject or category.
the decision maker to understand and for write queries, and The main focus of data marts is to isolate data of interest for
must get maximum efficiency from queries. Data warehouse a smaller scope or department, thus permitting the focusing
models are called hyper-cubes or multidimensional models on optimization needs for this data and increase more
and have been prescribed by [12]. The Models are designed security control. However this intermediate tier of data mart
for representing measurable indicators or facts and the is optional and not mandatory. The OLAP server is
68 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 2, February 2010

implemented at 3rd level. It optimizes and calculates the bandwidth, could affect the system’s reliability. However,
hypercube, i.e. the set of fact values for all the relevant they only focused on the rough trend about their impact but
tuples of instances for different dimensions (also called did not illustrate the correct optimal values of these
embers). In order for optimizing accesses against the data, parameters. The co-impact of these parameters was not also
query results are advance calculated in the form of discussed. Furthermore, for getting the accurate reliability
aggregates. OLAP operators allow materializing different value, some models are so complicated that it is difficult to
views of the hypercube, allowing interactive queries from figure out the best value of each parameter.
decision makers and analysis of the data. Common OLAP In [22] worked for designing a reliable large-scale data
warehouse or storage system and presented a new object-
operations include drilldown, roll-up, slice and dice, rotate.
based-repairing Markov model, which induces many key
The fourth tier is an OLAP client, which provides a user
challenges. One problem was to figure out some basic
interface with various reporting tools, analysis tools and system parameters, such as the number of nodes, the total
data mining tools for obtaining results. Software solutions number of stored objects and the bandwidth of switch and
are existed for a traditional use. the node. For designing a reliable system with the optimal
system parameters, compared with previous work, their
3. Distributed Index Construction for ROLAP approach makes a significant contribution in two aspects.
Firstly, they presented a new object-based Markov model for
Different methods have been proposed and recommended
quantifying the impact of key system parameters on the
for building ROLAP data-cubes [13], [14], [15] but there are
system reliability in three replica placement strategies. They
only very few results available for the indexing of such
compared their model with previous complex models; this
cubes. For sequential query processing, [16] proposed an
object-based compact model not only turns to be easier for
indexing model, which is composed of a collection of b-
solving because of its smaller state transition matrix, but
trees. While adequate for low-dimensional data-cubes, b-
also leads to more integrative and practical efficient results.
trees are inappropriate for higher dimensions in that (a)
Secondly, they proposed a two-step analyzing process.
multiple, redundant attribute orderings are required to
The first is to find out the comparatively precise optimal
support arbitrary user queries (b) their performance
value of a system parameter by independently analyzing its
deteriorates rapidly with increased dimensionality.
impact on the system reliability when other system
In [17] proposed the cube-tree, an indexing model which is
parameters are fixed. The second is to figure out the best
based on the concept of a packed R-tree [18]. In the
possible combination of these parameters by analyzing their
dimension of parallel query processing, a typical approach
integrated and complex impacts on the system reliability
used by current commercial systems like ORACLE 9i RAC
while all of them are tuned. Their analysis results showed
for improving throughput by distributing a stream of
that the optimal values do exist and have simple formulas.
incoming queries over multiple processors and having each
They presented a new efficient object-based repair model
processor answer a subset of queries. But this type of an
and by analyzing this model, they worked out the individual
approach provides no speedup for each individual query.
optimal value of parameters and their optimal combination.
For OLAP queries, which are time consuming, the
The results obtained by them can provide the engineers with
parallelization of each query is important for the scalability
direct instructions to design reliable systems.
of the entire OLAP system. With respect to the
parallelization for general purpose environments of R-tree
queries, a number of researchers have presented solutions. 4. Related Work
As claimed in [19] Koudas, Faloutsos and Kamel presented Here we introduce the lattice view that depicts the
a Master R-tree structure that employs a centralized index relationships between existing views, some algorithms of the
and a collection of distributed data files. Schnitzer and pre-computation of views, answering query using the
Leutenegger’s Master-Client R-tree [20] improves upon the materialized views and view selection.
earlier model by partitioning the central index into a smaller
master index as well as a set of associated client indexes. 4.1 The View Lattice Framework
While offering considerable performance advantages in The CUBE BY in [23] has resulted in the computation of
generic indexing environments, neither approach is well- views which related to SQL queries which has been grouped
suited for OLAP environment systems. In addition to the on all possible combinations of the dimension attributes.
sequential problems on the main server node, both utilize Those views are usually denoted by the grouping attributes,
partitioning methods that can lead to the localization of e.g. T, L, P, LT, PT, PL, PLT, and All for the example of
searches. Furthermore, neither approach provides the database with three attributes Time(T), Location(L) and
methods for incremental updates. Product(P). As claimed by [24] that a lattice is used to depict
Data reliability is major issue in data warehouses and many the relationship between views. An aggregate view is
solutions have been proposed for its solution. Replication is represented by each node in the lattice. In the lattice view
a widely used mechanism for protecting permanent data loss edge existed form node i to node j, view j can be computed
while replica placement will significantly impact data from view i to view j which contains one attribute less than
reliability. Several replica placement policies for different view i. In this case, view i is called, parent view of j. in this
objectives, have been proposed and deployed in real-world situation there is a basic view on which every view is
systems, like RANDOM in GFS, PTN in RAID, Q-rot in dependent. The complete aggregation view “ALL” can be
[16], [13], [21] have analyzed how the system parameters, computed from any other view of lattice.
such as object size, system capacity, disk and switch
(IJCNS) International Journal of Computer and Network Security, 69
Vol. 2, No. 2, February 2010

4.3 View Selection


The main issue in full computation of views is storage
space; so many researchers study this problem and
recommended partial-computation. In [31] efficient
approach was proposed for choosing a set of views for
materialization under a situation of limited storage space.
They introduced a linear cost model; this algorithm assumes
that the cost of answering a query is related to the size of
view from which the respond of the query can easily be
computed. This linear cost model was verified by the
experiments. In a greedy view selection algorithm which
tries to decide which aggregate view is best for minimizing
Figure 4. Lattice View the query cost. This greedy view selection algorithm first
chooses the base view. Materializing one more view can
4.2 Pre-computation of aggregates allow some queries which can be easily answered by a
In Figure 4 lattice views are shown, for example database smaller metalized view, in this way query cost can be
which have three dimensions, Time, Location and Product reduced. Their proposed algorithm chooses the view which
which are reflected as T, L and P respectively. A view is produces the more reduction in query cost. Research proved
labeled by the dimension’s name, where it is aggregated on that the benefit of the aggregate views selected by this
view PLT is the basic view while PL is parent view. algorithm in no worse than (0.63-f) times the benefit of an
A lot of aggregation is involved on OLAP queries in data optimized selection. The same problem was discussed in
warehouse. Performance can greatly be improved by the pre- [32] and another view selection algorithm PBS (Pick by
computation of aggregates. Many researchers have Size) was proposed. The main difference was that PBS
developed pre-computation algorithms for efficient selects the views solely based on the size of the views. In
computing all possible views, which are so called view each turn, the view with the smaller size from unselected
materialization. In [25] and [26] different efficient view views is chosen until the total size of the selected views
materialization algorithms have been proposed, e.g., reaches the allocated space limit.
Overlap, Pipesoft and PipeHash, which incorporate many 4.4 Query Processing
efficient optimization techniques, such as using the data
sorting in a particular order for computing all views that are In [33], the traditional query optimization algorithms were
prefixes in that order, computing a view from its smallest generalized to optimize the query in the presence of
previously computed parent, and computing the views with materialized views. In [34] proposed some techniques for
most common prefixes in a pipelined technique. rewriting a given SQL query, such that it uses one or more
In [27] there are some efforts have been made for studying, than one materialized view. They also proposed a semantic
how the skewed data may affect the pre-computation of approach for determining whether the information existing
aggregates and an approach for dynamically manage the in a view is sufficient for answering a query. Another query
memory usage is recommended. A comparison has been re-writing method was suggested in [35], this technique can
made in [28], difference between the view materialization in utilize the views having different granularities, aggregation
MOLAP and ROLAP further an array-based pre- granularities and selection regions. Generally, for an OLAP
computation efficient algorithm for MOLAP is proposed. In query, their can be many equivalent re-writings using
this algorithm the partitions of views is stored in main different materialized cubes/views in different ways. Their
memory array and further overlaps the computation of execution cost is different from one another. An efficient
different views while using minimum storage memory for algorithm is also proposed in [35] for determining the set of
each view. materialized views used in query re-writing.
In [29] author examined the pre-computation on compressed
MOLAP database and some algorithms for the computation 5. Recommendations for an Efficient Partial
of view without de-compression is proposed. In [30] pre-Aggregation
suggested another pre-computation algorithm for MOLAP.
A completely full pre-computation, where all possible
One distinct feature in these algorithms was that the
aggregates have been pre-computed, can provide the best
aggregation cells are managed in the similar way as the
and most efficient query performance, but in our point of
source data. Primary cells and aggregation cells are stored
view this approach in not recommended for the following
together single data structure, further proposed one
reasons:
multidimensional array, which allows them for quickly
• A full pre-computation requires a great storage space
accessing. In this algorithm pre-computation considers the
for aggregates, so it often exceeds the available
points in multidimensional space. The algorithm examines
space.
the coordinates of the cells and relies on the relationships
• This technique is not based on cost effective use of
among cells for determining how to perform efficient
resources-beneficial, in [32], it is mentioned that the
aggregation. A graph-theoretical model is employed for
gains from pre-computation outweigh the cost of
ensuring the correctness of the summation computation.
further disk space after some level of pre-
computation.
70 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 2, February 2010

• Maintenance cost is increased. selection by PBS is much effective because PC Cube


• It takes long load time. generated by the PBS leads to shorter time required for
• A full pre-computation is not suitable as very sparse answering the point queries. It has most excellent overall
cube (wastage of space) for high cardinality. performance among variations of member selection
However in the situation of fully pre-computation algorithms.
environment, to overcome the storage limitations it is
advisable that cube may be partitioned. One logical cube of References
data should be spread across multiple physical cubes on
distinct servers. Divide and Concur approach helps alleviate [1] Thomsen, Erik, OLAP Solutions: Building
the scalability limitations of full pre-computation approach. Multidimensional Information Systems, John Wiley
The other approach, which is partial pre-computation of and Sons, 1997.
aggregates, can resolve problems of fully aggregation. The [2] Agrawal, R., Gupta, A., Sarawagi, S., “Modeling
main objective of a partial pre-computation technique is to Multidimensional Databases”, Proceedings of the 13th
select a certain amount of aggregates for computing before International Conference on Data Engineering, pp.
querying time, in this way query answering time can be 232-243, 1997.
optimized. But there are two major issues about the [3] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh,
optimization process: “Data cube: A relational aggregation operator
• How many partial pre-computed aggregates should generalizing group-by, cross-tab, and sub-totals”,
be computed? Proceeding of the 12th International Conference On
• What kind of queries is optimized for pre- Data Engineering, pages 152–159, 1996.
computation strategy? [4] Aejandro A. Vaisman, “Data Warehousing, OLAP, and
The first question depends upon the storage available, we Materialized Views”, A Survey Technical Report
recommends that 40% of all possible aggregates should be TR015-98, University of Buenos Aires, Computer
computed in advance. Few years back it was recommended Science Department, 1998.
by Microsoft, as the vendor of one of the popular OLAP [5] V. Harinarayan, A. Rajaraman, and J. Ullman,
systems that 20% of all possible aggregates should be “Implementing data cubes”, Proceedings of the 1996
computed in advance, but now the technology has improved ACM SIGMOD Conference, pages 205–216, 1996.
and storage capacity can be achieved by very low cost. The [6] S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J.
best answer of second question is that these queries should Naughton, R. Ramakrishnan, and S.Sarawagi, “On
be those that most expected by the decision maker of OLAP the computation of multidimensional aggregates”,
application. Proceedings of the 22nd International VLDB
This type of answer in not so easy because it varies from Conference, pages 506–521, 1996.
user to user and application to application, for this purpose [7] F. Dehne, T. Eavis and A. Rau-Chaplin, “Parallel
and understanding the question systematically, one needs to Multi-Dimensional ROLAP Indexing”, Proceedings
precisely categorize the chunk of queries as an object for of the 3rd IEEE/ACM International Symposium on
optimization, this type set of queries or expected queries is Cluster Computing and the Grid, (CCGRID’03)
often called ”query workload”. Pattern of expected queries 2003.
can be obtained from other same type of case studies and [8] W.H. Inmon Building the Data Warehouse 3rd Edition,
navigational data analysis task. There are many algorithms Eds.Wiley and Sons, 1996.
for partial pre-computation which have already been [9] S. Chaudhuri, U. Dayal “An Overview of Data
discussed. For optimized and efficient processing of OLAP Warehousing and Olap Technolog”, SIGMOD Record
queries, most commonly used approach is to store the results 26(1), 1997.
of frequently issued queries by decision makers in to [10] P. Vassiliadis P., T. Sellis, “A Survey of Logical
summary tables, and further makes use of them for Models for OLAP Databases”, SIGMOD Record
evaluating other queries, this approach is best. In our point Volume 28, Number 1, March, 1999.
of view PBS (Pick by size) algorithm is so fast and best as it [11] R. Kimball, The Data Warehouse Toolkit, J.Wiley
will facilitate database administrators for determining the and Sons, Inc, 1996.
points, where diminishing returns outweigh the cost of the [12] L. Cabibbo and R. Torlone, “A Logical Approach to
additional storage space. This algorithm also shows how Multidimensional Databases”, Proceedings of the 6th
much space should be allocated for pre-computation. International Conference on Extending Database
Technology (EDBT'98), Valencia, Spain, 1998.
[13] V. Harinarayan, A. Rajaraman, and J. Ullman,
6. Conclusion
“Implementing data cubes” Proceedings of the 1996
The partial pre-computation is most popular research area in ACM SIGMOD Conference, pages 205–216, 1996.
OLAP environment. Most of the published papers for partial [14] K. Ross and D. Srivastava, “Fast computation of
pre-computation are about optimized performance of views sparse data cubes”, Proceedings of the 23rd VLDB
as the query workload. Practically users do not care about Conference, pages 116–125, 1997.
the processing overhead and the time used in determining [15] S. Sarawagi, R. Agrawal, and A.Gupta, “On
the output of the given query, when planning to implement computing the data cube”, Technical Report RJ10026,
partial pre-computation strategy. For implementation of IBM Almaden Research Center, San Jose, California,
point queries, the processing overhead is most important 1996.
fact of consideration. PBS approach is efficient, and
(IJCNS) International Journal of Computer and Network Security, 71
Vol. 2, No. 2, February 2010

[16] H. Gupta, V. Harinarayan, A. Rajaraman, and J. Database Systems for Advanced Applications
Ullman, “Index selection for olap”, Proceeding of the (FASFAA,01), Hong Kong, China, April,2001.
13th International Conference on Data Engineering, [31] V.Harinarayan, A.Rajaraman, and J.D.Ullman,
pages 208–219,1997. “Impleminting Data Cubes Efficienty”, In
[17] N. Roussopoulos, Y. Kotidis, and M. Roussopolis, Proceedings of SIGMOD, pages 205-206, 1996.
“Cubetree: Organization of the bulk incremental up- [32] A.Shukla, P.Deshpande, and J.Naughton,
dates on the data cube”, Proceedings of the 1997 “Materialized View Selection for Multidimensional
ACM SIG-MOD Conference, pages 89–99, 1997. Datasets”, In Proceedings on 24th VLDB Conference,
[18] N. Roussopolis and D. Leifker, “Direct spatial search New York, 1998.
on pictorial databases using packed r-trees”, [33] S.Chaudhuri,R.Krishnamurthy, S.Potamianos, and
Proceedings of the 1985 ACM SIGMOD Conference, K.Shim., “Optimizing Quereis with Materialized
pages 17–31, 1985. Views”, In Proceedings of the 11th IEEE International
[19] N. Koudas, C. Faloutsos, and I. Kamel, “De- Conference on Data Engineering, pages 190-200,
clustering spatial databases on multi- computer 1995.
architecture”, In Proceedings of Extended Database [34] D.Srivastava, S.Dar, H.V.Jagadish, A.Y.Levy.,
Technologies, pages 592–614, 1996. “Answering Queries with Aggregation using views”,
[20] B. Schnitzer and S. Leutenegger, “Master-client r- In Proceedings of 22nd VLDB Conference, Bombay,
trees: a new parallel architecture”, 11th India, pages 318-329, 1996.
International Conf-erence of Scientific and [35] C.S.Park, M.H.Kim and Y.J.Lee., “Finding an
Statistical Database Management, pages 68–77, 1999. Efficient Rewriting of OLAP Queries Using
[21] I. Kamel and C. Faloutsos, “On packing r-trees”, Materialized Views in Data Warehouses”, Decision
Proceedings of the Second International Conference Support Systems, vol.32, No.4, pages 379-399,2002.
on Information and Knowledge Management, pages
490–499, 1993.
[22] K.Du, Z.Hu, H.Wang,Y.Chen, S.Yang and Z.Yuan,
“Reliability Design for Large Scale Data Authors Profile
Warehouses”, Journal of Computing, Vol.3, No.10,
pp 78-85 October 2008. Naeem Akhtar Khan received the B.S. degree in Computer
Science from Allama Iqbal Open University, Islamabad, Pakistan
[23] J.Gray, A.Bosworth, A.Layman, H.Pirahesh., “Data
in 2005 and M.S. degree in Computer Science from University of
Cube: A Relational Aggregation Operator
Agriculture, Faisalabad, Pakistan in 2008. He is currently pursuing
Generalizing Group-By,Cross-Tabs, and Sub-Totals”, Ph.D. (Computer Science) degree in University of Central, Punjab,
In Proceedings of International Conference on Data Lahore, Pakistan. His research interests include large-scale data
Engineering (ICE’96), New Orleans, February,1996. management, data reliability, Data Mining, MOLAP.
[24] V. Harinarayan, A.Rajaraman, and J.D.Ullman,
“Implementing Data Cubes Efficiently”, In Dr. Abdul Aziz did his M.Sc. from University of the Punjab,
Proceedings of SIGMOD, pages 205-216,1996. Pakistan in 1989; M.Phil and Ph.D in Computer Science from
University of East Anglia, UK. He secured many honors and
[25] Sameet Agarwal, Rakesh Agarwal, Prasad M.
awards during his academic career from various institutions. He is
Deshpandre, Asnish Gupta, Jeffrey F.Naughton, currently working as full Professor at the University of Central
Ragnu Ramakrishnan, Sunita Sarawagi, “On the Punjab, Lahore, Pakistan. He is the founder and Chair of Data
Computation of Multidimensional Aggregates”, In Mining Research Group at UCP. Dr. Aziz has delivered lectures in
Proceedings of the 22nd VLDB Conference, Bombay, many universities as guest speaker. He has published large number
India, pages 506-521,1996. of research papers in different refereed international journals and
[26] P.M.Deshpande, S.Agarwal, J.F.Naughton, and conferences. His research interests include Knowledge Discovery
R.Ramakrishnan, “Computation of Multidimensional in Databases (KDD) - Data Mining, Pattern Recognition, Data
Aggregates”, Technical Report 1314, University of Warehousing and Machine Learning.
Wisconsin-Madison, 1996. He is member of editorial board for various well known journals
[27] Yu,J.X., Hongjun Lu., “Hash in Place with Memory and international conferences including IEEE publications. (e-mail:
Shifting: Datacube Computation Revisited”, In aziz@ucp.edu.pk).
Proceedings of 15th International Conference on Data
Engineering. Page: 254 March, 1999.
[28] Y.Zhao, P.M. Deshpande, and J.F.Naughton., “An
Array-based Algorithm for Simulataneous
Multidimensional Aggregates”, In Proceedings of
ACM SIGMOD, pages 159-170, 1997.
[29] Li,J.,Rotem, D., Srivastava, J., “Aggregation
Algorithm for very large compressed Data
Warehouses”, In Proceedings of 25th very large
Database (VLDB) Conference. Edinburgh, Scotland,
1999.
[30] Woshun Luk., “ADODA: A Desktop Online Data
Analyzer”, In 7th International Conference on

You might also like