Professional Documents
Culture Documents
Abstract
Data Warehousing is a computer system designed for archiving and analyzing an organization's
historical data, such as sales, salaries, or other information from day-to-day operations The topic
of data warehousing encompasses architectures, algorithms, and tools for bringing together
selected data from multiple databases or other information sources into a single repository, called
a data warehouse, suitable for direct querying or analysis. Data warehouse is constructed by
integrating data from multiple heterogeneous sources.
Normally, an organization summarizes and copies information from its operational systems to
the data warehouse on a regular schedule, such as every night or every weekend; after that,
management can perform complex queries and analysis on the information without slowing
down the operational systems. It supports analytical reporting, structured and/or ad hoc queries
and decision making. In recent years data warehousing has become a prominent buzzword in the
database industry, but attention from the database research community has been limited. In this
paper we motivate the concept of a data warehouse.
Keywords
Data warehousing, OLAP, Data warehouse, Data warehousing architecture, Online Analytical
Processing, Database, Methodology, Data warehouse design.
Introduction
Data warehousing is a collection of decision support technologies, aimed at enabling the
knowledge worker (executive, manager, and analyst) to make better and faster decisions.
The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data
warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data.
This data helps analysts to take informed decisions in an organization.
A data warehouses provides us generalized and consolidated data in multidimensional view.
Along with generalized and consolidated view of data, a data warehouses also provides us
Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective
analysis of data in a multidimensional space. This analysis results in data generalization and
data mining.
Data mining functions such as association, clustering, classification, prediction can be
integrated with OLAP operations to enhance the interactive mining of knowledge at multiple
level of abstraction. That's why data warehouse has now become an important platform for data
analysis and online analytical processing.
OLAP operations include rollup (increasing the level of aggregation) and drill-down
(decreasing the level of aggregation or increasing detail) along one or more dimension
hierarchies,
slice_and_dice
(selection
and
projection),
and
pivot
(re-orienting
the
usage analysis), and healthcare (for outcomes analysis). This paper presents a roadmap of data
warehousing technologies, focusing on the special requirements that data warehouses place on
database management systems.
Data Warehouse-Architecture
It includes tools for extracting data from multiple operational databases and external sources; for
cleaning, transforming and integrating this data; for loading data into the data warehouse; and for
periodically refreshing the warehouse to reflect updates at the sources and to purge data from the
warehouse, perhaps onto slower archival storage. In addition to the main warehouse, there may
be several departmental data marts. Data in the warehouse and data marts is stored and managed
by one or more warehouse servers, which present multidimensional views of data to a variety of
front end tools: query tools, report writers, analysis tools, and data mining tools. Finally, there is
a repository for storing and managing metadata, and tools for monitoring and administering the
warehousing system. The warehouse may be distributed for load balancing, scalability, and
higher availability. In such a distributed architecture, the metadata repository is usually
replicated with each fragment of the warehouse, and the entire warehouse is administered
centrally. An alternative architecture, implemented for expediency when it may be too expensive
to construct a single logically integrated enterprise warehouse, is a federation of warehouses or
data marts, each with its own repository and decentralized administration. Designing and rolling
out a data warehouse is a complex process, consisting of the following activities:
Define the architecture, do capacity planning, and select the storage servers, database and
OLAP servers, and tools.
Define the physical warehouse organization, data placement, partitioning, and access
methods.
Design and implement scripts for data extraction, cleaning, transformation, load, and refresh.
Populate the repository with the schema and view definitions, scripts, and other metadata.
Literature Review
Different researchers from different areas (database management, information system design,
data and information integration) have come out with their own conclusions. As Mull (1983)
observes:
"We must be prepared to learn more than we can understand."
Thus, there are many sources that could be quoted to illustrate the research methods used to
understand data warehousing and integration concepts. The work summarized here is based on
relevant literature review and on research performed in Norway and Mozambique.
Traditional information systems are not projected to manage and store strategic information.
They are formed by crucial data operational data needed for daily transactions. In terms of
decisions, data are empty and without any transparent value for the decision process of
organizations (Domenico, 2001). Decisions are taken based on administrators experience and
sometimes based on historical facts stored in different information systems.
A data warehouse is projected in a way that data can be stored and accessed and is not restricted
only to tables and relational lines. As the data warehouse is separated from operational databases,
users queries do not cause any impact in these systems. Data warehouse is protected from any
non-authorized alteration or loss of data. Data warehouse contemplates the base and the
resources needed for a Decision Support System (DSS), supplying historic and integrated data.
These data are for top managers, decision makers, partners, donors who need brief,
Summarized and integrated information and for low-level managers, for whom detailed data
helps to observe some tactical aspects of the organization. In this way, data warehouse provides a
specialized database that manages information from corporative databases and external data
sources.
Basic Concepts
Data Warehouse
In the bibliography many definitions can be found about data warehouse:
Inmon (1997) says, that data warehouse is a data collection oriented to a subject,
integrated, changeable in time and not volatile, to provide support to the decision
making process.
Harjinder and Rao (1996) argue, that data warehouse is a running process that
agglutinates data from heterogeneous systems, including historic data and external
data to attend the necessity of structured queries, analytical reports and decision
support.
Kimball et al. (1998) argue that, data warehouse is a source of an organization data,
formed by the union of all corresponding data marts.
Conclusion
In the area of integrating multiple, distributed, heterogeneous information sources, data
warehousing is a viable and in some cases superior alternative to traditional research solutions.
Traditional approaches request, process, and merge information from sources when queries are
posed. In the data warehousing approach, information is requested, processed, and merged
continuously, so the information is readily available for direct querying and analysis at the
warehouse. Although the concept of data warehousing already is prominent in the database
industry, we believe there are a number of important open research problems, described above,
that need to be solved to realize the flexible, powerful, and efficient data warehousing systems of
the future.
References
Inmon, W.H. (1992), "Building the Data Warehouse." John Wiley & Sons.
Wu, M-C., A.P. Buchmann. Research Issues in Data Warehousing. Submitted for
publication.
Vassiliadis P. and Sellis, T., (1999) A Survey of Logical Models for OLAP Databases.
SIGMOD Record.
Lujan Mora and Juan Trujilio (2003) A Comprehensive Method for Data Warehouse
Design.
Juan Trujillo and Sergio LujnMora (2004) Physical Modeling of Data Warehouses
using UML DOLAP04, Washington, DC, USA.
Lujan Mora and Juan Trujilio (2006).Physical Modeling of Data warehouses by using
UML Component and Deployment Diagrams, Design and implementation issues.
Journal of Database Management.
Deepti Mishra, Ali Yazici, Beri, Pinar Basaran. (2008) A Casestudy of Data Models in
Data Warehousing.
Hui Ma,Yiping Yang and Fan Zhang (2009) The Anti-standardized Design Research of
Data Warehouse.
Kamal Alaskar and Akhtar Shaikh. (2009) Object Oriented Data Modeling for Data
Warehousing.