Professional Documents
Culture Documents
System
.
This is the competitive advantage also the challenge.All of these are the
goals of Business Intelligence(BI).
Data from multiple Sources(Sources are usually OLTP Systems or Flat Files)
are extracted and integrated at a common repository called Data warehouse.
To build the Data Warehouse we need to integrate data from multiple data
sources.These sources may be a database or flat files.
Even though these sources may have similar kind of data,there may exist
considerable differences in the following ways.
Different attribute names will be used by different sources to represent
same data element. So we need to find the semantically equivalent
attributes from different data sources so that we can represent all of
these as a single attribute in the Data warehouse.
Some times different data sources may use same attribute names to
represent semantically different data elements. We should resolve this
before we load data in to the data warehouse.
This is the competitive advantage also the challenge.All of these are the
goals of Business Intelligence(BI).
OLTP vs OLAP
OLTP System deals with operational data. Operational data are those data
involved in the operation of a particular system.
Operational Data
Operational data are usually of local relevance
Frequent Updates
Normalized Tables
Point Query
OLAP deals with Historical Data or Archival Data. Historical data are those
data that are archived over a long period of time.
Example: If we collect last 10 years data about flight reservation, The data
can give us many meaningful information such as the trends in reservation.
This may give useful information like peak time of travel, what kinds of
people are traveling in various classes (Economy/Business)etc.
How is the profit changing over the years across different regions ?
Is it financially viable continue the production unit at location X?
During the physical design process, you convert the data gathered during the logical design
phase into a description of the physical database structure. Physical design decisions are mainly
driven by query performance and database maintenance aspects. For example, choosing a
partitioning strategy that meets common query requirements enables Oracle Database to take
advantage of partition pruning, a way of narrowing a search before performing it.
See Also:
Physical Design
During the logical design phase, you defined a model for your data warehouse consisting of
entities, attributes, and relationships. The entities are linked together using relationships.
Attributes are used to describe the entities. The unique identifier (UID) distinguishes between
one instance of an entity and another.
Figure 3-1 illustrates a graphical way of distinguishing between logical and physical designs.
During the physical design process, you translate the expected schemas into actual database
structures. At this time, you have to map:
Entities to tables
Relationships to foreign key constraints
Attributes to columns
Once you have converted your logical design to a physical one, you will need to create some or
all of the following structures:
Tablespaces
Tables and Partitioned Tables
Views
Integrity Constraints
Dimensions
Some of these structures require disk space. Others exist only in the data dictionary. Additionally,
the following structures may be created for performance improvement:
Indexes and Partitioned Indexes
Materialized Views
Tablespaces
A tablespace consists of one or more datafiles, which are physical structures within the operating
system you are using. A datafile is associated with only one tablespace. From a design
perspective, tablespaces are containers for physical design structures.
Tablespaces need to be separated by differences. For example, tables should be separated from
their indexes and small tables should be separated from large tables. Tablespaces should also
represent logical business units if possible. Because a tablespace is the coarsest granularity for
backup and recovery or the transportable tablespaces mechanism, the logical business design
affects availability and maintenance operations.
You can now use ultralarge data files, a significant improvement in very large databases.
See Also:
Tables are the basic unit of data storage. They are the container for the expected amount of raw
data in your data warehouse.
Using partitioned tables instead of nonpartitioned ones addresses the key problem of supporting
very large data volumes by allowing you to decompose them into smaller and more manageable
pieces. The main design criterion for partitioning is manageability, though you will also see
performance benefits in most cases because of partition pruning or intelligent parallel processing.
For example, you might choose a partitioning strategy based on a sales transaction date and a
monthly granularity. If you have four years' worth of data, you can delete a month's data as it
becomes older than four years with a single, fast DDL statement and load new data while only
affecting 1/48th of the complete table. Business questions regarding the last quarter will only
affect three months, which is equivalent to three partitions, or 3/48ths of the total volume.
Partitioning large tables improves performance because each partitioned piece is more
manageable. Typically, you partition based on transaction dates in a data warehouse. For
example, each month, one month's worth of data can be assigned its own partition.
Table Compression
You can save disk space by compressing heap-organized tables. A typical type of heap-organized
table you should consider for table compression is partitioned tables.
To reduce disk use and memory use (specifically, the buffer cache), you can store tables and
partitioned tables in a compressed format inside the database. This often leads to a better scaleup
for read-only operations. Table compression can also speed up query execution. There is,
however, a cost in CPU overhead.
Table compression should be used with highly redundant data, such as tables with many foreign
keys. You should avoid compressing tables with much update or other DML activity. Although
compressed tables or partitions are updatable, there is some overhead in updating these tables,
and high update activity may work against compression by causing some space to be wasted.
Views
A view is a tailored presentation of the data contained in one or more tables or other views. A
view takes the output of a query and treats it as a table. Views do not require any space in the
database.
See Also:
Integrity Constraints
Integrity constraints are used to enforce business rules associated with your database and to
prevent having invalid information in the tables. Integrity constraints in data warehousing differ
from constraints in OLTP environments. In OLTP environments, they primarily prevent the
insertion of invalid data into a record, which is not a big problem in data warehousing
environments because accuracy has already been guaranteed. In data warehousing environments,
constraints are only used for query rewrite. NOT NULL constraints are particularly common in data
warehouses. Under some specific circumstances, constraints need space in the database. These
constraints are in the form of the underlying unique index.
See Also:
Chapter 7, " Integrity Constraints"
Indexes are optional structures associated with tables or clusters. In addition to the classical B-
tree indexes, bitmap indexes are very common in data warehousing environments. Bitmap
indexes are optimized index structures for set-oriented operations. Additionally, they are
necessary for some optimized data access methods such as star transformations.
Indexes are just like tables in that you can partition them, although the partitioning strategy is not
dependent upon the table structure. Partitioning indexes makes it easier to manage the data
warehouse during refresh and improves query performance.
See Also:
Chapter 6, " Indexes" and Chapter 15, " Maintaining the Data
Warehouse"
Materialized Views
Materialized views are query results that have been stored in advance so long-running
calculations are not necessary when you actually execute your SQL statements. From a physical
design point of view, materialized views resemble tables or partitioned tables and behave like
indexes in that they are used transparently and improve performance.
See Also:
Dimensions
Granularity is the extent to which a system is broken down into small parts, either the system
itself or its description or observation. It is the extent to which a larger entity is subdivided. For
example, a yard broken into inches has finer granularity than a yard broken into feet.
The terms granularity, coarse, and fine are relative, used when comparing systems or
descriptions of systems. An example of increasingly fine granularity: a list of nations in the
United Nations, a list of all states/provinces in those nations, a list of all counties in those states,
etc.
Decision support system(DSS)
From Wikipedia, the free encyclopedia
Typical information that a decision support application might gather and present includes:
AA
AAA
LinkedIninShare
A decision support system (DSS) is a computer program application that analyzes business data
and presents it so that users can make business decisions more easily. It is an "informational
application" (to distinguish it from an "operational application" that collects the data in the
course of normal business operation).Typical information that a decision support application
might gather and present would be:
A decision support system may present information graphically and may include an expert
system or artificial intelligence (AI). It may be aimed at business executives or some other group
of knowledge workers.
Figure 1.13. The system development life cycle for the data
warehouse environment is almost exactly the
opposite of the classical SDLC.
\
DAT A MODELING TECHNIQUES
FOR A DATA WARE HOUSE
Chapter 6. Data Modeling for a Data Warehouse
.................
35
6.1 Why Data Modeling Is Important
.........................
35
Visualization of the business world
.....................
35
The essence of the data warehouse architecture
............
36
Different approaches of data modeling
...................
36
6.2 Data Modeling Techniques
............................
36
6.3 ER Modeling
.....................................
37
6.3.1
Basic Concepts
................................
37
6.3.1.1
Entity
....................................
37
6.3.1.2
Relationship
................................
38
6.3.1.3
Attributes
.................................
38
6.3.1.4
Other Concepts
..............................
39
6.3.2
Advanced Topics in ER Modeling
......................
39
6.3.2.1
Supertype and Subtype
.........................
39
6.3.2.2
Constraints
................................
40
6.3.2.3
Derived Attributes and Derivation Functions
............
41
6.4 Dimensional Modeling
...............................
42
6.4.1
Basic Concepts
................................
42
6.4.1.1
Fact
.....................................
42
6.4.1.2
Dimension
.................................
42
Dimension Members
..............................
43
Dimension Hierarchies
............................
43
6.4.1.3
Measure
..................................
43
6.4.2
Visualization of a Dimensional Model
...................
43
6.4.3
Basic Operations for OLAP
.........................
44
6.4.3.1
Drill Down and Roll Up
.........................
44
6.4.3.2
Slice and Dice
..............................
45
6.4.4
Star and Snowflake Models
.........................
45
6.4.4.1 Star
Model
................................
46
6.4.4.2
Snowflake
Model
.............................
46
6.4.5
Data Consolidation
..............................
47
6.5 ER Modeling and Dimensional Modeling
.
In general, we can say that a DSS is a computerized system for helping make decisions. A
decision is a choice between alternatives based on estimates of the values of those alternatives.
Supporting a decision means helping people working alone or in a group gather intelligence,
generate alternatives and make choices. Supporting the choice making process involves
supporting the estimation, the evaluation and/or the comparison of alternatives. In practice,
references to DSS are usually references to computer applications that perform such a supporting
role
Online Transaction Processing : OLTP also refers to computer processing in which the computer
responds immediately to user requests. An automatic teller machine for a bank is an example of
transaction processing.
DSS (Decission support system) which helps to take decission for the top executive
people. It generally based on historical dataOLTP (Online trasnaction processing)
system is the the system where day to day transaction are taking into
consideration. It is based on current data.
OLAP:
1). Store historical data of an organization
2). Data is used for BI/ business strategic decision
OLTP
1). Real time transactional data
2). Data is used to have track of transaction details.
OLTP OLAP
Horizon
OLTP databases store live operational information. An invoice, for example, once paid, is
possibly moved to some sort of backup store, maybe upon period closing. On the other side 5-10
strategic analysis are usual to identify trends. Extending life of operational data, would not be
enough (in addition to possibly impacting performance).
Even keeping that data indexed and online for years, you would surely face compatibility
problems. It is quite improbable that your current invoice fields and references are the same of
10 years ago!
But neither performance nor compatibility are the biggest concern under large horizon. Real
problem is business dynamics. Today business constantly change and the traditional entity-
relationship approach is too vulnerable to changes. I will better explore this point in next post
with a practical example.
Refresh
OLPT requires instant update. When you cash some money from an ATM you balance shall be
immediately updated. OLAP has not such requirement. Nobody needs instant information to
make strategic business decision.
This allows OLAP data to be refreshed daily. This means extra timing and resources for
cleansing and accruing data. If, for example, an invoice was canceled, we wouldn't like to see its
value first inflating sales figures and later reverted.
More time and more resources would also allow better indexing to address huge tables covering
the extended horizon.
This is possibly the most evident difference between two approaches. OLTP perfectly fits
traditional entity-relationship or object-oriented models. We usually refer to information as
attributes related to entities, objects or classes, like product price, invoice amount or client name.
Mapping can be with a simple, one argument function:
A dimensional database needs to be designed to support queries that retrieve a large number of
records and that summarize data in different ways. A dimensional database tends to be subject
oriented and aims to answer questions such as, What products are selling well? At what time
of year do certain products sell best? In what regions are sales weakest?
If you attempt to use a database that is designed for OLTP as your data warehouse, query
performance will be very slow and it will be difficult to perform analysis on the data.
The following table summarizes the key differences between OLTP and OLAP
databases:
Many of the problems that businesses attempt to solve are multidimensional in nature. For
example, SQL queries that create summaries of product sales by region, region sales by product,
and so on, might require hours of processing on an OLTP database. However, a dimensional
database could process the same queries in a fraction of the time.
Besides the characteristic schema design differences between OLTP and OLAP databases, the
query optimizer typically should be tuned differently for these two types of tasks. For example,
in OLTP operations, the OPTCOMPIND setting (as specified by the environment variable or by
the configuration parameter of that name) should typically be set to zero, to favor nested-loop
joins. OLAP operations, in contrast, tend to be more efficient with an OPTCOMPIND setting of
2 to favor hash-join query plans. For more information, see the OPTCOMPIND environment
variable and the OPTCOMPIND configuration parameter. See the IBM Informix Performance
Guide for additional information about OPTCOMPIND, join methods, and the query optimizer.
IBM Informix also supports the SET ENVIRONMENT OPTCOMPIND statement to change
OPTCOMPIND setting dynamically during sessions in which both OLTP and OLAP operations
are required. See the IBM Informix Guide to SQL: Syntax for more information about the SET
ENVIRONMENT statement of SQL.
Informix is designed to help businesses better leverage their existing information assets as they
move into an on-demand business environment. In this type of environment, mission-critical
database management applications typically require combination systems. The applications need
both online transaction processing (OLTP), and batch and decision support systems (DSS),
including online analytical processing (OLAP).
Archieve
An archive is a collection of computer files that have been packaged together for
backup, to transport to some other location, for saving away from the computer so
that more hard disk storage can be made available, or for some other purpose. An
archive can include a simple list of files or files organized under a directory or
catalog structure (depending on how a particular program supports archiving).
Archiving a conversation will hide it from your messages view, while deleting a
conversation from Messages permanently removes the entire conversation and its
history.
To archive a conversation, simply click the "x" next to the conversation. The
conversations history will be preserved, and you will still be able to find it later. If
the same person sends you a new message later, the archived conversation will
reappear, and the new message will be added to it.
If you click "Delete All" at the bottom of the page, the full conversation history will
be permanently cleared from your messages. You can also check the boxes next to
individual messages and click "Delete Selected" to permanently delete parts of the
conversation.
A program file may require a data file to work. a data file holds information that a
program file may use. For example... a program file might be a shortcut that you
click on to run a program such as Notpad but a data file is like a *.doc file type or
a .txt which contains data.
A program file may only contain binary operation codes, addresses and embedded
data as permitted by the designers of the computer processor that is going to be
executing the program. A data file can be in any format as determined by the
programmers.
Well, a program file contains code that can be translated by a compiler and run as a
program on a computer.
A program file refers to sets of instructions that are put together to build up the
application you will be using. Several files are linked between each other to see the
application that pops in your screen.
A data file is basically the application's feeding as it provides what is needed for
your program to work and convert single pieces of data on understandable
information.
Answer: Files with ".log" and ".txt" extensions are both plain text files. This means they can
both be viewed with a standard text editor like Notepad for Windows or TextEdit for Mac OS X.
The difference between the two file types is that .LOG files are typically generated
automatically, while .TXT files are created by the user. For example, when a software installer is
run, it may create a log file that contains a log of files that were installed. Log files typically have
one entry per line, which includes information such as the filename, the action (created, moved,
deleted, etc.), and the location of the file.
Abstract (summary)
From Wikipedia, the free encyclopedia
The terms prcis or synopsis are used in some publications to refer to the same thing that other
publications might call an "abstract". In management reports, an executive summary usually
contains more information (and often more sensitive information) than the abstract does.
C````````````````````````````````````````````
```````````