Data Mining

ABSTRACT
The Data Warehousing supports business analysis and decision making by creating an enterprise wide integrated database of summarized, historical information. It integrates data from multiple incompatible sources. By transforming data into meaningful information a data warehouse allows the business manager to perform more substantive, accurate and consistent analysis. Data Mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources and can be integrated with new products and systems as they are brought online. When implemented on high performance client/server or parallel processing computers data mining tools can analyze massive databases that support querying effectively. A Data Warehouse is of course a database, but it contains summarized information. Integration of Data Mining with Warehouse exploits effective results like better querying process, performance sharing and also getting reliable information. Here in the following section we expose the entire concept of Data Warehousing & Data Mining.
Contents
1) Introduction Features Decision Support Systems

2)
Data warehouse schemas
3) Microsoft Data Warehousing Framework

4)
Data mining working procedure
Data warehouse with data mining An approach to Client/Server data warehousing Applications Conclusion
INTRODUCTION:
Modern organizations are under enormous pressure with recent development of the technology. Clearly we need a rapid access to all kinds of information. To assist this we need to consider the past and to identify relevant trend analysis. So in order to perform any trend analysis we must have a database. In most organizations you will find really large databases in operation for normal daily transactions. These types of databases are known as operational databases; in most cases they have not been design to store historical data or to respond to queries but simply to support all the applications for day to day transactions. The second type of database found in organizations is the data warehouse. This is designed for strategic decision support and is largely built up from the databases that make up the operational database. The basic characteristic of a data warehouse is that it contains vast amount of data which can mean billions of records. Smaller, local data warehouse are called data marts. A data warehouse is designed especially for decision support queries, therefore only data that is needed for decision support is extracted from the operational data and stored in the data warehouse along with the time when it was retrieved from operational databases. Datawarehousing
Need for Datawarehouse:

To summarise large valumes of data. To integrate datas from different sources. Make decision makers to access past data.
Enable people to make informed decisions.
FEATURES :
1. Time dependent: - That is, containing information collected over time, which
implies there must always be a connection between the information in the warehouse and the time when it was entered.
2. Non-volatile (permanent): -That is, data in datawarehouse is never updated

but used only for queries. End users who want to update the data must use operational database. This means that data warehouse will always be filled with historical data.
3. Subject oriented: - That is, built around all the existing applications of the
operational data. The data warehouse is designed specifically for decision support while the operational databases contain about information for day to-day use.
4. Integrated: - In data warehouse it is essential to integrate this information and

make it consistent; only one name must exists to describe each individual entity.
DECISION SUPPORT SYSTEM :

When designing a decision support system, particular importance should be placed on the requirements of the end-user and the h/w and s/w products that will be required.
The requirements of the end-users: Some end-users need specific query tools so that they can build their queries themselves. Some others are interested only in particular part of information. We can build a specific type of application around this to speed up the query process.
H/w and S/w products of a decision support systems:

Working in a client/server environment allows you great flexibility in choosing the appropriate s/w for end-users because each individual need can be catered for on a local workstation. The h/w requirements depend on the type of data warehouse and the techniques with which you want to work. Two basic types of data warehouses:
1.
Enterprise data warehouses: The enterprise data warehouse contains corporate wide information integrated from multiple operational data sources for consolidated data analysis. Typically it is composed of several subject areas such as customers, products, and sales and is used for both tactical and strategic decision making. Data Marts: Data marts contain a subset of corporate wide data that is built for use by an individual department or division of an organization. Unlike the enterprise data warehouse, data marts are often built from the bottom of by departmental resources for a specific support application or group of users. Data marts contain summarized and often detailed about subject area.
2.
DATAWAREHOUSE SCHEMAS:
A multidimensional data model identifies the dimensions, their hierarchies the measure functions etc., for the design of data cube. But realization of data cube is in designing phase. Various schemas as employed. 1. Star schema : It is a modeling paradigm in which the data warehouse contains a large single fact table and a set of smaller dimensional tables, one for each dimension.
Fact table: Fact table

Dim1-key Dim2-key Dim3-key Summary Dim1table Dim1Attrib Dim2table Dim2Attrib Dim3table Dim3Attrib
It contains detailed summary data Each tuple consists of foreign key to each dimension table. Corresponds to only one tuple in each dimension table.
Dimension table:
It consists of columns that correspond to the attributes of the dimensions. in the fact table.
One tuple in a dimension table may corresponds to more than one tuple
1:N relationship exists between fact table and dimension tables. It is easy to understand and easy to define hierarchies. It reduces the no. of physical joins and is easy to maintain.
2. Snowflake schema :
It consists of single fact table and multiple dimension tables. The difference between star schema and snowflake schema is that in star schema the dimension tables are denormalized and in snowflake schema these tables are normalized.
Dimension1 Table
Fact table
Dimension2 Table
Dimension3 Table
Easier to maintain. Saves storage space.
Microsoft Data Warehousing Framework:

The goal of the data warehousing framework is to simplify the design implementation and management of data warehousing solutions. The data warehousing framework
describes the relationships between the various components used in the process of building using and managing a data warehouse.
Data Warehouse/ Data Mart Design
Information directory
Building
Using
Operational Sources
Data Transform/ Cleaning
Datamarts or Data Warehouse
End-User Tools
Schema
Transform
Schedule
Repl
Info Publish
OLAP
Repository(persistent shared metadata)
Data Warehouse Management
The core of the Microsoft framework is a set of enabling technologies comprised of the data transport layer and integrated data repository. Operational data must pass through a cleaning and transformation stage before being placed into the data marts or data warehouse in order to confirm to the decisions laid out during the design stage. End-user tools including desktop productivity products specialized analysis products and custom programs are used to gain access the information in the data warehouse. Ideally user access is through a directory facility that enables the user search for appropriate and relevant data to resolve business questions, and provides a layer of security between the users and backend systems. Finally a verity of tools come into play for the management of data warehouse environment such as scheduling repeated tasks and managing multiserver N/w.
Microsoft repository provides the integration point for the metadata shared by the various tools used in the data warehousing process. Shared metadata allows for the transparent integration of the multiple tools from a variety of vendors, with out the need for specialized interfaces between each of the products.
Data Mining
Data Mining or knowledge discovery in databases is the nontrivial extraction of implicit and previously unknown and potentially useful information from the data. Data mining is the search for relationship and global patterns that exist in large databases but are hidden among vast amount of data.
WORKING PROCEDURE :
Data Mining software analyzes relationships and patterns in stored transactions data based on open-ended user queries. Generally sought four types of relationships are:

Classes: Stored data is used to locate data in predetermined groups. Clusters: Data items are grouped according to logical relationships or consumer preferences. Associations: Data can be mined to identify associations. Sequential patterns: Data is mined to anticipate behavior patterns and trends.
Major Steps:
Extract, transform and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and Information technology professionals. Analyze the data by application software.

1.
Present data in useful manner such as graph or table.
Techniques in Data Mining :

Artificial Neural Networks: Non-linear predictive models that learn through training and resemble biological neural network in structure.
2.
Decision Trees: Tree shaped structures that represent sets of decisions. These decisions generate rules for classification of dataset. Genetic Algorithms: Optimization techniques that use processes such as genetic combinations, mutation and natural selection in a design based on the concepts of evaluation. Rule Induction: The extraction of useful if-then rules from data based on
3.
4.
statistical significance.
Data warehouse with data mining:

Data mining: - As is well known, in mining, enormous quantities of debris have to
be removed before diamonds or gold can be found. The analogy that, with a computer you can automatically find the one 'information-diamond' among the tons of datadebris in your database is of course very attractive. Integration of a data mining in a decision support system is very helpful. The sole function of data warehouse is to supply information needed to make adequate decisions. In some cases you can use standard SQL tools for decision support, but if you want to compare millions of records and do not know exactly the type of information you require, or if you want to find hidden data then you have to turn to data mining. In many cases you will find that you need a separate computer for data mining; trying to mine operational data is almost impossible because there are different applications with different types of attributes and different data types but no
Data historical data. With a data warehouse this problem does not exist - all the information Operational Data marts
has been transferred from the operational database to the data warehouse; furthermore, in many cases you can clean the data before commencing data mining.
Extracts from several
data
Warehouse
databases
The Relationship between operational data, a data warehouse, and data marts
Client/Server and data warehousing:

Over the past few years it has proved very difficult to built effective decision support systems because the techniques available were not able to support the enduser satisfactorily. End-users would ideally like to have available all kinds of techniques such as GUI, statistical techniques, windowing mechanisms and visualization techniques so that they can easily access the data being sought. This means that a great deal of local computer power is needed at each workstation, and the client/server technique is the solution to this problem. Client/Server involves dispersing the s/w over several computers and creating an environment for the enduser so that it appears that each is working on just one system. The heavy load of GUI or other visual techniques can be processed on these local machines and all the database tasks handled by a specific database serve. In this way the database server can be completely optimized for the database. In some cases you can buy special databases that operate with specific type of h/w. With client/server you only have to change the piece of s/w that is related to the end-user the other applications do not require alteration. Of all the techniques currently available on the market, client/server represents the best choice for building a data warehouse.
APPLICATIONS: Data warehousing:
a. Sales and marketing analysis across many industries. b. Inventory turn and product tracking in manufacturing. c. Profitable lane or driver risk analysis in transportation. d. Claims analysis or fraud detection in insurance.
Data Mining:
Retail/Marketing: Identifying buying patterns from customers. Banking: Detect patterns of fraudulent credit card use. Healthcare:
1. 2.
Identifying the behavior of the risky customer. Identifying successful medical therapies for different illnesses.
Conclusion: Acquiring of right information at right time to right people is key to take right decisions. To make possible so, the path called data warehouse is used to data mining.
Bibliography:
1. Data Mining by Pieter Adriaans , Dolf Zantinge
2. Decision Support and Data Warehouse Systems by Efrem G. Mallach

Data Mining

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining

Uploaded by

Copyright:

Available Formats

ABSTRACT

1) Introduction Features Decision Support Systems

Data warehouse schemas

3) Microsoft Data Warehousing Framework

Data mining working procedure

Need for Datawarehouse:

Enable people to make informed decisions.

2. Non-volatile (permanent): -That is, data in datawarehouse is never updated

4. Integrated: - In data warehouse it is essential to integrate this information and

DECISION SUPPORT SYSTEM :

H/w and S/w products of a decision support systems:

Fact table: Fact table

Easier to maintain. Saves storage space.

Microsoft Data Warehousing Framework:

Data Warehouse/ Data Mart Design

Data Transform/ Cleaning

Datamarts or Data Warehouse

Repository(persistent shared metadata)

Data Warehouse Management

Present data in useful manner such as graph or table.

Techniques in Data Mining :

Data warehouse with data mining:

Client/Server and data warehousing:

APPLICATIONS: Data warehousing:

2. Decision Support and Data Warehouse Systems by Efrem G. Mallach

You might also like