You are on page 1of 17

DOCUMENTATION

ON

( K.D.D. )

Department:
Database and Database Management Systems

Developed By:

ACHARYA DUSHYANT PATEL


SHAILESH
B.E. Computer Engineering, B.E. Computer Engineering,
Semester V, Semester V,
00CE54, 00CE43,

Shree U.V. Patel College of Engineering.


Kherva,
DATA MINING – The Knowledge Discovery in Database 2

Mehsana.

PREFECE

“You have no choice but to operate in a world shaped by


globalization and the information revolution. There are two options: adapt or die.”
- Andy Grove, Chairman, Intel

The last few years have seen a growing recognition of information as


a key business tool. Those who successfully gather, analyze, understand and act upon the
information are among the winners in this new “information age”.
In any business or work just gathering the information is not
sufficient, they need to store it for future purposes. Data Base Management Systems are great
tools to define and stores the information as Data Base and also analyze and give you the
response to some questions or quarry, Still there is some questions or some data which we
want, is not directly accessible from the data base . Just seeing the Data Base you can not
make some decisions and may not predict the market as well as customers.
At this point Data Mining is very important for the user, Though the
Data Mining is not a magic wand but it can find the “hidden” information from your
database, it can predict the market as well as the customers up to certain level of accuracy.
Data Mining also able to take the result depending on multiple database, may be in different
DBMS or different companies.
We have tries to give detail introduction for Data Mining, Main
features of it and also tries to give details of development of Data Mining or K.D.D.
application.
We hope that after reading this report one can better understand the
Data Mining and can develop the application or say Data Mining software which can give
the facilities for mining the Data Base in real meaning. Because in real meaning your
software should have Artificial Intelligence to detect some models or methods for Data
Mining.
DATA MINING – The Knowledge Discovery in Database 3

ACKNOLEDGEMENT

All of first we want to thank CONVERGENCE for giving us


the opportunity to show our ability in front of the students and the experts in their
seminar.

We are very thankful to our college, shree u.v.patel college


of engineering to provide us helps as much as possible. Specially for our Computer
Department who has given as the very friendly atmosphere to work for the research
and giving amazing support for every kind of help.

We are thankful to Mr. H. R. Chaudhari, and Mr. P. B.


Thakkar for giving us the guideline in selecting the topic and to get the basic
information for the topic.

People related with us, who helps us in this research directly


or indirectly are also important for us, we are giving them thanks from our heart.

From,
Acharya Dushyant
Patel Shailesh
DATA MINING – The Knowledge Discovery in Database 4

INDEX

CONTENTS PAGE NO.

Abstract 4

What is Data Mining? 5

Learning from past mistakes 6

Data Mining: What it can’t do? 6

Is Data Mining replace skilled purposes? 7

Data Wearhousing 7

Data Mining and Data Wearhousing 8

Data Mining and OLAP 9

Data Mining Applications 10

Successful Data Mining 11

Ten steps of Data Mining 11

Conclusion 15

Bibliography 15
DATA MINING – The Knowledge Discovery in Database 5

ABSTRACT

Data Mining gains its name, and to some degree its popularity,
by playing off of a meaning that the data that you have stored is much like a “mountain” and
that buried within the mountain (just as buried within your data) are certain “gems” of great
value. The problem is that there are also lots of non-valuable rocks and rubble in the
mountain that need to be mined through and discarded in order to get to that which is
valuable. The trick is that both for mountains of rock and mountains of data you need some
power tools to unearth the value of the data. For rock, this means earthmovers and dynamite;
for data, this means powerful computers and data Mining software.

Data Mining is a process for organizations, which uncover


patterns hidden in their data that can be used to predict the behavior of customers, products
and processes.

Here the Data Base can be global, or more than one database
may be on different DBMS, but the Data Mining process can extract the all database and
gives you the results which you want. This process gives you the information from the
database may be it is not visible directly.

Data Mining can give the some results, some combinations or


some specific characteristics of customer, product or processes, which is further useful to
next working. It can be said that there is some Artificial Intelligence in the Data Mining. So it
is called as K.D.D. (Knowledge Discovery in Database).

Data Mining is the tool which can give your data the
intelligence for any particular models or work. The Building of Data Mining software is very
easy if you go through proper steps.
The Data Mining is the tool which makes your Data Base
INTELLIGENT.
DATA MINING – The Knowledge Discovery in Database 6

What is Data Mining? ::

Databases today can range in size into the terabytes — more than
1,000,000,000,000 bytes of data. Within these masses of data lies hidden information of
strategic importance. But when there are so many trees, how do you draw meaningful
conclusions about the forest?
The newest answer is Data Mining, which is being used both to
increase revenues and to reduce costs. The potential returns are enormous. Innovative
organizations worldwide are already using data mining to locate and appeal to higher-value
customers, to reconfigure their product offerings to increase sales, and to minimize losses due
to error or fraud.
Data mining is a process that uses a variety of data analysis tools to
discover patterns and relationships in data that may be used to make valid predictions.
The first and simplest analytical step in data mining is to describe the
data — summarize its statistical attributes (such as means and standard deviations), visually
review it using charts and graphs, and look for potentially meaningful links among variables
(such as values that often occur together). Collecting, exploring and selecting the right data
are critically important.
But data description alone cannot provide an action plan. You must
build a predictive model based on patterns determined from known results, then test that
model on results outside the original sample. A good model should never be confused with
reality (you know a road map isn’t a perfect representation of the actual road), but it can be a
useful guide to understanding your business.
The final step is to empirically verify the model. For example, from a
database of customers who have already responded to a particular offer, you’ve built a model
predicting which prospects are likeliest to respond to the same offer. Can you rely on this
prediction? Send a mailing to a portion of the new list and see what results you get.
The data mining is often referred as K.D.D. Knowledge Discovery in
Database. Because in the process of Data Mining we are mining the data or we are initiated
the process of knowledge discovery in database.
DATA MINING – The Knowledge Discovery in Database 7

Learning from Past Mistakes? ::


“Those who cannot remember the past are condemned to repeat it”.
-G. Santayana
Data Mining Works the same way as a human being does. It uses
historical information(experience) to learn from the past. However, in order for the data
mining technology to pull the “gold” out of your database, you do have to tell it what the gold
looks like(What business problem you would like to solve). It then uses the description of
that “gold” to look for similar examples in the database, and uses these pieces of information
from the past to develop a predictive model of what will happen in the future.

Data mining: What it can’t do


Data mining is a tool, not a magic wand. It won’t sit in your database
watching what happens and send you e-mail to get your attention when it sees an interesting
pattern. It doesn’t eliminate the need to know your business, to understand your data, or to
understand analytical methods. Data mining assists business analysts with finding patterns
and relationships in the data — it does not tell you the value of the patterns to the
organization. Furthermore, the patterns uncovered by data mining must be verified in the real
world.
Remember that the predictive relationships found via data mining are
not necessarily causes of an action or behavior. For example, data mining might determine
that males with incomes between $50,000 and $65,000 who subscribe to certain magazines
are likely purchasers of a product you want to sell. While you can take advantage of this
pattern, say by aiming your marketing at people who fit the pattern, you should not assume
that any of these factors cause them to buy your product.
To ensure meaningful results, it’s vital that you understand your data.
The quality of your output will often be sensitive to outliers (data values that are very
different from the typical values in your database), irrelevant columns or columns that vary
together (such as age and date of birth), the way you encode your data, and the data you leave
in and the data you exclude. Algorithms vary in their sensitivity to such data issues, but it is
unwise to depend on a data mining product to make all the right decisions on its own.
Data mining will not automatically discover solutions without
guidance. Rather than setting the vague goal, “Help improve the response to my direct mail
solicitation,” you might use data mining to find the characteristics of people who (1) respond
DATA MINING – The Knowledge Discovery in Database 8

to your solicitation, or (2) respond AND make a large purchase. The patterns data mining
finds for those two goals may be very different.
Although a good data mining tool shelters you from the intricacies of
statistical techniques, it requires you to understand the workings of the tools you choose and
the algorithms on which they are based. The choices you make in setting up your data mining
tool and the optimizations you choose will affect the accuracy and speed of your models.

Is Data Mining replace skilled purposes? ::


Data mining does not replace skilled business analysts or managers, but
rather gives them a powerful new tool to improve the job they are doing. Any company that
knows its business and its customers is already aware of many important, high-payoff
patterns that its employees have observed over the years. What data mining can do is confirm
such empirical observations and find new, subtle patterns that yield steady incremental
improvement (plus the occasional breakthrough insight).

Data warehousing ::
Data ware housing is a blend of technologies aimed at the effective
integration of operational databases into an environment that enables the strategic use of data.
These technologies include relational and multidimensional database management systems,
client/server architecture, metdata modeling and repositories, graphical user interfaces, and
mush more.
DATA MINING – The Knowledge Discovery in Database 9

Data mining and Data warehousing ::

Frequently, the data to be mined is first extracted from an enterprise


data warehouse into a data mining database or data mart (Figure 1). There is some real
benefit if your data is already part of a data warehouse. As we shall see later on, the problems
of cleansing data for a data warehouse and for data mining are very similar. If the data has
already been cleansed for a data warehouse, then it most likely will not need further cleaning
in order to be mined. Furthermore, you will have already addressed many of the problems of
data consolidation and put in place maintenance procedures.

The Data Mining database may be a logical rather than a physical


subset of your data warehouse, provided that the data warehouse DBMS can support the
additional resource demands of Data Mining. If it cannot, then you will be better off with a
separate Data Mining database.
DATA MINING – The Knowledge Discovery in Database 10

A data warehouse is not a requirement for data mining. Setting up a


large data warehouse that consolidates data from multiple sources, resolves data integrity
problems, and loads the data into a query database can be an enormous task, sometimes
taking years and costing millions of dollars. You could, however, mine data from one or more
operational or transactional databases by simply extracting it into a read-only database
(Figure 2). This new database functions as a type of data mart.

Data mining and OLAP ::

One of the most common questions from data processing


professionals is about the difference between data mining and OLAP (On-Line Analytical
Processing). As we shall see, they are very different tools that can complement each other.
OLAP is part of the spectrum of decision support tools. Traditional
query and report tools describe what is in a database. OLAP goes further; it’s used to answer
why certain things are true. The user forms a hypothesis about a relationship and verifies it
with a series of queries against the data. For example, an analyst might want to determine the
factors that lead to loan defaults. He or she might initially hypothesize that people with low
incomes are bad credit risks and analyze the database with OLAP to verify (or disprove) this
assumption. If that hypothesis were not borne out by the data, the analyst might then look at
high debt as the determinant of risk. If the data did not support this guess either, he or she
might then try debt and income together as the best predictor of bad credit risks.
DATA MINING – The Knowledge Discovery in Database 11

In other words, the OLAP analyst generates a series of hypothetical


patterns and relationships and uses queries against the database to verify them or disprove
them. OLAP analysis is essentially a deductive process. But what happens when the number
of variables being analyzed is in the dozens or even hundreds? It becomes much more
difficult and time-consuming to find a good hypothesis (let alone be confident that there is
not a better explanation than the one found), and analyze the database with OLAP to verify or
disprove it.
Data mining is different from OLAP because rather than verify
hypothetical patterns, it uses the data itself to uncover such patterns. It is essentially an
inductive process. For example, suppose the analyst who wanted to identify the risk factors
for loan default were to use a data mining tool. The data mining tool might discover that
people with high debt and low incomes were bad credit risks (as above), but it might go
further and also discover a pattern the analyst did not think to try, such as that age is also a
determinant of risk.
Here is where data mining and OLAP can complement each other.
Before acting on the pattern, the analyst needs to know what the financial implications would
be of using the discovered pattern to govern who gets credit. The OLAP tool can allow the
analyst to answer those kinds of questions. Furthermore, OLAP is also complementary in the
early stages of the knowledge discovery process because it can help you explore your data,
for instance by focusing attention on important variables, identifying exceptions, or finding
interactions. This is important because the better you understand your data, the more effective
the knowledge discovery process will be.

Data mining applications ::


Data mining is increasingly popular because of the substantial
contribution it can make. It can be used to control costs as well as contribute to revenue
increases.
Many organizations are using data mining to help manage all phases
of the customer life cycle, including acquiring new customers, increasing revenue from
existing customers, and retaining good customers. By determining characteristics of good
customers (profiling), a company can target prospects with similar characteristics. By
profiling customers who have bought a particular product it can focus attention on similar
customers who have not bought that product (cross-selling). By profiling customers who have
DATA MINING – The Knowledge Discovery in Database 12

left, a company can act to retain customers who are at risk for leaving (reducing churn or
attrition), because it is usually far less expensive to retain a customer than acquire a new one.
Data mining offers value across a broad spectrum of industries.
Telecommunications and credit card companies are two of the leaders in applying data
mining to detect fraudulent use of their services. Insurance companies and stock exchanges
are also interested in applying this technology to reduce fraud. Medical applications are
another fruitful area: data mining can be used to predict the effectiveness of surgical
procedures, medical tests or medications. Companies active in the financial markets use data
mining to determine market and industry characteristics as well as to predict individual
company and stock performance. Retailers are making more use of data mining to decide
which products to stock in particular stores (and even how to place them within a store), as
well as to assess the effectiveness of promotions and coupons. Pharmaceutical firms are
mining large databases of chemical compounds and of genetic material to discover substances
that might be candidates for development as agents for the treatments of disease.

Successful data mining ::

There are two keys to success in data mining. First is coming up with
a precise formulation of the problem you are trying to solve. A focused statement usually
results in the best payoff. The second key is using the right data. After choosing from the data
available to you, or perhaps buying external data, you may need to transform and combine it
in significant ways.
The more the model builder can “play” with the data, build models,
evaluate results, and work with the data some more (in a given unit of time), the better the
resulting model will be. Consequently, the degree to which a data mining tool supports this
interactive data exploration is more important than the algorithms it uses.
Ideally, the data exploration tools (graphics/visualization,
query/OLAP) are well integrated with the analytics or algorithms that build the models.

The Ten Steps of Data Mining ::

Here is a process for extracting hidden knowledge from your data


warehouse, your customer information file, or any other company database.
DATA MINING – The Knowledge Discovery in Database 13

1. Identify The Objective


Before you begin, be clear on what you hope to accomplish with
your analysis. Know in advance the business goal of the data mining. Establish whether or
not the goal is measurable. Some possible goals are to
· find sales relationships between specific products or services
· identify specific purchasing patterns over time
· identify potential types of customers
· find product sales trends.
2. Select The Data
Once you have defined your goal, your next step is to select the data
to meet this goal. This may be a subset of your data warehouse or a data mart that contains
specific product information. It may be your customer information file. Segment as much as
possible the scope of the data to be mined.
Here are some key issues.
· Are the data adequate to describe the phenomena the data mining analysis is
attempting to model?
· Can you enhance internal customer records with external lifestyle and demographic
data?
· Are the data stable—will the mined attributes be the same after the analysis?
· If you are merging databases can you find a common field for linking them?
· How current and relevant are the data to the business goal?
3. Prepare The Data
Once you've assembled the data, you must decide which attributes to
convert into usable formats. Consider the input of domain experts—creators and users of the
data.
· Establish strategies for handling missing data, extraneous noise, and outliers
· Identify redundant variables in the dataset and decide which fields to exclude
· Decide on a log or square transformation, if necessary
· Visually inspect the dataset to get a feel for the database
· Determine the distribution frequencies of the data
DATA MINING – The Knowledge Discovery in Database 14

You can postpone some of these decisions until you select a data
mining tool. For example, if you need a neural network or polynomial network you may have
to transform some of your fields.
4. Audit The Data
Evaluate the structure of your data in order to determine the
appropriate tools.
· What is the ratio of categorical/binary attributes in the database?
· What is the nature and structure of the database?
· What is the overall condition of the dataset?
· What is the distribution of the dataset?
Balance the objective assessment of the structure of your data against
your users' need to understand the findings. Neural nets, for example, don't explain their
results.
5. Select The Tools
Two concerns drive the selection of the appropriate data mining tool
—your business objectives and your data structure. Both should guide you to the same tool.
Consider these questions when evaluating a set of potential tools.
· Is the data set heavily categorical?
· What platforms do your candidate tools support?
· Are the candidate tools ODBC-compliant?
· What data format can the tools import?
No single tool is likely to provide the answer to your data mining
project. Some tools integrate several technologies into a suite of statistical analysis programs,
a neural network, and a symbolic classifier.
6. Format The Solution
In conjunction with your data audit, your business objective and the
selection of your tool determine the format of your solution The Key questions are
· What is the optimum format of the solution—decision tree, rules, C code, SQL
syntax?
· What are the available format options?
· What is the goal of the solution?
· What do the end-users need—graphs, reports, code?
7. Construct The Model
DATA MINING – The Knowledge Discovery in Database 15

At this point that the data mining process begins. Usually the first
step is to use a random number seed to split the data into a training set and a test set and
construct and evaluate a model. The generation of classification rules, decision trees,
clustering sub-groups, scores, code, weights and evaluation data/error rates takes place at this
stage. Resolve these issues:
· Are error rates at acceptable levels? Can you improve them?
· What extraneous attributes did you find? Can you purge them?
· Is additional data or a different methodology necessary?
· Will you have to train and test a new data set?
8. Validate The Findings
Share and discuss the results of the analysis with the business client
or domain expert. Ensure that the findings are correct and appropriate to the business
objectives.
· Do the findings make sense?
· Do you have to return to any prior steps to improve results?
· Can use other data mining tools to replicate the findings?
9. Deliver The Findings
Provide a final report to the business unit or client. The report should
document the entire data mining process including data preparation, tools used, test results,
source code, and rules. Some of the issues are:
· Will additional data improve the analysis?
· What strategic insight did you discover and how is it applicable?
· What proposals can result from the data mining analysis?
· Do the findings meet the business objective?
10. Integrate The Solution
Share the findings with all interested end-users in the appropriate
business units. You might wind up incorporating the results of the analysis into the
company's business procedures. Some of the data mining solutions may involve
· SQL syntax for distribution to end-users
· C code incorporated into a production system
· Rules integrated into a decision support system.
Although data mining tools automate database analysis, they can lead
to faulty findings and erroneous conclusions if you're not careful. Bear in mind that data
DATA MINING – The Knowledge Discovery in Database 16

mining is a business process with a specific goal—to extract a competitive insight from
historical records in a database.

CONCLUSION

Data mining offers great promise in helping organizations uncover


patterns hidden in their data that can be used to predict the behavior of customers, products
and processes. However, data mining tools need to be guided by users who understand the
business, the data, and the general nature of the analytical methods involved. Realistic
expectations can yield rewarding results across a wide range of applications, from improving
revenues to reducing costs.
Building models is only one step in knowledge discovery. It’s vital to
properly collect and prepare the data, and to check the models against the real world. The
“best” model is often found after building models of several different types, or by trying
different technologies or algorithms.
Choosing the right data mining products means finding a tool with
good basic capabilities, an interface that matches the skill level of the people who’ll be using
it, and features relevant to your specific business problems. After you’ve narrowed down the
list of potential solutions, get a hands-on trial of the likeliest ones.

BIBLIOGRAPHY

 Building Data Mining Applications for CRM


By : Alex Berson, Stephen Smith, Kurt Thearling
 Introduction to Data Mining and Knowledge Discovery
By : Two crow corporation.
 www.db.cs.sfu.ca/sections/publications/
 www.dmreview.com
 www.research.microsoft.com/datamine
 www.ascencialsoftware.com
DATA MINING – The Knowledge Discovery in Database 17

 www.data-mine.com

You might also like