Professional Documents
Culture Documents
ON
( K.D.D. )
Department:
Database and Database Management Systems
Developed By:
Mehsana.
PREFECE
ACKNOLEDGEMENT
From,
Acharya Dushyant
Patel Shailesh
DATA MINING – The Knowledge Discovery in Database 4
INDEX
Abstract 4
Data Wearhousing 7
Conclusion 15
Bibliography 15
DATA MINING – The Knowledge Discovery in Database 5
ABSTRACT
Data Mining gains its name, and to some degree its popularity,
by playing off of a meaning that the data that you have stored is much like a “mountain” and
that buried within the mountain (just as buried within your data) are certain “gems” of great
value. The problem is that there are also lots of non-valuable rocks and rubble in the
mountain that need to be mined through and discarded in order to get to that which is
valuable. The trick is that both for mountains of rock and mountains of data you need some
power tools to unearth the value of the data. For rock, this means earthmovers and dynamite;
for data, this means powerful computers and data Mining software.
Here the Data Base can be global, or more than one database
may be on different DBMS, but the Data Mining process can extract the all database and
gives you the results which you want. This process gives you the information from the
database may be it is not visible directly.
Data Mining is the tool which can give your data the
intelligence for any particular models or work. The Building of Data Mining software is very
easy if you go through proper steps.
The Data Mining is the tool which makes your Data Base
INTELLIGENT.
DATA MINING – The Knowledge Discovery in Database 6
Databases today can range in size into the terabytes — more than
1,000,000,000,000 bytes of data. Within these masses of data lies hidden information of
strategic importance. But when there are so many trees, how do you draw meaningful
conclusions about the forest?
The newest answer is Data Mining, which is being used both to
increase revenues and to reduce costs. The potential returns are enormous. Innovative
organizations worldwide are already using data mining to locate and appeal to higher-value
customers, to reconfigure their product offerings to increase sales, and to minimize losses due
to error or fraud.
Data mining is a process that uses a variety of data analysis tools to
discover patterns and relationships in data that may be used to make valid predictions.
The first and simplest analytical step in data mining is to describe the
data — summarize its statistical attributes (such as means and standard deviations), visually
review it using charts and graphs, and look for potentially meaningful links among variables
(such as values that often occur together). Collecting, exploring and selecting the right data
are critically important.
But data description alone cannot provide an action plan. You must
build a predictive model based on patterns determined from known results, then test that
model on results outside the original sample. A good model should never be confused with
reality (you know a road map isn’t a perfect representation of the actual road), but it can be a
useful guide to understanding your business.
The final step is to empirically verify the model. For example, from a
database of customers who have already responded to a particular offer, you’ve built a model
predicting which prospects are likeliest to respond to the same offer. Can you rely on this
prediction? Send a mailing to a portion of the new list and see what results you get.
The data mining is often referred as K.D.D. Knowledge Discovery in
Database. Because in the process of Data Mining we are mining the data or we are initiated
the process of knowledge discovery in database.
DATA MINING – The Knowledge Discovery in Database 7
to your solicitation, or (2) respond AND make a large purchase. The patterns data mining
finds for those two goals may be very different.
Although a good data mining tool shelters you from the intricacies of
statistical techniques, it requires you to understand the workings of the tools you choose and
the algorithms on which they are based. The choices you make in setting up your data mining
tool and the optimizations you choose will affect the accuracy and speed of your models.
Data warehousing ::
Data ware housing is a blend of technologies aimed at the effective
integration of operational databases into an environment that enables the strategic use of data.
These technologies include relational and multidimensional database management systems,
client/server architecture, metdata modeling and repositories, graphical user interfaces, and
mush more.
DATA MINING – The Knowledge Discovery in Database 9
left, a company can act to retain customers who are at risk for leaving (reducing churn or
attrition), because it is usually far less expensive to retain a customer than acquire a new one.
Data mining offers value across a broad spectrum of industries.
Telecommunications and credit card companies are two of the leaders in applying data
mining to detect fraudulent use of their services. Insurance companies and stock exchanges
are also interested in applying this technology to reduce fraud. Medical applications are
another fruitful area: data mining can be used to predict the effectiveness of surgical
procedures, medical tests or medications. Companies active in the financial markets use data
mining to determine market and industry characteristics as well as to predict individual
company and stock performance. Retailers are making more use of data mining to decide
which products to stock in particular stores (and even how to place them within a store), as
well as to assess the effectiveness of promotions and coupons. Pharmaceutical firms are
mining large databases of chemical compounds and of genetic material to discover substances
that might be candidates for development as agents for the treatments of disease.
There are two keys to success in data mining. First is coming up with
a precise formulation of the problem you are trying to solve. A focused statement usually
results in the best payoff. The second key is using the right data. After choosing from the data
available to you, or perhaps buying external data, you may need to transform and combine it
in significant ways.
The more the model builder can “play” with the data, build models,
evaluate results, and work with the data some more (in a given unit of time), the better the
resulting model will be. Consequently, the degree to which a data mining tool supports this
interactive data exploration is more important than the algorithms it uses.
Ideally, the data exploration tools (graphics/visualization,
query/OLAP) are well integrated with the analytics or algorithms that build the models.
You can postpone some of these decisions until you select a data
mining tool. For example, if you need a neural network or polynomial network you may have
to transform some of your fields.
4. Audit The Data
Evaluate the structure of your data in order to determine the
appropriate tools.
· What is the ratio of categorical/binary attributes in the database?
· What is the nature and structure of the database?
· What is the overall condition of the dataset?
· What is the distribution of the dataset?
Balance the objective assessment of the structure of your data against
your users' need to understand the findings. Neural nets, for example, don't explain their
results.
5. Select The Tools
Two concerns drive the selection of the appropriate data mining tool
—your business objectives and your data structure. Both should guide you to the same tool.
Consider these questions when evaluating a set of potential tools.
· Is the data set heavily categorical?
· What platforms do your candidate tools support?
· Are the candidate tools ODBC-compliant?
· What data format can the tools import?
No single tool is likely to provide the answer to your data mining
project. Some tools integrate several technologies into a suite of statistical analysis programs,
a neural network, and a symbolic classifier.
6. Format The Solution
In conjunction with your data audit, your business objective and the
selection of your tool determine the format of your solution The Key questions are
· What is the optimum format of the solution—decision tree, rules, C code, SQL
syntax?
· What are the available format options?
· What is the goal of the solution?
· What do the end-users need—graphs, reports, code?
7. Construct The Model
DATA MINING – The Knowledge Discovery in Database 15
At this point that the data mining process begins. Usually the first
step is to use a random number seed to split the data into a training set and a test set and
construct and evaluate a model. The generation of classification rules, decision trees,
clustering sub-groups, scores, code, weights and evaluation data/error rates takes place at this
stage. Resolve these issues:
· Are error rates at acceptable levels? Can you improve them?
· What extraneous attributes did you find? Can you purge them?
· Is additional data or a different methodology necessary?
· Will you have to train and test a new data set?
8. Validate The Findings
Share and discuss the results of the analysis with the business client
or domain expert. Ensure that the findings are correct and appropriate to the business
objectives.
· Do the findings make sense?
· Do you have to return to any prior steps to improve results?
· Can use other data mining tools to replicate the findings?
9. Deliver The Findings
Provide a final report to the business unit or client. The report should
document the entire data mining process including data preparation, tools used, test results,
source code, and rules. Some of the issues are:
· Will additional data improve the analysis?
· What strategic insight did you discover and how is it applicable?
· What proposals can result from the data mining analysis?
· Do the findings meet the business objective?
10. Integrate The Solution
Share the findings with all interested end-users in the appropriate
business units. You might wind up incorporating the results of the analysis into the
company's business procedures. Some of the data mining solutions may involve
· SQL syntax for distribution to end-users
· C code incorporated into a production system
· Rules integrated into a decision support system.
Although data mining tools automate database analysis, they can lead
to faulty findings and erroneous conclusions if you're not careful. Bear in mind that data
DATA MINING – The Knowledge Discovery in Database 16
mining is a business process with a specific goal—to extract a competitive insight from
historical records in a database.
CONCLUSION
BIBLIOGRAPHY
www.data-mine.com