Data Mining: A Comprehensive Overview of Key Concepts, Tools, Stages and Applications

1.
Definition
2. Overview
3. History
4. Evolution
5. Scope
6. Stages
7. Process
8. Relationships
9. Elements
10. Data Warehousing vs Data mining
11. Data Mining tools
12. Knowledge Discovery in Database
13. Advantages/Disadvantages
°ë is the process of discovering
meaningful new correlations, patterns and trends
by sifting through large amounts of stored in
repositories, using pattern recognition
technologies as well as statistical and
mathematical techniques.µ
m Data mining tools predict future trends and behaviors, allowing
businesses to make proactive, knowledge driven decisions.
m Prospective analysis offered by data mining move beyond analyses of

past events provided by retrospective tools typical of decision support
systems.
m Data mining tools can answer business questions that traditionally were
too time consuming to resolve.
m They scour databases for hidden patterns, finding predictive information

that experts may miss because it lies outside their expectations.
m Data mining techniques can be implemented rapidly on existing

software and hardware platforms to enhance the value of existing
information resources, and can be integrated with new products and
systems as they are brought on-line.
m Data mining is the evolution of a field with a long history, but the term
itself was only introduced relatively recently, in the 1990s
m Statistics are the foundation of most technologies on which data mining

is built.
m Its roots can be traced back to along three family lines:

£lassical statistics
Artificial intelligence
Machine learning
m It is finding increasing acceptance in science and business areas which

need to analyze large amounts of data to discover trends which they
could not otherwise find.
m £lassical statistics embrace concepts such as regression analysis,
standard distribution, standard deviation, standard variance, cluster
analysis, all of which are used to study data and data relationships.
m These are the building blocks with which more advanced statistical
analysis are underpinned.
m Within the heart of today·s data mining tools and techniques, classical
statistical analysis plays a significant role.
m It is built upon heuristics (method that often rapidly leads to a solution
that is usually close to the best possible answer) as opposed to statistics,
attempts to apply human-thought-like processing to statistical problems.
m Since this approach requires vast computer processing power, it was not
practical until the early 1980s, when computers began to offer useful
power at reasonable prices.
m £ertain AI concepts were adopted by some high-end commercial

products, such as query optimization modules for Relational Database
Management Systems (RDBMS).
m ÿnion of statistics and artificial intelligence.
m Is an evolution of artificial intelligence because it blends artificial

intelligence heuristics with advanced statistical analysis.
m Machine learning attempts to let computer programs learn about the

data they study, such that programs make different decisions based on
the qualities of the studied data, using statistics for fundamental
concepts, and adding more advanced AI heuristics and algorithms to
achieve its goals.
÷

÷

ë !"ë #

ë # - *) #
$
% & ) . !"
' (
#ë!"* "
" *+
, ( (
*+,
-ë!
ë - 1 2 #
( & ( $)(
/ë ' (
-,$2 " (
* " ë
! 0 )

ë " ( 4 $ 2, 2
'( (3 ! ( !"*5

.

)
m
. A typical
example of a predictive problem is targeted marketing. Data
mining uses data on past promotional mailings to identify the
targets most likely to maximize return on investment in future
mailings.
EX:
4 forecasting bankruptcy
4 identifying segments of a population likely to respond similarly to given
events.
m

. Data
mining tools sweep through databases and identify previously
hidden patterns in one step.
EX:
4 analysis of retail sales data to identify seemingly unrelated products that
are often purchased together (ex * beer and diapers).
4 detecting fraudulent credit card transactions and identifying anomalous
data that could represent data entry keying errors.
m Stage 1: Exploration
Data preparation, cleaning and transformation.
m Stage 2: Model building and validation

£onsidering various models and choosing the best one
based on their performance.
m Stage 3: Deployment
ÿsing the selected model as best in Stage 2 and applying it to
new data in order to generate predictions or estimates of the
expected outcome.
m £ : Stored data is used to locate data in predetermined
groups. For example, a restaurant chain could mine customer
purchase data to determine when customers visit and what they
typically order. This information could be used to increase traffic
by having daily specials.
m £ : Data items are grouped according to logical

relationships or consumer preferences. For example, data can
be mined to identify market segments or consumer affinities.
m : Data can be mined to identify associations. The

beer-diaper example is an example of associative mining.
m
: Data is mined to anticipate behavior
patterns and trends. For example, an outdoor equipment retailer
could predict the likelihood of a backpack being purchased
based on a consumer's purchase of sleeping bags and hiking
shoes.
m Extract, transform, and load transaction data onto the data
warehouse system.
m Store and manage the data in a multidimensional database

system.
m Provide data access to business analysts and information

technology professionals.
m Analyze the data by application software.
m Present the data in a useful format, such as a graph or table.

m Data Warehouse: °is a repository (or archive) of information
gathered from multiple sources, stored under a unified schema,
at a single site.µ (Silberschatz)
£ollect data * Store in single repository
Allows for easier query development as a single repository
can be queried.
m Data Mining:
Analyzing databases or ë to discover
patterns about the data to gain knowledge.
Knowledge is power
m Data mining tools are software components and theories that
allow users to extract information from data. The tools provide
individuals and companies with the ability to gather large
amounts of data and use it to make determinations about a
particular user or groups of users.
m Data mining tools can be classified into one of three

categories:
1. traditional data mining tools
2. dashboards, and
3. text-mining tools.
ë
m Help companies establish data patterns and trends by using a

number of complex algorithms and techniques.
m Some of these tools are installed on the desktop to monitor the
data and highlight trends and others capture information
residing outside a database.
m The majority are available in both Windows and ÿ IX versions,
although some specialize in one operating system only.
m While some may concentrate on one database type, most will
be able to handle any data using online analytical processing
or a similar technology.
ë
m Installed in computers to monitor information in a database.

m Dashboards reflect data changes and updates onscreen ³
often in the form of a chart or table ³ enabling the user to
see how the business is performing.
m Historical data also can be referenced, enabling the user to
see where things have changed (e.g., increase in sales from
the same period last year).
m This functionality makes dashboards easy to use and
particularly appealing to managers who wish to have an
overview of the company's performance.
V
m Its ability to mine data from different kinds of text ³ for
example from Microsoft Word and Acrobat PDF documents to
simple text files.
m These tools scan content and convert the selected data into a
format that is compatible with the tool's database, thus
providing users with an easy and convenient way of accessing
data without the need to open different applications.
m Scanned content can be unstructured (i.e., information is
scattered almost randomly across the document, including e-
mails, Internet pages, audio and video data) or structured (i.e.,
the data's form and purpose is known, such as content found
in a database).
m The most prevalent used in data
m KDD was developed in 1989 by Gregory Piatetsky-
Shapiro.
m ÿsers are able to process raw data, the data for
information and interpret the various results in the
form of information management.
m Include information like financials, client lists, policy
and procedure documents, shareholder registers,
and even electronic copies of contractual
agreements with customers and vendors.
m With a data mining tool, it is possible to conduct a
focused search for data that is needed, rather than
having to pore through all the stored data manually.

We select data relevant to some criteria.
Eg.: for credit card customers-transactions.

ÿnnecessary information is removed
V
Data is transformed in order to be suitable for data mining.
à ë
Extractions of patterns from data.

Patterns obtained in data mining stage are converted into
knowledge, which
in turn is used to support decision making.
ë

Makes it possible for the analyst to gain a deeper, more intuitive
understanding of the data.
It helps users to examine large volumes of data & detect patterns
visually.
m Historical data can be used to predict future trends
m Knowledge about new trends can be used to improve products

and services
m Extracting knowledge hidden in large volumes of data
m Data mining is used in developing models to predict outcomes

of future situations.
m rimited information
m oise & missing data
m ÿser interaction & prior knowledge
m ÿncertainty

Data Mining: A Comprehensive Overview of Key Concepts, Tools, Stages and Applications

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining: A Comprehensive Overview of Key Concepts, Tools, Stages and Applications

Uploaded by

Copyright:

Available Formats

1.

m Prospective analysis offered by data mining move beyond analyses of

m They scour databases for hidden patterns, finding predictive information

m Data mining techniques can be implemented rapidly on existing

m Statistics are the foundation of most technologies on which data mining

m Its roots can be traced back to along three family lines:

m It is finding increasing acceptance in science and business areas which

m £ertain AI concepts were adopted by some high-end commercial

m Is an evolution of artificial intelligence because it blends artificial

m Machine learning attempts to let computer programs learn about the

  ! 0  ) 

'( (3    ! ( !"*5   

m Stage 2: Model building and validation

m £  : Data items are grouped according to logical

m  : Data can be mined to identify associations. The

m Store and manage the data in a multidimensional database

m Provide data access to business analysts and information

m Analyze the data by application software.

m Present the data in a useful format, such as a graph or table.

m Data mining tools can be classified into one of three

m Help companies establish data patterns and trends by using a

m Installed in computers to monitor information in a database.

ë   

m Knowledge about new trends can be used to improve products

m Extracting knowledge hidden in large volumes of data

m Data mining is used in developing models to predict outcomes

You might also like

! 0 )

'( (3 ! ( !"*5

m £ : Data items are grouped according to logical

m : Data can be mined to identify associations. The

ë