Understanding Data Mining Techniques and Their Importance in Future Planning

understanding data mining Basis for future planning development
What is data mining?
Importance of data mining

Classes of data mining technique
How does data mining works?

What technological infrastructure is
required? Problems
the process of collecting, searching through, and
analyzing a large amount of data in a database, as to discover patterns or relationships
Bose & Mahapatra (2001) process of identifying
interesting patterns in databases that can then be used in decision making Turban et al. (2007) process that uses statistical, mathematical artificial intelligence and machine learning techniques to extract and identify useful information Frawley et al. (1992) objective of data mining is to obtain useful, non-explicit information from data stored in large repositories
Data mining help analyzing data from
different perspectives, different dimensions or angles and summarizing it into useful information
Plays important role in Financial Fraud
Detection (FDD) because often applied to extract and uncover the hidden truths behind very large quantities of data
Data
-any facts, numbers or text that can be processed by a computer. -this includes operational or transactional data such as sales, cost, inventory, payroll and accounting -nonoperational data such as industry sales, forecast data Information - the correlation or pattern among the data can provide useful information Knowledge -information can be converted into knowledge about historical patterns and future trends
Classification build up and utilizes a model to
predict the categorical labels of unknown objects to distinguish between objects of different classes Clustering divide objects into conceptually meaningful groups (clusters), with the objects in a group being similar to one another but very dissimilar to the objects in other groups Prediction estimates numeric and ordered future values based on the patterns of a data set
Outlier detection measure the distance between data
objects to detect those objects that are grossly different from or inconsistent with the remaining data set Regression statistical methodology used to reveal the relationship between one or more independent variables and a dependent variable Visualization easily understandable presentation of data and to methodology that converts complicated data characteristics into clear patterns or relationships uncovered in the data mining process
stages of data mining
exploration this stage starts with preparing data such as data cleaning, transformation, selecting records. model building and validation involves choosing the best model based on the predictive performance deployment this stage combines the previous two by implementing the model that was choose and applying it to the data to generate predictions or pattern.
consists of five elements
-extract, transform, and load transaction data onto the data warehouse system -store and manage the data in a multidimensional database system -provide data access to business analysts and information technology professionals -analyze data by application software -present data in a useful format, such as a graph or table
Tools used for data mining :
I) artificial neural networks non linear predictive models that learn through training and resemble biological neural networks in structure
ii) decision tree tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset
Tools used for data mining : iii) logistic regression - is a type of regression analysis used for predicting the outcome of a categorical (a variable that can take on a limited number of categories) dependent variable based on one or more predictor variables.
iv) genetic algorithms optimization techniques that use processes such as genetic combination, mutation and natural selection in a design based on the concepts of natural evaluation
Tools used for data mining :
vi) data visualization the visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationship.
As companies started collecting and saving basics data in
computers, they were able to start answering detailed questions quicker and with more ease
Internal auditors can use spreadsheets to undertake simple data
mining exercises or to produce summary tables. Using the spreadsheet, auditors can review complex data in a simplified format and drill down where necessary to find the underlining assumptions or information
size of database
-the more data being processed and maintained, the more powerful the system required
query complexity
-the more complex the queries and the greater the number of queries being processed, the more powerful the system required
Limited use of outlier detection and visualization have
seen, may be due to difficulty of detecting outliers Cost sensitivity

Cost of misclassification (false positive and false
negative errors) differs

with a false negative error (misclassifying fraudulent activity
as a normal activity) more costly than a false positive error (misclassifying a normal activity as a fraudulent activity)
The right combination of imaginative human and
computer skills can work small wonders on large sets of data.

Large amount of data can be retrieved from
various website and database
http://www.analyticbridge.com/profiles/blogs/fraud-
detection-using-interactive-construction-andanalysis-of http://www.ijsce.org/attachments/File/NCAI2011/IJSC E_NCAI2011_025.pdf http://www.ijser.org/researchpaper%5CFraudDetection-of-Credit-Card-Payment-System-byGenetic-Algorithm.pdf Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group.

Understanding Data Mining Techniques and Their Importance in Future Planning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Data Mining Techniques and Their Importance in Future Planning

Uploaded by

Copyright:

Available Formats

understanding data mining Basis for future planning development

What is data mining?

Importance of data mining

How does data mining works?

the process of collecting, searching through, and

analyzing a large amount of data in a database, as to discover patterns or relationships

Bose & Mahapatra (2001) process of identifying

Data mining help analyzing data from

Classification build up and utilizes a model to

Outlier detection measure the distance between data

stages of data mining

consists of five elements

Tools used for data mining :

Tools used for data mining :

As companies started collecting and saving basics data in

Limited use of outlier detection and visualization have

seen, may be due to difficulty of detecting outliers Cost sensitivity

negative errors) differs

The right combination of imaginative human and

computer skills can work small wonders on large sets of data.

various website and database

You might also like