Professional Documents
Culture Documents
required? Problems
interesting patterns in databases that can then be used in decision making Turban et al. (2007) process that uses statistical, mathematical artificial intelligence and machine learning techniques to extract and identify useful information Frawley et al. (1992) objective of data mining is to obtain useful, non-explicit information from data stored in large repositories
different perspectives, different dimensions or angles and summarizing it into useful information
Plays important role in Financial Fraud
Detection (FDD) because often applied to extract and uncover the hidden truths behind very large quantities of data
Data
-any facts, numbers or text that can be processed by a computer. -this includes operational or transactional data such as sales, cost, inventory, payroll and accounting -nonoperational data such as industry sales, forecast data Information - the correlation or pattern among the data can provide useful information Knowledge -information can be converted into knowledge about historical patterns and future trends
predict the categorical labels of unknown objects to distinguish between objects of different classes Clustering divide objects into conceptually meaningful groups (clusters), with the objects in a group being similar to one another but very dissimilar to the objects in other groups Prediction estimates numeric and ordered future values based on the patterns of a data set
objects to detect those objects that are grossly different from or inconsistent with the remaining data set Regression statistical methodology used to reveal the relationship between one or more independent variables and a dependent variable Visualization easily understandable presentation of data and to methodology that converts complicated data characteristics into clear patterns or relationships uncovered in the data mining process
exploration this stage starts with preparing data such as data cleaning, transformation, selecting records. model building and validation involves choosing the best model based on the predictive performance deployment this stage combines the previous two by implementing the model that was choose and applying it to the data to generate predictions or pattern.
-extract, transform, and load transaction data onto the data warehouse system -store and manage the data in a multidimensional database system -provide data access to business analysts and information technology professionals -analyze data by application software -present data in a useful format, such as a graph or table
I) artificial neural networks non linear predictive models that learn through training and resemble biological neural networks in structure
ii) decision tree tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset
Tools used for data mining : iii) logistic regression - is a type of regression analysis used for predicting the outcome of a categorical (a variable that can take on a limited number of categories) dependent variable based on one or more predictor variables.
iv) genetic algorithms optimization techniques that use processes such as genetic combination, mutation and natural selection in a design based on the concepts of natural evaluation
vi) data visualization the visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationship.
computers, they were able to start answering detailed questions quicker and with more ease
Internal auditors can use spreadsheets to undertake simple data
mining exercises or to produce summary tables. Using the spreadsheet, auditors can review complex data in a simplified format and drill down where necessary to find the underlining assumptions or information
size of database
-the more data being processed and maintained, the more powerful the system required
query complexity
-the more complex the queries and the greater the number of queries being processed, the more powerful the system required
as a normal activity) more costly than a false positive error (misclassifying a normal activity as a fraudulent activity)
http://www.analyticbridge.com/profiles/blogs/fraud-
detection-using-interactive-construction-andanalysis-of http://www.ijsce.org/attachments/File/NCAI2011/IJSC E_NCAI2011_025.pdf http://www.ijser.org/researchpaper%5CFraudDetection-of-Credit-Card-Payment-System-byGenetic-Algorithm.pdf Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Monterey, CA: Wadsworth International Group.