You are on page 1of 2

Data mining (also known as Knowledge Discovery in Databases - KDD) has been defined as "The nontrivial extraction of implicit,

previously unknown, and potentially useful information from data"[1] It uses machine learning, statistical and visualization techniques to discover and present knowledge in a form which is easily comprehensible to humans.

pattern matching is the act of checking some sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact. The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., search and replace).
What is KDD? It is not a syndrome (as I first thought when I heard about it) and it is not the name of a DJ either. And don't you dare to associate it with the annual conference of Knowledge Discovery and Data Mining organized by SIGKDD. It actually means Knowledge Discovery from Databases and the concept emerged in 1989 to refer to the broad process of finding knowledge in data. It is referring to the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Knowledge discovery differs from machine learning in that the task is more general and is concerned with issues specific to databases. Oops! Does it sound familiar to you? I bet it does. Then what the bleep is data mining? Actually, these 2 terms have been interchangeably used for several years. No distinction was made. Until a kind of consensus has been made within the community. We'll still have 2 terms but with slightly different understandings. The term KDD is now viewed as the overall process of discovering useful knowledge from data, while data mining is viewed as an application of some particular algorithms for extracting patterns from data without the additional steps of the KDD process, like data cleaning, data reduction, concept hierarchies generation and it can even go to the infrastructure of the project. To me, it sounds a bit fishy. I mean, how on earth would a new comer know this difference between KDD and data mining? The name doesn't tell you anything.

In machine learning, pattern recognition is the assignment of a label to a given input value. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse treeto an input sentence, describing the syntactic structure of the sentence. Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to do "fuzzy" matching of inputs. This is opposed to pattern

matchingalgorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors. In contrast to pattern recognition, pattern matching is generally not considered a type of machine learning, although pattern-matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similarquality output to the sort provided by pattern-recognition algorithms.

Through the use of automated statistical analysis (or "data mining") techniques, businesses are discovering new trends and patterns of behavior that previously went unnoticed. Once they've uncovered this vital intelligence, it can be used in a predictive manner for a variety of applications. Brian James, assistant coach of the Toronto Raptors, uses data mining techniques to rack and stack his team against the rest of the NBA. The Bank of Montreal's business intelligence and knowledge discovery program is used to gain insight into customer behavior.

You might also like