Professional Documents
Culture Documents
Zhe shen
12/6/2017
DATA MINING 2
Numerous data mining processes have been identified and these include: KDD, SEMMA,
as well as CRISP-DM.
2. Why do you think the early phases (understanding of the business and
The early phases are the longest stages in data mining phases because they primarily
involve learning. Learning and understanding occurs in these phases and as a result they
cannot be automated. A lot of time must be taken to understand the business as well as
data as any mistakes made at these stages affects the entire data mining projects (Olson,
Business understanding
This phase primarily involves understanding the business objectives, the current situations
including the available resources, constraints, as well as assumptions. This phase also involves
developing a data mining plan as well as data mining objectives (Turban, Sharda & Delen,
2015).
Data understanding
DATA MINING 3
This phase involves collection of available data to familiarize with the data and analyze gross as
well as surface aspects of the data. The quality of data is examined to determine cases of missing
Data preparation
Data preparation takes most of the project’s time and the outcome constitutes the final data set.
After the identification of available data sources, the sources are constructed as well as formatted
as required.
Modeling
In this phase, the modeling techniques are selected and validated. Additional models are then
obtained by running the modeling tool in the obtained dataset. The models are then examined to
ensure they meet business initiatives (Olson, Shi & Shi, 2007).
Evaluation
The model results are then evaluated in the perspective of the business objectives identified
during the first phase. The decision to proceed or not is then made during this phase.
Deployment
The information obtained from the data mining process is then presented in a manner that can be
utilized by the stakeholders. The final report generated during this phase should provide an
overview of the experience during the data mining process and identify areas of improvement.
DATA MINING 4
4. What are the main data preprocessing steps? Briefly describe each step and provide
relevant examples.
Data integration: entails gathering, selecting, as well as channeling of information. This step
coordinates various databases to identify irregularities as well as redundancies (Han, Pei &
Kamber, 2011).
Data cleaning: this step involves cleaning the information to identify missing qualities, determine
irregularities, as well as eliminate anomalies. For example, fixing blunders as well as missing
information.
Data transformation: involves standardizing the information as well as introducing new traits.
For instance this step is identified with accumulation as well as standardization (Olson, Shi &
Shi, 2007).
Data reduction: entails creating lower volume of data while maintaining the same explanatory
results. Achieved for instance by equalizing skewed information or minimizing number of traits
as well as records.
There are numerous variations between SEMMA and CRISP-DM. SEMMA was not designed
for the general business environment as is the case with CRISP-DM that is applicable across
various data mining tools in the industry. SEMMA was primarily developed for SAS Enterprise
Miner. CRISP-DM constitutes a six phase model that proceeds from start to finish in the data
mining process (Olson, Shi & Shi, 2007). On the contrary SEMMA specifically focuses on SAS
DATA MINING 5
Enterprise Miner as well as model development overlooking the initial stages that are contained
References
Turban, E., Sharda, R., & Delen, D. (2015). Decision support and business intelligence systems.
Olson, D. L., Shi, Y., & Shi, Y. (2007). Introduction to business data mining (Vol. 10, pp. 2250-
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.