You are on page 1of 14

white paper

using analytic services data mining


framework for classification

predicting the enrollment of students at a university – a case study

ata Mining is the process of knowledge discovery involving finding


D hidden patterns and associations, constructing analytical models,
performing classification and prediction, and presenting mining results. Data
Mining is one of the functional groups that is offered with Hyperion System
9 BI+ Analytic Services – a highly scalable enterprise class architecture
analytic server (OLAP). The Data Mining Framework within Analytic Services
integrates data mining functions with OLAP and provides the users with
highly flexible and extensible on-line analytical mining capabilities. On-line
analytical mining greatly enhances the power of exploratory data analysis by
providing users with the facilities for data mining on different subsets of data
at different levels of abstraction in combination with the core analytic services
like drill up, drill down, pivoting, filtering, slicing and dicing – all performed
on the same OLAP data source.

introduction accuracy the likelihood that an applicant would eventually


This paper focuses on using Naïve Bayes, one of the Data enroll in an academic program. Universities typically incur a
Mining algorithms (shipped in-the-box with Analytic considerable expense in promoting their programs and in
Services) to develop a model to solve a typical business following up with prospective candidates. Identifying
problem in the admissions department at an academic applicants with a higher likelihood of enrollment into the
university – referred to as “ABC University” in the paper. The program will help the university channel the promotional
paper details out the approach that is taken by the user to solve expenditure in a more gainful way. The candidates typically
the problem and explains the various steps that are performed apply to more than one university to widen their chances of
by using Analytic Services in general and the Analytic Services getting enrolled within that academic year. Universities that
Data Mining Framework in particular, towards arriving at the can quickly arrive at a decision on the applicant stand a higher
solution. chance of getting acceptance from candidates.
ABC University collects from applicants a variety of data as
problem statement part of the admissions process: demographic, geographic, test
One of the problems related to managing admissions that scores, financial information, etc. In addition to that, the
typical universities face is to be able to predict with reasonable admissions department at the ABC University also has

hyperion.com
white paper

acceptance information from the previous year’s admissions available data


process. The problem at hand is to use all this available data The admissions department is currently gathering
and predict whether an applicant will choose to enroll or not. demographic, geographic, test scores, financial information,
The ABC University is also interested in analyzing the etc., from applicants as part of the admissions process. There
composite factors influencing the enrollment decision. This is also historical data available indicating the actual
additional analysis is useful in adjusting the admissions policy enrollment status of applicants along with all the other
at the university and also in ensuring effective cost attributes that were collected as part of the admission process.
management in the admissions department. The dataset made available has 33 different attributes for
each applicant inclusive of the decision result attribute. There
are in all about 11000 records available.

Table 1: List of potential mining attributes available in database

2 hyperion.com
white paper

preparing for data mining algorithms that are shipped in the box with the Analytic
Services Data Mining Framework, the Naïve Bayes and the
cube is the data source Decision Tree algorithms have the capability to handle both
The algorithms in the Data Mining Framework are designed to categorical as well as numerical mining attribute types and
work on data present within an Analytic Services cube. The treat them accordingly.
design of the cube should take into consideration the data One of the key steps in Data Mining is the data auditing or
needs for all kinds of analyses (OLAP and Data Mining) that the data conditioning phase. This involves putting together,
the user is interested in performing. Once the data is brought cleansing, categorizing, normalizing, and proper encoding of
into the cube environment it can then be accessed through the data. This step is usually performed outside the Data Mining
Data Mining Framework for predictive analytics. tool. The effectiveness of the Data Mining algorithm is largely
The Data Mining Framework uses MDX expressions to dependent on the quality and completeness of the source data.
identify sections within the cube to obtain input data for the In some cases, for various mathematical reasons, the available
algorithm as well as to write back the results. The Data Mining input data may also need to be transformed before it is
Framework can only take regular dimension members as brought into a Data Mining environment. Transformations
mining attributes. What this implies is that only data that is may sometimes also include splitting or combining of input
referenced through regular dimension members (not through data columns. Some of these transformations may be done on
attribute dimensions or user defined attributes) can be the input dataset outside the Data Mining Framework by
presented as input data to the Data Mining Framework. using standard data manipulation techniques available in ETL
Accordingly, the data that is required for predictive analytics tools or RDBMS environments. For the current case the input
should be modeled within the standard dimensions and data does not need any mathematical transformation, but
measures within a cube. some encoding is needed to convert data into a format that can
In the case study being discussed in this paper, the primary be processed within the Analytic Services OLAP environment.
business requirement was to build a classification model for In the current problem at the ABC University, the available
prediction. Since there were no other accompanying business set of input data consisted of both ‘string’ and ‘number’
requirements, the design of the Analytic Services cube was data types. The list below gives some of the input data,
primarily driven by the Data Mining analytics need. For which needed encoding of ‘string’ type input into ‘number’
example, we have not used any attribute dimension modeling type input:
in the case study. However, in the generic case it is more likely • Identity related data – like Gender, City, State, Ethnicity
that the cube caters to both regular OLAP analytics and • Data related to the application process – like Application
predictive analytics within the same dimensional model. Status, Primary Source of contact, Applicant Type, etc.
preparing mining attributes • Date related data – like Application Date, Source Date, etc.
(Dates were available in the original dataset as strings,
The available input data can broadly be of two data types –
specifically they had two different formats – “yymmdd”
‘number’ or ‘string’. However, since measures in Analytic
and “mm/dd/yy”, and they had to be encoded into a number.)
Services are essentially stored in the database in a numerical
format, the ‘string’ type input data will have to be encoded into In the current case study, these encodings were done
a ‘number’ type data before being stored in Analytic Services. outside the Analytic Services environment by the construction
For example, if the gender information is available as a string of look-up master tables where the ‘string’ type input were
stating ‘Male’ or ‘Female’ it needs to be first encoded into a listed in a tabular format and the records were sequentially
numeric – like ‘1’ or ‘0’, before being stored as a measure in the numbered. Subsequently, the ‘string’ type input was referred to
Analytic Services OLAP database. by its corresponding numeric identifier during data load into
Mining attributes can be of two types – ‘categorical’ or Analytic Services. Table 2 shows a few samples of how such
‘numerical’. Mining attributes that describe discrete mapping files will look like.
information content like gender (‘Male’ or ‘Female’), zip code
State State AppliedStatus Application Status
(95054, 94304, 90210, etc.), customer category (‘Gold’, ‘Silver’,
ID Name ID
‘Blue’), status information (‘Applied’, ‘Approved’, ‘Declined’, 1 VT 3 Applied
‘On Hold’), etc. are termed ‘categorical’ attribute types. 2 CA 4 Offered Admission
Mining attributes that describe continuous information 3 MA 5 Paid Fees
content like sales, revenue, income, etc. are termed ‘numerical’ 4 MI 6 Enrolled
attribute types. The Analytic Services Data Mining Framework 5 NH
has the capability of working with algorithms that can handle 6 NJ
both categorical and numerical attribute types. Among the Table 2: Typical mapping of numeric identifiers

hyperion.com 3
white paper

preparing the cube • Data load is performed just as it is normally done for any
After all the input data has been identified and made ready, the Analytic Services cube.
next step is to design an outline and load the data into an At this stage we have:
Analytic Services cube. • Designed an Analytic Services cube
In the context of the current case the Analytic Services • Loaded it with relevant data
outline created was as follows:
It should be noted that the steps described so far are
• All the input data (measures in the OLAP context) were
generic to Analytic Services cube building and did not need
organized together into five groups (a two level hierarchy
any specific support from the Analytic Services Data Mining
created in the measures dimension) based on a logical
Framework.
grouping of measures. The details of each of the measure
are explained in the table below -Table 3: Analytic Services
outline expanded.

Measure Group Explanation

Measures related to information about the applicants’ identity were organized into this
group. Some of these measures were transformed from ‘string’ type to ‘number’ type
to facilitate modeling it within the Analytic Services database context.

Measures related to various test scores and high school examination results were
organized into this group.

Measures related to the context of the applicants application processing have been
organized together into this group.

Measures related to the academic background.

Measures providing information about the financial support and funding associated
with the applicant.

Table 3: Analytic Services outline expanded

4 hyperion.com
white paper

identifying the optimal set of attributes, grouped by the input attribute type – categorical or
mining attributes numerical.
It is necessary to reduce the number of attributes / variables
presented to an algorithm so that the information content is Categorical Type Numerical Type
FARecieved StudBudget
enhanced and the noise minimized. This is usually performed
AppStatus TotalAward
using supporting mathematical techniques to ensure that the
Applicant Type
most significant attributes are retained within the dataset that
is presented to the algorithm. It should be noted here that the Table 4: Optimal set of mining attributes identified
choice of significant attributes are more driven by the
particular data rather than by the problem itself. Attribute At this stage we have:
analysis or attribute conditioning is one of the initial steps in • Designed an Analytic Services cube
the Data Mining process and is currently performed outside • Loaded it with relevant data
the Data Mining Framework. The main objective during this • Identified the optimal subset of measures (mining attributes)
exercise is to identify a subset of mining attributes that are
modeling the problem
highly correlated with the predicted attribute; while ensuring
We will now use the Data Mining Framework to define an
that the correlation within the identified subset of attributes is
appropriate model (for the business problem) based on the
as low as possible.
Analytic Services cube and the identified subset of mining
The Analytic Services platform provides for a wide variety
attributes (measures). Setting up the model includes selecting
of tools and techniques that can be used in the attribute
the algorithm, defining algorithm parameters and identifying
selection process. One method to identify an optimal set of
the input data location and output data location for the
attributes is to use certain special data reduction techniques
algorithm.
implemented within Analytic Services through Custom
Defined Functions (CDFs). Additionally, users can use other choosing the algorithm
data visualization tools like Hyperion Visual Explorer to arrive The next step in the Data Mining process is to pick the
at a decision on the effectiveness of specific attributes in appropriate algorithm. There are a set of six basic algorithms
contributing to the overall predictive strength of the Data provided in the Data Mining Framework – Naïve Bayes,
Mining algorithm. Depending on the nature of the problem Regression, Decision Tree, Neural Network, Clustering and
the users may choose to utilize an appropriate tool and Association Rules. The Analytic Services Data Mining
technique in deciding the optimal set of attributes. Framework also allows for the inclusion of new algorithms
One of the advantages of working with the Analytic through a well defined process described in the vendor guide
Services Data Mining Framework is the inherent capability in that is part of the Data Mining SDK. The six basic algorithms
Analytic Services to support customized methods for attribute are a sample set that is shipped with the product to provide a
selection by the use of Custom Defined Functions (CDFs). starting point for using the Data Mining Framework.
This is essential since the process of mining attribute selection Choosing an algorithm for a specific problem needs basic
can vary significantly across various problems and having an knowledge of the problem domain and the applicability of
extensible toolkit comes in very handy to be able to customize specific mathematical techniques to efficiently solve problems
a method to suit a specific problem. in that domain.
In the current case at ABC University, a CDF was used to The specific problem that is being discussed in this paper
identify the correlation effects amongst the available set of falls into a class of problems termed as classification problems.
mining attributes. A thorough analysis of various subsets of The need here is to classify each applicant into a discrete set of
the available mining attributes was performed to identify a classes on the basis of certain numerical and categorical
subset that is highly correlated with the predicted mining information available about the applicant. The ‘class’ referred
attribute and at the same time has low correlation scores to in this context is the status of the applicants application
within the subset in itself. Since some Data Mining algorithms looked at from an enrollment perspective: “will enroll” or “will
(like Naïve Bayes, Neural Net) are quite sensitive to inter- not enroll”. There is historical data available indicating which
attribute dependencies, an attempt was made to outline the kind (with a specific combination of categorical and
clusters of mutually dependent attributes, with a certain numerical factors associated with them) of applicants that
degree of success. From each cluster a single, most convenient, have gone ahead and accepted offers from the ABC University
attribute was selected. For this case study, an expert made the and subsequently enrolled into the programs. There is data
decision, but this process can be generalized to a large degree. available for the negative case as well – i.e. applicants that did
An optimal set of five mining attributes was identified after not eventually enroll into the program.
this exercise. Table 4 shows the list of identified mining

hyperion.com 5
white paper

Given the fact that this problem can be looked at as a in effectively using the Data Mining functionality to provide
classification problem and the fact that there is historical predictive solutions to business problems.
information available, one of the algorithms that is suitable for 1. Building the Data Mining model
the analysis is the Naïve Bayes classification algorithm. We 2. Testing the Data Mining model
chose Naïve Bayes for modeling this particular business 3. Applying the Data Mining model
problem.
Each of these steps, performed using the Data Mining
deciding on the algorithm parameters Wizard in the Administration Services Console, uses MDX
Every algorithm has a set of parameters that control the expressions to define the context within the cube to perform
behavior of the algorithm. Algorithm users need to choose the the data mining operation. Various accessors, specified as
parameters based on their knowledge of the problem domain MDX expressions, identify data locations within the cube. The
and the characteristics of the input data. Analytic Services framework uses the data in the locations as input to the
provides adequate support for such preliminary analysis of algorithm or writes output to the specified location.
data using Hyperion Visual Explorer or the Analytic Services Accessors need to be defined for each of the algorithms so
Spreadsheet Client. Users are free to analyze the data using any as to let the algorithm know specific contexts for each of the
tool convenient and determine their choices for the various following:
algorithm parameters. • (the attribute domain) the expression to identify the fac-
Each of the algorithms has a set of parameters that tors of our analysis that will be used for prediction [In the
determine the way the algorithm will process the input data. current context this expression pertains to the mining
For the current case, the algorithm chosen is Naïve Bayes and attributes that we identified]
it has four parameters that need to be specified – “Categorical, • (the sequence domain) the expression to identify the
Numerical, RangeCount, Threshold”. The details of each of the cases/records that need to be analyzed [In the current
parameters and the implications of setting them are described context this expression will identify the list of applicants]
in the online help documentation. • (the external domain) the expression to identify if multiple
Out of the selected list of attributes we have a few that are models need to be built [Not relevant in the current
of categorical type and hence our choice for the ‘Categorical’ context]
parameter is a ‘yes’. Similarly, there are attributes that are of • (the anchor) the expression to specify the additional
numerical type and hence the choice for ‘Numerical’ restrictions from dimensions that are not really partici-
parameter also is a ‘yes’. The data was analyzed using a pating in this data mining operation [In the current con-
histogram plot to understand the distribution before deciding text all the dimensions of the cube that we used have
on the value to be provided for the ‘RangeCount’ parameter. relevance to the problem. Accordingly, the anchor in the
This parameter needs to be large enough to allow for the current context only helps restrict the algorithm scope to
algorithm to use all the variety available in the data and at the the right measure in the ‘Measures’ dimension]
same time should be small enough to prevent over fitting. Additional details for each of these expressions can be
From the analysis of the input data for this particular case, obtained from the online help documentation.
setting this parameter ‘12’ seemed reasonable. The
‘RangeCount’ controls the binning1 process in the algorithm. building the data mining model
It should be emphasized that the binning schemes (including To access the Data Mining Framework, you will need to bring
bin count) really depend on the specific circumstances and up the Data Mining Wizard in the Administration Services
may vary to a great degree between different problems. Console, and choose the appropriate application and database
At this stage we have: as shown in Figure 1 on the next page.
• Designed an Analytic Services cube
• Loaded it with relevant data
• Identified the optimal subset of measures (mining attributes)
• Chosen the algorithm suitable for the problem
• Identified the parameter values for the chosen algorithm

applying the data mining framework


Now that we have completed all the preparatory steps for Data
Mining, the next step is to use the Data Mining Wizard in the
Administration Services Console to build a Data Mining
model for the business problem. There are three steps involved

6 hyperion.com
white paper

Figure 1: Choosing the application and database

In the next screen (Figure 2 below), depending on whether you choose the appropriate task option.
are building a new model or revising an existing model, you

Figure 2: Creating a Build Task

hyperion.com 7
white paper

Figure 3: Settings to handle missing data

This will bring up the wizard screen for setting the The Naïve Bayes algorithm requires that we declare upfront
algorithm parameters and the accessor information associated if we plan to use either or both of ‘Categorical’ and ‘Numerical’
with the chosen algorithm, in this case Naïve Bayes. The user predictors. In the context of the current case, we have both
will select a node in the left pane to see and provide values for categorical and numerical attribute types and hence the choice
the appropriate options and fields displayed in the right pane. is ‘True’ for both these parameters. ‘RangeCount’ was decided
As shown in Figure 3, select “Choose mining task settings” to at 12. ‘Threshold’ was fixed at 1e-4, a very small value. Figure
set how to handle missing data in the cube. The choice in this 4 shows the completed screen for the parameters setting.
case is to replace with ‘As NaN’ (Not-A-Number).

Figure 4: Setting parameters

8 hyperion.com
white paper

The Naïve Bayes algorithm has two predictor accessors – were used for the case being discussed. All the information
‘Numerical Predictor’ and ‘Categorical Predictor’ and one provided during this stage of model building is preserved in a
target accessor. Figure 5 shows the various domains that need template file so as to facilitate reuse of the information if
to be defined for the accessors. Table 5 shows the values that necessary.

Figure 5: Accessors associated with Naive Bayes algorithm

Table 5: Setting up accessors for the “build” mode while using Naive Bayes algorithm

hyperion.com 9
white paper

Figure 6: Generating the template and model

Once the accessors are defined, the Data Mining Wizard model that is developed by the use of the algorithm. Testing
will prompt the user to provide names for the template and the model on this test dataset and comparing the outcomes
model that will be generated at this stage. Figure 6 shows the predicted by the model against the known outcomes
screen in which the model and template names need to be (historical data) is also one among the multiple processes
defined. supported by the Data Mining Wizard. A ‘test’ mode template
At this stage we have: can be created by a process similar to creating a ‘build’ mode
• Built a Data Mining model built using the Naïve Bayes template as described in the previous section. While building
algorithm the ‘test’ mode template the user needs to provide a
‘Confidence’ parameter to let the Data Mining Framework
testing the data mining model know the minimum confidence level necessary to declare the
The next step will be to test the newly built model to verify that model as a valid one. We specified a value of 0.95 for the
it satisfies the level of statistical significance that is needed for ‘Confidence’ parameter. The exact steps in the wizard and
the model to be put to use. Ideally, a part of the input data descriptions of the various parameters can be obtained from
(with valid known outcomes – historical data) will be set aside the online help documentation.
as a test dataset to verify the goodness of the Data Mining

10 hyperion.com
white paper

Once the process is completed the results of the test appear If the ‘Test’ accessor has a value 1.0 then the test is deemed
(the name of which was specified in the last step of the Data successful and the model is declared ‘good’ or ‘valid’ for
Mining Wizard) against the ‘Model Results’ node. Figure 7 prediction. Figure 9 shows the result of test for the case being
shows the node in the Administration Services Console discussed in this paper.
‘Enterprise View’ pane where the ‘Mining Results’ node is At this stage we have:
visible. • Built a Data Mining model built using the Naïve Bayes
The model can be queried within the Administration algorithm
Services Console interface to obtain a list of the model • The model has been verified as valid with 95% confidence
accessors by using the “Query Result” functionality. Invoking
“Show Result” for the ‘Test’ accessor will indicate the result of
the test. Figure 8 below shows the list of model accessors in the
result set of a model based on the Naïve Bayes algorithm used
in the test mode.

Figure 7: Model Results node in the Administration Figure 8: Model accessors for result set associated with a
Services Console interface model based on Naive Bayes algorithm

Figure 9: Test results

hyperion.com 11
white paper

applying the data mining model interpreting the results


The intent at this stage is to use the recently constructed Data The results of the Data Mining model need to be interpreted
Mining model to predict whether new applicants are likely to in the context of the business problem that it is attempting to
enroll into the program. Using the Data Mining model in the solve. Any transformation done to the input measures need to
apply mode is similar to the earlier two steps. The Data Mining be appropriately adjusted for while attempting to interpret the
Wizard guides the user to provide the parameters appropriate results. In the context of the case being discussed in this paper,
to the ‘apply’ mode. The ‘Target’ domain is usually different in the intent was to predict whether applicants were likely to
the ‘apply’ mode since data is written back to the cube. The enroll at the ABC University. The possible outcomes in this
details of the various accessors and the associated domains can case are either the applicant will enroll or the applicant will
be obtained from the online help documentation. Table 6 not enroll. The model was verified against the entire set of
shows the values that were provided to the Data Mining available data (over 11300 records).
Wizard to use the model in the ‘apply’ mode.
Just as in the ‘build’ mode the names of the results model
the confusion matrix
and template are specified in the wizard and the template is You can construct a confusion matrix by listing the ‘false
saved before the model is executed. The results of the positives’ and ‘false negatives’ in a tabular format. A ‘false
prediction are written into the location specified by the positive’ happens when the model predicts that an applicant
‘Target’ accessor – The mining attribute that is referred to by will enroll and in reality the applicant does not enroll. A ‘false
the MDX expression: {[ActualStatus]}. The results can be negative’ happens when the model predicts that an applicant
visualized either by querying the model results in the will not enroll and in reality the applicant does enroll. The
Administration Services Console using the “Query Result” results predicted by the model can be compared with the
functionality as described in the previous section, or by actual outcome as available in the historical data to build the
accessing the cube and reviewing the data written back to the confusion matrix. In general for such classification problems,
cube. One of the options to view the results will be to use the it is most likely that one of these (‘false positives’ or ‘false
Analytical Services Spread Sheet Client to connect to the negatives’) will be slightly more important than the other in a
database and view the cube data for the ‘ActualStatus’ measure. business context. In the case being discussed in this paper, a
‘false negative’ means lost revenue, whereas a ‘false positive’

Table 6: Setting up accessors for the “apply” mode while using Naive Bayes algorithm

12 hyperion.com
white paper

means additional promotional expenditure in trying to follow Mining Wizard. The details of each of these transformations,
up on an applicant who will eventually not enroll. The what they do and how to use them can be obtained from the
importance of each should be analyzed in the context of the Analytic Services online help documentation. This list of
business and the model needs to be rebuilt if necessary with a transformations is further extensible through the import of
different training set (historical data) or with a different set of custom Java routines written specifically for the purpose. The
attributes. details of how to write Java routines to be imported as
Figure 10 below shows the confusion matrix constructed additional transforms can be obtained from the vendor guide
using the data set that was analyzed as part of this case study. that is shipped as part of the Data Mining SDK
It is evident from the confusion matrix that the model
mapping
predicted that 1550 (1478 + 72) students will enroll. Of that,
only 1478 actually enrolled and 72 did not enroll. This implies In some cases when the model has been developed for a
that there were 72 false positives. Similarly, the model different context and needs to be used elsewhere, the
predicted that 9805 (9356 + 449) students will not enroll. Of ‘Mapping’ functionality is useful. Through this functionality
that, only 9356 actually did not enroll, whereas 449 actually the user can provide information to the Data Mining
did enroll. This implies that there were 449 false negatives. Framework on how to interpret the existing model accessors
in the new context in which it is being deployed. More
information on using this functionality can be obtained from
the online help documentation.

import/export of pmml models


The Data Mining Framework allows for portability through
import and export of mining models using the PMML format.

setting up models for scoring


The Data Mining models built using the Analytic Services
Figure 10: Confusion matrix to analyze the model’s
Data Mining Framework can also be set up for ‘scoring’. In the
effectiveness in prediction
‘scoring’ mode the user interacts with the model at real time
and the results are not written to the database. The input data
analyzing the results
can either be sourced from the cube or through data templates
On further analysis of the results the following observations which the user fills up during execution. The ‘scoring’ mode of
can be made: deployment can be combined with custom applications built
using developer tools provided by Hyperion Application
Incorrect Predictions # of Cases Percentage of Cases
False positives 72 0.634% Builder to make applications that cater to a specific business
False negatives 449 3.954% process while leveraging powerful predictive analytic
Total 521 4.59% capability from the Analytic Services Data Mining Framework.
The online help documentation provides additional details on
how to ‘score’ a Data Mining model.

Success rate of the model: 95.41% (only 521 incorrect using the data mining framework
predictions in 11355 cases) in batch mode
There is also a batch mode interface to access the
additional functionality functionalities provided in the Data Mining Framework.
The Analytic Services Data Mining Framework offers more Scripts written using the MaxL command interface can be
functionality that can be used when deploying models in real used to do almost all the functionality that is exposed through
business scenarios. Some of the further steps that can be the Data Mining Wizard. Details of the MaxL commands and
considered include: their usage can be obtained from the online help
transformations documentation.
The Data Mining Framework also offers the ability to apply a building custom applications
transform to the input data just before it is presented to the Custom applications can be developed using Analytic Services
algorithm. Similarly, the output data can be transformed as the backend database and developer tools provided along
before being written into the Analytic Services cube. The Data with Hyperion Application Builder. The functionality
Mining Framework offers a basic list of transformations – exp, provided by the Data Mining Framework can be invoked
log, pow, scale, shift, linear that can be used through the Data through APIs.

hyperion.com 13
white paper

summary suggested reading


Data Mining is one of the functional groups among the 1. Data Mining: Concepts and Techniques
comprehensive enterprise class analytic functionalities offered Jiawei Han, Micheline Kamber
within Analytic Services. This case study focused on using the
‘Naïve Bayes’ algorithm to solve a classification problem, 2. Data Mining Techniques: For Marketing, Sales, and Customer
modeled using a real life data set. It was possible to get a Relationship Management
95.41% success rate in the classification exercise using the Michael J. A. Berry, Gordon S. Linoff.
Analytic Services Data Mining Framework.
3. Data Mining Explained
Some of the business benefits of Data Mining in the OLAP Rhonda Delmater, Jr., Monte Hancock
context that can be illustrated from the current case include:
• It can serve as a discovery tool in a critical decision- 4. Data Mining: A Hands-On Approach for Business
support process. It includes evaluation of the critical Professionals (Data Warehousing Institute Series)
parameters affecting the outcome of a customer (appli- Robert Groth
cant) behavior. The ABC University had initially assumed
that some time-related factors played a stronger role in
footnote
influencing the judgment to enroll. The Data Mining 1 Breaking up a continuous range of data into discrete
exercise proved it not to be true. In fact, some other, finan-
segments / bins.
cial attributes appeared as number one.
• The successful prediction mechanism can become a base
for a full-blown risk-management application. In case of
ABC University, again, they can devise a policy to invest
more promotional expenditure in tracking applicants
with distinctly higher academic credentials but with mod-
erate probability of enrollment. Similarly, the prediction
mechanism can help the admissions department in
making decisions on admission offers even before they
have seen the entire applicant pool.
• Operational control and reporting tool. Traditional OLAP
reporting can provide visibility into the state of the
admissions operations, extent of funds utilization and
reporting on various other financial/operational indica-
tors; in all providing better control on the conformance
between planned and actual business positions.

                                    
                           ,      
     .   .       .    .  
© Copyright 2005 Hyperion Solutions Corporation. All rights reserved. “Hyperion,” the Hyperion “H” logo, and Hyperion’s product names are trademarks of Hyperion. References to
other companies and their products use trademarks owned by the respective companies and are for reference purpose only. 5164_0805

hyperion.com

You might also like