The Application of Data Mining Techniques and Multiple Classifiers To Marketing Decision

International Journal of Electronic Business Management, Vol. 3, No. 4, pp.
301-310 (2005)
301
THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Yu-Min Chiang*, Yu-Chieh Lo and Shang-Yi Lin Department of Industrial Engineering and Management I-Shou University Kaohsiung (840), Taiwan
ABSTRACT
Nowadays, the capability to collect data has been expanded enormously and provides enterprises huge amount of data. The interesting knowledge or the high-valued information about the customer can be extracted by data mining. By following the market segmentation strategy, an enterprise could increase the expected profits. However, as for the enterprise, the customers basic data including some demographic variables and geographic variables are easier to obtain than the behavior data of customers. The customer value may be predicted through the customers basic data. Followed by taking marketing activities to those customers with high value, the enterprises could avoid unnecessary marketing cost. The research constructed a marketing decision model which utilized the demographic and geographic variables as input of three individual classifiers - BP network, decision tree, and Mahalanobis distance - to predict a new customers value. In order to improve the accuracy of prediction, this study combined three classifiers to predict new customers value through the different combining methods. The results show the multiple classifier system combined by BP has the best prediction accuracy. Keywords: Customer Relation Management, Data Mining, Neural Network, Mahalanobis Distance, Decision Tree
1. INTRODUCTION
*
The issues of customer relationship management (CRM) have attracted many concerns nowadays and business operation model has gradually turned from product-focused to customer-centric. Enterprises have come to realize that customer information is one of their key assets. As the enterprises explore the customer behavior in depth, they find that not all customers will bring profits and just a small percentage of all users of the productsthe best customersaccount for a large portion of an organizations sales. Since customers are different, concentrating on the heavy user market segment is an attractive strategy. A CRM system is a process to compile information that increases understanding of how to manage an organizations relationships with its customers [16]. By cooperating with the marketing activities, CRM systems can bring a lot of profit and help enterprises to survive in a continuous changing and competitive environment. CRM systems play the roles in collecting customers data and intensifying the relationship between customers and organizations thus achieving the objectives of establishing customer loyalty. Most
*
Corresponding author: ymchiang@isu.edu.tw
businesses agree that it costs six times as much to get a new customer as it does to keep an old customer. Prahalad and Ramaswamy [10] showed that if an organization can increase 5% of customer retention rate, the profits from customers will move up 25% on average. General speaking, customer data of an organization is gathered from the interactions with customers, such as customers basic data and the sales transaction data. By analyzing the data, the organization can understand customer differences. Once organizations learn more and more about their customers, they can use that knowledge to serve them better. Data mining is a well-known analytic technique that can be used to turn customer data into customer knowledge. Typically, 20 percent of the customers buy 80 percent of the product sold, and it is the famous 80/20 principle. These 20 percent are heavy users or and may be the best customers. In terms of marketing, focus on heavy users can get more revenues. Therefore, organizations should divide a heterogeneous market into a number of smaller, more homogeneous subgroups, and that is called customer segmentation. In the past, most customer segmentation was differentiated by RFM (Recency, Frequency and Monetary) scores or consumption patterns. The segmentation process is implemented by mining huge historical transaction
302
International Journal of Electronic Business Management, Vol. 3, No. 4 (2005) segmentation. The purpose of segmentation is to identify behavioral segments and to tailor products, services, and marketing messages to each segment. [5,13] Data mining should be embedded in a business CRM strategy that spells out the actions to be taken as a result of what is learned through DM. 2.2 Multiple Classifiers The combination of multiple classifiers has been used for practical applications to improve classification accuracy. The combination methods can be divided into two categories: serial combination and parallel combination. Multiple classifiers with serial combination links single classifier in a sequence. The output of a classifier is passed to the classifier in the next position in the sequence. While parallel combination approach considers all the output of the classifiers and integrates them by a combination algorithm. Previous methods for parallel classifiers combination include majority vote, nave Bayes, behavior-knowledge space method, Borda count, and neural network. Schiele [12] deemed that combining only few classifiers can obtain good performance, and combine the suitable number of classifiers can increase the robustness of classifier. He also mentioned combining complementary classifier can raise the robustness and the accuracy thus the total classifier can be attained. It is practicable to combine simple classifier rather than design a single complex classifier. When the data is binary, the most used combination method is majority vote. The major limitation of majority vote is that the number of classifiers is odd. If the data is continuous type, the way to combine includes Bayes method, maximum precision, decision tree, self organizing map (SOM), and so on. Table 1 summarizes previous research in multiple classifiers and the results. They all showed that multiple classifiers will perform better than the single classifier. 2.3 Artificial Neural Network An artificial neural network is an information processing technology by the biological brain and their neural system. Artificial neural network is a kind of computing system that consists of software and hardware. It offers a mathematical model that attempts to mimic the ability of organisms nervous system and learn from experience. An artificial neuron imitates the biological neuron which receives signals or inputs from outside environment or other neurons. After the summation and transfer computation, the neuron will output a signal to other neurons or the outside environment. Neural network learning can be supervised or unsupervised. Learning is accomplished by modifying network connection weights while a set of input instances is repeatedly passed through the network. Neural networks - the
data. Owing to the easy obtained property of the customers basic data, if an organization could make the marketing decision for new coming customers according to their basic data, it can avoids unnecessary marketing cost and complex data mining process. The aim of this study is to test the feasibility of using the customers basic data including demographic variables and geographic variables in marketing decision. To that purpose, the customer data including the basic data and the purchasing transaction data of a warehouse in southern Taiwan will be used. The research focused on the skin care products and their corresponding transaction data during June to November in 2004. The study applies some classifiers to categorize customers into two groups by the customers basic data. One group is the potential high-value customers and organizations should take marketing activities on customers in this group. The other one is the customers with relatively low-value and neednt to take much marketing effort. Three kinds of classifier including back-propagation (BP) neural network, decision tree, and Mahalanobis distance, are used for customer classification. Recently, multiple classifier systems have been used for practical applications to improve classification accuracy [4,11,15]. Consequently, the research will combine the individual classifiers to a multiple classifier system to get better classification accuracy. This paper is organized as follows: first, the issues of data mining techniques, multiple classifiers, and the neural network are presented. The following section is devoted to describe to describe the data and the analyses methodology. In the next section, the performance of three individual classifiers and the multiple classifier system is reported. The final section summarizes the finding of the research and outlines some suggestions for future research.
2. LITERATURE REVIEW
2.1 Data Mining Data mining (DM) is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules [1]. Given the enormous size of databases, DM is the technology for knowledge discovery in databases. DM is an interdisciplinary field that combines statistics, database management, computer science, artificial intelligence, machine learning, and mathematical algorithms. This technology provides different methodologies for decision-making, problem solving, analysis, diagnosis, integration, learning, and innovation [8]. Berry and Linoff [1] defined six common DM tasks: classification, estimation, prediction, affinity grouping or market basket analysis, clustering and profiling. A popular application of data mining with CRM is customer
Y. M. Chiang et al.: The Application of Data Mining Techniques and Multiple Classifiers artificial is usually dropped - continuous to grow in popularity within the business, scientific, and academic worlds. They are powerful tool readily applied to prediction, classification, and clustering.
303
3. METHODOLOGY
This research intends to apply data mining techniques to segment customers based the transaction data, and to construct a multiple classifier system to predict the value of a new customer by utilizing the demographic and geographic variables of customers basic data. The architecture of the
research is outlined in Figure 1. The study first applies back-propagation network (BPN), Mahalanobis distance (MD), and decision tree individually as the individual classifiers, then combines the three individual classifiers to a multiple classifier system. The parallel combination methods adopted in the research include majority vote, BPN, and SOM network. The prediction accuracy of the individual classifier and the multiple classifier system will be compared by an empirical case. The following subsections will introduce the three individual classifier adopted in the study.
Authors Xu et al. [15] Wang et al. [14] Sboner et al. [11]
Table 1: Researches about multiple classifiers and their contents Research Contents They used Bayesian formalism, voting principle, and Dempster-Shafer belief theory to combine different classifiers. The paper proposed a Kohonen self-organizing neural network to integrate the results of 5 other neural network classifiers. They used radar recognition problem as demonstration. The proposed combination method is insensitive to classifier correlation. They combined linear discriminant algorithm, k-NN and decision tree by majority votes to classify 152 skin images.
Customers demographic data and geographic data
Classifier 1
Classifier 2
Classifier 3
Customer value
low
Dont take marketing activities
high Take marketing activities
Figure 1: The architecture of the research 3.1 Back-propagation Neural Network Back-propagation network is a feed-forward neural network that calculates output values from input values. The structure of the network is typical of networks used for prediction and classification. The neurons are organized into three layers, as shown in Figure 2. Each neuron in the input layer is connected exactly one input variable and it does not do any computation, which means it copies its input value to its output value. In the study, the input variables are the customers basic data. Before inputting them to the network, the variables should be normalized. The next layer is called the hidden layer because it is connected neither to the inputs nor to the output of the network. Each neuron in the hidden layer is fully connected to all neurons in the input layer and calculates its output by multiplying the value of each input by its corresponding weight, adding these up, and applying the transfer function. The output layer is to represent the output variable of the network. Neurons in the output layer are fully connected to all neurons in the hidden layer. The back-propagation learning is to adjust the connected weights to minimize the error that is the difference
304
International Journal of Electronic Business Management, Vol. 3, No. 4 (2005)

T (t ) = [T1 (t ), T2 (t ), L , Tn (t )]
between the network output and the target (actual result). The error is fed back through the network. The adjustment process will continue until the error converges to an acceptable value. The neural network is used to calculate a single value that represents whether to take marketing activities, so there is only one neuron in the output layer. In consideration of network parameters setting, the research applied the design of experiments to determine the best network parameters.
(2)
Step 4: Calculate the output of the hidden layer ( Yi ) and the output of the output layer ( Y j ).
neti (t ) = [Whi (t ) X h (t )] i (t )
h
(3) (4) (5) (6)
Yi (t ) = f (neti (t )) =
net j (t ) = Wij (t ) Yi (t ) j (t )
i
1 + exp neti (t )
Y j (t ) = f net j (t ) =
1 1 + exp
net j (t )
Step 5: Calculate the error . The error of the output layer ( j ) is calculated by the following equation:
j (t ) = Y j (t ) [1 Y j (t )] [T j (t ) Y j (t )]
Figure 2: The structure of the back-propagation neural network The learning algorithm of BP network is introduced briefly as follows. Nomenclature m: Number of neurons in input layer l: Number of neurons in hidden layer n: Number of neurons in output layer : Learning rate of the network
(7)
The error of the output layer ( i ) is calculated as follows.
i (t ) = Yi (t ) [1 Yi (t )] [Wij (t ) j (t )]
j
(8)
Step 6: Calculate the amount of the weight change ( W ) and bias change ( ). From hidden layer to output layer:
Whi : Weight matrix of input layer to hidden layer,

h = 1,2,L , l , i = 1,2, L , m
Wij (t ) = j (t ) Yi (t )
j (t ) = j (t )
(9) (10)
Wij : Weight matrix of hidden layer to output layer,
i : j:
X: T: Y:
i = 1,2, L , m , j = 1,2, L , n
Bias vector of hidden layer, i = 1,2, L , m Bias vector of output layer, j = 1,2, L , n Input vector Target vector Output vector Error
From input layer to hidden layer:
Whi (t ) = i (t ) X i (t ) i (t ) = i (t )
(11) (12)
Step 7: Update all the weights and biases. From hidden layer to output layer:
Algorithm Step1: Set the learning rate as
j (t + 1) = j (t ) + j (t )
Wij (t + 1) = Wij (t ) + Wij (t )
(13) (14)
Step2: Randomize the initial values of Whi , Wij ,
i , and j .
Step 3: Input a training sample with input vector X and target vector T. Suppose its the t-th sample.
X (t ) = [ X 1 (t ), X 2 (t ),L, X l (t )]
From input layer to hidden layer:
Whi (t + 1) = Whi (t ) + Whi (t ) i (t + 1) = i (t ) + i (t )
(15) (16)
(1)
Y. M. Chiang et al.: The Application of Data Mining Techniques and Multiple Classifiers Step 8: Repeat Step 3 to step 7 until the error converges to a specific value or executing the specified number of learning cycles. 3.2 Decision Tree A decision is a simple structure where non-terminal nodes represent tests on one or more attributes and terminal nodes represent decision outcomes. Decision trees are powerful and popular for both prediction and classification. Decision trees are easy for us to understand and can be transferred into rules thus they are very attractive. The path from root node to the terminal node forms the classification rules. Figure 3 shows a general tree structure. Decision trees are constructed using only those attributes best able to differentiate the concepts to be learned. Repeatedly split the data at each node into smaller and smaller groups in such a way that each new generation of nodes has greater purity than its ancestors with respect to the larger variable [1]. At the start of the process, there is a training set consisting of preclassified records, the target. The tree is built by splitting the records at each node according to a function of a single input variable. The remaining training set instances test the accuracy of the constructed trees. If the decision tree classifies the instances correctly, the process terminates.
ROOT Condition 1
305
data are difficult to explore the relationship among them. The Mahalanobis distance (MD) is a distance function which is used to measure the homogeneity between multivariate data by the covariance matrix. It is also considered as one of the binary classification methods. When the data are homogeneous, the MD will be small, and in most case the MD is less than 2. The MD has significance in pattern recognition [7]. It can also be used as the core of a manufacturing control system [6]. Suppose a customer has k characteristics (Z1..Zk) that can be used for classification. Let Zij mean the i-th characteristic of the j-th customer, i=1,,k, and j=1,,n, where k is the number of the field of customers basic data, and n is the number of customers. The original data should be normalized by Eq. (17) before calculating the MD. zij = Zij mi
(17)
where
1 mi = (Zi1 + Zi 2 + K + Zin ) n
i=
(18) (19)
1 (Zi1 mi )2 + K + (Zin mi )2 n 1
Class A
Condition 2
The normalized data are shown in Table 2. All characteristics now have the average 0 and standard deviation 1. The MD is corresponding to the correlation between each characteristic. Let rst be the correlated coefficient between s-th and t-th characteristic.
1 1 n rst = z z + z z +L+ z z zsl ztl = sn tn (20) n 1 l =1 n 1 s1 t1 s 2 t 2
Class B
Class C
The correlation matrix R can be expressed by Eq. (21).
Figure 3: A binary decision tree The research adopts CART [2] algorithm which grows binary trees and continues splitting as long as new splits can be found that increase purity. The CART algorithm identifies candidate subtrees through a process of repeated pruning. CART relies on a concept called the adjusted error rate to identify the least useful branches which will be pruned. The processes of using decision trees for data mining are described as follows. First, select the algorithm of decision tree. The paper adopts CART algorithm. Second, construct the decision tree by the training data set. Third, prune the decision tree by evaluation. 3.3 Mahalanobis Distance Ambiguous and inconsistent multi-dimensional
1 r R = 21 M rk1
r12 1 M rk 2
K r1k K r2 k O M K 1
(21)
Let A be the inverse of the correlation matrix R.
a11 a A = 21 M ak1
a12 a22 M ak 2
K a1k K a2 k O M K akk
(22)
306
International Journal of Electronic Business Management, Vol. 3, No. 4 (2005) Table 2: Standardized data Characteristics No. 1 2 Z1 z11 z12 ... Z2 z21 z22 Zi zi1 zi2 Zk zk1 zk2
The Mahalanobis distance is defined by the Eq. (23)
D2 =
1 k k aijzizj ( = 1,2,..., n) K j =1 i =1
All the MDs form the Mahalanobis space. In general, the vale of MD is less than 2.5 and the probability of MD being larger than 4 is quite low. A threshold must be designated as a decision criterion in a MD space. In the study, the threshold is obtained by setting the Type 1 and Type II error.
4. AN EMPIRICAL STUDY
The research took the customer data of a large warehouse in southern Taiwan as an example to verify the effectiveness of the proposed approach. We focused on the purchasing transaction data of skin care products from June 1 to November 30 in 2004. There were total 4119 customers and 10361 transaction data in total. 4.1 Data Preprocessing and Customer Segmentation The study utilized Microsoft SQL Server 2000 as the data processing platform and built a CRM database including both the transaction table and the customer table. The customer table has two kinds of data: demographic variables and geographic variables. Demographic variables include Age, Gender, Occupation, Income, Education, and Marital Status. There is just one geographic variable to be analyzed Commercial Circle, that represents the distance between a customer and the warehouse. The paper distinguishes the customer value according to the segmentation results of Luo [9], in which the customers were differentiated into six clusters: premium customers, prospect customers, uneconomic customers, new customers, undesirable customers, and lost customers. If a business can increase 5% of retention rate, it can increase 25% profit in average. Since the
... j n
z1j z1n
z2j z2n
zij zin
zkj zkn
(23)
customer values of the six segmentations are different, the enterprise should make a marketing decision carefully. The study suggests advertising to those customers with high value and saving the marketing cost of low value customers. In terms of marketing, the research defines two categories of customers. The first category is defined as the marketing target and it consists of premium customers and prospect customers. Customers in the second category including uneconomic customers, new customers, undesirable customers, and lost customers, are low profitable or even cause negative profit. The enterprise should avoid taking marketing activities to those customers in the second category. The distribution of the two categories is shown in Table 3. Table 3: The distribution of customers
Customer Number of Proportion Segmentation Customers Premium 103 739 17.94% Prospect 636 Uneconomic 164 New 1238 3380 82.06% Undesirable 894 Lost 1084 4119 100%
Category 1 (Marketing) Category 2 (Neednt Marketing) Total
To increase the accuracy of prediction, the research combines three individual classifiers and compares the performance of multiple classifier system with individual classifier. The accuracy is defined as the percentage of number of correct classification over total samples in the test data set. It can be observed that the number of customers in Category 2 is much larger than the one in Category 1. To avoid the over-learning on training data set cause the low accuracy, the study randomly selected 739 customers from Category 2 and divided the customers in two categories into training samples and test samples, as shown in Table 4. 4.2 Accuracy of the Individual Classifier The research divided customers into two
Y. M. Chiang et al.: The Application of Data Mining Techniques and Multiple Classifiers categories. Category 1 contains the customers who create high value to the business and they are expected to respond to the marketing activities. On the other hand, Category 2 contains customers with low value to the business and they are worthless to take marketing activities. The purpose of the paper is to use the customers basic data to predict which category a new customer belongs to. The considered basic data includes Age, Gender, Occupation, Income, Education, Marital Status and Commercial Circle. The following subsections illustrate the accuracy of the three individual classifiers BP network, decision tree, and Mahalanobis distance - in predicting the value of a new customer. Table 4: The distribution of training and test data set Number of Category Data Set Total Customers Training 1 Test Training 2 Test 246 246 493 739 493 739
307
4.2.2 Decision Tree The paper utilized Answer Tree 3.1 as the decision tree analysis tool. The training data set was used for constructing a decision tree with 9 rules. After pruning procedure, the decision tree finally has 4 rules (Figure 4). A rule is created by following one path of the tree. The 4 rules and their corresponding accuracy are described below.
4.2.1 Back-propagation Network The collected customers basic data consists of seven variables, but not al variables have significant influence in classification. Taking the Chi-square test on all the variables between two categories, we found that just four variables have significant difference. The four variables are Commercial Circle, Age, Gender, and Education. The input variables to the network are the four variables that have significant difference. The paper applies the Taguchi method to determine the optimal parameters of the BP network. The output layer contains two neurons, one represents Category 1 and the other represents Category 2. We designed a network with one hidden layer, thus the network architecture became 4-h-2, where h meant the number of the hidden neurons. The study tested three settings about h according to the sum of input neurons and output neurons. They are half of the sum, the sum, and double of the sum, respectively. Matlab 7.0 was used as the neural network analysis tool. The study chose LM training algorithm of the BP network. Three important parameters in LM training algorithm are , _dec, and _inc. We selected three levels of all the factors affecting the BP performance when taking the Taguchi experiments. After the parameter optimization procedure, the study obtained the corresponding weights of the network by learning from the training samples. The study used the test samples to evaluate the performance of the BP classifier. The accuracy of BP network on the test data set is 81.71%.
Figure 4: The binary decision tree used to classify customers IF Commercial Circle 15 , THEN classify the customer to Category 1. (Accuracy is 85.14%) Rule 2: IF Commercial Circle15 & Age34.5, THEN classify the customer to Category 2. (Accuracy is 85.85%) Rule 3: IF Commercial Circle15 & Age34.5 & Age49.5, THEN classify the customer to Category 1. (Accuracy is 71.03%) Rule 4: IF Commercial Circle15 & Age49.5, THEN classify the customer to Category 2. (Accuracy is 70.83%) Using the four rules to classify the test data set, the accuracy is 81.30%. Rule 1: 4.2.3 Mahalanobis Distance The research utilized Matlab 7.0 to calculate the Mahalanobis distance between customers. The customers basic data contains 7 variables. To test the effectiveness of the variables in MD calculation, the research first performed the Taguchi experiments by
308
International Journal of Electronic Business Management, Vol. 3, No. 4 (2005)

7
adopting the L8(2 ) orthogonal array. The experimental results showed the most significant variables in MD calculation are Commercial Circle, Age, Occupation, and Marital Status. The four variables were used as the characteristics described in Sec. 3.3 to calculate the MD. The MD classification results of the test samples indicate the accuracy is 68.29%. The accuracy of three individual classifiers is listed in Table 5. It can be observed that the BP network has the best performance in predicting the customer value, and the MD method has the worst performance. Table 5: The accuracy of the individual classifier BP Accuracy (%) 81.71 MD 68.29 Decision Tree 81.30
4.3 Multiple Classifier System To increase the prediction accuracy of customer value, the study tried to combine the three individual classifiers by majority vote strategy, BP combination, and SOM combination. The performance of different combination methods is analyzed carefully and is illustrated in the following. 4.3.1 Combination by Majority Vote Strategy As the name of the method suggest, the result of the majority vote is decided by voting. When affirmative votes are larger than the negative votes on one category, the classification result is that category. In this case, if more than two classifiers categorize a customer to Category 1, the voting strategy classifies the customer to Category 1, and vice versa. Using the majority vote strategy to combine multiple classifiers, the accuracy raises to 84.55%. 4.3.2 Combination by Back-propagation Network Take the results of the three individual classifiers as the network input, and the output layer still has 2 neurons to indicate the two categories the customers should be classified. Note if the result of the individual classifier is Category 1, the input value for the corresponding input neuron is 1; on the other hand, the input value is 2. The hidden layer in this model was also set to one, and the number of hidden neurons was set to half of the sum of input neurons and the output neurons. The BP combination method generated 85.57% classification accuracy of test data set. 4.3.3 Combination by SOM Network The neurons of input layer of the SOM network are also 3, which represent the three individual classifiers output, respectively. The SOM network topology is set to 12. The research also utilized
Matlab 7.0 for SOM network application. The classification result of the test data indicates the accuracy is 83.54%. Table 6 lists the performance of the three combination methods. Compared to the accuracy shown in Table 5, the accuracy is improved whatever the combination methods we used. However, the BP combination method generates the best prediction result. Table 7 shows the partial classification results. If the customer is classified to the right category, the value is 1. On the contrary, if one is wrong classified, the value will be 0. Observed the customer id 510151607, among the three individual classifiers, just decision tree has the correct classification. Both BP and MD get the wrong classification. However, after the combination by BP and SOM, the customer is classified to the right category. Customers with id 510151579, 510155919, and 510156036 are wrong classified by the decision tree. But after combination, they all are classified to the right category. The results tell combing the multiple classifiers can cover the mistake of the individual classifier, therefore increases the accuracy of the classification. Table 6: Comparison of different combination methods
Majority Vote Accuracy (%) BP Combination SOM Combination
84.55
85.57
83.54
5. CONCLUSIONS AND FUTURE WORKS

The research utilized the customers basic data to predict the customer value for offering the suitable marketing strategy. The customers basic data are easier to be obtained than the customer behavior data extracted from the purchasing transaction data. If businesses can use the customers basic data to predict a new customers value, they can make the suitable marketing decision in advance of the actual purchasing behaviors. The value of the research is to provide businesses such opportunity to faster respond to new coming customers and decrease the marketing cost. According to the finding of the research, not all the variables in customers basic data would be used as the input variables in three individual classifiers. Among them there exist the same input variables in three individual classifiers: Commercial Circle and Age. It means that the two variables play an important role in the warehouse operation. The distance between customer and the warehouse will influence the inclination of customer to go to the warehouse, and the age of a customer has a great effect on the
Y. M. Chiang et al.: The Application of Data Mining Techniques and Multiple Classifiers purchasing behavior of the skin care products. The empirical analyses prove that the performance of the multiple classifier system is better than the individual classifier whatever the combination method it uses. The best combination method is BP network. The results are useful for business to estimate a new customer value and apply a suitable marketing strategy to that customer. Though the built multiple classifier system performs well in predicting customer value, the authors suggest some directions for future research: 1. The MD method is only effective in classifying objects into two categories. It causes the limitation of the customer classification and the corresponding marketing strategy. Further
309
2.
3.
researches can devote to exploring the performance of other classifiers. Emotion marketing is based on consumers personality. Some researches tried to build the relationship between the constellation and the personality. Since most businesses collect the birthday data of their customers, we have the chance to investigate how constellation affects the customer purchasing behavior. The marketing cost is not considered in the paper. Moreover, the effect of marketing strategies is not evaluated by the research. Both should be concerned in the future researches interesting in the topic.
Table 7: Part of the classification results of all the classifiers used in the research Individual Classifier Multiple Classifier System Decision Majority SOM BP CARD_NO MD BP Tree Vote Combination Combination 510150719 0 1 1 1 1 1 510150817 510150857 510151051 510151156 510151287 510151344 510151542 510151579 510151607 510152039 510152462 510152572 510152740 510152921 510153034 510153655 510155919 510156036 570002497 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 3. 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
REFERENCES
1. Berry, M. J. A. and Linoff, G. S., 1997, Data Mining Techniques: For Marketing Sale and Customer Support, John Wiley & Sons Inc., New York. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J., 1984, Classification and Regression Trees, Wadsworth International Group, Monterey, CA.
4.
2.
5.
Frawley, W. J., Paitetsky-Shapiro, G. and Matheus, C. J., 1991, Knowledge Discovery in Databases: An Overview, Knowledge Discovery in Databases, AAAI/ MIT Press, pp. 1-30. Gunter, S. and Bunke, H., 2004, Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition, Pattern Recognition Letters, Vol. 25, No. 11, pp. 1323-1336. Ha, S. H. and Park, S. C., 1998, Application of
310
International Journal of Electronic Business Management, Vol. 3, No. 4 (2005) data mining tools to hotel data mart on the Intranet for database marketing, Expert Systems with Applications, Vol. 15, No. 1, pp. 1-31. Hayashi, S., Tanaka, Y. and Kodama, E., 2001, A new manufacturing control system using Mahalanobis distance for maximising productivity, IEEE International Symposium on Semiconductor Manufacturing Conference, pp. 59-62. Kato, N., Abe, M. and Nemoto, Y., 1997, A handwritten character recognition system using modified Mahalanobis distance, Systems and Computers in Japan, Vol. 28, No. 1, pp. 46-55. Liao, S. H., 2003, Knowledge management technologies and applications-literature review from 1995 to 2002, Expert Systems with Applications, Vol. 25, No. 2, pp. 155-164. Luo, Y. C., 2005, The application of data mining technique to customer segmentation and marketing decision, Department of Industrial Engineering and Management, I-Shou University, Master Thesis. Prahalad, C. K. and Ramaswamy, V., 2000, Co-opting customer competence, Harvard Business Review, Vol. 76, No. 1, pp. 79-87. Sboner, A., 2003, A multiple classifier system for early melanoma diagnosis, Artificial Intelligence in Medicine, Vol. 27, No. 1, pp. 29-44. Schiele, B., 2002, How many classifiers do I need? International Conference on Pattern Recognition, Quebec, Canada. Shaw, M. J., Subramaniam, C., Tan, G. W. and Welge, M. E., 2001, Knowledge management and data mining for marketing, Decision Support Systems, Vol. 31, pp. 127-137. Wang, Y. H., Ma, S. D. and Tan, T. N., 1999, Combination of multiple classifiers with neural networks, International Federation of Automatic Control, pp. 165-169. 15. Xu, L., Krzyzak, A. and Suen, C. Y., 1992, Method of combining multiple classifiers and their application to handwritten numeral recognition, IEEE Transportation System, Vol. 22, No. 3, pp. 418-435. 16. Zikmund, W. G., McLeod, R. and Gilbert, F. W., 2003, Customer Relationship Management Integrating Marketing Strategy and Information Technology, John Wiley & Sons Inc., New York.
6.
7.
8.
ABOUT THE AUTHORS

Yu-Min Chiang is an assistant professor in the department of Industrial Engineering and Management at I-Shou University, Taiwan, R.O.C. She received her Ph.D. degree in Industrial Engineering and Engineering Management from National Tsing-Hua University. Her current research and teaching interests are in logistics management, data mining, and automated inspection. She is a member of CIIE, ORST, and TAAI. Yu-Chieh Lo is a graduate student of Industrial Engineering and Management at I-Shou University. Her research interest is customer relationship management. Shang-Yi Lin is a graduate student of Industrial Engineering and Management at I-Shou University. His research interests are data mining and customer relationship management. (Received September 2005, revised November 2005, accepted December 2005)
9.
10.
11.
12.
13.
14.

The Application of Data Mining Techniques and Multiple Classifiers To Marketing Decision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Application of Data Mining Techniques and Multiple Classifiers To Marketing Decision

Uploaded by

Copyright:

Available Formats

International Journal of Electronic Business Management, Vol. 3, No. 4, pp.

Corresponding author: ymchiang@isu.edu.tw

Authors Xu et al. [15] Wang et al. [14] Sboner et al. [11]

Dont take marketing activities

high Take marketing activities

International Journal of Electronic Business Management, Vol. 3, No. 4 (2005)

(3) (4) (5) (6)

The error of the output layer ( i ) is calculated as follows.

Whi : Weight matrix of input layer to hidden layer,

Wij : Weight matrix of hidden layer to output layer,

From input layer to hidden layer:

Algorithm Step1: Set the learning rate as

Wij (t + 1) = Wij (t ) + Wij (t )

Step2: Randomize the initial values of Whi , Wij ,

From input layer to hidden layer:

Whi (t + 1) = Whi (t ) + Whi (t ) i (t + 1) = i (t ) + i (t )

The correlation matrix R can be expressed by Eq. (21).

Let A be the inverse of the correlation matrix R.

The Mahalanobis distance is defined by the Eq. (23)

Category 1 (Marketing) Category 2 (Neednt Marketing) Total

International Journal of Electronic Business Management, Vol. 3, No. 4 (2005)

5. CONCLUSIONS AND FUTURE WORKS

ABOUT THE AUTHORS

You might also like