You are on page 1of 11

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/280745147

Churn Prediction in Telecommunication Industry


Using Rough Set Approach

Conference Paper · January 2015


DOI: 10.1007/978-3-319-10774-5_8

CITATIONS READS

8 2,046

5 authors, including:

Adnan Amin Changez Khan


Institute of Management Sciences, Pakistan, P… Institute of Management Sciences, Pakistan
12 PUBLICATIONS 43 CITATIONS 7 PUBLICATIONS 21 CITATIONS

SEE PROFILE SEE PROFILE

Imtiaz Ali Sajid Anwar


National University of Sciences and Technology Institute of Management Sciences, Peshawar,p…
144 PUBLICATIONS 1,624 CITATIONS 39 PUBLICATIONS 151 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

oral health status View project

Special Issue CFP View project

All content following this page was uploaded by Adnan Amin on 15 September 2016.

The user has requested enhancement of the downloaded file.


Churn Prediction in Telecommunication Industry Using
Rough Set Approach
Adnan Amin1, Changez Khan1, Saeed Shehzad2, Imtiaz Ali1, Sajid Anwar1,*
1
Institute of Management Science, Peshawar Pakistan.
2
City University of Science and Technology, Peshawar Pakistan.
{geoamins, changezkhanbangash, saeedshehzad, sajid.anwar}@gmail.com

Abstract. The Customer churn is a crucial activity in rapidly growing and mature competitive
telecommunication sector and is one of the greatest importance for a project manager. Due to the
high cost of acquiring new customers, customer churn prediction has emerged as an indispensable
part of telecom sectors’ strategic decision making and planning process. It is important to forecast
customer churn behavior in order to retain those customers that will churn or possible may churn.
This study is another attempt which makes use of rough set theory, a rule-based decision making
technique, to extract rules for predicting the churns. Experiments were performed to explore the
performance of four different algorithms (Exhaustive, Genetic, Covering, and LEM2) to calculate
the rules set. Rough set classification through four different table rule set are done. The n-fold and
hold-out methods were used to evaluate the performance of the rough set classification. It is ob-
served that out of these four algorithms, rough set theory based classification through genetic al-
gorithm method yields most suitable performance. Moreover, by applying the proposed technique
on publicly available dataset, the results show that the proposed technique can fully predict all
those customers that will churn or possibly may churn and also provides useful information to
strategic decision makers.

1 Introduction
Customer churn is one of the mounting issues of today’s rapidly growing and competitive telecom
sector. The focus of the telecom sector has shifted from acquiring new customer to retaining existing
customers because of the associate high cost [1]. The retention of existing customers also leads to
improved sales and reduced marketing cost as compared to new customers. These facts have ultimately
resulted in customer churn, prediction activity to be an indispensable part of telecom sector’s strategic
decision making and planning process. Customer retention is one of the primary objectives of customer
relationship management (CRM). Its importance has led to the development of various tools that sup-
port some important tasks in predictive modelling and classification. In order to properly manage cus-
tomer churn in business, an accurate and reliable churn prediction model is needed that will study the
historical patterns from existing dataset and will generate decision rules.
In recent decades, the organizations are increasingly focusing on long term relationships with their
customers and observing customer's behavior from time to time using various applied knowledge dis-
covery (KDD) techniques [2], [3],[4],[5],[6] to extract hidden relationships between the entities and
attributes in flood of data bank. These facts have attracted many companies to invest in CRM to main-
tain customers’ information. Customer centric approach is very common, particularly in telecommuni-
cation sector to predict the customers' behavior based on historical data stored in CRM. This type of
data provides useful information to the decision makers for strategic planning about their customers.
Data can be considered as good if the system can convert it into meaningful information. Considering
the mounting issue of customer churn, there is a need of an integrated system that will identify the
customer’s churn behavior before the customers are lost which require suitable action to be taken in
advance. This type of information can also help to find out the target customers to maximize the cus-
tomer strength [7].
Several studies have been referred to customer churn analysis in many industries such as banking,
online gaming, airlines, telecommunication, financial services and social network services [8], to han-
dle a prominent problem of shifting customers from one service or company to next competitor in
market. However, rough set theory has not been widely applied in predicting customer churn in tele-
communication sector. Therefore, this study is approaching to explore the powerful applications of
rough set theory for churn prediction in telecommunication sector by constructing predictive classifiers
that forecast churn prediction based on accumulated knowledge.
The rest of the paper is organized as follows; the next section presents customer churn and related
prediction modelling. The preliminary study about rough set theory followed by mentions about algo-
rithms for rule generation are described in section 3 and evaluation methods discussed in section 4. The
evaluation setup and experiments are explained in section 5. In section 6, the proposed churn prediction
technique is explored and results in 7. The paper is concluded in section 8.

2 Customer Churn and Prediction Modeling


Customer churn are those specific customers who have decided to leave the use of the service, product,
or even company and shifting to next competitor companies in the market. Customer churn- shifting
from one service provider to the next competitor. Literature reveals the following types of customer
churns [9]. (i) Active churner (Volunteer): those customers who want to quit the contract and move to
the next provider. (ii) Passive churner (Non-Volunteer): When the company discontinues the service
to a customer. (iii) Rotational churner (Silent): Those customers who discontinued the contract without
prior knowledge of both parties (customer-company).
The first two types of churns can be predicted easily with the help of traditional approaches in term of
Boolean class value, but the third type of churn may exist which is difficult to predict because there
may have such type of customers who may possibly churns in near future. It should be the goal of the
decision maker and marketers to decrease the churn ratio because it is a well-known phenomenon that
existing customers are the most valuable assets for companies as compare to acquiring new one [1].
The marketing strategy within competitive companies has been germinating from approaching product-
oriented to customer-centric due to the advancement in the machine learning field [1]. The database
technologies not only provide useful information to the organization's decision makers about their past
customer behavior, current but also provide future prediction with the help of prediction modelling
techniques.
Churn prediction has received a tremendous focus from both types of researchers (academia and
industry) and have proposed numerous studies for churn prediction in various domains such as financial
service [10], Banking Sector [11],[12], Complaint & Repair Services [13], Credit Card Accounts [7] ,
Airline Sector [14], Social Networks [15] and Insurance company [16] etc. Although, various studies
have been published on churn prediction in service-based industries. However, no standard model ex-
ists in the account of churn prediction modelling techniques [3]. For instance, service-based firm needs
to use, data mining as a useful tool to predict customers at risk of churning [17]. On other hand, Input
variables or Feature selection also has a significant role in churn prediction that is discussed in subse-
quent studies [3] and [18].
From the research community perspective, the dataset is also playing a vital role that makes re-
searchers to rank and compare the existing or new outcomes which is useful for designing a bench-
marking technique for churn prediction modelling [19]. The following relevant studies [2], [3], [4], [5],
[15], [17], [18], [20], [21], [22], [23] and [24] have used similar dataset for churn prediction in tele-
communication industry. From the literature [4], [5], [15], [17], [20], [21], [25], [26], [27], [28] and
[29], it is observed that several studies have investigated and compared the performance of various
machine learning algorithms on similar dataset. However, it is argued in study [3] that there is no
standard model exists for customer churn prediction and it important to construct more accurate churn
prediction classifier. In response to churn retention, the new generation of churn prediction and its
management leads to focus on the classification accuracies [1]. This study is exploring the performance
of four different algorithms (Exhaustive, Genetic, Covering and LEM2) that calculate the rule set and
build more accurate and suitable rough set based churn predictive classifier that forecast churns. It is
also approaching to address the four important research hypothesis (discussed in section 5). The pri-
marily study about rough set theory is explained in the next section.

3 The Rough Set Theory

Rough Set theory was originally proposed by [30] in 1982. The fundamental idea of rough set theory
is dealing with the data analysis and classification of imprecise or uncertain information. A rough set
theory has a precise concept of lower and upper approximation and the boundary region. The boundary
region separates the lower approximation from upper approximation. Pawlak [31] has defined mathe-
matically rough set approximation and boundaries as; suppose set X U and B is an equivalence rela-
tion in information system IS= (U, B). Then BL= {Y U/ B: Y X} is lower approximation and
exactly member of X while BU = {Y U/ B: Y ∩ X ≠Ø} is upper approximation which is possibly a
member of X. BR = BL - BU is the boundary region.

3.1 Decision Table


The special case of Information system (IS) is known as decision table [32]. It is simply known as
table in relational databases which consists of columns and rows. The columns and rows are represent-
ing properties (attributes) and objects (instance) respectively. Formally, information system (IS) is IS
= (U, A) where U is a non-empty finite set of instances called the universe and A is a non-empty finite
set of attributes or properties i.e A={a1,a2,a3,…an} such that a :U→Va for every a Α. A decision system
is any information system of the form S = (U, C {d}), where C are conditional attributes and d C is
the decision attribute. The Union of C and {d} are elements of Set A.

3.2 Indiscernibility, Reduct and Core


The information system IS = (U, A), For any subset of attributes B A indiscernibility relation IND
(B) is defined as follows: If IND (B) = IND (B-{a}) then a B is dispensable otherwise indispensable
in B while set B can be called independent if all attributes are indiscernible. If (i, j) U × U belongs to
IND (B) then we can say that i and j are indiscernible by attributes reduction process is finding more
important attributes that preserve discernibility relation with the information. If A be the attribute set
of universe set U and B is reduced attribute. Therefore, B is equal of sub-set A and the equation for
reduced set is composed as; RED (B) A. Core is intersection of all reducts. Core (B) = Red (B).
Where Red (B) is set of all reducts of B. The core and reduct is used in the proposed study to identify
the important attributes which cannot effect on the classification result and reduce the computational
cost and complexity.

3.3 Cut and Discretization


Discretization is a process of grouping the attribute’s data based on the calculated cuts and the con-
tinuous variables converting into discrete attributes [7]. There may exists such unseen object which
cannot match with the rules or it can increase computational cost that slow down the machine learning
process, so cut & discretization methods are used in order to get high quality of classification [32].

3.4 Rules Generation


Rules generation is the most considerable stage in machine learning process. After completion of
reduct process, the decision rules can be constructed by overlaying the reduct sets over the decision
table. It can mathematical express as; (ai1 = v1)^…..^(aik= vk) => d = vd, where 1 ≤ i 1<... <ik ≤ m, vi
Vai Usually the expression is representing as; IF C THEN D where C is set of conditions and D is
decision value. To extract the decision rules, the following classification algorithms can be used to
produce set of decision rules by matching the trainings instances against the set of generated reducts
[32].
Exhaustive Algorithm: It takes subsets of features incrementally and then returns the reducts of re-
quired one. It needs more concentration because it may lead to extensive computations in case of com-
plex and large decision table. It is based on Boolean reasoning approach [33].
Genetic Algorithm: It is based on order-based GA coupled with heuristic and this evolutionary
method is presented by [34], [35].it is used to reduce the computational cost in large and complex
decision table.
Covering Algorithm: it is customized implementation of the LEM2 idea and implemented in RSES
covering method. It was introduced by Jerzy Grzymala [36].
RSES LEM2 Algorithm: it is a separate-&-conquer technique paired with lower and upper approxi-
mation of rough set theory and it is based on local covering determination of each object from the
decision class [36], It is implementation of LEM2 [32].

4 Evaluation Measures

It is nearly impossible to build a perfect classifier or model that could perfectly characterized all the
instances from the test set [23]. To assess the classification results we count the number True Positive
TP, True Negative TN, False Positive FP and False Negative FN. The FN value actually belong to
Positive P (e.g. TP + FN = P) but wrongly classified as Negative N (e.g. TN + FP = N) while FP value
actually part of N but wrongly classified as P. The following measures were used for the evaluation of
proposed classifiers and approaches.

• Sensitivity: It measures the fraction of churn customers who are correctly identified as true churn.
Sensitivity = TP/ P => R (Recall) (1)
• Specificity: It measures the fraction of true non-churns customers which are correctly identified.
Specificity=TN/N=>1-specificity= FP/N (2)
• Precision: It is characterizes the number of correctly predicted churns over the total number of churns
predicted by proposed approach. It can formally express as;
Precision = TP / (TP + FP) (3)
• Accuracy: Overall accuracy of the classifier can calculated by the given formula in equation; Acc =
(TP + TN) / (P + N) (4)
• Misclassification: Error on the training data is not a good indicator of performance on future data.
Misclassification Error (MisErr) = 1 – Accuracy (5)
Type-I Error =1–specificity=FP/(FP+TN) (6)
Type-II Error=1–sensitivity=FN/(TP+FN) (7)
• F-Measure: A composite measure of precision and recall to compute the test’s accuracy. It can be
interpreted as a weighted average of precision and recall.
F-Measure=2.(Precision.Recall)/(Precision+Recall) (8)

• Lift: It tells about the prediction power of classifier arbitrary compared to a random selection by
comparing the precision to the overall churns rate in test set.
Lift = Precision / P (P + N) (9)
• Coverage: the fraction of number of cases satisfying both the condition and decision and number of
cases satisfying decision only.
COV= (number of cases satisfying Condition and
Decision)/(number of cases satisfying Decision) (10)
5 Evaluation Setup

In this section, we presents experiment to fulfill the objectives of proposed study. In this study, we
have designed & conducted two experiments to evaluate the potential of rough set theory as an approach
of knowledge discovery and classification technique for identification of churn in telecommunication
sector. The experiment was performed by Rough Set Exploration System (RSES) [32]. RSES toolkit
is very helpful to: (i) Convert dataset into decision table (ii) Applying Cut & Discretization methods to
convert continuous attributes into discrete attributes (iii) Extracting the decision rules set from training
set (iv)Validate the result. The proposed approach was experimentally evaluated. Specifically, the eval-
uation addresses the following points;
• P1: Which features are more indicative for churn prediction in telecom sector?
• P2: Which algorithm (Exhaustive, Genetic, Covering, and LEM2) is more appropriate for calculating
rules set for rough set classification approach in telecommunication sector?
• P3: What is the predictive power of the proposed approach on churn prediction in telecom sector?
• P4: Can derived rules help the decision makers in strategic decision making and planning process?

5.1 Data Preparation and Feature Selection


In this study, used publically available dataset [37]. On other hand, the feature selection is an im-
portant step in knowledge discovery process, to identify those relevant variables or attributes from the
large number of attributes in a dataset which are too relevant and reduce the computational cost [34].
To make sure that only relevant features are included into decision table that also reduce the computa-
tional cost and address to P1 (Which features are more indicative for churn prediction in telecom sec-
tor?), The selection of most appropriate attributes from the dataset in hands, was carried out using
feature ranking method titled as “Information Gain Attribute Evaluator”, using WEKA toolkit [38]. It
evaluates the attributes worth through information gain measurement procedure as per the class value.
It diverse the selection and ranking of attributes that significantly improves the computational effi-
ciency and classification [39]. The dataset contains 19 conditional features, including one unique iden-
tifier and a decision attribute about telecom/wireless operator. After feature ranking, it include most
relevant and ranked attributes in decision table. The Table 1 reflects the list of attributes with descrip-
tion which are added into decision table.

Table 1. List of selected Top ranked attributes reflects the classification performance.
Attributes Description
Int’l Plan Whether a customer subscribed international plan or not.
VMail Plan Whether a customer subscribed Voice Mail plan or not.
Day Charges A continuous variable that holds day time call charges.
Day Mins No. of minutes that a customer has used in daytime.
CustSer Calls Total No. of calls made a customer to customer service.
VMail Msg Indicates number of voice mail messages
Int’l Calls Total No. of calls that used as international calls.
Int’l Charges A continuous variable that holds international call charges
Int’l Mins No. of minutes that used during international calls.
Eve Charges A continuous variable that holds evening time call charges.
Eve Mins No. of minutes that a customer has used at evening time.
Churn? The Class label whether a customer is churn or non-churn.

The information table which consists of objects, conditional attributes and decision attribute are orga-
nized in Table 2.

Table 2. Organization of attributes for decision table.


Sets Description
Objects { 3333 distinct objects}
Conditional Attributes {Intl Phan, VMail_Plan, VMailMsg, Day_Mins, Day Charges, Eve_Mins, Eve_Charges,
Intl_Mins, Intl_Calls, Intl_Charges, CustServ_Calls}
Decision Attribute {Churn?}

6 Evaluation of Rough set based rules induction methods using hold-out


cross validation

In this section, we are evaluating four different algorithms for rules calculation with rough set based
classification approach using RSES toolkit. All these methods are applied on the same telecom dataset.

6.1 Training and Test Sets


The performance of classifier can be evaluated through several validation methods which have been
discussed in literature [41], [40]. The dataset divided into training and validation sets in [41]. The
training set contained 50% of non-churn and 50% of churn while the validation set organized from
30% churns and 70% as non-churn. In order to provide better predictive model [21], the classifier has
been validated by splitting the data set into 67% and 33% for training and test sets respectively and
also oversampled the proportion of churns in the training set by 50% churn and 50% for non-churn. On
other hand, validation process have also used separate validation procedure in [41] and [40]. In subse-
quent study [42], validated the proposed model by comparing the obtained result with regression, KNN
and decision tree. On other hand, the dataset divided by 80:20 percent ratio in [20] where 80% used for
training purpose and remaining data used for testing purpose. However, after multiple splitting attempts
during the experiments we concluded that the best performance is obtained in this study if the split
factor parameter is set to 0.7 using RSES toolkit that randomly splits the data into two disjoint sub-
tables. For instance, setting Split factor to be 0.7 means that the training set will contain 0.7 of objects
from decision table, while the test set will contain the remaining 0.3. The dataset [37] consists 3333
instances where 85.5% customers are Non-churn (NC) and 14.49% are Churn (C). The training set
contains 2333 customers where 85.16% are NC and 14.8% are C. The test set contains 1000 instances
where 86.3% are NC and 13.7% are C.

Table 3. Train-and-Test (hold-out) Cross Validation applied.

Methods TP FP FN TN COV PRE REC ER ACC


Exhaustive 828 35 39 98 1 0.959 0.96 0.074 0.926
Genetic 863 0 19 118 1 1 0.98 0.019 0.981
Covering 525 37 41 37 0.64 0.934 0.93 0.122 0.878
LEM2 571 19 26 52 0.668 0.968 0.96 0.067 0.933

Table 3 reflects that highest accuracy is obtained with all aspects of evaluation measures to genetic
algorithm for calculating the rules set with rough set based classification for predicting churn that is
addressing to P2 in term of more appropriate algorithm out of targeted four algorithms while it shows
more suitable predictive power of churn prediction in telecom section which is reporting to P3.

6.2 Discretization, Reduct and Rule Sets


The cut and discretization process is carefully performed on the prepared decision table. The RSES
software has cut and discretization methods. It adds cuts in subsequent loop one by one for a given
attribute. It is considers all the objects in decision table at every iteration and generate less number of
cuts [34].
The RSES tool has genetic algorithm and exhaustive algorithms methods for calculating reducts set
from decision table [34], [35]. The exhaustive algorithm requires much consideration because it may
lead high computational cost in large and complicated decision table. On other hand, genetic algorithm
can be used to reduce the computational cost in large and complex decision table [34]. Moreover, it is
also investigated in the proposed study that decision rules calculation through genetic algorithm with
rough set based classification approach have shown more suitable method as compared the rest of dis-
cussed classification methods. Table 4 reflects the generated reduct sets.

Table 4. List OF REDUCT SETS


Reduct Set Strength
{Day_Mins, Eve_Mins, Intl_Mins, Intl_Calls} 4
{VMail_Plan, Day_Mins, Intl_Calls, CustServ_Calls} 4
{Intl_Plan,VMail_Plan,Eve_Mins,Intl_Mins,Intl_Calls, CustServ_Calls} 6
{ Intl_Plan, VMail_Plan, Day_Mins, Churn} 4
{ Intl_Plan, VMail_Plan, Day_Mins, Intl_Mins } 4
{ Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins, Intl_Calls } 5
{ Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins, Intl_Mins } 5
{ Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins } 4
{ Intl_Plan, VMail_Plan, Day_Mins} 3
{Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins, Intl_Mins, Intl_Calls} 6
{Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins, Intl_Mins, CustServ_Calls} 6
{Intl_Plan, VMail_Plan, Day_Mins, Eve_Mins, Intl_Mins,CustServ_Calls, Churn } 7

Once the reduct set is obtained, the rule set can be easily generated from reduct set. The decision
rules can be obtained by selecting either of the method i.e. global rules or local rules using RSES. In
proposed study, global rules method has been selected which scans the training set object-by-object
and generated decision rules by matching objects with reduct set and selecting attributes from condi-
tional attributes in reduct set. However, local method, slightly faster than the global method but, it is
generating much more cuts as compare to global one. The decision rules specify rules in the form of
“if C then D” where C is a condition and D refers to decision attribute. To address P4, important deci-
sion rules are induced from the decision table by using the best performed method shown in table 3
(e.g. genetic method). Based on these simple and easy interpretable rules, the decision makers can
easily understand the flow of customer churn behavior and they can adopt suitable strategic planning
to retain their churn. All the generated decision rules cannot show here due to limitation of pages. The
number of rules which indicates non-churn customers are greater than the number of decision rules
generated for churn class because the non-churn and churn ratio in the training set is 85.5% and 14.49%
respectively. Table 5 is showing some of the decision rules. The Churn=False means non-churn cus-
tomers and Churn=True indicating churn customers. The strength is refer to the number of matched
cases in the generated rules.

Table 5. List of generating rules for non-churns


Decision Rules Strength
If(Intl_Plan="(-Inf,0.5)") & (Day_Mins="(195.45,208.85)") & (CustServ_Calls= "(1.5,2.5)") then (Churn=False) 37
If (Intl_Plan="(-Inf,0.5)")&(Day_Mins="(189.25,195.45)")&(CustServ_Calls= "(0.5,1.5)") then (Churn= False ) 35
If (Intl_Plan="(-Inf,0.5)") &(Day_Mins="(108.8,151.05)") &(CustServ_Calls="(2.5,3.5)") then (Churn= False) 64
If (Intl_Plan="(-Inf,0.5)") &(Day_Mins="(78.65,108.8)")&(CustServ_Calls="(-Inf,0.5)") then (Churn= False) 31
If (Day_Mins="(108.8,151.05)")&(Intl_Mins="(-Inf,8.65)")&(CustServ_Calls = "(1.5,2.5)") then (Churn= False) 34
If (Intl_Plan="(-Inf,0.5)") & (VMail_Plan="(0.5,Inf)") & (Eve_Mins ="(134.15,167.45)") &(CustServ_Calls="(0.5,1.5)") 41
then (Churn= False)
If (VMail_Plan="(-Inf,0.5)")&(Eve_Mins="(-Inf,134.15)")&(CustServ_Calls ="(0.5,1.5)") then (Churn= False) 47
If (VMail_Plan="(-Inf,0.5)")&(Eve_Mins="(134.15,167.45)")&(CustServ_Calls ="(2.5,3.5)") then (Churn= False) 39
If (VMail_Plan="(0.5,Inf)") &(Intl_Calls="(2.5,3.5)") &(CustServ_Calls = "(1.5,2.5)") then (Churn= False) 38
If (VMail_Plan ="(0.5,Inf)")& (Day_Mins="(108.8,151.05)")&(CustServ_Calls ="(1.5,2.5)") then (Churn= False) 36
If (Intl_Plan="(0.5,Inf)")&(Intl_Mins="(13.35,Inf)") then (Churn=True) 35
If (Day_Mins="(108.8,151.05)")&(Eve_Mins ="(207.35,227.15)") &(CustServ_Calls ="(3.5,Inf)")then (Churn= True) 10
If (Day_Mins="(78.65,108.8)")&(CustServ_Calls="(3.5,Inf)") then(Churn= True) 9
If (VMail_Plan="(-Inf,0.5)") &(Day_Mins= "(281.05,Inf)")&(Intl_Calls= "(3.5,4.5)") then (Churn= True) 9
If (VMail_Plan="(-Inf,0.5)") &(Day_Mins="(281.05,Inf)"&(Intl_Calls="(2.5,3.5)") then (Churn= True) 8
If (VMail_Plan="(-Inf,0.5)")& (Day_Mins="(108.8,151.05)")(Intl_Calls="(-Inf,2.5)")& (CustServ_Calls="(3.5,Inf)") 8
then (Churn= True)
If(VMail_Plan="(-Inf,0.5)")&(Day_Mins="(108.8,151.05)"&(Intl_Mins="(8.75,10.35)")&(CustServ_Calls ="(3.5,Inf)") 8
then (Churn= True)
If(Intl_Plan="(-Inf,0.5)")&(VMail_Plan="(-Inf,0.5)")&(Day_Mins="(263.55,281.05)")&(Eve_Mins="(207.35, 8
227.15)") then (Churn= True)
If (Day_Mins="(281.05,Inf)") &(Intl_Calls="(2.5,3.5)")&(CustServ_Calls="(0.5,1.5)") then (Churn= True) 6
If (Day_Mins="(-Inf,78.65)")&(CustServ_Calls="(3.5,Inf)") then (Churn= True) 5
If (Day_Mins="(281.05,Inf)")&(Eve_Mins="(254.15,273.85)") then (Churn=True) 5

7 Results & Discussion

This section reports the evaluation results and discussion. The number of churns is much smaller as
compared to non-churns customers in the selected dataset, which can provides tough time to churn
prediction classifier during the learning process. The results of the both experiments for churn predic-
tion approaches have fully classified the validation set with 100% covering of all instances and well
classified the actual churns and non-churns customers in telecom sector.

7.1 Comparison
Here we first address to P3: What is the predictive power of the proposed approach on churn pre-
diction in telecom sector? We have presented the sensitivity, specificity, precision, lift, misclassifica-
tions, accuracy and F-measure values in Table 6 and then compared the prediction performance with
other predictive models which are applied on the same dataset. Where A=Neural Network [2], B=De-
cision Tree [20], C=Neural Network [20], D=SVM [20], E= RBF kernel function of SVM [24], F=
Linear kernel function of SVM [24], G= Polynomial kernel function of SVM [24], H=SIG kernel func-
tion of SVM [24] and I=Proposed Approach. Comparing the proposed approach “I” with different
Churn prediction techniques applied on similar data set, it is clear that the proposed approach perform
very well as compared to the previously applied techniques.

Table 6. Predictive Performance of proposed & previous approaches applied on the same dataset.

A B C D E F G H I

Sensitivity 81.75 76.47 83.90 83.37 52.12 39.15 59.44 30.49 100
Specificity 94.70 79.49 83.50 84.04 94.70 95.32 96.29 93.12 86.13
Precision 66.27 80.60 83.40 84.20 81.90 79.05 80.95 71.43 97.85
Lift 0.0001 0.0002 0.0002 0.0002 0.0007 0.0005 0.0008 0.0004 0.0001
Type-1 Error 5.30 20.51 16.50 15.96 5.30 4.68 3.71 6.88 13.87
Type-2 Error 18.25 23.53 16.10 16.63 47.88 60.85 40.56 69.51 0.00
MisErr 6.76 22.10 16.30 16.30 14.37 22.14 11.44 29.47 1.90
Accuracy 93.2 77.9 83.7 83.7 85.6 77.9 88.6 70.5 98.1

F-Measure 73.2 78.5 83.7 83.8 63.7 52.4 68.5 42.7 98.9

7.2 Features’ Analysis


As we have addressed P1 in sub-section 5.1, furthermore we have examined the features’ sensitivity
to determine which features are more indicative for churn prediction in telecom sector to address P1.
Fig.2 shows the point of inflection of various variables such as CustServ_Call, Intl_Charges,
Eve_Charges, Day_Charges, Day_Mins, Intl_Plan, Eve_Mins and above these points the churn rate
decreases and the curve reflects the churn behavior while those features which are below the curve line
e.g: Intl_Calls, VMail_Messages and VMail_Plan shows churn rate constantly decreases. The curve is
increasing at increasing rate up to point of inflections and beyond these points churns behavior increas-
ing at decreasing rate. This implies that the churn rate is high in those features which are above the
curve except Intl_Calls, VMail_Messages and VMail_Plan.

Fig. 1. Reflects sensitivity on y-axis and 1-specificity on x-axis.

8 Conclusion
Customer churn is a crucial activity in rapidly growing and competitive telecommunication sector,
due to the high cost of acquiring new customers. Churn prediction has emerged as indispensable part
of strategic decision making and planning process. This study is approaching to explore the powerful
applications of rough set theory for churn prediction in telecommunication sector by constructing an
improved predictive classifiers that can forecast churn prediction based on accumulated knowledge. To
evaluate the results of the proposed technique, a benchmarking study is applied and as a result obtained
more improved churn prediction accuracy (given in table 3) and good evaluation measures as compare
to other state-of-the-art techniques (shown in Table 6). In this study also investigated the performances
of four different algorithms (Exhaustive, Genetic, Covering, and LEM2) of calculating rule sets, along
with extracted useful decision rules for decision makers or project managers which are easily interpret-
able by decision maker to design more suitable decisions and strategic planning. In further study, per-
sonalized profiling of churns will be explored to predict more effective marketing strategy for decision
maker to efficiently establish long-term relationship with the customer.

References
1. Hadden J., A. Tiwari, R. Roy and D.Rute. Computer assisted customer churn management: State-of-theart and future trends.
Journal of Computers and Operations Research 10, (2007) pp. 2902-2917.
2. Anuj Sharma & Prabin Kumar. A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services. IJCSA
Application 27, (2011) pp. 0975-8887.
3. Wouter Verbeke, David Martens, Christophe Mues, Bart Baesens. Building comprehensible customer churn prediction models with advance
rule induction techniques. Expert Systems with Applications 38, (2011) pp. 2354– 2364.
4. Kirui, Clement, Li Hong, Wilson Cheruiyot, and Hillary Kirui. “Predicting Customer Churn in Mobile Telephony Industry Using Probabilistic
Classifiers in Data Mining”. International Journal of Computer Science Issues 10, (2013)
5. Hung S. Y., D. C. Yen and H. Y. Wang. “Applying data mining to telecom churn management”, Expert Systems with Applications 31 (2006)
pp. 515–524.
6. Huang B., M. T. Kechadi and B. Buckley. Customer churn prediction in telecommunications, Expert Systems with Applications 39, (2012)
pp.1414–1425.
7. Chiun-Sin Lina, Gwo-Hshiung Tzenga and YangChieh Chinc, Combined rough set theory and flow network graph to predict customer churn
in credit card accounts. Expert System with Application 38 (2011) pp.8-15.
8. L. Yan, R. H. Wolniewicz, and R. Dodier. Predicting customer behaviour in telecommunications, IEEE Intelligent Systems 2, (2004) pp. 50–
58.
9. Vladislav Lazarov and Marius Capota. Churn Prediction, Business Analytics Course. TUM Computer Science 2007.
10. Dirk Van den Poel, Bart Larivière. Customer attrition analysis for financial services using proportional hazard models. European Journal of
Operational Research 157, (2004) pp.196-217,.
11. Chitra, K., and B. Subashini. "Customer Retention in Banking Sector using Predictive Data Mining Technique." ICIT 2011.
12. Dr. U. Devi Prasad & S. Madhavi. Prediction of Churn Behavior of Bank Customers Using Data Mining Tools, Business Intelligence Journal
5, 2012.
13. Tiwari J. Hadden A., R. Roy, and D. Ruta. Churn Prediction using Complaints Data. International Journal of Intelligent Technology 13, pp.
158–163, (2006).
14. James J.H Liou. A novel decision rules approach for customer relationship management of the airline market. Journal of Expert systems with
applications, 2008.
15. Richard J. Oentaryo, Ee-Peng Lim, David Lo, Feida Zhu, and Philips K. Prasetyo. Collective Churn Prediction in Social Network. International
Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE/ACM, 2012.
16. Soeini, Reza Allahyari, and Keyvan Vahidy Rodpysh. "Applying Data Mining to Insurance Customer Churn Management." International Pro-
ceedings of Computer Science & Information Technology 30, 2012.
17. Hossein Abbasimehr, Mostafa Setak, M.J. Tarokh. The Application of Neuro-Fuzzy Classifier for Customer Churn Prediction, Procedia Infor-
mation Technology & Computer Science 1(2012) pp. 1643-1648.
18. Aimée, Backiel, Bart Baesens, and Gerda Claeskens. "Mining Telecommunication Networks to Enhance Customer Lifetime Predictions."
Available at SSRN, 2014.
19. Vandecruys, Olivier, David Martens, Bart Baesens, Christophe Mues, Manu De Backer, and Haesen. "Mining software repositories for com-
prehensible software fault prediction models." Journal of Systems and software 81, no. 5, (2008) pp: 823-839.
20. Essam Shaaban, Yehia Helmy, Ayman Khedr, Mona Nasr. A Proposed Churn Prediction Model. International Journal of Engineering Research
and Applications 2, (2012) pp. 693-697.
21. Hossein Abbasimehr, Mostafa Setak, M.J. Tarokh. A Neuro-Fuzzy Classifier for Customer Churn Pred iction, International Journal of computer
applications 19, (2011) pp. 0975-8887.
22. Rahul J. Jadhav and Usharani T. Pawar. Churn Prediction in Telecommunication Using Data Mining Technology. IJACSA 2 2011.
23. Burez J., D. Van den Poel. Handling class imbalance in customer churn prediction. Expert Systems with Applications 36, (2009) pp.4626–4636.
24. Innut Brandusoiu and Gavril Toderean. Churn Prediction in the telecommunications sector using support vector machines. Annals of the Oradea University
Fascicle of management and technological engineering 1 (2013).
25. Khan, Imran, et al. "Intelligent Churn prediction for Telecommunication Industry." International Journal of Innovation and Applied Studies 4.
No 1, (2013) pp: 165-170.
26. Idris, Adnan, Asifullah Khan, and Yeon Soo Lee. "Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost
based ensemble classification." Applied intelligence 39. (2013) pp: 659-672.
27. Khan, Afaq Alam, Sanjay Jamwal, and M. M. Sepehri. "Applying data mining to customer churn prediction in an internet service provider."
International Journal of Computer Applications 9. (2010) pp: 8-14.
28. Verbeke, Wouter, et al. "New insights into churn prediction in the telecommunication sector: A profit driven data mining approach." European
Journal of Operational Research 218. No 1 (2012) pp: 211-229.
29. Qureshi, Saad Ahmed, et al. "Telecommunication subscribers' churn prediction model using machine learning." Digital Information Manage-
ment (ICDIM), 2013 Eighth International Conference on. IEEE, 2013.
30. Pawlak, Z. Rough sets, International journal of computer and information science 5, (1982) pp. 341-356.
31. Pawlak, Z. Rough Set: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht, 1991.
32. Bazan Jan G. and Marcin Szczuka. The Rough Set Exploration System, Transactions on Rough Sets III, LNCS 3400 springer, (2005) pp. 37–
56.
33. Nguyen, H. S., and S. H. Nguyen. "Analysis of stulong data by rough set exploration system (RSES)." Proceedings of the ECML/PKDD
Workshop. 2003.
34. Bazan J., Nguyen H.S., Nguyen S.H., Synak P., Wróblewski J. Rough Set Algorithms in Classification Problem. Physica-Verlag, Heidelberg,
New York, (2000) pp. 49 – 88.
35. Wroblewski J., Genetic algorithms in decomposition and classification problem. In: Skowron A., Polkowski L.(ed.), Rough Sets in Knowledge
Discovery 1, Physica Verlag, Heidelberg, (1998) pp. 471–487.
36. Grzymala-Busse, J., A New Version of the Rule Induction System LERS. Fundamenta Informaticae, Vol. 31, (1997) pp. 27–39.
37. Source of Dataset http://www.sgi.com/tech/mlc/db/
38. Witten, I. H. and Frank, E. Data mining: Practical machine learning tools and techniques with java implementations. San Francisco, CA: Morgan
Kaufmann Publishers Inc, 2000b
39. Jasmina Novakovic, Using Information Gain Attribute Evaluation to Classify Sonar Targets. 17th Telecommunications forum TELFOR Serbia,
Belgrade, November, (2009) pp. 24-26.
40. Dudoit, Sandrine, and Mark J. van der Laan. "Asymptotics of cross-validated risk estimation in estimator selection and performance assess-
ment." Statistical Methodology 2, (2005) pp: 131-154.
41. John Hadden, A Customer Profiling Methodology for Churn Prediction, PhD thesis school of applied sciences manufacturing department, 2008.
42. Datta, P., masand, B., mani, D. R. & Li, B. Automated cellular modelling and prediction on a large scale. Issues on the application of data
mining, (2001) pp: 485-502

View publication stats

You might also like