You are on page 1of 11

Computers in Industry 64 (2013) 5767

Contents lists available at SciVerse ScienceDirect

Computers in Industry
journal homepage: www.elsevier.com/locate/compind

Process Mining for the multi-faceted analysis of business processesA case study
in a nancial services organization
Jochen De Weerdt a,*, Annelies Schupp a, An Vanderloock a, Bart Baesens a,b
a
b

Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Naamsestraat 69, B-3000 Leuven, Belgium
School of Management, University of Southampton, High_eld Southampton SO17 1BJ, United Kingdom

A R T I C L E I N F O

A B S T R A C T

Article history:
Received 7 September 2011
Received in revised form 9 August 2012
Accepted 18 September 2012
Available online 12 October 2012

Most organizations have some kind of process-oriented information system that keeps track of business
events. Process Mining starts from event logs extracted from these systems in order to discover, analyze,
diagnose and improve processes, organizational, social and data structures. Notwithstanding the large
number of contributions to the process mining literature over the last decade, the number of studies
actually demonstrating the applicability and value of these techniques in practice has been limited. As a
consequence, there is a need for real-life case studies suggesting methodologies to conduct process
mining analysis and to show the benets of its application in real-life environments. In this paper we
present a methodological framework for a multi-faceted analysis of real-life event logs based on Process
Mining. As such, we demonstrate the usefulness and exibility of process mining techniques to expose
organizational inefciencies in a real-life case study that is centered on the back ofce process of a large
Belgian insurance company. Our analysis shows that process mining techniques constitute an ideal
means to tackle organizational challenges by suggesting process improvements and creating a companywide process awareness.
2012 Elsevier B.V. All rights reserved.

Keywords:
Process Mining
Event log analysis
Real-life application
Financial services industry

1. Introduction
These days it is impossible for an organization to operate without
some sort of enterprise information system. During the last decades,
information systems (IS) have transformed from simple systems
with limited functionality to complex, integrated architectures. As a
result, it becomes harder to understand and monitor how these
systems impact the execution of every-day processes in organizations. Process Mining [17] offers a solution based on the extraction,
analysis, diagnosis and visualization of the data recorded by an IS
during process execution. Although in the past, major contributions
to the process mining literature were predominantly technical in
nature, techniques have proved their usefulness in practice as well.
Nevertheless, application-oriented studies have only received
modest attention and therefore, this study demonstrates the
benets and challenges of applying process mining techniques in
practice by a multi-faceted analysis of business processes within the
back ofce of a Belgian insurance company.

* Corresponding author. Tel.: 32 16 32 68 87; fax: 32 16 32 66 24.


E-mail addresses: jochen.deweerdt@econ.kuleuven.be (J. De Weerdt),
annelies.schupp@student.kuleuven.be (A. Schupp),
an.vanderloock@student.kuleuven.be (A. Vanderloock),
bart.baesens@econ.kuleuven.be (B. Baesens).
0166-3615/$ see front matter 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.compind.2012.09.010

Process Mining goes beyond the capabilities of traditional


business intelligence tools [6] with respect to process analysis.
Accordingly, it can be considered as a procient means for helping
organizations understanding their actual way of working and
thereby serving as a foundation for process improvement. This is
mainly due to the fact that the cornerstone of Process Mining is real
data that comprises how business operations are actually carried
out in an organization. This is signicantly different from other
approaches to process improvement, for instance relying on
interviews with key stakeholders.
Based on existing literature and our own experiences, a
methodology framework is described, which structures the
process mining study in a nancial services organization. This
framework is similar to earlier works [3,13], however it puts an
emphasis on data extraction and exploration as well as on the
multi-faceted nature of analyzing process execution data. Furthermore, this study claries benets as well as challenges of
conducting a real-life process mining study. For instance, the
study points out the importance of intensive two-way communication between process analysts and organizations experts and
management.
Accordingly, this paper is structured as follows. First, Section 2
outlines how the eld has emerged in the last decades and why the
application of process mining techniques in services organizations
faces distinctive challenges. We continue with a discussion of

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

58

former real-life case studies in Section 3. Then Section 4 elaborates


on the followed research methodology which is applied within a
nancial services organization in Section 5. Finally, we formulate
conclusions in Section 6.

relevant process information by using discovery techniques, BPI


focuses more on the analysis of predened key performance
indicators. It starts from an external incentive, where the critical
points are assumed to be known upfront.

2. State-of-the-art of Process Mining

2.2. Process Mining and Business Process Management

In [4], Dumas et al. bring out one of the most inuential trends
of the past decades: the shift from a data-orientation to a processorientation. In the seventies and eighties, most information
systems were built on top of the operating system with the single
goal of storing, retrieving and presenting information. Data was at
the center of design, such that process modeling was limited to the
boundaries of the information system. This could result in
inefciencies, low responsiveness and a poor understanding of
how the routines were actually executed. It was only in the early
nineties, with the emergence of management techniques such as
process reengineering, that the modeling of processes received
more attention. The changing context stimulated the progression
of process-aware information systems (PAISs) which can be found
along the whole value chain: ERP (Enterprise Resource Planning),
WfM (Workow Management), CRM (Customer Resource Management), case handling, B2B (business To business) and SCM
(Supply Chain Management) systems. Most of these systems keep
track of the actual execution of the business processes by logging
large amounts of data that form the input for process analysis
techniques.

The second domain with which there exists an overlap is


Business Process Management (BPM), as shown on the left-hand
side of Fig. 1. According to Weske et al. [25], BPM can be dened as
supporting business processes using methods, techniques, and
software to design, enact, control, and analyze operational
processes involving humans, organizations, applications, documents and other sources of information. From a BPM view, the
information system supports the complete life cycle of operational
business processes. Subsequently, Business Process Management
can be seen as an extension of a more traditional technique called
Workow Management (WfM) [20] since BPM covers the whole
life cycle of business processes. The relationship between BPM and
Process Mining is depicted in Fig. 2. It presents the BPM life cycle,
subdivided in four primary phases: Design, System Conguration,
Execution and Diagnosis. As such it shows how operational
business processes are conceived, set up, enacted and analyzed.
Within the BPM eld, a posteriori process diagnosis is largely
covered by the term Business Process Analysis (BPA). When the
focus shifts towards real-time process performance management,
the eld of Business Activity Monitoring (BAM) emerges. BAM
tools are often dashboard-like applications allowing business
executives to monitor business processes during run time. In
contrast to the common focus of BPA and BAM on getting statistics
on aggregate data, Process Mining works on a deeper level of detail
by digging into the exact paths of execution.
Process Mining comprises a collection of techniques to analyze
the information stored in event logs, where the analysis focuses on
the discovery, monitoring and improvement of processes. In Fig. 2,
three often distinguished types of tasks are represented: discovery,
conformance and extension tasks. One key aspect of discovery is
learning process models from the execution data. However,
discovery tasks can also target other aspects such as the
originators, for instance by revealing the social networks in an
organization. Besides discovery tasks, conformance checking aims
at quantifying the discrepancy between a process model (either an
a priori or a discovered process model) and the process executions

2.1. Process Mining and Business Intelligence


In the early nineties, Business Intelligence (BI) tools gained
popularity. Golfarelli et al. [6] dene BI as the process of turning
data into information and then into knowledge. It can be seen as a
bottom-up analysis of enterprise data to improve the decision
making process and to help managers understand their business
and the positioning in its competitive environment.
Ever since process-related management techniques emerged, a
new subdomain of BI materialized, adopting this change of focus:
Business Process Intelligence (BPI). BPI is a set of integrated tools
that supports business and IT users in managing process execution
quality by providing several features, such as analysis, prediction,
monitoring, control and optimization [7]. The topic of this paper,
Process Mining, nds itself at the intersection of BPI and two other
domains, namely Business Process Analysis and Business Activity
Monitoring (BAM). These last two domains can be situated within
the eld of Business Process Management. Fig. 1 illustrates the
links between the different terms with Process Mining being the
center of attention. Firstly, the overlap between BPI and Process
Mining stems from the underlying idea of gathering ex-post
process knowledge out of logged data. Yet the goal of doing so
differs signicantly between both. While Process Mining extracts

Business Process
Management

Business
Intelligence

BPA

Process
Mining

WfM

BPI

BAM

Fig. 1. Process Mining in a broader context.

Fig. 2. Process Mining within the BPM life cycle.

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

as registered in an event log. Since existing process models or not


always available, contrasting the actual process behavior with
managements expectations, business rules or legislative regulations is a valuable option as well. The latter is often referred to as
compliance verication. Finally, the diagnostic information that
can be derived from an event log allows for a third type of tasks:
extension tasks. The goal is to enrich a process, organizational or
social model with other information. Bottleneck analysis for
example scrutinizes timestamp information of activities, process
paths, performers and organizational units.
2.3. Challenges for Process Mining in services organizations
Given the human-centric nature of typical business processes in
services organizations, process mining analysis can be valuable by
providing objective insights into the actual way of working
because of its foundation on execution data. However, three main
challenges are identied that complicate the application of process
mining in such exible environments. These challenges originate
mainly from the fact that many of the information systems that
support the business processes are process-unaware.
 Data capturing. Information systems that lack a process orientation often fail to capture data such that analysis from a process
viewpoint can be initiated. Nevertheless, due to the decreasing
cost of data storage and due to an increased awareness of
potential benets, it is often observed that large amounts of
execution data are being captured in todays organizations.
However, it is often demanding to extract and convert data so
that it can be used for process mining analysis.
 Process scope denition. Another consequence of the absence of a
process-aware information system is the identication of the
actual processes themselves. Frequently, actual business processes cover multiple organizational units. Furthermore, distinct
business processes regularly share similar information infrastructures and data storage systems which again complicates the
correct delineation of a true business process.
 Process mining analysis. A nal challenge consists of the
application of process mining techniques to execution data.
Given the fact that a typical information system in such exible
environments imposes very little restrictions, a wide variety of
behavior is observed and stored. Generally, applying process
mining techniques to such complex data sets is arduous.

59

With the purpose of addressing these challenges, Sections 4 and


5 outline a methodology framework and a case study which
demonstrate possible solution strategies and guidelines for the
analysis of business processes in a services environment. First, the
next section provides an overview of related work.
3. Related work
In the process mining literature, attention has been largely
bestowed on the development of novel techniques and algorithms
with a strong focus on the control-ow discovery perspective [21].
Only a limited number of articles describe practical applications.
As such, this work complements former rather technical contributions by proposing a distinctive methodology for the analysis
of process inefciencies within services organizations. Our ndings
are validated with a case study in which several existing process
mining techniques are applied to a data set originating from the
back ofce process of a nancial services company. Although
practical process mining cases in the literature are rare, real-life
applications are essential to prove the utility of Process Mining in
practice. Eleven relevant papers are listed in Table 1 where Process
Mining is used as a practical tool to help organizations understand
their business.
The rst process mining application study was described in
[18]. The authors verify the applicability of existing process mining
techniques such as HeuristicsMiner and social network analysis to
the invoice handling of a local public services ofce. The major
contribution of this study was the demonstration of process
mining techniques according to the different perspectives of an
event log: control-ow, organizational and case data information.
In that sense, our case study is similar even though the actual
approach towards analyzing different perspectives is distinct.
Public services has been a popular application area for Process
Mining. Next to [18], Alves de Medeiros et al. [2] took an interest in
four processes from Dutch municipalities to test the robustness of
the Genetic Miner algorithm. Also in [16], a municipality process is
used for demonstrating the application of organizational analysis
techniques. Note that these studies, in contrast to our work, only
focus on one specic perspective for analysis. In [3], a custom six
phase methodology for process diagnostics is proposed and
validated with another public services case study. In contrast to
the methodology presented in the next section, this framework is
rather coarse-grained whereas our framework is specically

Table 1
An overview of practical applications of process mining in the literature.
Author

Domain

Year

Event log details


# Cases

Van der Aalst et al. [18]


Alves de Medeiros et al. [2]

Mans et al. [11]


Song and Van der Aalst [16]
Bozkaya et al. [3]
Jans et al. [10]
Rozinat et al. [14]
Rozinat et al. [15]

Goedertier et al. [5]


Van der Aalst et al. [19]

Rebuge and Ferreira [13]

Public services
Public services

Healthcare
Public services
Public services
Financial services
Wafer scanner industry
Public services

Telecom industry
Public services

Healthcare

2007
2007

2008
2008
2009
2009
2009
2009

2011
2011

2011

14.279
4 event logs:
35
100
358
407
627
570
83.611
10.000
24
2 event logs:
363
570
17.812
2 event logs:
796
1.882
627

# Activity types

# Events

# Originators

17

147.579

487

14
17
11
19
376
9
60
7
720

3.023
276.333
62.531
154.966

109
60
290

5
10
42

1.817
6.616
80.401

13
110

376

5.187
11.985

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

targeted towards the analysis of exible processes in services


environments.
Rozinat et al. [15] conducted two more case studies in the
public services domain. However, the contribution with respect to
the application of process mining in practice is rather limited.
Instead, process mining techniques are applied in order to validate
and assess the quality of simulation models. There exists some
common grounds with our work in the sense that the proposed
simulation models integrate multiple perspectives. Yet, this study
makes use of multiple perspectives to search for inefciencies
instead of creating a simulation model. The last two real-life case
studies performed in this domain are presented in [19]. This paper
introduces a new approach to the current process mining
techniques. The discovered process models are used as an
extension to forecast the completion time of running instances.
In both real-life cases the data is split up fty-fty to a learning set
and test set with the main focus on the quality of the time
predictions. Although this study contributes to illustration of the
applicability of process mining in practice, it does not present a
general methodology framework.
Also in the private sector, the applicability of process mining
techniques has been studied. Goedertier et al. [5] apply different
process discovery techniques within the telecom industry. It was
found that applying these techniques to the highly exible process
at hand created severe challenges for both knowledge discovery as
well as knowledge evaluation. Since this study entails an extensive
empirical evaluation, it can be considered as a key contribution to
the literature on applying control-ow discovery to real-life data.
Furthermore, the healthcare industry can benet a lot from
Process Mining as well. Mans et al. [11] and Rebuge and Ferreira
and [13] demonstrate how process mining can be applied to
complex care pathways. Their results reveal the possibility of
uncovering understandable models from large groups of patients
with the purpose of tracking deviations from guidelines. The latter
study also provides a true methodology for applying process
mining in the context of healthcare. Despite the fact that there
exist some commonalities such as data preprocessing and the
different analysis perspectives, their methodology is centered on
the application of sequence clustering in order to discover different
usage scenarios.
A somewhat rare application in a manufacturing context is
presented in [14]. The study focuses on the conformance aspect in
the wafer scanner industry. The paper points out that in todays
processes vital information is missing for compliance purposes, but
that in the future more audit data will become available. The
application of Process Mining for auditing is further explored in
[10]. In this study, it is shown that internal fraud can be
investigated by making use of process mining techniques such
as the LTL-checker in ProM.1
In conclusion, this section shows that Process Mining can be
successfully applied in a wide variety of practical situations.
However, there are still a lot of unexplored paths and there exists a
need for more real-life case studies providing evidence for the
effectiveness of Process Mining.
4. A methodological framework for applying Process Mining in
practice
The objective of this paper is to conduct a case study that proves
the utility of Process Mining in practice and to provide a suggestion
on how Process Mining can be applied in real-life. It has been found
that many of the proposed algorithms have difculties in dealing
with real-life event logs since these logs typically exhibit much less
structured process behavior [5,8,22]. Today, this remains one of the
1

http://www.promtools.org/prom5/.

most important challenges for process mining research. As such, so


as to apply process mining techniques in practice, guidelines
should be put forward on how to conduct such a case study.
Accordingly, this study proposes a methodology framework in line
with earlier works [3,13]. Our Process Mining Methodology
Framework (PMMF), as depicted in Fig. 4, emphasizes the early
phases in such an analysis, i.e. data preparation and exploration.
Furthermore, our framework acknowledges the multi-faceted
analysis that is often required, especially when analyzing highly
exible business processes where the underlying information
system allows a lot of freedom to its users. Because of this
exibility, it is typically found that the logged data is less
structured and much more difcult to analyze. The demonstration
of the application of our framework in Section 5 can help process
miners to overcome different issues encountered during a process
mining analysis.
The framework consists of ve major building blocks:
preparation, exploration, perspectivization, analysis and results.
Before any analyses can be performed, data must be collected and
prepared. The rst component in the preparation phase is the data
extraction part. A rst challenge crops up when the process scope
and the time frame are to be determined. The goal is to extract data
from the bulk of information stored in the system that is relevant
for the analysis. There is a clear trade-off between a too wide and a
too narrow process scope. When too much data is extracted two or
more processes can get mixed up, giving a process model that is
hard to interpret. If less data is incorporated in the analysis, a
straightforward interpretation can be obtained, but the danger of
missing part of the knowledge increases. The problem of scope
determination also turns up in the time dimension. A time frame
that is too wide can cause an overow of data, causing existing
process mining techniques requiring large computation times and
often resulting in incomprehensible results. On the other hand, if
the process uctuates strongly because of seasonality for example,
a small time frame may not be long enough to include such
patterns. In general, dening an accurate process scope can be
achieved by scrutinizing the different activities registered in the
execution data. Furthermore, case data also often provides
valuable information on how to identify different business
processes within one information system. The process of
delineating a business process within the execution data of an
information system is depicted in Fig. 3. Dening an appropriate
time frame requires either expert judgment or an iterative analysis
procedure, as explained further.
The next phase is process data exploration, where a rst
impression of the data is gathered. For this purpose, a lot of

Case data

60

Business activities
Fig. 3. Delineation of a business process in terms of process scope (activities, case
data) and time frame.

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

statistical information can be assessed. Also, with the availability


of different control-ow mining algorithms, a number of initial
visualizations of the process can be obtained. Relying on these
exploration steps, business experts can iteratively improve the
process scope and time frame in order to ensure that the input data
is suitable for further analysis. The feedback loop for the
adjustment of both process scope and time frame is imperative
and therefore explicitly depicted in Fig. 4.
After agreeing upon a satisfactory data set according to time
and scope, different analysis perspectives can be identied.
Typically, process data will contain information on control-ow,
organizational and case data information. In specic situations
such as the case study presented in the next section, multiple event
logs can be constructed for analysis. This is an atypical way of
working since usually, one single event log captures the three
dimensions at once. However, specically in process-unaware
environments, it proves benecial to construct multiple event logs
for analyzing the different perspectives. For instance, in a
Document Management System (DMS) document types are often
an important source of information. By constructing a dedicated
event log in which document types are considered as activities,
multiple control-ow and performance analysis techniques can be
applied to provide valuable insights. In similar fashion, analyzing
organizational ows can be worthwhile as well.
After identifying the different analysis dimensions and deciding
upon the construction of multiple event logs out of the execution
data, actual analysis can be initiated. The analysis phase is
subdivided into two major segments: the basic discovery analysis
and the more detailed compliance and performance analysis. In the
discovery phase, we distinguish a control-ow, an organizational
and a case data perspective on the data. The control-ow
perspective includes the analysis of activity sequences within
the business process. Furthermore, the data can be explored from
an organizational point of view, for instance by investigating the
teams involved in the process. Finally, it is often valuable to explore
the underlying data elements of process executions in order to
discover particular patterns.
Typically, a discovery analysis will highlight different points of
interest for which a more thorough assessment is appropriate. The
PMMF considers two different types of in-depth analyses:
performance and compliance analysis. In the case study presented

61

below, management was mainly interested in case throughput


times. On the one hand, performance analysis can be carried out in
general for the whole set of process executions. On the other hand,
it is typically worthwhile to explore performance in more detail.
Subsets of traces are then investigated in order to examine the
impact of control-ow or other characteristics of these process
executions. The in-depth analysis also consists of compliance
analyses which can be of interest to the organization. Compliance
verication is about validating whether the reality is consistent
with expected or required process behavior. In Section 5.4,
compliance issues are thoroughly investigated because process
inefciencies coincided with deviating behavior with respect to
managements expectations.
Finally, the result phase is the closing stage of the PMMF. The
outcome of the analyses will be a valuable starting point for
process improvement or even process reengineering steps in order
to optimize the business process at hand. Management can dene
new goals and measurements based on the new insights obtained
with Process Mining in order to resolve for instance identied
process inefciencies.
5. Case study
With the purpose of this paper being the demonstration of the
usefulness of process mining analyses in practice, we describe a
case study in the nancial services industry. This industry is of
main interest for Process Mining since there exist plenty of humancentric business processes for which the analysis of event logs
proves especially worthwhile. The case at hand involves a large
Belgian insurance company. Products include life and non-life
insurances and retirement savings, which are mainly offered
through a large network of brokers.
5.1. Context
The core information system underlying the insurance companys back ofce can be best described as a Document
Management System. As soon as a physical document arrives at
the company, being an e-mail, a claim, an application offer, etc., it is
digitalized and added to the DMS. Each document receives a
document ID and document type before it is sent to a team for
further processing. From then on, the system keeps track of every
step taken during the documents life cycle by recording which
actions were taken by which team at which moment in time. These
data are the cornerstone for the process mining case study outlined
below.
Although plenty of process-related information is available, the
company had no clear idea of how the back ofce processes
actually look like, nor could it provide accurate performance
estimations for the different document types. In the next section
the process of nding the answers to these questions using the
Process Mining Methodology Framework is explained.
5.2. Data preparation and exploration
The rst phase in any process mining analysis consists of
preparation and exploration of the available process data. All the
event information needed to conduct a process mining analysis
was present in the DMS of the company. In a rst phase, these data
were extracted from the DMS and converted into a standard event
log storage format, in this case MXML.

Fig. 4. The Process Mining Methodology Framework.

5.2.1. Process data exploration and scope adjustment


In a next step, the extracted data was imported into the analysis
tool ProM, where an early inspection could start. A major challenge
encountered was the substantial size of the data set causing

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

62

Table 2
Characteristics of the event logs applying trace-based frequency ltering for the different perspectivesthe number of process instances (# PI), the number of distinct process
instances (# DPI), the number of activity types (# AT).
Frequency threshold

Event log characteristics

0
50
0
50
0
50

44.880
34.769
44.880
40.406
44.880
39.396

# PI
Control-ow log
Team ow log
Document ow log

(100%)
(77.47%)
(100%)
(90.47%)
(100%)
(87.78%)

difculties for both data processing and data analysis. A rst


important reason why the data was difcult to work with was
the absence of almost any restriction on the possible execution
paths within the system. It was found that the DMS was
nowhere near the more structured process behavior typically
found in for example BPMSs. Due to the unlimited number of
possible process executions, any typical control-ow process
model rendered incomprehensible. As a consequence, it proved
very difcult to mark out the boundaries of the process under
investigation in the larger DMS. Therefore, the data scope had to
be ne-tuned multiple times, going through the scope adjustment loop after each inspection, until a satisfactory data set was
obtained.
In consultation with the insurance companys management,
the nal event log consisted of the execution paths of all the
proposals dealt with in a six-month period. As explained in the
previous section, the execution data could be decoupled into three
different event logs according to three different information
perspectives. The rst log corresponded to the control-ow,
where each case (being a document) followed a certain sequence
of activities. A second log was constituted from a performer
viewpoint: each case now followed a certain sequence of teams.
More specically, each event corresponds to a change of hands, so
that each time a document was forwarded, that event was
included. The last log was composed from a case data view based
on the attribute document type. This was useful, since an
incoming document receives a preliminary type, which can be
further specied or changed during its life time. In the same way
as the former log, the document type ow can be tracked per
process execution.
The construction of multiple event logs from the same
data set according to three different perspectives resulted in
an atypical way of working. Usually, an event log is centered
on capturing control-ow activity events which are performed
by a certain originator and to which a number of case data
elements can be added. All three analysis perspectives can be
investigated by analyzing this single event log. However, in this
case we constructed three different event logs according to the
different perspectives. The reason why we followed a different
strategy stems from both the specicity of the data and the
expectations of the organizations executives. After several
discussions, it appeared that both the team ow and the
document type ow represented the business process best.
Accordingly, it appeared valuable to make available control-ow
mining techniques with respect to the teams and the document
types.
5.3. Discovery analysis
According to the methodological framework, the three discerned event logs are rstly analyzed in an exploratory way in
order to nd interesting observations for further analysis.

Heuristic net
characteristics
# DPI

#AT

# Arcs

# Activities

4.494
667
1.222
77
2.027
41

15 activities
13 activities
59 teams
30 teams
189 documents
22 documents

209
69
498
165
209
69

16
14
61
32
191
24

5.3.1. Control-ow event log: business activities


The control-ow based event log contained a total of 44.880
cases (documents) with fteen real business activities. A discovery
analysis typically starts with the visualization of the underlying
process model. The visualization for this event log is strongly
complicated by the fact that the event log consists of 4.494 distinct
process executions (DPI), as illustrated in Table 2. The complexity
is even further emphasized by the fact that 3.345 instances are
unique activity sequences.
One method to limit the variety in process behavior is to narrow
the scope to more frequent behavior. For this case study the log
was ltered based on process instance frequencies. A process
instance frequency is interpreted as the number of times a same
sequence of activities is traversed by different process instances.
Applying a frequency threshold of fty, a total 34.769 cases or
77.47% of all cases were withheld. On this ltered data set,
HeuristicsMiner [24] was performed in order to visualize the
process as in Fig. 5. This process model covers 667 frequent process
executions. Accordingly, this model only conforms to mainstream
behavior and thus excludes exceptions and infrequent cases that
might be interesting as well.
From a business perspective, no particular conclusions can be
drawn from the control-ow analysis. Although the analysis
indicates a strong behavioral freedom accorded by the information
system, this freedom is often necessary and efcient. More
stringent conditions on possible activity sequences might be
introduced in order to reduce the level of self-determination of the
users, but there is no guarantee that the process itself could be
improved signicantly in this way. In order to tune the analysis to
business interests, the attention shifted to features business users
work with on a daily basis: documents and other teams. In next
paragraphs the corresponding perspectives are explored.
5.3.2. Organizational event log: team routing
The process data can also be examined from an organizational
perspective, where a ow represents the sequence of teams that
worked on the same document. A rst attempt to visualize the
team ows resulted in a completely incomprehensible model. It
reected the complexity of the data in the log: 1.222 different
paths could be followed by 59 teams, represented by 498 arcs in
the dependency graph (see Table 2). To allow for the extraction of
useful knowledge from the log, the amount of team paths needed
to be reduced in a similar fashion as described in Section 5.3.1.
Focusing on frequent behavior, a general view on how teams were
interconnected could be obtained. In a similar fashion as with the
control-ow perspective the event log was ltered for team paths
that were traversed by at least fty different document instances.
The result was an involvement of only 30 teams in 77 possible
paths presented by the dependency graph in Fig. 6. Although this
might imply a drastic cut in the data, the opposite appeared to be
true: more than ninety percent of all the log traces were retained
after ltering.

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

63

Fig. 5. Visualization of the dependency graph for the control-ow log with a frequency threshold of 50.

Looking more closely at the team ow process model, it can be


pointed out that there are three teams operating on cases more
than 10.000 times: In, Central and A1. First of all, the In-team
received part of the incoming mail and forwarded the documents
to a specic team for further processing. If there was confusion
about which team was next, the document was sent to the Centralteam. This team, involved 15.415 times in the ow, was
responsible for those forwarded mails and for most other incoming
documents. It acted as a central administration point and rstly

dened the document type. Afterwards the documents were


forwarded to a team specialized in the document type at hand.
When a document was wrongly forwarded, most likely the other
team sent it back. A last team that had a high involvement (10.025
times) was the A1-team which was mainly due to the fact that this
team could treat multiple document types.
By discussing these initial ndings with the business users, a
rst possible inefciency was identied. Several teams stated that
they were burdened by reiterated handovers, being documents

Fig. 6. Visualization of the dependency graph for the team ow log with a frequency threshold of 50. The enlargement shows one of the many reiterated handovers between
teams.

64

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

Fig. 7. Visualization (dependency graph) for the document type event log with a frequency threshold of 50.

that were sent back and forth between teams. Since this issue was
conrmed by our analysis (see Fig. 6), this behavior is further
investigated in Section 5.4.2.2.
5.3.3. Case data event log: document types
Finally, an exploratory analysis of the document types was
carried out. As such, we investigated the sequences of the
document type changes. Due to the infeasibility of displaying
the model for the complete event log, a lter was applied similar to
former perspectives to visualize only frequent document paths.
The results of the ltering were striking since the number of
document types decreased from 189 to only 22 and the total of
different possible paths declined from 2.027 to 41. Because almost
90 percent of the logged data is retained and form the input for the
resulting process model in Fig. 7, the applied ltering can be
considered very effective with respect to limiting the behavior and
as such being able to derive insights from the data. Even though the
system exhibits a large amount of freedom, a large amount of
behavior is captured by a small amount of possible document type
sequences. This emphasizes the relevance of analyzing the most
frequent document paths.
An analysis of the discovered model highlighted the issue of
wrong document classication. For instance, certain documents
change from a private proposal to a business proposal or vice versa.
In consultation with business experts, this problem was considered for further analysis.

in Fig. 8. By means of the IDs of the standard cases in terms of


document ows, these cases were assessed from a control-ow
and an organizational perspective. As such, questions like which
teams treated those documents? and what business activities are
usually performed in a standard process? were answered.
The histogram in Fig. 9(a) displays the throughput time
distribution of the benchmark event log. The x-axis depicts the
throughput time, calculated in working days of 9 h. On the vertical
axis the frequency in number of instances is shown. The mean time
a document resided in the system before it was terminated is 8.26
working days, with a standard deviation of 8.5 days. In the nal
analysis sections, we will employ the characteristics of this
benchmark event log as a point of reference to evaluate bottlenecks
and other process inefciencies.
5.4.2. Compliance analysis
Verifying compliance is an important step towards process
improvement. Compliance can be assessed by verifying strictly

5.4. In-depth analysis: tracking process inefciencies


In the in-depth analysis, the investigation was taken a step
further, digging deeper into the data to come up with interesting
insights. Because of the main focus on performance in terms of
throughput time, it was decided to create a benchmark event log to
better judge unwanted process behavior.
5.4.1. Performance analysis: creating a benchmark
In order to assess the impact of the compliance issues explained
in the next sections, a benchmark log was created rst. It
represents standard behavior corresponding to very frequent,
normal process executions. The analysis started from the document perspective since this was the most intuitive view for
management. By investigating the available document ow
patterns, it was agreed upon to select the six most occurring
ows to form the benchmark set of cases. These ows are depicted

Fig. 8. Visualization (dependency graph) of the six most frequent document type
sequences.

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

65

5.4.2.1. Wrong document classication. An example of a business


rule in the insurance companys back ofce is the requirement that
certain document types were not allowed to appear in the same
document type sequence. However, the data contained several
violations against this rule. Often, documents were originally
classied as a business proposal but later on in the process rectied
to a private proposal or the other way around. Note that due to the
application of thresholds by the HeuristicsMiner algorithm,
misclassication paths are not directly detectable in the process
model in Fig. 7. To this, it should be added that some paths are
present in the model but difcult to identify because of the
simplied notation by means of a dependency graph in ProM,
which disguises the types of splits and joins. Despite the fact that it
is possible to visualize this split/join information, it makes a
heuristic net not straightforward to interpret by business users and
makes the discovered process models less appropriate for
discussion with organization executives. This is also conrmed
by Mendling et al. [12] who investigate the inuence factors of
process model understandability. Better visualization techniques
are therefore requisite to avoid wrong interpretations and improve
the understandability for business users.
To estimate the impact of this inefciency, all document type
traces in which both a private proposal and a business proposal
occurred, were ltered out. There appeared to be 1.422 cases in the
data fullling this condition. After conducting a performance
analysis on traces with one or more wrong document classications, it was shown that it took on average 7.41 working days
longer to complete such a document as compared to a normal trace
(Fig. 9(b)). This is a signicant difference, especially because other
faulty document type reclassications reside in the process as well.
Since a similar throughput time problem can be expected,
improvement actions seem indispensable in this area. As such,
the analyses can provide important decision support in order to
shape a process reengineering plan.

Fig. 9. Different control-ow models for the simple event log. (a) Benchmark event
log, (b) document classication problem, (c) reiterated handover problem.

5.4.2.2. Reiterated team handover. At the insurance company,


teams complained about a high amount of documents that were
forwarded and then sent back to the sender. It was observed that in
841 cases, reiterated handover between teams existed. Therefore,
we decided to investigate the inuence on the efciency of the
process in terms of throughput time. In order to separate process
executions with repeated forwarding, the analysis started from the
organizational log. By using the LTL-checker in ProM, we were able
to identify and separate those process executions for which a
particular team appeared more than once. An example of this
reiterated handover between teams is shown on the right-hand
side of Fig. 6 for the teams Control and A1.
Afterwards, the performance information for all combined
reiterated handover cases was looked at. Our presumption was
conrmed: a reiterated handover trace endured longer in
comparison with standard behavior (Fig. 9(c)). On average, it took
twice as long as a normal execution path (17 h compared to 8 h),
even worse than the document classication problem. Although
this behavior represents only 2 percent of the log, it can be
interesting to look at the cause of this behavior, since it impacts the
throughput in a very negative way.
5.5. Results: process improvement measures

dened rules and regulations, but also by contrasting managements expectations with the actual process behavior. Both types of
compliance analysis feature in this section. Firstly, a formal
analysis was executed based on a business rule concerning
document classication. Next a more informal analysis looked at
whether management expectations in terms of reiterated team
handover were conrmed.

Ultimately, the results of the case study were considered


remarkable by the organizations management. By contrasting
actual behavior as registered in the logged data with expectations
and requirements, different insights and guidelines for process
improvement could be formulated. Firstly, we observed that the
quality of a process mining study can only be as good as the quality
of the input data. For instance, clear verb-object names for

66

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767

denoting activities prove very helpful for interpretation of the data.


Furthermore, for enhanced performance analysis, both start and
end timestamps of activities should be kept track of. According to
these ndings, a number of recommendations were proposed to
improve and guarantee data quality.
Secondly, the opportunity to work out a business case for the
introduction of OCR (Optical Character Recognition) technology
was identied. Since the document misclassication appeared to
be an important source of process inefciency, OCR might resolve
this issue. By improving this manual process to a fully automated
scanning and classication process where the type of the
document is recognized based on keywords, signicant process
performance gains can be expected.
Finally, the exposure of multiple inefciencies offers a great
opportunity for management to make employees aware of them
and improve the business process by better training and guidance.
For instance, the downside of frequent reiterated handover could
be emphasized in order to reduce process inefciency.
6. Conclusion
In this paper we addressed the issue of the applicability of
Process Mining in real-life environments. We acknowledge the
importance of applied process mining studies, especially since
their conclusions often differ from the contributions advanced in
studies that introduce novel analysis techniques. However, the
related work section indicates that the issue of applying process
mining techniques in practice should receive more attention, also
in academic literature. Accordingly, we provide a methodology
framework structuring the process of a real-life analysis and
illustrate its usefulness in a multi-faceted case study, situated in
the nancial services industry. The starting point of analysis is
process data extracted from a Document Management System as
implemented by a large Belgian insurance company to support its
back ofce processes.
The case study shows that an important element of process
mining analysis is the data preparation and data exploration part.
More specically, the information feedback loop between process
analysts and business experts proves to be a critical success factor.
This two-way communication is essential to discover useful
business insights based on the techniques of the process mining
eld. Accordingly, the feedback loop allows for improvement of
both the process scope and the quality of the data. A second
observation is the usefulness of combining several perspectives to
get a fuller understanding of the process, which is entirely
captured by the third and fourth phase of our methodological
framework. First a discovery analysis can indicate what perspectives are interesting from a management point of view. This
enables a more focused and more valuable scope for further
analysis. Secondly the integration of several perspectives in an
innovative way exploits the capability of searching inefciencies
instead of solely discovering process models. Subsequently, based
on the outcomes of our analysis, it is found that Process Mining can
be an effective and efcient technique to expose organizational
challenges. Moreover, a process mining analysis on real-life event
logs can be the starting point for the involved management to
formulate specic measures to improve the business processes.
Another conclusion is the applicability of process mining
techniques on structured and less structured information systems.
In the case of a less process-based information system such as a
Document Management System, Process Mining allows to obtain a
model of the actual process based on the data in the event log.
Process Mining proves especially useful to contrast expected
behavior with actual behavior as reected in the data. Since more
exibility is an important characteristic of information systems
that are process-aware but not entirely process-based, Process

Mining is an ideal means to discover how the system is used and


where process inefciencies should be countered.
Furthermore, the case study uncovers several limitations of the
application of currently available process mining techniques on
real-life data. First of all, the greatest strength of Process Mining, its
reliance on the actual data, is a weakness as well. Today, in many
organizations, logging infrastructures are imperfect and business
activities executed outside the system cannot be taken into
account for analysis. Also, process analysts must rely on the quality
of delivered input data. Both depth and correctness of results are
greatly determined by the quality of the process data. Secondly,
process mining techniques still struggle with substantial amounts
of data reecting unstructured process behavior. Highly complex
processes, which can be expected in highly exible environments
such as nancial services back ofces, are a denite challenge. As
explained in this study, the use of relevant lters can be a solution
to extract important knowledge from such event logs. However, we
think that future research in this area is required to enhance and
adapt process mining techniques to real-life environments.
Nevertheless, we are convinced that process mining techniques
will keep improving and as such increase their usefulness in
practice. For instance this work shows how Process Mining can
provide novel and powerful ways to analyze business processes.
Notwithstanding a number of interesting approaches (e.g. [9,23]),
academic research should focus even more on how process mining
techniques can be improved to meet the requirements in practice
with respect to interpretability and scalability.
References
[2] A.K. Alves de Medeiros, A.J.M.M. Weijters, W.M.P. van der Aalst, Genetic process
mining: an experimental evaluation, Data Mining and Knowledge Discovery 14
(2) (2007) 245304.
[3] M. Bozkaya, J. Gabriels, J.M.E.M. van der Werf, Process diagnostics: a method
based on process mining, in: A. Kusiak, S. Lee (Eds.), eKNOW, IEEE Computer
Society, 2009, pp. 2227.
[4] M. Dumas, W.M.P. van der Aalst, A.H.M. ter Hofstede, Process Aware Information
Systems: Bridging People and Software Through Process Technology, WileyInterscience, 2005.
[5] S. Goedertier, J. De Weerdt, D. Martens, J. Vanthienen, B. Baesens, Process
discovery in event logs: an application in the telecom industry, Applied Soft
Computing 11 (2) (2011) 16971710.
[6] M. Golfarelli, S. Rizzi, I. Cella, Beyond data warehousing: whats next in business
intelligence? in: I.-Y. Song, K.C. Davis (Eds.), DOLAP, ACM, 2004, pp. 16.
[7] D. Grigori, F. Casati, M. Castellanos, U. Dayal, M. Sayal, M.-C. Shan, Business
process intelligence, Computers in Industry 3 (2004) 321343.
[8] C.W. Gunther, Process mining in exible environments, Ph.D. thesis, TU Eindhoven, 2009.
[9] C.W. Gunther, W.M.P. van der Aalst, Fuzzy miningadaptive process simplication based on multi-perspective metrics, in: Proceedings of the 5th International
Conference BPM 2007, Brisbane, Australia, September 2428, 2007, (2007), pp.
328343.
[10] M. Jans, J.M. van der Werf, N. Lybaert, K. Vanhoof, A business process mining
application for internal transaction fraud mitigation, Expert Systems with
Applications 38 (10) (2011) 1335113359.
[11] R.S. Mans, H. Schonenberg, M. Song, W.M.P. van der Aalst, P.J.M. Bakker, Application of process mining in healthcarea case study in a Dutch hospital, in: A.L.N.
Fred, J. Filipe, H. Gamboa (Eds.), BIOSTEC (Selected Papers), Communications in
Computer and Information Science, vol. 25, Springer, 2008, pp. 425438.
[12] J. Mendling, H.A. Reijers, J. Cardoso, What makes process models understandable?
in: Proceedings of the 5th International Conference BPM 2007, Brisbane,
Australia, September 2428, 2007, (2007), pp. 4863.
[13] A. Rebuge, D.R. Ferreira, Business process analysis in healthcare environments: a
methodology based on process mining, Information Systems 37 (2) (2012)
99116.
[14] A. Rozinat, I.S.M. de Jong, C.W. Gunther, W.M.P. van der Aalst, Process mining
applied to the test process of wafer scanners in ASML, IEEE Transactions on
Systems, Man, and Cybernetics, Part C 39 (4) (2009) 474479.
[15] A. Rozinat, R.S. Mans, M. Song, W.M.P. van der Aalst, Discovering simulation
models, Information Systems 34 (3) (2009) 305327.
[16] M. Song, W.M.P. van der Aalst, Towards comprehensive support for organizational
mining, Decision Support Systems 46 (1) (2008) 300317.
[17] W.M.P. van der Aalst, Process MiningDiscovery, in: Conformance and Enhancement of Business Processes. Springer, 2011.
[18] W.M.P. van der Aalst, H.A. Reijers, A.J.M.M. Weijters, B.F. van Dongen, A.K. Alves de
Medeiros, M. Song, H.M.W. Verbeek, Business process mining: an industrial
application, Information Systems 32 (5) (2007) 713732.

J. De Weerdt et al. / Computers in Industry 64 (2013) 5767


[19] W.M.P. van der Aalst, M.H. Schonenberg, M. Song, Time prediction based on
process mining, Information Systems 36 (2) (2011) 450475.
[20] W.M.P. van der Aalst, A.H.M. ter Hofstede, M. Weske, Business process management: a survey, Business Process Management (2003) 112.
[21] W.M.P. van der Aalst, A.J.M.M. Weijters, L. Maruster, Workow mining: discovering process models from event logs, IEEE Transactions on Knowledge and Data
Engineering 16 (9) (2004) 11281142.
[22] G.M. Veiga, D.R. Ferreira, Understanding spaghetti models with sequence clustering for ProM, in: S. Rinderle-Ma, S.W. Sadiq, F. Leymann (Eds.), Business Process
Management Workshops, Lecture Notes in Business Information Processing, vol.
43, Springer, 2009, pp. 92103.
[23] A.J.M.M. Weijters, J.T.S. Ribeiro, Flexible heuristics miner (fhm), in: CIDM, IEEE,
2011, pp. 310317.
[24] A.J.M.M. Weijters, W.M.P. van der Aalst, A.K. Alves de Medeiros, Process mining
with the HeuristicsMiner algorithm. BETA Working Paper Series 166, TU Eindhoven, 2006.
[25] M. Weske, W.M.P. van der Aalst, H.M.W.E. Verbeek, Advances in business process
management, Data & Knowledge Engineering 50 (1) (2004) 18.
Jochen De Weerdt received a Masters degree in Business Economics - Information
Systems Engineering from KU Leuven, Belgium. He is currently employed as a
scientic researcher at the Department of Decision Sciences and Information

67

Management at the KU Leuven. His research interests include data mining, process
mining, and web intelligence..

Annelies Schupp received a Masters degree in Business Economics - Information


Systems Engineering from the KU Leuven, Belgium. She is currently working as a
functional/business analyst at AE nv.

An Vanderloock received a Masters degree in Business Economics - Information


Systems Engineering from the KU Leuven, Belgium. She is currently working as a
functional/business analyst at AE nv.

Bart Baesens holds a Masters degree in Business Engineering (option: Management Informatics) and a PhD in Applied Economic Sciences from KU Leuven,
Belgium. He is currently an associate professor at KU Leuven, and a guest lecturer at
the University of Southampton (United Kingdom). He has done extensive research
on data mining and its applications, for instance in the elds of CRM, credit risk
management, fraud detection, software engineering, business process intelligence,
web analytics and mining, and social networks.

You might also like