Process Mining 101

Introduction

Modern enterprise systems maintain detailed records of events that occur during the execution of the business processes they support. For example, a customer relationship management (CRM) system keeps track of virtually every interaction between the moment a customer makes an initial inquiry, until the moment the customer places their first purchase order. Meanwhile, an enterprise resource planning (ERP) system at a manufacturing company keeps track of all purchasing events, inventory movements, invoice approvals, and other transactions that occur within the company’s purchase-to-pay process.

The same can be said of industry-specific enterprise systems such as lending management systems (banking), claims management systems (insurance), and hospital management systems (healthcare). Every one of these systems supports one or more end-to-end processes and in doing so, they collect records of every step in the process.

These records allow us to retrace the execution of every instance of a process from start to end. For example, the events in a CRM system allow us to see how a customer inquiry becomes a quotation, how a quotation becomes a purchase order, and how this purchase order is delivered and invoiced.

Process mining is a collection of methods to extract and consolidate records of the execution of a business process and to analyze these records by means of different types of visualizations. These visualizations allow us to identify issues and opportunities for improvement such as bottlenecks, sources of waste and root causes of service-level violations.

The starting point for process mining is a collection of records representing every step in the execution of an end-to-end business process, such as an order-to-cash process or a claim-to-resolution process. This collection of records is called an event log. Given an event log, process mining tools provide several analytic capabilities to uncover business process improvement opportunities from different perspectives.

Learn more about the benefits of process mining in our introductory guide “What is Process Mining?“.

The four key capabilities of process mining

Process mining tools such as Apromore provide four main analytics capabilities: automated discovery, conformance checking, performance mining and variant analysis.

Automated process discovery

Automated process discovery techniques take as input an event log and produce an “as-is” model of the business process In Apromore, the discovered model can take the form of a process map or of a BPMN model.

A process map (also called a dependency graph) shows the activities of the process and the transitions between consecutive activities, also known as “directly-follows relations” between activities. The activities (nodes) and the directly-follows relations may be annotated with frequency information.

A BPMN process model shows activities and flows (directly-follows relations) but it also shows different types of gateways, particularly exclusive gateways and parallel gateways. This view allows us to better appreciate the decision points, rework loops and parallel branches of a process.

Automated process discovery techniques also allow us to visualize the social network of a process, in other words, the workers who intervene in the process and the handoffs between these workers.

The automated process discovery capabilities of Apromore are encapsulated in the Process Discoverer plugin, as shown in the following demonstration:

Automated Discovery in Apromore

Conformance checking

Conformance checking is about checking that the executions of a business process recorded in an event log abide to a prescribed or expected process behavior. The input of conformance checking is either a set of compliance rules or a prescribed process model. The output is a list of violations of the compliance rules or a list of deviations with respect to the process model.

For example, a common compliance rule in a purchase-to-pay process is that an invoice cannot be approved, unless the corresponding purchase order has been previously approved. Another recurrent compliance rule is that if an invoice has been approved by a given employee, the corresponding payment must be triggered by a different employee. This latter rule is called the “four-eyes principle”. In this context, the goal of conformance checking is to determine if every execution of the purchase-to-pay process fulfills these and other compliance rules.

The following video demonstrates how to check compliance with business rules in Apromore:

Conformance Checking in Apromore

Performance mining

Performance mining techniques take as input an event log and extract performance analytics of the underlying business process, which help to answer questions such as: “Where are the bottlenecks in the process?”, “Which activities consume the highest amount of effort (processing time)?”, or “How does the process perform when the workload is higher-than-usual?”

Common performance analytics concern the frequency and duration of the various process activities and handovers between activities, e.g. case frequency or average activity duration.

In Apromore, performance analytics can be shown to the user in the form of charts, e.g. via Apromore’s Performance Dashboard, or by “enhancing” a process model automatically discovered from the log via annotations and by color-coding the various elements (e.g. activity in a darker blue color indicates that the activity has been observed more frequently than the others in the log). Performance analytics can also be exported for analysis via third-party business intelligence tools.

The following video demonstrates how to do performance mining in Apromore:

Performance Mining in Apromore

Variant analysis

Variant analysis techniques take as input two or more event logs (corresponding to different variants of the same business process) and produce as output a list of differences. Typically, one of the event logs contains all the cases that end up in a positive outcome according to some criterion, while the other log contains all the cases that end up in a negative outcome. For example, the first log may contain all cases where the customer was satisfied, while the second one contains all cases that led to a complaint. Or the first log may contain all cases where the process completed on time, while the second one contains the delayed cases. Variant analysis techniques help diagnose the reasons why certain executions of a business process (i.e. certain process cases) do not lead to a desirable outcome.

In order to analyze multiple variants of a process, one has to start by extracting the event logs corresponding to each of these variants. In Apromore, this is achieved by means of log filters. Once the logs of the process variants have been extracted, Apromore allows us to compare two or more variants of a business process in three ways:

By means of side-by-side comparison of the process maps or the BPMN process models of these variants. This comparison can be done using the frequency view or the duration view.
By opening both logs simultaneously using the performance dashboard plugin. This allows us to compare the variants with respect to a wide range of performance metrics and charts.
By means of multi-log animation, which means running animation of multiple logs simultaneously on top of the same BPMN model.

The following video demonstrates how to extract and compare multiple variants of a process in Apromore:

Variant analysis in Apromore

Event logs

In order to analyze a business process using process mining techniques, we need to extract an event log from the information system(s) that support the execution of the process. It is possible to extract event logs from almost any enterprise system out there, be it from ERP or CRM systems such as SAP, Dynamics, Salesforce, or ServiceNow, or from vertically specialized systems such as manufacturing execution systems, insurance management systems, hospital management systems, etc.

An event log is a set of event records. Each event record consists of the following attributes:

A reference to the process activity being performed, e.g. ‘Register request’, ‘Examine request, ‘Check ticket’, and so forth.
Additionally, each event recorded in the log must be linkable to a case, via a case identifier, e.g. the order number for an order-to-cash process, or the claim number for a claims handling process.
In order to discover a case, one must be able to order the set of activities recorded in the log according to the time when they occurred. This requires that each event entry must have a completion timestamp, capturing the date and time when a given activity has been completed.
Optionally, additional attributes such as the start timestamp of activity, the resource that performed the activity, or the amount of a request, the geographic area where the request originated from, etc. can be used to obtain more fine-grained insights and to identify different process variants for our analysis.

Time to try yourself

If you wish to replicate by yourself the analysis shown in the video demonstrations of this tutorial, you can sign up for a trial of Apromore Enterprise Edition. Once you sign up, you will find a few sample event logs in the Apromore workspace. The event logs that we used in the above videos can be found in the folder “Example Event Logs”. The event log of the manufacturing process is called Production_Data, while one of the Sepsis patient treatment event logs is called Sepsis Cases. You can also find the Production Data log from here and the Sepsis Cases log from here. You can also find these and other event logs from the public collection of event logs in the IEEE Task Force on Process Mining.

Exercise

In case you would like to get hands-on with process mining, we close this tutorial with an exercise based on the sepsis patient treatment event log. This event log retraces the clinical pathways of 1050 patients treated for sepsis condition in a Dutch hospital.

The Sepsis event log is available in the Apromore Community Edition and in the Apromore Enterprise Edition (trial), under the folder “Example Event Logs”. The file is called Sepsis_Cases. In case you don’t find it, you can download it from here (download this file and then upload it into your Apromore workspace).

To start this exercise, you should split the Sepsis event log into two logs for the purpose of variant analysis:

One log should contain all cases where the patient’s age is 30 years or younger (SEPSIS_young).
The second log should contain all cases where the patient’s age is 65 or above (SEPSIS_old).

Given these two logs, answer the following questions using Apromore:

What is the average case duration (also called cycle time) of each of these two variants of the process?
What are the bottlenecks of each of these two variants? In other words, what are the transitions between tasks with the highest waiting times?
Describe the differences between the frequency and the order in which activities are executed in the two variants. Hint: In the Process Discoverer plugin, consider using the abstraction slider to hide some of the most infrequent arcs so as to make the maps more readable.
In addition to the differences in bottlenecks, and the differences in activity frequencies, what other differences do you observe between these two variants, which might help to explain the observed differences in cycle times?

Thinking about using Apromore in a project?

Get in touch to know more about how you can get started with Apromore, with a Proof-of-Value project.

If you want to learn more about the fascinating world of process mining, you can enroll in a public course on process mining by The University of Melbourne. Alternatively, you can contact us to discuss your corporate training requirements on process mining and Apromore. We have a range of training courses delivered both online and face-to-face.