Process Mining 101

This introductory tutorial on process mining offers a tour of the four main capabilities of process mining, and illustrates these using Apromore. The tutorial lasts approximately 90 minutes.


Today’s organizations heavily rely on enterprise systems to support their business operations, including enterprise resource planning (ERP) system, which support production, delivery, and payroll processes for example, and customer relationship management (CRM) systems, which support marketing, sales, and customer service processes. These systems record a vast amount of valuable data which can be analyzed to reveal useful information to support business operations.

Process mining is a family of techniques to analyze the performance and conformance of business processes based on data produced during their execution. Process mining allows us to extract events recorded by standardized enterprise systems, such as ERP or CRM systems, as well as industry-specific systems, such as a claims management systems (insurance industry), hospital management systems (healthcare) or student management systems (education). For example, every time a purchase order arrives, or every time a service is delivered, an event is recorded in an ERP system. Process mining techniques take as input large collections of such events, which we call event logs. Given such event logs, process mining techniques allow us to discover and analyze the process in order to identify sources of inefficiencies and defects, and to drill down into their root causes.

Introducing process mining (Copyright 2015, Queensland University of Technology).

Benefits of process mining

Process mining techniques complement tactical business intelligence dashboards. While tactical BI dashboards allow managers and analysts to get an aggregate picture of the health of a business process, process mining techniques allow us to dig deeper. Specifically, they allow us:

  • to understand how the process is actually being executed (“what tasks are performed and in what order?”),
  • to decorticate the performance of a process down to the level of individual tasks, resources, and handoffs (e.g. “what’s the most congested process passage?”, “what’s the average waiting time between these two activities?”),
  • to differentiate between process variants (“why certain low-value loan applications take longer than others to be processed?”),
  • to characterise deviations from policies/regulations and understand the root causes of such deviations (e.g. “how frequently are invoices released without being approved, and why?”), and
  • to predict future process performance and outcomes as processes unfold (“will claim 123 be handled within 5 days, as per our SLA?”, “will loan offer 456 be rejected by the client?”). 

Process mining allows analysts to extract a wide range of insights. In particular, they allow analysts to:

  • identify root causes explaining why certain process variants have better/worse performance than others
  • confirm or refute long-held beliefs about the behavior of a process
  • identify problem areas in existing business processes, such as anomalies and bottlenecks.
  • uncover violations of compliance rules, or other deviations with respect to the expected behavior of the process, which might be having an impact on the performance of the process.

The four key capabilities of process mining

Process mining tools such as Apromore provide four main analytics capabilities: automated discovery, conformance checking, performance mining and variant analysis.

The four main capabilities of process mining

The four main capabilities of process mining (Copyright 2015, Queensland University of Technology)

Automated process discovery techniques take as input an event log and produce an “as-is” model of the business process In Apromore, the discovered model can take the form of a process map or of a BPMN model.

A process map (also called a dependency graph) show the activities of the process and the transitions between consecutive activities, also known as “directly-follows relations” between activities. The activities (nodes) and the directly-follows relations may be annotated with frequency information.

A BPMN process model shows activities and flows (directly-follows relations) but it also shows different types of gateways, particularly exclusive gateways and parallel gateways. This view allows us to better appreciate the decision points, rework loops and parallel branches of a process.

Automated process discovery techniques also allow us to visualize social network of a process, in other words, the workers who intervene in the process and the handoffs between these workers.

The automated process discovery capabilities of Apromore are encapsulated in the Process Discoverer plugin, as shown in the following demonstration.

Automated process discovery in Apromore

Conformance checking is about checking that the executions of a business process recorded in an event log abide to a prescribed or expected process behavior. The input of conformance checking is either a set of compliance rules or a prescribed process model. The output is a list of violations of the compliance rules or a list of deviations with respect to the process model.

For example, a common compliance rule in a purchase-to-pay process is that an invoice cannot be approved, unless the corresponding purchase order has been previously approved. Another recurrent compliance rule is that if an invoice has been approved by a given employee, the corresponding payment must be triggered by a different employee. This latter rule is called the “four-eyes principle”. In this context, the goal of conformance checking is to determine if every execution of the purchase-to-pay process fulfills these and other compliance rules.

The following video demonstrates how to check compliance with business rules in Apromore.

Conformance checking in Apromore

Performance mining techniques take as input an event log and extract performance analytics of the underlying business process, which help to answer questions such as: “Where are the bottlenecks in the process?”, “Which activities consume the highest amount of effort (processing time)?”, or “How do the process performs when the workload is higher-than-usual?”

Common performance analytics concern the frequency and duration of the various process activities and handovers between activities, e.g. case frequency or average activity duration.

In Apromore, performance analytics can be shown to the user in the form of charts, e.g. via Apromore’s Performance Dashboard, or by “enhancing” a process model automatically discovered from the log via annotations and by color-coding the various elements (e.g. an activity in a darker blue color indicates that the activity has been observed more frequently than the others in the log). Performance analytics can also be exported for analysis via third-party business intelligence tools.

The following video demonstrates how to do performance mining in Apromore.

Performance mining in Apromore

Variant analysis techniques take as input two or more event logs (corresponding to different variants of the same business process) and produce as output a list of differences. Typically, one of the event logs contains all the cases that end up in a positive outcome according to some criterion, while the other log contains all the cases that end up in a negative outcome. For example, the first log may contain all cases where the customer was satisfied, while the second one contains all cases that led to a complaint. Or the first log may contain all cases where the process completed on time, while the second one contains the delayed cases. Variant analysis techniques help diagnose the reasons why certain executions of a business process (i.e. certain process cases) do not lead to a desirable outcome.

In order to analyze multiple variants of a process , one has to start by extracting the event logs corresponding to each of these variants. In Apromore, this is achieved by means of log filters. Once the logs of the process variants have been extracted, Apromore allows us to compare two or more variants of a business process in three ways:

  • By means of side-by-side comparison of the process maps or the BPMN process models of these variants. This comparison can be done using the frequency view or the duration view.
  • By opening both logs simultaneously using the performance dashboard plugin. This allows us to compare the variants with respect to a wide range of performance metrics and charts.
  • By means of multi-log animation, which means running an animation of multiple logs simultaneously on top of the same BPMN model.

The following video demonstrates how to extract and compare multiple variants of a process in Apromore.

Business process variant analysis in Apromore

Event logs

In order to analyze a business process using process mining techniques, we need to extract an event log from the information system(s) that support the execution of the process. It is possible to extract event logs from almost any enterprise system out there, be it from ERP or CRM systems such as SAP, Dynamics, Salesforce, or ServiceNow, or from vertically specialized systems such as manufacturing execution systems, insurance management systems, hospital management systems, etc.

Attributes of an event log. The attributes marked in red are mandatory.

An event log is a set of event records. Each event record consists of the following attributes:

  • A reference to the process activity being performed, e.g. ‘Register request’, ‘Examine request, ‘Check ticket’, and so forth.
  • Additionally, each event recorded in the log must be linkable to a case, via a case identifier, e.g. the order number for an order-to-cash process, or the claim number for a claims handling process.
  • In order to discover a case, one must be able to order the set of activities recorded in the log according to the time when they occurred. This requires that each event entry must have a completion timestamp, capturing the date and time when a given activity has been completed.
  • Optionally, additional attributes such as the start timestamp of an activity, the resource that performed the activity, or the amount of a request, the geographic area where the request originated from, etc. can be used to obtain more fine-grained insights and to identify different process variants for our analysis.

Structure of an event log (Copyright 2015, Queensland University of Technology)

Time to try yourself

If you wish to replicate by yourself the analysis shown in the video demonstrations of this tutorial, you can sign up for a trial of Apromore Enterprise Edition. Once you sign up, you will find a few sample event logs in the Apromore workspace. The event logs that we used in the above videos can be found in the folder “Example Event Logs”. The event log of the manufacturing process is called Production_Data, while the one of the Sepsis patient treatment event log is called Sepsis Cases. You can also find the Production Data log from here and the Sepsis Cases log from here. You can also find these and other event logs from the public collection of event logs in the IEEE Task Force on Process Mining.


In case you would like to get hands-on with process mining, we close this tutorial with an exercise based on the sepsis patient treatment event log. This event log retraces the clinical pathways of 1050 patients treated for sepsis condition in a Dutch hospital.

The Sepsis event log is available in the Apromore Community Edition and in the Apromore Enterprise Edition (trial), under folder “Example Event Logs”. The file is called Sepsis_Cases. In case you don’t find it, you can download it from here (download this file and then upload it into your Apromore workspace).

To start this exercise, you should split the Sepsis event log into two logs for the purpose of variant analysis:

  • One log should contain all cases where the patient’s age is 30 years or younger (SEPSIS_young).
  • The second log should contain all cases where the patient’s age is 65 or above (SEPSIS_old).

Given these two logs, answer the following questions using Apromore:

  • What is the average case duration (also called cycle time) of each of these two variants of the process?
  • What are the bottlenecks of each of these two variants? In other words, what are the transitions between tasks with the highest waiting times?
  • Describe the differences between the frequency and the order in which activities are executed in the two variants. Hint: In the Process Discoverer plugin, consider using the abstraction slider to hide some of the most infrequent arcs so as to make the maps more readable.
  • In addition to the differences in bottlenecks, and the differences in activity frequencies, what other differences do you observe between these two variants, which might help to explain the observed differences in cycle times?

Thinking about using Apromore in a project?

Get in touch to know more about how you can get started with Apromore, with a Proof-of-Value project.

If you want to learn more about the fascinating world of process mining, you can enroll in a public course on process mining by The University of Melbourne. Alternatively, you can contact us to discuss your corporate training requirements on process mining and Apromore. We have a range of training courses delivered both online and face-to-face.

Do you have any questions?
Ask now!

Book an interactive demo and
find out how we can help.

Sign up for a free
30-day trial.