↑ Return to Platform

Canonical format

Apromore’s canonical process format provides a common, unambiguous representation of business processes captured in different notations and/or at different abstraction levels, such that all process models can be treated alike. The idea behind this format is to represent only the structural characteristics of a process model that are common to the majority of modeling languages. Language-specific concepts are omitted because they cannot be meaningfully interpreted when dealing with process models originating from different notations, i.e. when “cross-language” operations have to be performed. Moreover, this canonical format is agnostic to graphical information such as shapes, line thickness and positions, which is contained in a concrete process definition. This information is stored separately in the form of annotations, and only used when a canonical model needs to be presented to the user or converted to a native format.

There are at least five advantages in using a canonical format for the provision of advanced process model repository features:

Standardization: a canonical format makes it possible to standardize software access to process definitions via a set of APIs. This is achieved through the Canonizer service, which allows the various algorithms to work on a common process structure. In this way, cross-language operations such as similarity search and process merging, can be directly performed and concatenated, i.e. without the need to first convert a model into another model’s notation.

Efficiency: avoiding language conversions in turn improves the overall system efficiency. Moreover, canonical elements can be indexed with specific meanings with the purpose to expedite queries, e.g. searching for all clone fragments within the repository. In fact, searching large collections for models having particular properties may be very time consuming. Thus having a single optimized format to avoid on-the-fly ad-hoc conversions is definitely preferable from a performance point of view.

Interchangeability: annotations also capture non-structural aspects of a process model, such as graphical information or process semantics, which can be automatically inferred from a concrete process definition. By organizing these annotations in profiles, a profile inferred from a process model can be applied to another canonical model, and a canonical model can have multiple profiles. In this way it is for example possible to switch between different graphical representations while keeping the same process structure. The same mechanism can also be used to attach process semantics to process structures, in order to perform operations such as soundness analysis.

Reusability: the canonical format is also used as the format for storing business process patterns and industry reference models. On the one hand, this facilitates the execution of those operations that involve such content, e.g. conformance analysis or pattern-based completion. On the other hand, it makes this content virtually available in every process modeling language that is supported by the repository.

Flexibility: the elements of a canonical format are defined through an inheritance mechanism such that at the highest abstraction level a process is simply seen as a directed, attributed graph. This allows algorithms to treat process models at different levels of granularity, depending on the type of operation required by the user.

Without a common process format, a variant of each algorithm would need to be implemented for every (new) process modeling language. Moreover, ad-hoc conversions from one language to another would need to be put in place, to allow cross-language operations such as comparisons and merges.


The meta-model of the canonical process format is defined using the UML class diagram shown in the thumbnail below (click on it to see the full picture).

A CanonicalProcess is a container for a set of Nets, ResourceTypes and Objects. Each Net is a directed, attributed graph made up of Nodes and Edges, and represents a process or a subprocess. The top process is indicated as root, while all other Nets are marked as subnets. Nodes can be of type Routing or Work, while Edges represent links between Nodes. Routing nodes capture all elements of a process model which are used for routing purposes (i.e. no work is performed from a business perspective), and as such they have more than one incoming edge and/or more than one outgoing edge. They can be Splits (ORSplit for inclusive data-driven choice, XORSplit for exclusive data-driven choice and ANDSplit for parallel branching), Joins (ORJoin for synchronizing merge, XORJoin for simple merge and ANDJoin for synchronization), and States (to indicate the state before an event-driven decision is made or soon after a merge). Splits have one incoming edge and multiple outgoing edges, Joins have multiple incoming edges and one outgoing edge, States can have multiple incoming and outgoing edges. The conditions upon which an (X)ORSplit choice is made, must be specified via the attribute condition of each Edge leaving the (X)ORSplit. Also, one such an Edge can be marked as default to indicate the default branch to be chosen if the conditions associated with all other Edges leaving the same Split evaluate to false.

Different from Routing nodes, Work nodes capture those elements of a process which are relevant from a business perspective. Work nodes have at most one incoming edge and one outgoing edge and can be partitioned into Tasks and Events. A Task node models a process element which actively performs some work as part of a process, e.g. preparing an invoice or processing a message. Task nodes can be atomic, or compound if they enclose a net describing their behavior. The enclosed net is indicated as subnet. Events are used to signal the beginning or the end of a process, or to signal something that has happened during a process execution. Events can be specialized into Message events to capture a message being sent or receipt, and Time events to capture e.g. a timeout or a delay.

Work nodes can be associated with one or more ResourceTypes and Objects. Each ResouceType captures a class of organizational resources participating in the process, i.e. a group of concrete resources rather than the resources themselves. These can be Human, e.g. a position or role in an organization, or Nonhuman, e.g. an information system or equipment. For instance, the Human ResourceType “Finance Officer” may refer to the set of persons of an organization with role Finance Officer. ResourceTypes can have one or more specializations, e.g. “Finance Officer” may be specialized in “Senior Finance Officer” and “Junior Finance Officer”. This relation is transitive and antisymmetric, and typically indicates a separation of duties. Each association between a Work node and a ResourceType indicates that a resource of that ResourceType is required to carry out the Work node. Therefore, a Work node associated with the same ResourceType n times, means that n resources of that ResourceType are required to carry out the given Work node (e.g. this captures the concept of teamwork for human resources, i.e. a set of persons all working on the same task). The association between Work nodes and ResourceTypes can specify a qualifier to indicate the status a given ResourceType takes when performing the associated Work node, e.g. only one person of all the persons with role “Finance Officer” associated with Work node “Prepare invoice” is qualified as Responsible person. The association between Work node and ResourceType can be optional to indicate that the work may be performed without involving the specific resource (see the attribute optional of resourceTypeRef).

Objects capture organizational business objects that are involved in the process. These can be physical artifacts, e.g. a paper-based invoice (Hard object) or information artifacts, e.g. a file or variable representing an electronic invoice (Soft object). For the latter, the type of the object must be specified, e.g. the file extension or variable type. Objects can be associated with a Work node via an input relation if they are utilized by the Work node, and/or via an output relation if they are produced by the Work node. These relations correspond to read/write operations in the case of Soft objects. An object used as both input and output of a Work node indicates that the object is updated, e.g. an invoice is filled-out or a variable changes its content. Moreover, input objects can be marked as consumed if they are destroyed while being used by a Work node. Similar to ResourceTypes, the association between Objects and Work nodes can also be tagged optional to capture a situation where the Work node may be performed without using or producing the specific object.

Nodes, ResourceTypes and Objects can be configurable. This is denoted by their optional attribute configurable. A node’s configuration options are indicated through annotations outside a canonical representation. Configuration is an important aspect for large model repositories. However, given the diversity of languages and concepts, the configuration mechanism itself is not part of the canonical process format.