Master’s Project: More on process mining.

More on Discovery Plug-ins for ProM.

Last week I talked about the discovery plug-ins for the tool ProM, which only take the event log as the input. I showed how to mine the control-flow perspective of a process only by mining the event log, using as example the alpha-algorithm. By doing this, you answer the question “How are the cases actually being executed?”.

These plug-ins are basically process mining algorithms. You give them the event-log as input, and then you ‘mine’ the event log for something. Which means you run the algorithm and it goes through the event log and returns something (for example: petri nets or control-flow perspective), depending on what kind of algorithm it is.

Now there’s other things you can mine with discovery plug-ins. You can mine case-related information about a process. By doing this, you answer the questions “What are the most frequent paths in the process?”, “Are there any loops?”, “Can I simplify the log by abstracting the most frequent paths?”.

You can mine organisational related information about a process. This answers questions like “How many people are involved in a specific case?”, “What is the communication structure and dependencies among people?”, etc.  You mine this with the social-network miner plug-in.

If you just want to know how many people are involved at all the cases in the log, you can simply use the log summary:

I’m still using the same example as the last post, which is the event log of the (one) process of a telephone repair company. You can see the different resources above. We have testers, solvers, and the system. So to answer the question, there’s 12 people involved in all the cases together. You can also use the inspector to inspect specific cases (process instances) at a time and check who was involved.

For the rest of the questions, you can use the social network plug-in. For example, let’s check if there are employees who outperform others. We can find out who is better at fixing defects. We first filter the event log to only show us the relevant tasks on solvers. Then we run the plug-in to mine for a handover-of-work social network. We get this:

The graph shows which employees handed over work to other employees in the cases(process instances). SolverS3 and solverC3 have the best performance. Because the telephones that they fix, always pass the tests, and therefore, are not resent to the repair department, and therefore are not resent to other solvers. That’s why you don’t see arrows going out of these two employees, only arrows coming to them, which means they get work handed over from other solvers. As far as the oval shapes go, taken from the tutorial:

The oval shape of the nodes in the graph visually expresses the relation between the in and out degree of the connections (arrows) between these nodes. A higher proportion of in-going arcs lead to more vertical oval shapes while higher proportions of outgoing arcs produce more horizontal oval shapes. From this remark, can you tell
which employee has more problems to fi x the defects?

To answer the question from the tutorial, It seems that solverS2 and solverS1 have the most difficulty to fix the defects. Because they have the highest outgoing/ingoing degree of connection (both 5/1).

You can also use discovery plug-ins to use temporal logic to verify if the cases in a log  satisfy certain properties. An example to this from the tutorial:

We know that after a try to fix the defect, the telephone should be tested to check if it is indeed repaired.

Thus, we could use the “LTL Checker” plug-in to verify the property: Does the task “Test Repair” always happen after the tasks “Repair (Simple)” and before the task “Archive Repair”?

And so ends the tutorial, which gave us a brief overview on how some discovery plug-ins can be used for mining knowledge related to processes.

Some Literature

Apart from reading the Process Mining Manifesto, which gives you a general introduction to process mining. I’ve also read “Process Mining for the multi-faceted analysis of business processes—A case study in a financial services organization“. As the title says its a case study, so I got to read on the impact of process mining on real organizations.

Aside from the case specific content, the journal/article also talks about broad subjects such as the link between process mining and business intelligence. Process mining can be seen as the link between Business Intelligence and Business Process Management (BPM). Process mining occurs during the diagnosis phase of BPM. Normally, Business Process Analysis (BPA) and Business Activity Monitoring (BAM) are the techniques used on getting statistics on the business process, but process mining provides deeper insight, by going into the exact paths of execution. Process mining discovers, monitors and enhances processes by extracting knowledge from event logs.

Master’s Project: Diving into Process Mining

From now on I will use this blog to also talk about my Master’s project. It’s a project which is necessary to complete my Master in Information Management, worth 15 ECTS out of the 60 for the whole Master. The Master’s thesis is then composed/written out of that project. I’ll try to update this blog as much as I can.

Process Mining

My project is about process mining. What exactly is process mining? Well, according to the process mining manifesto, it is:

 a relatively young research discipline that sits between computational intelligence and data mining on the one hand, and process modeling and analysis on the other hand.

And the Idea of it is:

The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today’s (information) systems.

So it’s about algorithms which are able to extract knowledge from event logs. More simple: process mining are techniques used in order to discover new processes, monitor existing processes and even improve processes.

We’re more accustomed to the idea of data mining, which is extracting knowledge from data warehouses. The focus was always on data until around the 90’s, when things like process re-engineering started emerging, processes started becoming important as well. Process mining is filling the gap between Business Intelligence and Business Process Management.

The tool

Today I was playing around with the tool used for process mining, called ProM. ProM gives you a framework where you can use process mining tools. I also followed an introductory tutorial for using the tool.

ProM enables you to use an event log in order to do various actions using plug-ins. These plug-ins can be categorized into three:

  • Discovery: These plug-ins only take the event log as an input. They answer questions like: How are the cases actually being executed? and are the rules being obeyed?
  • Conformance: These plug-ins check how much the data in the event log matches the prescribed behavior from deployed process models. They help you monitor processes.
  • Extension: Discover information that will enhance the model, they take a model and the event log as input.

A cool example of the plug-ins that I used today is the alpha-algorithm. It mines a petri net out of the event log.  I used the event log which is given in the tutorial, it’s about a process to repair telephones in a company.

In the image above you see the key data on the event log. 1 process, 1000 instances of that process, accumulating 10476 events, a mean of 10 events per instance.

In the image above you see the action tab of ProM, Inputted is the event log I used, then in Actions I searched for the alpha-algorithm, which was installed in a more general plug-in.

Above you see the result of the alpha algorithm. So it actually makes a process model out of the event log, very cool. More detailed: the alpha-algorithm mines the control flow perspective of a process. From the tutorial:

The control- flow perspective of a process establishes the dependencies among its tasks. Which tasks precede which other ones? Are there concurrent tasks? Are there loops? In short, what is the process model that summarizes the flow followed by most/all cases in the log? This information is important because it gives you feedback about how cases are actually being executed in the organization.

So with this model you can check whether in your company, the process is actually being done how you thought it was, or does the model show that things are done differently? So this plug-in falls under the first category, i.e. “Discovery”. It answers the question: How are the cases actually being executed?