Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published June 18, 2018 | Version v1
Conference paper Open

Distributed Learning of Process Models for Next Activity Prediction

  • 1. University of Bari Aldo Moro
  • 2. University of Bari Aldo Moro and Exprivia S.p.A.
  • 3. University of Bari Aldo Moro and Consorzio Interuniversitario Nazionale per l'Informatica (CINI)

Description

Process mining is a research discipline that aims to discover, monitor and improve real processing using event logs. In this paper we tackle the problem of next activity prediction/recommendation via "nested prediction model" learning, that is, we first identify recurrent and frequent sequences of activities and then we learn a prediction model for each frequent sequence. The key principle underlying the design of the proposed solution is in the ability to process massive logs by means of a parallel and distributed solution (by exploiting the Spark parallel computation framework) which can make reasonable decisions in the absence of perfect models. Indeed, given the classical threshold for minimum support and a user-specified error bound, our approach exploits the Chernoff bound to mine "approximate" frequent sequences with statistical error guarantees on their actual supports. Experiments on real-world log data prove the effectiveness of the proposed approach.

Files

short_ distributed-learning-process.pdf

Files (751.4 kB)

Name Size Download all
md5:65ce9ce4d77b5bfcbe5a5b0adaf2d1b3
751.4 kB Preview Download

Additional details

Funding

TOREADOR – TrustwOrthy model-awaRE Analytics Data platfORm 688797
European Commission
MAESTRA – Learning from Massive, Incompletely annotated, and Structured Data 612944
European Commission