Distributed Learning of Process Models for Next Activity Prediction
- 1. University of Bari Aldo Moro
- 2. University of Bari Aldo Moro and Exprivia S.p.A.
- 3. University of Bari Aldo Moro and Consorzio Interuniversitario Nazionale per l'Informatica (CINI)
Description
Process mining is a research discipline that aims to discover, monitor and improve real processing using event logs. In this paper we tackle the problem of next activity prediction/recommendation via "nested prediction model" learning, that is, we first identify recurrent and frequent sequences of activities and then we learn a prediction model for each frequent sequence. The key principle underlying the design of the proposed solution is in the ability to process massive logs by means of a parallel and distributed solution (by exploiting the Spark parallel computation framework) which can make reasonable decisions in the absence of perfect models. Indeed, given the classical threshold for minimum support and a user-specified error bound, our approach exploits the Chernoff bound to mine "approximate" frequent sequences with statistical error guarantees on their actual supports. Experiments on real-world log data prove the effectiveness of the proposed approach.
Files
short_ distributed-learning-process.pdf
Files
(751.4 kB)
Name | Size | Download all |
---|---|---|
md5:65ce9ce4d77b5bfcbe5a5b0adaf2d1b3
|
751.4 kB | Preview Download |