Dataset Open Access

Artificial datasets for online Declare discovery

Burattin, Andrea


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Burattin, Andrea</dc:creator>
  <dc:date>2015-03-16</dc:date>
  <dc:description>This file contains two datasets.

1. Periodical Sudden Drifts

For this case study, we have generated two synthetic logs (\(\mathcal{L}_1\) and \(\mathcal{L}_2\)) by modeling two variants of the insurance claim process described in [1] in CPN Tools and by simulating the models. \(\mathcal{L}_1\) contains 14,840 events and \(\mathcal{L}_2\) contains 16,438 events.

We merged the logs (eight alternations of \(\mathcal{L}_1\) and \(\mathcal{L}_2\)) using the Stream Package in ProM (the source code of the package is publicly available at https://svn.win.tue.nl/repos/prom/Packages/Stream/Trunk). The same package has been used to transform the resulting log into an event stream. The event stream contains 250,224 events and has several sudden concept drifts (one for every switch from \(\mathcal{L}_1\) to \(\mathcal{L}_2\)).

2. Gradual Drifts

We have considered two variants of the insurance claim process described in [1], \(\mathcal{M}_1'\) (with 21 activities) and \(\mathcal{M}_2'\) (with 19 activities). We have also designed 6 additional models \(\mathcal{M}_a,\dots, \mathcal{M}_f\) to represent the intermediate steps to go from \(\mathcal{M}_1'\) to \(\mathcal{M}_2'\). Therefore, \(\mathcal{M}_1'\) and \(\mathcal{M}_a\) are very similar and the same happens for \(\mathcal{M}_a\) compared to \(\mathcal{M}_b\), for \(\mathcal{M}_b\) compared to \(\mathcal{M}_c\), and so on. We have simulated these models generating 8 logs (\(\mathcal{L}_1', \mathcal{L}_a, \dots,\mathcal{L}_f, \mathcal{L}_2'\)). \(\mathcal{L}_1'\) contains 139,938 events, \(\mathcal{L}_2'\) contains 128,696 events and \(\mathcal{L}_a,\dots,\mathcal{L}_f\) contain 77,231 events (altogether).

Using the Stream Package, we have generated an event stream containing 345,865 events.

 

References


	R. J. C. Bose, “Process Mining in the Large: Preprocessing, Discovery, and Diagnostics,” Ph.D. dissertation, Eindhoven University of Technology, 2012.
</dc:description>
  <dc:identifier>https://zenodo.org/record/19187</dc:identifier>
  <dc:identifier>10.5281/zenodo.19187</dc:identifier>
  <dc:identifier>oai:zenodo.org:19187</dc:identifier>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/publicdomain/zero/1.0/legalcode</dc:rights>
  <dc:subject>process mining</dc:subject>
  <dc:subject>event log</dc:subject>
  <dc:subject>event stream</dc:subject>
  <dc:subject>artificial dataset</dc:subject>
  <dc:title>Artificial datasets for online Declare discovery</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
162
14
views
downloads
All versions This version
Views 162162
Downloads 1414
Data volume 112.1 MB112.1 MB
Unique views 153153
Unique downloads 1414

Share

Cite as