Published August 26, 2024 | Version v1
Dataset Open

Code and data for "ActivityGen: Extracting Enabled Activities from Screenshots"

Description

The code and data for the paper "ActivityGen: Extracting Enabled Activities from Screenshots" is provided here. 

Abstract

Many tasks in organizations are performed in a desktop environment. It is possible to record users' interactions in a desktop environment by taking screenshots when an action happens. The result is an interaction log. By considering the associated images of a record, it is possible to detect which activity was performed and which activities were enabled. This information can be extracted, resulting in a translucent event log. Such a translucent event log is valuable and can be used as input for dedicated process-mining techniques. The results can be used to analyze human-computer interactions or create bots for robotic process automation. However, current techniques for extracting information on enabled activities rely on template matching, which is rigid and sensitive to variations. To solve this issue, we present our modular framework, ActivityGen. ActivityGen detects and labels graphical user interface elements by also considering additional information. ActivityGen uses more advanced techniques to overcome the limitations of previous approaches and can extract information without a user's input. Furthermore, it can be adjusted to a user's needs. It detects graphical user interface elements more accurately than state-of-the-art techniques and labels them faster, more robust, and more domain-oriented than state-of-the-art techniques.

Data

ReDraw_CLS and ReDraw_ViSM are specified in the work. 

The basis for ReDraw_CLS is the ReDraw dataset. We focus on the following components: Button, CheckBox, EditText, Image, ImageButton (which we refer to as icon), RadioButton, and Switch. We noticed that the examples of ImageView and ImageButton are similar, primarily consisting of icon images. Therefore, we removed the ImageView class and introduced an Image class instead. The Image class contains images from the validation set of the Coco validation set 2017 and the YouTube Thumbnails dataset, enabling the detection of general website images.

ReDraw_ViSM iterates add 6,000 synthetically created buttons to the former dataset by distributing them in the same ratio into train, test, and validation sets.

lm_basic and lm_extended contain the text training for the language models. 

Code

The code allows for the execution of ActivityGen. Moreover, we provide our evaluation scripts. However, the models do not have to be trained. The models' weights are provided in the model folder.

Files

Code.zip

Files (4.0 GB)

Name Size Download all
md5:a91394003ad08e99ef750fd2d970dfb7
553.6 MB Preview Download
md5:bf84d5a83251878dc4d6c54c4134c19e
229.4 kB Preview Download
md5:31a90d1971eab0cd2ae7ee3996927ff1
161.8 kB Preview Download
md5:5f8eaa5b7b9f04f3a1df2c56c8718a7d
1.7 GB Preview Download
md5:ed25055799b859fcc4c4cb7a33bb749b
1.7 GB Preview Download

Additional details

Software

Programming language
Python