The Clarity Software Documentation Dataset

Anonymous Authors

doi:10.5281/zenodo.5822884

Published January 5, 2022 | Version 1.0

Dataset Open

The Clarity Software Documentation Dataset

Anonymous Authors¹

1. Anonymous

This repository holds the Clarity Dataset which is a companion to the SANER'22 entitled "An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation". The dataset consists of 45,998 captions 10,204 GUI screenshots and xml metadata files (akin to the "html" for stipulating GUIs) of Android applications. The NL captions were obtained from human labelers, underwent several quality control mechanisms, and contain both high- (screen-level) and low-(component) level descriptions of screen functionality. This dataset is meant as a new source of data to augment techniques for software documentation that can take advantage of the rich pixel-based information contained within screenshots.

Files

Clarity-Data.zip

Files (12.3 GB)

Name	Size	Download all
Clarity-Data.zip md5:b0017a0ed1495c5942c33835889172fa	12.3 GB	Preview Download
README.md md5:34d826f7f0a64d475eb88d4e6736294a	1.5 kB	Preview Download

794

Views

189

Downloads

Show more details

	All versions	This version
Views	794	536
Downloads	189	161
Data volume	1.8 TB	1.4 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 5, 2022
Modified: January 6, 2022

The Clarity Software Documentation Dataset

Creators

Description

Files

Clarity-Data.zip

Files (12.3 GB)