Dataset Open Access

The Clarity Software Documentation Dataset

Anonymous Authors

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Anonymous Authors</dc:creator>
  <dc:description>This repository holds the Clarity Dataset which is a companion to the SANER'22 entitled "An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation". The dataset consists of 45,998 captions 10,204 GUI screenshots and xml metadata files (akin to the "html" for stipulating GUIs) of Android applications. The NL captions were obtained from human labelers, underwent several quality control mechanisms, and contain both high- (screen-level) and low-(component) level descriptions of screen functionality. This dataset is meant as a new source of data to augment techniques for software documentation that can take advantage of the rich pixel-based information contained within screenshots.</dc:description>
  <dc:subject>Software Documentation</dc:subject>
  <dc:title>The Clarity Software Documentation Dataset</dc:title>
All versions This version
Views 8271
Downloads 1815
Data volume 123.3 GB86.3 GB
Unique views 6860
Unique downloads 1411


Cite as