Dataset Open Access

The Clarity Software Documentation Dataset

Anonymous Authors

JSON-LD ( Export

  "description": "<p>This repository holds the Clarity Dataset which is a companion to the SANER&#39;22 entitled &quot;An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation&quot;. The dataset consists of 45,998 captions&nbsp;10,204 GUI screenshots and xml metadata files (akin to the &quot;html&quot; for stipulating GUIs)&nbsp;of Android applications.&nbsp;The NL captions were obtained from human labelers, underwent several quality control mechanisms, and contain both high- (screen-level) and low-(component)&nbsp;level descriptions of screen functionality. This dataset is meant as a new source of data to augment techniques for software documentation that can take advantage of the rich pixel-based information contained within screenshots.</p>", 
  "license": "", 
  "creator": [
      "affiliation": "Anonymous", 
      "@type": "Person", 
      "name": "Anonymous Authors"
  "url": "", 
  "datePublished": "2022-01-05", 
  "version": "1.0", 
  "keywords": [
    "Software Documentation", 
  "@context": "", 
  "distribution": [
      "contentUrl": "", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
      "contentUrl": "", 
      "encodingFormat": "md", 
      "@type": "DataDownload"
  "identifier": "", 
  "@id": "", 
  "@type": "Dataset", 
  "name": "The Clarity Software Documentation Dataset"
All versions This version
Views 8271
Downloads 1815
Data volume 123.3 GB86.3 GB
Unique views 6860
Unique downloads 1411


Cite as