{
  "DOI": "10.5281/zenodo.3724807",
  "abstract": "We used the YouTube Data API\u00a0to augment the\u00a0YouTube 8M\u00a0corpus by crawling a variety of meta data for the videos.\n\n\nFirst point of interest was the \"video resource,\"\u00a0which comprises data about the video, such as the video\u2019s title, description, uploader name, tags, view count, and more. Also included in the meta data is whether comments have been left for the video. If so, we downloaded them as well, including information about their authors, likes, dislikes, and responses.\n\n\nThere is no property which specifies a video\u2019s\u00a0language, since this information is not mandatory when uploading a video. Also, the API provides only information about the available captions, but not the captions themselves. Only the uploader of a video is given access to its captions via the API; we extracted them using youtube-dl.\u00a0For each video, all manually created captions were downloaded, and auto-generated captions in the \"default\"\u00a0language and English. The \"default\"\u00a0auto-generated caption gives perhaps the only hint at a video\u2019s original language.\n\n\nFinally, we downloaded all thumbnails used to advertise a video, which are not available via the API, but only via a canonical URL. Our corpus provides the possibility to recreate the way a video is presented on YouTube (meta data and thumbnail), what the actual content is ((sub)titles and descriptions), and how its viewers reacted (comments).\n\nIf you use this dataset in your publication, please cite the dataset as outlined in the right column.",
  "author": [
    {
      "family": "Jiani Qu"
    },
    {
      "family": "Anny Marleen Hi\u00dfbach"
    },
    {
      "family": "Tim Gollub"
    },
    {
      "family": "Martin Potthast"
    }
  ],
  "event": "The Sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP)",
  "event_place": "Zurich",
  "id": "3724807",
  "issued": {
    "date-parts": [
      [
        "2018",
        "07",
        "05"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "Webis YouTube 8M Augmented 2018",
  "type": "dataset"
}