Dataset Open Access

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020)

Baran, Erdal; Dimitrov, Dimitar


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/748603fe-bde3-4d77-81c4-b68807d4c6d3/TweetsCOV19_052020.n3.gz"
      }, 
      "checksum": "md5:e08e4b873841e737cb8cf1835370af4d", 
      "bucket": "748603fe-bde3-4d77-81c4-b68807d4c6d3", 
      "key": "TweetsCOV19_052020.n3.gz", 
      "type": "gz", 
      "size": 404722462
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/748603fe-bde3-4d77-81c4-b68807d4c6d3/TweetsCOV19_052020.tsv.gz"
      }, 
      "checksum": "md5:4e8fc16a2bea5cd3421578522fb87f22", 
      "bucket": "748603fe-bde3-4d77-81c4-b68807d4c6d3", 
      "key": "TweetsCOV19_052020.tsv.gz", 
      "type": "gz", 
      "size": 197659685
    }
  ], 
  "owners": [
    102132
  ], 
  "doi": "10.5281/zenodo.4593502", 
  "stats": {
    "version_unique_downloads": 273.0, 
    "unique_views": 1028.0, 
    "views": 1078.0, 
    "version_views": 1078.0, 
    "unique_downloads": 273.0, 
    "version_unique_views": 1028.0, 
    "volume": 88202455958.0, 
    "version_downloads": 353.0, 
    "downloads": 353.0, 
    "version_volume": 88202455958.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.4593502", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4593501", 
    "bucket": "https://zenodo.org/api/files/748603fe-bde3-4d77-81c4-b68807d4c6d3", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4593501.svg", 
    "html": "https://zenodo.org/record/4593502", 
    "latest_html": "https://zenodo.org/record/4593502", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4593502.svg", 
    "latest": "https://zenodo.org/api/records/4593502"
  }, 
  "conceptdoi": "10.5281/zenodo.4593501", 
  "created": "2021-03-10T15:11:35.241964+00:00", 
  "updated": "2021-03-11T00:27:25.465187+00:00", 
  "conceptrecid": "4593501", 
  "revision": 4, 
  "id": 4593502, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.4593502", 
    "description": "<p><strong><a href=\"https://data.gesis.org/tweetscov19/\">TweetsCOV19</a></strong><strong> </strong>is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of <a href=\"https://data.gesis.org/tweetskb\">TweetsKB</a> and aims at capturing online discourse about various aspects of the pandemic and its societal impact. <strong>Metadata</strong> information about the tweets as well as extracted <strong>entities</strong>, <strong>sentiments</strong>, <strong>hashtags</strong>, <strong>user mentions</strong>, and <strong>resolved URLs </strong>are exposed in RDF using established RDF/S vocabularies*.</p>\n\n<p>We also provide a <em><strong>tab-separated values (tsv)</strong></em> version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character (&quot;\\t&quot;). The following list indicate the feature indices:</p>\n\n<ol>\n\t<li>Tweet Id: Long.</li>\n\t<li>Username: String. Encrypted for privacy issues*.</li>\n\t<li>Timestamp: Format ( &quot;EEE MMM dd HH:mm:ss Z yyyy&quot; ).</li>\n\t<li>#Followers: Integer.</li>\n\t<li>#Friends: Integer.</li>\n\t<li>#Retweets: Integer.</li>\n\t<li>#Favorites: Integer.</li>\n\t<li>Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from <a href=\"https://github.com/yahoo/FEL\">FEL</a> library. Each entity is separated from another entity by char &quot;;&quot;. Also, each entity is separated by char &quot;:&quot; in order to store &quot;original_text:annotated_entity:score;&quot;. If FEL did not find any entities, we have stored &quot;null;&quot;.</li>\n\t<li>Sentiment: String. <a href=\"http://sentistrength.wlv.ac.uk/\">SentiStrength</a> produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char &quot; &quot;. Positive sentiment was stored first and then negative sentiment (i.e. &quot;2 -1&quot;).</li>\n\t<li>Mentions: String. If the tweet contains mentions, we remove the char &quot;@&quot; and concatenate the mentions with whitespace char &quot; &quot;. If no mentions appear, we have stored &quot;null;&quot;.</li>\n\t<li>Hashtags: String. If the tweet contains hashtags, we remove the char &quot;#&quot; and concatenate the hashtags with whitespace char &quot; &quot;. If no hashtags appear, we have stored &quot;null;&quot;.</li>\n\t<li>URLs: String: If the tweet contains URLs, we concatenate the URLs using &quot;:-: &quot;. If no URLs appear, we have stored &quot;null;&quot;</li>\n</ol>\n\n<p>To extract the dataset from <a href=\"https://data.gesis.org/tweetskb\">TweetsKB</a>, we compiled a seed list of 268 COVID-19-related <a href=\"https://data.gesis.org/tweetscov19/keywords.txt\">keywords</a>.</p>\n\n<p><em>* For the sake of privacy, we anonymize&nbsp;user IDs&nbsp;and we do not provide the text of the tweets.</em></p>", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "title": "TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020)", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4593501"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4593502"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "covid-19"
      }, 
      {
        "id": "twitter-datasets"
      }
    ], 
    "keywords": [
      "twitter", 
      "tweets", 
      "linked data", 
      "microblogging", 
      "RDF", 
      "csv", 
      "covid-19", 
      "coronavirus"
    ], 
    "publication_date": "2021-03-10", 
    "creators": [
      {
        "name": "Baran, Erdal"
      }, 
      {
        "name": "Dimitrov, Dimitar"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "url", 
        "identifier": "https://data.gesis.org/tweetscov19/", 
        "relation": "isDocumentedBy", 
        "resource_type": "dataset"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4593501", 
        "relation": "isVersionOf"
      }
    ]
  }
}
1,078
353
views
downloads
All versions This version
Views 1,0781,078
Downloads 353353
Data volume 88.2 GB88.2 GB
Unique views 1,0281,028
Unique downloads 273273

Share

Cite as