Dataset Open Access

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 3, June 2020 - December 2020)

Baran, Erdal; Dimitrov, Dimitar


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/0a1cd686-f5e1-4cb0-a55f-e47af50dbde4/TweetsCOV19_062020_122020.n3.gz"
      }, 
      "checksum": "md5:2b40fbbc223f1e2fa497c3987f15dfdc", 
      "bucket": "0a1cd686-f5e1-4cb0-a55f-e47af50dbde4", 
      "key": "TweetsCOV19_062020_122020.n3.gz", 
      "type": "gz", 
      "size": 2063924726
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/0a1cd686-f5e1-4cb0-a55f-e47af50dbde4/TweetsCOV19_062020_122020.tsv.gz"
      }, 
      "checksum": "md5:9d3b2f63d3d8c9d3a3898360543efcdc", 
      "bucket": "0a1cd686-f5e1-4cb0-a55f-e47af50dbde4", 
      "key": "TweetsCOV19_062020_122020.tsv.gz", 
      "type": "gz", 
      "size": 1003769666
    }
  ], 
  "owners": [
    102132
  ], 
  "doi": "10.5281/zenodo.4593524", 
  "stats": {
    "version_unique_downloads": 252.0, 
    "unique_views": 1168.0, 
    "views": 1237.0, 
    "version_views": 1237.0, 
    "unique_downloads": 252.0, 
    "version_unique_views": 1168.0, 
    "volume": 430233108118.0, 
    "version_downloads": 323.0, 
    "downloads": 323.0, 
    "version_volume": 430233108118.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.4593524", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4593523", 
    "bucket": "https://zenodo.org/api/files/0a1cd686-f5e1-4cb0-a55f-e47af50dbde4", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4593523.svg", 
    "html": "https://zenodo.org/record/4593524", 
    "latest_html": "https://zenodo.org/record/4593524", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4593524.svg", 
    "latest": "https://zenodo.org/api/records/4593524"
  }, 
  "conceptdoi": "10.5281/zenodo.4593523", 
  "created": "2021-03-10T15:12:24.222549+00:00", 
  "updated": "2021-03-24T23:16:28.446518+00:00", 
  "conceptrecid": "4593523", 
  "revision": 5, 
  "id": 4593524, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.4593524", 
    "description": "<p><strong><a href=\"https://data.gesis.org/tweetscov19/\">TweetsCOV19</a></strong><strong> </strong>is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of <a href=\"https://data.gesis.org/tweetskb\">TweetsKB</a> and aims at capturing online discourse about various aspects of the pandemic and its societal impact. <strong>Metadata</strong> information about the tweets as well as extracted <strong>entities</strong>, <strong>sentiments</strong>, <strong>hashtags</strong>, <strong>user mentions</strong>, and <strong>resolved URLs </strong>are exposed in RDF using established RDF/S vocabularies*.</p>\n\n<p>We also provide a <em><strong>tab-separated values (tsv)</strong></em> version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character (&quot;\\t&quot;). The following list indicate the feature indices:</p>\n\n<ol>\n\t<li>Tweet Id: Long.</li>\n\t<li>Username: String. Encrypted for privacy issues*.</li>\n\t<li>Timestamp: Format ( &quot;EEE MMM dd HH:mm:ss Z yyyy&quot; ).</li>\n\t<li>#Followers: Integer.</li>\n\t<li>#Friends: Integer.</li>\n\t<li>#Retweets: Integer.</li>\n\t<li>#Favorites: Integer.</li>\n\t<li>Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from <a href=\"https://github.com/yahoo/FEL\">FEL</a> library. Each entity is separated from another entity by char &quot;;&quot;. Also, each entity is separated by char &quot;:&quot; in order to store &quot;original_text:annotated_entity:score;&quot;. If FEL did not find any entities, we have stored &quot;null;&quot;.</li>\n\t<li>Sentiment: String. <a href=\"http://sentistrength.wlv.ac.uk/\">SentiStrength</a> produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char &quot; &quot;. Positive sentiment was stored first and then negative sentiment (i.e. &quot;2 -1&quot;).</li>\n\t<li>Mentions: String. If the tweet contains mentions, we remove the char &quot;@&quot; and concatenate the mentions with whitespace char &quot; &quot;. If no mentions appear, we have stored &quot;null;&quot;.</li>\n\t<li>Hashtags: String. If the tweet contains hashtags, we remove the char &quot;#&quot; and concatenate the hashtags with whitespace char &quot; &quot;. If no hashtags appear, we have stored &quot;null;&quot;.</li>\n\t<li>URLs: String: If the tweet contains URLs, we concatenate the URLs using &quot;:-: &quot;. If no URLs appear, we have stored &quot;null;&quot;</li>\n</ol>\n\n<p>To extract the dataset from <a href=\"https://data.gesis.org/tweetskb\">TweetsKB</a>, we compiled a seed list of 268 COVID-19-related <a href=\"https://data.gesis.org/tweetscov19/keywords_v1.1.txt\">keywords</a>.</p>\n\n<p><em>* For the sake of privacy, we anonymize&nbsp;user IDs&nbsp;and we do not provide the text of the tweets.</em></p>", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "title": "TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 3, June 2020 - December 2020)", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4593523"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4593524"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "covid-19"
      }, 
      {
        "id": "twitter-datasets"
      }
    ], 
    "keywords": [
      "twitter", 
      "tweets", 
      "linked data", 
      "microblogging", 
      "RDF", 
      "csv", 
      "covid-19", 
      "coronavirus"
    ], 
    "publication_date": "2021-03-10", 
    "creators": [
      {
        "name": "Baran, Erdal"
      }, 
      {
        "name": "Dimitrov, Dimitar"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "url", 
        "identifier": "https://data.gesis.org/tweetscov19/", 
        "relation": "isDocumentedBy", 
        "resource_type": "dataset"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4593523", 
        "relation": "isVersionOf"
      }
    ]
  }
}
1,237
323
views
downloads
All versions This version
Views 1,2371,237
Downloads 323323
Data volume 430.2 GB430.2 GB
Unique views 1,1681,168
Unique downloads 252252

Share

Cite as