Dataset Open Access

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020)

Baran, Erdal; Dimitrov, Dimitar


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.4593502</identifier>
  <creators>
    <creator>
      <creatorName>Baran, Erdal</creatorName>
      <givenName>Erdal</givenName>
      <familyName>Baran</familyName>
    </creator>
    <creator>
      <creatorName>Dimitrov, Dimitar</creatorName>
      <givenName>Dimitar</givenName>
      <familyName>Dimitrov</familyName>
    </creator>
  </creators>
  <titles>
    <title>TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020)</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2021</publicationYear>
  <subjects>
    <subject>twitter</subject>
    <subject>tweets</subject>
    <subject>linked data</subject>
    <subject>microblogging</subject>
    <subject>RDF</subject>
    <subject>csv</subject>
    <subject>covid-19</subject>
    <subject>coronavirus</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2021-03-10</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4593502</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsDocumentedBy" resourceTypeGeneral="Dataset">https://data.gesis.org/tweetscov19/</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4593501</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/covid-19</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/twitter-datasets</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;&lt;a href="https://data.gesis.org/tweetscov19/"&gt;TweetsCOV19&lt;/a&gt;&lt;/strong&gt;&lt;strong&gt; &lt;/strong&gt;is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of &lt;a href="https://data.gesis.org/tweetskb"&gt;TweetsKB&lt;/a&gt; and aims at capturing online discourse about various aspects of the pandemic and its societal impact. &lt;strong&gt;Metadata&lt;/strong&gt; information about the tweets as well as extracted &lt;strong&gt;entities&lt;/strong&gt;, &lt;strong&gt;sentiments&lt;/strong&gt;, &lt;strong&gt;hashtags&lt;/strong&gt;, &lt;strong&gt;user mentions&lt;/strong&gt;, and &lt;strong&gt;resolved URLs &lt;/strong&gt;are exposed in RDF using established RDF/S vocabularies*.&lt;/p&gt;

&lt;p&gt;We also provide a &lt;em&gt;&lt;strong&gt;tab-separated values (tsv)&lt;/strong&gt;&lt;/em&gt; version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character (&amp;quot;\t&amp;quot;). The following list indicate the feature indices:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Tweet Id: Long.&lt;/li&gt;
	&lt;li&gt;Username: String. Encrypted for privacy issues*.&lt;/li&gt;
	&lt;li&gt;Timestamp: Format ( &amp;quot;EEE MMM dd HH:mm:ss Z yyyy&amp;quot; ).&lt;/li&gt;
	&lt;li&gt;#Followers: Integer.&lt;/li&gt;
	&lt;li&gt;#Friends: Integer.&lt;/li&gt;
	&lt;li&gt;#Retweets: Integer.&lt;/li&gt;
	&lt;li&gt;#Favorites: Integer.&lt;/li&gt;
	&lt;li&gt;Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from &lt;a href="https://github.com/yahoo/FEL"&gt;FEL&lt;/a&gt; library. Each entity is separated from another entity by char &amp;quot;;&amp;quot;. Also, each entity is separated by char &amp;quot;:&amp;quot; in order to store &amp;quot;original_text:annotated_entity:score;&amp;quot;. If FEL did not find any entities, we have stored &amp;quot;null;&amp;quot;.&lt;/li&gt;
	&lt;li&gt;Sentiment: String. &lt;a href="http://sentistrength.wlv.ac.uk/"&gt;SentiStrength&lt;/a&gt; produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char &amp;quot; &amp;quot;. Positive sentiment was stored first and then negative sentiment (i.e. &amp;quot;2 -1&amp;quot;).&lt;/li&gt;
	&lt;li&gt;Mentions: String. If the tweet contains mentions, we remove the char &amp;quot;@&amp;quot; and concatenate the mentions with whitespace char &amp;quot; &amp;quot;. If no mentions appear, we have stored &amp;quot;null;&amp;quot;.&lt;/li&gt;
	&lt;li&gt;Hashtags: String. If the tweet contains hashtags, we remove the char &amp;quot;#&amp;quot; and concatenate the hashtags with whitespace char &amp;quot; &amp;quot;. If no hashtags appear, we have stored &amp;quot;null;&amp;quot;.&lt;/li&gt;
	&lt;li&gt;URLs: String: If the tweet contains URLs, we concatenate the URLs using &amp;quot;:-: &amp;quot;. If no URLs appear, we have stored &amp;quot;null;&amp;quot;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To extract the dataset from &lt;a href="https://data.gesis.org/tweetskb"&gt;TweetsKB&lt;/a&gt;, we compiled a seed list of 268 COVID-19-related &lt;a href="https://data.gesis.org/tweetscov19/keywords.txt"&gt;keywords&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;* For the sake of privacy, we anonymize&amp;nbsp;user IDs&amp;nbsp;and we do not provide the text of the tweets.&lt;/em&gt;&lt;/p&gt;</description>
  </descriptions>
</resource>
1,078
353
views
downloads
All versions This version
Views 1,0781,078
Downloads 353353
Data volume 88.2 GB88.2 GB
Unique views 1,0281,028
Unique downloads 273273

Share

Cite as