Dataset Open Access
Baran, Erdal; Dimitrov, Dimitar
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">twitter</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">tweets</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">linked data</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">microblogging</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">RDF</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">csv</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">covid-19</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">coronavirus</subfield> </datafield> <controlfield tag="005">20210311002725.0</controlfield> <controlfield tag="001">4593502</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="a">Dimitrov, Dimitar</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">404722462</subfield> <subfield code="z">md5:e08e4b873841e737cb8cf1835370af4d</subfield> <subfield code="u">https://zenodo.org/record/4593502/files/TweetsCOV19_052020.n3.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">197659685</subfield> <subfield code="z">md5:4e8fc16a2bea5cd3421578522fb87f22</subfield> <subfield code="u">https://zenodo.org/record/4593502/files/TweetsCOV19_052020.tsv.gz</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2021-03-10</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-covid-19</subfield> <subfield code="p">user-twitter-datasets</subfield> <subfield code="o">oai:zenodo.org:4593502</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">Baran, Erdal</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020)</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-covid-19</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-twitter-datasets</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p><strong><a href="https://data.gesis.org/tweetscov19/">TweetsCOV19</a></strong><strong> </strong>is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of <a href="https://data.gesis.org/tweetskb">TweetsKB</a> and aims at capturing online discourse about various aspects of the pandemic and its societal impact. <strong>Metadata</strong> information about the tweets as well as extracted <strong>entities</strong>, <strong>sentiments</strong>, <strong>hashtags</strong>, <strong>user mentions</strong>, and <strong>resolved URLs </strong>are exposed in RDF using established RDF/S vocabularies*.</p> <p>We also provide a <em><strong>tab-separated values (tsv)</strong></em> version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character (&quot;\t&quot;). The following list indicate the feature indices:</p> <ol> <li>Tweet Id: Long.</li> <li>Username: String. Encrypted for privacy issues*.</li> <li>Timestamp: Format ( &quot;EEE MMM dd HH:mm:ss Z yyyy&quot; ).</li> <li>#Followers: Integer.</li> <li>#Friends: Integer.</li> <li>#Retweets: Integer.</li> <li>#Favorites: Integer.</li> <li>Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from <a href="https://github.com/yahoo/FEL">FEL</a> library. Each entity is separated from another entity by char &quot;;&quot;. Also, each entity is separated by char &quot;:&quot; in order to store &quot;original_text:annotated_entity:score;&quot;. If FEL did not find any entities, we have stored &quot;null;&quot;.</li> <li>Sentiment: String. <a href="http://sentistrength.wlv.ac.uk/">SentiStrength</a> produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char &quot; &quot;. Positive sentiment was stored first and then negative sentiment (i.e. &quot;2 -1&quot;).</li> <li>Mentions: String. If the tweet contains mentions, we remove the char &quot;@&quot; and concatenate the mentions with whitespace char &quot; &quot;. If no mentions appear, we have stored &quot;null;&quot;.</li> <li>Hashtags: String. If the tweet contains hashtags, we remove the char &quot;#&quot; and concatenate the hashtags with whitespace char &quot; &quot;. If no hashtags appear, we have stored &quot;null;&quot;.</li> <li>URLs: String: If the tweet contains URLs, we concatenate the URLs using &quot;:-: &quot;. If no URLs appear, we have stored &quot;null;&quot;</li> </ol> <p>To extract the dataset from <a href="https://data.gesis.org/tweetskb">TweetsKB</a>, we compiled a seed list of 268 COVID-19-related <a href="https://data.gesis.org/tweetscov19/keywords.txt">keywords</a>.</p> <p><em>* For the sake of privacy, we anonymize&nbsp;user IDs&nbsp;and we do not provide the text of the tweets.</em></p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">url</subfield> <subfield code="i">isDocumentedBy</subfield> <subfield code="a">https://data.gesis.org/tweetscov19/</subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.4593501</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.4593502</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 1,078 | 1,078 |
Downloads | 353 | 353 |
Data volume | 88.2 GB | 88.2 GB |
Unique views | 1,028 | 1,028 |
Unique downloads | 273 | 273 |