00000nmm##2200000uu#4500 4593502 doi 10.5281/zenodo.4593502 oai:zenodo.org:4593502 user-covid-19 user-twitter-datasets Dimitrov, Dimitar TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 2, May 2020) Baran, Erdal url:https://data.gesis.org/tweetscov19/ info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx twitter tweets linked data microblogging RDF csv covid-19 coronavirus <a href="https://data.gesis.org/tweetscov19/">TweetsCOV19</a> is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of <a href="https://data.gesis.org/tweetskb">TweetsKB</a> and aims at capturing online discourse about various aspects of the pandemic and its societal impact. Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions, and resolved URLs are exposed in RDF using established RDF/S vocabularies*. We also provide a tab-separated values (tsv) version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character ("\t"). The following list indicate the feature indices: <ol> <li>Tweet Id: Long.</li> <li>Username: String. Encrypted for privacy issues*.</li> <li>Timestamp: Format ( "EEE MMM dd HH:mm:ss Z yyyy" ).</li> <li>#Followers: Integer.</li> <li>#Friends: Integer.</li> <li>#Retweets: Integer.</li> <li>#Favorites: Integer.</li> <li>Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from <a href="https://github.com/yahoo/FEL">FEL</a> library. Each entity is separated from another entity by char ";". Also, each entity is separated by char ":" in order to store "original_text:annotated_entity:score;". If FEL did not find any entities, we have stored "null;".</li> <li>Sentiment: String. <a href="http://sentistrength.wlv.ac.uk/">SentiStrength</a> produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char " ". Positive sentiment was stored first and then negative sentiment (i.e. "2 -1").</li> <li>Mentions: String. If the tweet contains mentions, we remove the char "@" and concatenate the mentions with whitespace char " ". If no mentions appear, we have stored "null;".</li> <li>Hashtags: String. If the tweet contains hashtags, we remove the char "#" and concatenate the hashtags with whitespace char " ". If no hashtags appear, we have stored "null;".</li> <li>URLs: String: If the tweet contains URLs, we concatenate the URLs using ":-: ". If no URLs appear, we have stored "null;"</li> </ol> To extract the dataset from <a href="https://data.gesis.org/tweetskb">TweetsKB</a>, we compiled a seed list of 268 COVID-19-related <a href="https://data.gesis.org/tweetscov19/keywords.txt">keywords</a>. * For the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets. Zenodo 2021-03-10 user-covid-19 user-twitter-datasets info:eu-repo/semantics/other 20210311002725.0 197659685 md5:4e8fc16a2bea5cd3421578522fb87f22 https://zenodo.org/records/4593502/files/TweetsCOV19_052020.tsv.gz 404722462 md5:e08e4b873841e737cb8cf1835370af4d https://zenodo.org/records/4593502/files/TweetsCOV19_052020.n3.gz open https://data.gesis.org/tweetscov19/ Is documented by url 10.5281/zenodo.4593501 isVersionOf doi