Dataset Open Access

Japanese COVID-19 Tweets from 2020-01-17 to 2020-04-30 (40,720,545 tweets and 105,317,606 retweets)

Toriumi, Fujio; Sakaki, Takeshi; Yoshida, Mitsuo

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3892867", 
  "author": [
      "family": "Toriumi, Fujio"
      "family": "Sakaki, Takeshi"
      "family": "Yoshida, Mitsuo"
  "issued": {
    "date-parts": [
  "abstract": "<p><strong>Abstract</strong> (our paper)</p>\n\n<p>The spread of COVID-19, the so-called new coronavirus, is currently having an enormous social and economic impact on the entire world. Under such a circumstance, the spread of information about the new coronavirus on SNS is having a significant impact on economic losses and social decision-making. In this study, we investigated how the new type of coronavirus has become a social topic in Japan, and how it has been discussed. In order to determine what kind of impact it had on people, we collected and analyzed Japanese tweets containing words related to the new corona on Twitter. First, we analyzed the bias of users who tweeted. As a result, it is clear that the bias of users who tweeted about the new coronavirus almost disappeared after February 28, 2020, when the new coronavirus landed in Japan and a state of emergency was declared in Hokkaido, and the new corona became a popular topic. Second, we analyzed the emotional words included in tweets to analyze how people feel about the new coronavirus. The results show that the occurrence of a particular social event can change the emotions expressed on social media.</p>\n\n<p><strong>Data</strong></p>\n\n<p>Tweets_YYYY-MM-DD.tsv.gz:<br>\nThe first column is the tweet id, the second column is the date and time (JST) when the tweet was posted, the third column is the flag as to whether the tweet was used for emotion analysis or not, and the fourth column is the tweet id of the retweet source.<br>\nThis data was collected by giving the query &quot;\u65b0\u578b\u80ba\u708e OR \u6b66\u6f22 OR \u30b3\u30ed\u30ca OR \u30a6\u30a4\u30eb\u30b9 OR \u30a6\u30a3\u30eb\u30b9&quot; to the Twitter Search API. Therefore, most of the tweets are Japanese tweets.<br>\nWe conducted emotion analysis on tweets, excluding retweets and tweets containing links. The fourth column is empty if the tweet is not a retweet.</p>\n\n<p>KL-Divergence.tsv.gz:<br>\nThe first column is the date (JST), and the second column is the value of KL-Divergence that calculated the bias of the users who posted tweets related to COVID-19.<br>\nThe value of KL-Divergence was calculated with all users appearing in Tweets_YYYY-MM-DD.tsv.gz. Based on the sampling stream data, we determined that if the value is below 0.6, there is no bias.</p>\n\n<p>Emotions_by_ML-Ask.tsv.gz:<br>\nThe first column is the date (JST), the second and subsequent columns are the number of tweets for each emotion, and the last column is the number of tweets analyzed for the day.<br>\nFor this analysis, we only used tweets with a value of 1 in the third column of Tweets_YYYY-MM-DD.tsv.gz. We used <a href=\"\">pymlask</a> (Python implementation of <a href=\"\">ML-Ask</a>) to estimate the emotion of the tweet.</p>\n\n<p><strong>Publication</strong></p>\n\n<p>This data set was created for our study. If you make use of this data set, please cite:<br>\nFujio Toriumi, Takeshi Sakaki, Mitsuo Yoshida. Social Emotions Under the Spread of COVID-19 Using Social Media. <em>Transactions of the Japanese Society for Artificial Intelligence (in Japanese)</em>. vol.35, no.4, pp.F-K45_1-7, 2020.<br>\n\u9ce5\u6d77\u4e0d\u4e8c\u592b, \u698a\u525b\u53f2, \u5409\u7530\u5149\u7537. \u30bd\u30fc\u30b7\u30e3\u30eb\u30e1\u30c7\u30a3\u30a2\u3092\u7528\u3044\u305f\u65b0\u578b\u30b3\u30ed\u30ca\u798d\u306b\u304a\u3051\u308b\u611f\u60c5\u5909\u5316\u306e\u5206\u6790. <em>\u4eba\u5de5\u77e5\u80fd\u5b66\u4f1a\u8ad6\u6587\u8a8c</em>. vol.35, no.4, pp.F-K45_1-7, 2020.<br>\n<a href=\"\"></a></p>", 
  "title": "Japanese COVID-19 Tweets from 2020-01-17 to 2020-04-30 (40,720,545 tweets and 105,317,606 retweets)", 
  "type": "dataset", 
  "id": "3892867"
All versions This version
Views 1,3001,298
Downloads 1,2781,278
Data volume 20.7 GB20.7 GB
Unique views 1,1971,195
Unique downloads 154154


Cite as