Dataset Open Access

Japanese COVID-19 Tweets from 2020-01-17 to 2020-04-30 (40,720,545 tweets and 105,317,606 retweets)

Toriumi, Fujio; Sakaki, Takeshi; Yoshida, Mitsuo


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Toriumi, Fujio</dc:creator>
  <dc:creator>Sakaki, Takeshi</dc:creator>
  <dc:creator>Yoshida, Mitsuo</dc:creator>
  <dc:date>2020-07-01</dc:date>
  <dc:description>Abstract (our paper)

The spread of COVID-19, the so-called new coronavirus, is currently having an enormous social and economic impact on the entire world. Under such a circumstance, the spread of information about the new coronavirus on SNS is having a significant impact on economic losses and social decision-making. In this study, we investigated how the new type of coronavirus has become a social topic in Japan, and how it has been discussed. In order to determine what kind of impact it had on people, we collected and analyzed Japanese tweets containing words related to the new corona on Twitter. First, we analyzed the bias of users who tweeted. As a result, it is clear that the bias of users who tweeted about the new coronavirus almost disappeared after February 28, 2020, when the new coronavirus landed in Japan and a state of emergency was declared in Hokkaido, and the new corona became a popular topic. Second, we analyzed the emotional words included in tweets to analyze how people feel about the new coronavirus. The results show that the occurrence of a particular social event can change the emotions expressed on social media.

Data

Tweets_YYYY-MM-DD.tsv.gz:
The first column is the tweet id, the second column is the date and time (JST) when the tweet was posted, the third column is the flag as to whether the tweet was used for emotion analysis or not, and the fourth column is the tweet id of the retweet source.
This data was collected by giving the query "新型肺炎 OR 武漢 OR コロナ OR ウイルス OR ウィルス" to the Twitter Search API. Therefore, most of the tweets are Japanese tweets.
We conducted emotion analysis on tweets, excluding retweets and tweets containing links. The fourth column is empty if the tweet is not a retweet.

KL-Divergence.tsv.gz:
The first column is the date (JST), and the second column is the value of KL-Divergence that calculated the bias of the users who posted tweets related to COVID-19.
The value of KL-Divergence was calculated with all users appearing in Tweets_YYYY-MM-DD.tsv.gz. Based on the sampling stream data, we determined that if the value is below 0.6, there is no bias.

Emotions_by_ML-Ask.tsv.gz:
The first column is the date (JST), the second and subsequent columns are the number of tweets for each emotion, and the last column is the number of tweets analyzed for the day.
For this analysis, we only used tweets with a value of 1 in the third column of Tweets_YYYY-MM-DD.tsv.gz. We used pymlask (Python implementation of ML-Ask) to estimate the emotion of the tweet.

Publication

This data set was created for our study. If you make use of this data set, please cite:
Fujio Toriumi, Takeshi Sakaki, Mitsuo Yoshida. Social Emotions Under the Spread of COVID-19 Using Social Media. Transactions of the Japanese Society for Artificial Intelligence (in Japanese). vol.35, no.4, pp.F-K45_1-7, 2020.
鳥海不二夫, 榊剛史, 吉田光男. ソーシャルメディアを用いた新型コロナ禍における感情変化の分析. 人工知能学会論文誌. vol.35, no.4, pp.F-K45_1-7, 2020.
https://doi.org/10.1527/tjsai.F-K45</dc:description>
  <dc:identifier>https://zenodo.org/record/3892867</dc:identifier>
  <dc:identifier>10.5281/zenodo.3892867</dc:identifier>
  <dc:identifier>oai:zenodo.org:3892867</dc:identifier>
  <dc:relation>doi:10.5281/zenodo.3892866</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/covid-19</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/zenodo</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/publicdomain/zero/1.0/legalcode</dc:rights>
  <dc:subject>Information diffusion</dc:subject>
  <dc:subject>Social emotions</dc:subject>
  <dc:subject>SNS analysis</dc:subject>
  <dc:subject>COVID-19</dc:subject>
  <dc:title>Japanese COVID-19 Tweets from 2020-01-17 to 2020-04-30 (40,720,545 tweets and 105,317,606 retweets)</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
1,300
1,278
views
downloads
All versions This version
Views 1,3001,298
Downloads 1,2781,278
Data volume 20.7 GB20.7 GB
Unique views 1,1971,195
Unique downloads 154154

Share

Cite as