Dataset Open Access

A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) form Poland related to COVID-19 for open research

Andrzej Jarynowski

Researcher(s)
Daniel Płatek
Work package leader(s)
Małgorzata Stochmal

Social behavior has a fundamental impact on the dynamics of infectious diseases (such as COVID-19), challenging public health mitigation strategies and possibly the political consensus. The widespread use of the traditional and social media on the Internet provides us with an invaluable source of information on societal dynamics during pandemics. With this dataset, we aim to understand mechanisms of COVID-19 epidemic-related social behavior in Poland deploying methods of computational social science and digital epidemiology. We have collected and analyzed COVID-19 perception on the Polish language Internet during 15.01-31.07(06.08) and labeled data quantitatively (Twitter, Youtube, Articles) and qualitatively (Facebook, Articles and Comments of Article) in the Internet by infomediological approach.

- manually labelled1,449 articles / Facebook posts from Lower Silesia (facebook_articles_lower_silesia.zip) and 111 texts from outside this region;

-manually labelled 1000 most popular tweets (twits_annotated.xlsx) with cathegories is_fake (categorical and numeric) topic and sentiment; 

-extracted 57,306 representative articles (articles_till_06_08.zip) in Polish using Eventregitry.org tool in language Polish and topic "Coronavirus" in article body;

- extracted 1,015,199 (tweets_till_31_07_users.zip and tweets_till_31_07_text.zip) and Tweets from #Koronawirus in language Polish using Twitter API.

- collected 1,574 videos (youtube_comments_till_31_07.zip and youtube_movie.csv) with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API;

- We supplemented the media observations with an analysis of 244 social empirical studies till 25.05 on COVID-19 in Poland (empirical_social_studies.csv).

Reports and analyzes and coding books can be found in Polish at: http://www.infodemia-koronawirusa.pl

Main report (in Polish) https://depot.ceon.pl/handle/123456789/19215  

Files (147.7 MB)
Name Size
articles_till_06_08.zip
md5:3838a903a340fa7223e2892a833a2484
11.1 MB Download
empirical_social_studies.csv
md5:5a7f7909df84ec51247b9bd53346888b
94.3 kB Download
facebook_articles_lower_silesia.zip
md5:9f7be5eedbbb3f4b01a859d1b6fd719c
1.2 MB Download
tweets_till_31_07_text.zip
md5:ff73803828c782e4691eda40f4152106
69.8 MB Download
tweets_till_31_07_users.zip
md5:0ddfb4b0b8c0394b650f00f0e790bb5d
25.9 MB Download
twits_annotated.xlsx
md5:7712e830524eff7339007001d8976609
155.4 kB Download
youtube_comments_till_31_07.zip
md5:a6b8f92ac7203a02329dd2f5c78c950d
38.6 MB Download
youtube_movies.csv
md5:bc4fedcad2ed636116b2f6313a73fcbb
923.8 kB Download
  • Jarynowski A, Wójta-Kempa M, Belik V. TRENDS IN PERCEPTION OF COVID-19 IN POLISH INTERNET, Polish Epidemiological Review

  • Jarynowski A, Wójta-Kempa M, Płatek D, Krzowski Ł, Belik V. Spatial Diversity of COVID-19 Cases in Poland Explained by Mobility Patterns - Preliminary Results 2020; http://dx.doi.org/10.2139/ssrn.3621152

1,825
607
views
downloads
All versions This version
Views 1,8251,373
Downloads 607303
Data volume 7.7 GB4.1 GB
Unique views 1,5001,163
Unique downloads 347184

Share

Cite as