There is a newer version of the record available.

Published May 12, 2022 | Version 1
Dataset Open

Twitter Conversations about the COVID-19 Omicron Variant: A Large Scale Dataset of more than 500,000 Tweets

  • 1. University of Cincinnati

Description

Please cite the following paper when using this dataset:

N. Thakur and C.Y. Han, “An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection,” Preprints, 2022, DOI: 10.20944/preprints202205.0238.v2

Abstract

This open-access dataset is one of the salient contributions of the above-mentioned paper. It presents a total of 537,702 Tweet IDs of the same number of Tweets about the SARS-CoV-2 Omicron Variant posted on Twitter since the first detected case of this variant on November 24, 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The Tweet IDs are presented in 7 different .txt files based on the timelines of the associated tweets. The following table provides the details of these dataset files. The data collection followed a keyword-based approach and tweets comprising the "omicron" keyword were filtered, collected, and added to this dataset. 

Filename

No. of Tweet IDs

Date Range of the Tweet IDs

TweetIDs_November.txt

17271

November 24, 2021 to November 30, 2021

TweetIDs_December.txt

101393

December 1, 2021 to December 31, 2021

TweetIDs_January.txt

95055

January 1, 2022 to January 31, 2022

TweetIDs_February.txt

91571

February 1, 2022 to February 28, 2022

TweetIDs_March.txt

100787

March 1, 2022 to March 31, 2022

TweetIDs_April.txt

94409

April 1, 2022 to April 20, 2022

TweetIDs_May.txt

37216

May 1, 2022 to May 12, 2022

 

 

 

 

 

 

 

 

 

 

 

 

In the above table, the last date for May is May 12 as it was the most recent date at the time of data collection and dataset upload. The dataset would be updated soon to incorporate more recent tweets.

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Files

TweetIDs_April.txt

Files (11.3 MB)

Name Size Download all
md5:bdeed27af3e00e5f3530af4447e119b6
2.0 MB Preview Download
md5:bb0ce09222e7d04ff0a979210cbe332f
2.1 MB Preview Download
md5:e2c74a45f854b43882489bc4bfa4401f
1.9 MB Preview Download
md5:b5793e434c880480cf015e4dca32551e
2.0 MB Preview Download
md5:6292bc79c58547a5892def6393da05b9
2.1 MB Preview Download
md5:3ccec04bcef7179c6f5dd5a70e7e968c
781.5 kB Preview Download
md5:bb839faa7cdf1b573a803331601aa222
362.7 kB Preview Download