Published November 19, 2024 | Version Main
Dataset Restricted

Labeled Datasets for Research on Information Operations

Description

Labeled Datasets for Research on Information Operations

Compliance with Platform Terms

To comply with the platform terms, we ask that you download one data file per researcher, per day. By requesting access, you agree to abid by these rules.

README

19-November-2024
Contact: Observatory on Social Media

Dataset Articles
This dataset is collected and processed according to the paper "Labeled Datasets for Research on Information Operations."

Description
These datasets contain data curated for research on information operations (IO) and includes both labeled IO and control data. The datasets cover 26 verified IO campaigns from various countries and provide comprehensive records of posts from IO accounts alongside control posts from legitimate accounts discussing similar topics during the same periods. The datasets enable the development and benchmarking of IO detection methods by comparing coordinated versus organic accounts.

License
This dataset is available under the Attribution-NonCommercial-NoDerivatives 4.0 International license. If you use this data, please cite the original paper.

Dataset Content
The dataset includes anonymized fields to preserve privacy, and is structured with the following columns:

  • postid: Unique identifier for each post within the dataset.
  • post_text: The textual content of the post. The PII inside post_text such as mentions and URLs are hashed
  • application_name: Hashed version of the name of the application or platform from which the post was made.
  • post_language: Language in which the post was written.
  • in_reply_to_postid: Anonymized ID of the post this entry is replying to, if applicable.
  • in_reply_to_accountid: Anonymized ID of the account the post is replying to, if applicable.
  • post_time: Timestamp indicating when the post was made.
  • accountid: Unique anonymized ID for the account that created the post.
  • account_profile_description: Description provided by the account holder in their profile.
  • follower_count: Number of followers the account had at the time of data collection.
  • following_count: Number of accounts the user was following at the time of data collection.
  • account_creation_date: Date when the account was created.
  • is_repost: Boolean indicator if the post is a repost.
  • reposted_accountid: Anonymized ID of the original account that made the reposted post, if applicable.
  • reposted_postid: Anonymized ID of the original post that was reposted, if applicable.
  • hashtags: Hashtags included in the post content, if any.
  • urls: Hashed URLs shared within the post, if any.
  • account_mentions: Anonymized ID of accounts mentioned within the post, if any.
  • is_control: Boolean indicator marking whether the post is from a control (True) or IO (False) account.

Data for different campaigns are organized in separate versions of this repository, which can be also found below or in the excel file shared.

Campaign Name URL
Armenia https://doi.org/10.5281/zenodo.14141550
Bangladesh https://doi.org/10.5281/zenodo.14188947
Catalonia https://doi.org/10.5281/zenodo.14188959
China_1 https://doi.org/10.5281/zenodo.14188970
China_2 https://doi.org/10.5281/zenodo.14188975
Cuba Part 1 https://doi.org/10.5281/zenodo.14188984
Cuba Part 2 https://doi.org/10.5281/zenodo.14189008
Ecuador https://doi.org/10.5281/zenodo.14189015
Egypt_UAE https://doi.org/10.5281/zenodo.14189018
Ghana_Nigeria https://doi.org/10.5281/zenodo.14189028
Iran_1 https://doi.org/10.5281/zenodo.14189037
Iran_2 https://doi.org/10.5281/zenodo.14189038
Iran_3 https://doi.org/10.5281/zenodo.14189041
Iran_4 https://doi.org/10.5281/zenodo.14189047
Iran_5 https://doi.org/10.5281/zenodo.14189048
Iran_6 https://doi.org/10.5281/zenodo.14189053
Qatar https://doi.org/10.5281/zenodo.14189058
Russia_1 https://doi.org/10.5281/zenodo.14189061
Russia_2 https://doi.org/10.5281/zenodo.14189072
Russia_3 https://doi.org/10.5281/zenodo.14189075
Russia_4 https://doi.org/10.5281/zenodo.14189078
Russia_5 https://doi.org/10.5281/zenodo.14189081
Spain https://doi.org/10.5281/zenodo.14189086
Thailand https://doi.org/10.5281/zenodo.14189095
UAE https://doi.org/10.5281/zenodo.14189098
Venezuela_1 https://doi.org/10.5281/zenodo.14189107
Venezuela_2 https://doi.org/10.5281/zenodo.14189110

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

To comply with the platform terms, we ask that you download one datafile per researcher, per day. When submitting your application below, please confirm your agreement to these terms.

You are currently not logged in. Do you have an account? Log in here

Additional details

Dates

Available
2024-11-19