Labeled Datasets for Research on Information Operations
Creators
Description
Labeled Datasets for Research on Information Operations
Compliance with Platform Terms
To comply with the platform terms, we ask that you download one data file per researcher, per day. By requesting access, you agree to abid by these rules.
README
19-November-2024
Contact: Observatory on Social Media
Dataset Articles
This dataset is collected and processed according to the paper "Labeled Datasets for Research on Information Operations."
Description
These datasets contain data curated for research on information operations (IO) and includes both labeled IO and control data. The datasets cover 26 verified IO campaigns from various countries and provide comprehensive records of posts from IO accounts alongside control posts from legitimate accounts discussing similar topics during the same periods. The datasets enable the development and benchmarking of IO detection methods by comparing coordinated versus organic accounts.
License
This dataset is available under the Attribution-NonCommercial-NoDerivatives 4.0 International license. If you use this data, please cite the original paper.
Dataset Content
The dataset includes anonymized fields to preserve privacy, and is structured with the following columns:
- postid: Unique identifier for each post within the dataset.
- post_text: The textual content of the post. The PII inside post_text such as mentions and URLs are hashed
- application_name: Hashed version of the name of the application or platform from which the post was made.
- post_language: Language in which the post was written.
- in_reply_to_postid: Anonymized ID of the post this entry is replying to, if applicable.
- in_reply_to_accountid: Anonymized ID of the account the post is replying to, if applicable.
- post_time: Timestamp indicating when the post was made.
- accountid: Unique anonymized ID for the account that created the post.
- account_profile_description: Description provided by the account holder in their profile.
- follower_count: Number of followers the account had at the time of data collection.
- following_count: Number of accounts the user was following at the time of data collection.
- account_creation_date: Date when the account was created.
- is_repost: Boolean indicator if the post is a repost.
- reposted_accountid: Anonymized ID of the original account that made the reposted post, if applicable.
- reposted_postid: Anonymized ID of the original post that was reposted, if applicable.
- hashtags: Hashtags included in the post content, if any.
- urls: Hashed URLs shared within the post, if any.
- account_mentions: Anonymized ID of accounts mentioned within the post, if any.
- is_control: Boolean indicator marking whether the post is from a control (True) or IO (False) account.
Data for different campaigns are organized in separate versions of this repository, which can be also found below or in the excel file shared.
Files
Additional details
Dates
- Available
-
2024-11-19