Published May 14, 2025 | Version v2
Dataset Open

MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

Creators

Description

We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube
 
In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content (text, images, and videos), providing a rich labeled dataset for in-depth analysis.
 
To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.
 

Usage Notice

This dataset includes four annotation files:
 
• reddit_anno_publish.csv
• tiktok_anno_publish.csv
• twitter_anno_publish.csv
• youtube_anno_publish.csv
 
Each file contains post IDs and corresponding annotations on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes.
 
To protect user privacy, only post IDs are released. We recommend retrieving the full post content via the official APIs of each platform, in accordance with their respective terms of service.
- Reddit API (https://www.reddit.com/dev/api)  
- TikTok API (https://developers.tiktok.com/products/research-api)  
- X/Twitter API (https://developer.x.com/en/docs/x-api)  
- YouTube API (https://developers.google.com/youtube/v3)
 

Humanitarian Classes

Each post is annotated with seven binary humanitarian classes. For each class, the label is either:
 
• True – the post contains this humanitarian information
• False – the post does not contain this information
 
These seven humanitarian classes include:
 
• Casualty: The post reports people or animals who are killed, injured, or missing during the hurricane.
• Evacuation: The post describes the evacuation, relocation, rescue, or displacement of individuals or animals due to the hurricane.
• Damage: The post reports damage to infrastructure or public utilities caused by the hurricane.
• Advice: The post provides advice, guidance, or suggestions related to hurricanes, including how to stay safe, protect property, or prepare for the disaster.
• Request: Request for help, support, or resources due to the hurricane
• Assistance: This includes both physical aid and emotional or psychological support provided by individuals, communities, or organizations.
• Recovery: The post describes efforts or activities related to the recovery and rebuilding process after the hurricane. 
 
Note: A single post may be labeled as True for multiple humanitarian categories.
 

Bias Classes

Each post is annotated with five binary bias classes. For each class, the label is either:
 
• True – the post contains this bias information
• False – the post does not contain this information
 
These five bias classes include:
 
• Linguistic Bias: The post contains biased, inappropriate, or offensive language, with a focus on word choice, tone, or expression.
• Political Bias: The post expresses political ideology, showing favor or disapproval toward specific political actors, parties, or policies.
• Gender Bias: The post contains biased, stereotypical, or discriminatory language or viewpoints related to gender. 
• Hate Speech: The post contains language that expresses hatred, hostility, or dehumanization toward a specific group or individual, especially those belonging to minority or marginalized communities.
• Racial Bias: The post contains biased, discriminatory, or stereotypical statements directed toward one or more racial or ethnic groups. 
 
Note: A single post may be labeled as True for multiple bias categories.
 

Information Integrity Classes

Each post is also annotated with a single information integrity class, represented by an integer:
 
• -1 → False information (i.e., misinformation or disinformation)
• 0 → Unverifiable information (unclear or lacking sufficient evidence)
• 1 → True information (verifiable and accurate)
 

Key Notes

  1. Version 1 is no longer available. 

Files

reddit_anno_publish.csv

Files (8.9 MB)

Name Size Download all
md5:ba748b6cdec10c3f0d31a0b81fc2968a
790.4 kB Preview Download
md5:9e870621d00f795160298d90ab6c099a
4.4 MB Preview Download
md5:d93fe0d657a588a358e966acc821f265
3.6 MB Preview Download
md5:acaed8ebbdb49e3cbba6c90ddeb464d2
146.3 kB Preview Download