Published May 7, 2025 | Version v1
Dataset Open

Triangulation and Utilization of multimodal data for fake news detection with social constructs

Authors/Creators

  • 1. ROR icon National Textile University

Contributors

Contact person:

  • 1. ROR icon National Textile University

Description

To overcome the limitations found in many existing fake news datasets, which often analyze either news content or social media posts in isolation, we present a comprehensive, triangulated dataset that systematically interlinks four essential components: original news articles, social media posts, multimedia content, and veracity labels. The original news articles are sourced from NELA-GT as well as mainstream media outlets, providing a foundational layer of factual reporting. These articles are paired with their corresponding social media derivatives, which include posts from platforms such as Twitter and Reddit, along with extensive metadata like engagement statistics and bot-likelihood scores. Multimedia content, including both images and videos, is incorporated from datasets like FakeNewsNet to allow for visual misinformation analysis. Veracity labels are curated through fact-checked claims provided by the TruthSeekers repository, ensuring each instance is associated with a trusted assessment of truthfulness.

The resulting dataset contains 158,400 meticulously aligned instances, encompassing a rich array of modalities such as text, image data, social interaction context, and temporal metadata. The alignment of these diverse data points is achieved through a multi-tiered method. This includes URL and keyword matching using Levenshtein distance thresholds (<5), semantic filtering via BERTScore (>0.85), and multimodal validation using CLIP similarity scores (>0.7). These techniques collectively ensure high-confidence matching across modalities.

Compared to existing datasets such as FakeNewsNet and LIAR, our triangulated dataset offers several critical advantages. It uniquely includes social context features like bot scores and retweet graphs, supports multimodal pairings of text, images, and social media posts, and allows for provenance tracking by comparing original and manipulated versions of content. For instance, it enables detailed tracing of how a legitimate BBC article titled “Climate Accord Signed” may be repurposed into a misleading viral tweet like “Politicians FAKED climate deal!” accompanied by doctored images. This level of integration provides researchers with a powerful tool to study the lifecycle and mutation of fake news across platforms and modalities.

Files

FND_Triangulations.ipynb

Files (179.3 kB)

Name Size Download all
md5:4f31273b9dd1733bb5f4034708e700c8
913 Bytes Download
md5:fe4c5999dbe6ac5904cf45c4c8ec4552
90 Bytes Download
md5:396cac2668d1c40147c8ecf9ad3f5384
521 Bytes Download
md5:5784bbc431d975b2459d284ae9cee744
10.3 kB Preview Download
md5:280774425e36437a403778bfcfd75982
918 Bytes Download
md5:eab94ff82e39e04893af621bab86b403
68 Bytes Download
md5:65d265b711f2c72d4db7667ac785976b
4.7 kB Preview Download
md5:42c3b700271e6e6f0f8b7a938da4c6fd
348 Bytes Preview Download
md5:36fd514097907c18576d51f56a2e862d
119 Bytes Preview Download
md5:9ee93975608d833cfe1b31c6686cf5e3
714 Bytes Preview Download
md5:ab10ad7559436ddf03e56ecb15bed109
426 Bytes Preview Download
md5:00ca98488e4bea374a2397a58be4f164
160.2 kB Preview Download

Additional details

Software

Development Status
Active