Triangulation and Utilization of multimodal data for fake news detection with social constructs
Description
To overcome the limitations found in many existing fake news datasets, which often analyze either news content or social media posts in isolation, we present a comprehensive, triangulated dataset that systematically interlinks four essential components: original news articles, social media posts, multimedia content, and veracity labels. The original news articles are sourced from NELA-GT as well as mainstream media outlets, providing a foundational layer of factual reporting. These articles are paired with their corresponding social media derivatives, which include posts from platforms such as Twitter and Reddit, along with extensive metadata like engagement statistics and bot-likelihood scores. Multimedia content, including both images and videos, is incorporated from datasets like FakeNewsNet to allow for visual misinformation analysis. Veracity labels are curated through fact-checked claims provided by the TruthSeekers repository, ensuring each instance is associated with a trusted assessment of truthfulness.
The resulting dataset contains 158,400 meticulously aligned instances, encompassing a rich array of modalities such as text, image data, social interaction context, and temporal metadata. The alignment of these diverse data points is achieved through a multi-tiered method. This includes URL and keyword matching using Levenshtein distance thresholds (<5), semantic filtering via BERTScore (>0.85), and multimodal validation using CLIP similarity scores (>0.7). These techniques collectively ensure high-confidence matching across modalities.
Compared to existing datasets such as FakeNewsNet and LIAR, our triangulated dataset offers several critical advantages. It uniquely includes social context features like bot scores and retweet graphs, supports multimodal pairings of text, images, and social media posts, and allows for provenance tracking by comparing original and manipulated versions of content. For instance, it enables detailed tracing of how a legitimate BBC article titled “Climate Accord Signed” may be repurposed into a misleading viral tweet like “Politicians FAKED climate deal!” accompanied by doctored images. This level of integration provides researchers with a powerful tool to study the lifecycle and mutation of fake news across platforms and modalities.
Files
FND_Triangulations.ipynb
Files
(179.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4f31273b9dd1733bb5f4034708e700c8
|
913 Bytes | Download |
|
md5:fe4c5999dbe6ac5904cf45c4c8ec4552
|
90 Bytes | Download |
|
md5:396cac2668d1c40147c8ecf9ad3f5384
|
521 Bytes | Download |
|
md5:5784bbc431d975b2459d284ae9cee744
|
10.3 kB | Preview Download |
|
md5:280774425e36437a403778bfcfd75982
|
918 Bytes | Download |
|
md5:eab94ff82e39e04893af621bab86b403
|
68 Bytes | Download |
|
md5:65d265b711f2c72d4db7667ac785976b
|
4.7 kB | Preview Download |
|
md5:42c3b700271e6e6f0f8b7a938da4c6fd
|
348 Bytes | Preview Download |
|
md5:36fd514097907c18576d51f56a2e862d
|
119 Bytes | Preview Download |
|
md5:9ee93975608d833cfe1b31c6686cf5e3
|
714 Bytes | Preview Download |
|
md5:ab10ad7559436ddf03e56ecb15bed109
|
426 Bytes | Preview Download |
|
md5:00ca98488e4bea374a2397a58be4f164
|
160.2 kB | Preview Download |
Additional details
Software
- Development Status
- Active