Published November 14, 2025 | Version v1
Dataset Open

Taxonomy Construction of Factual Claims from Social Media

  • 1. ROR icon The University of Texas at Arlington

Description

This dataset accompanies the paper "LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media" (Findings of ACL 2025). It contains the curated data used for taxonomy construction experiments described in the paper, focusing on factual claims extracted from social media discussions across three topic domains, including COVID-19 vaccine, climate change, and cybersecurity. This dataset is designed to support research in taxonomy construction and factual claim analysis. 

Contents

  • tweets.csv: The ids of 384,676 tweets collected from X (formerly Twitter) for the three domains above. (Note: Facebook data in the paper are not included due to data-sharing restrictions and privacy policies.)
  • Taxonomies: Nine final taxonomies of factual claims generated by three LLMs (Zephyr, GPT-4o mini, Gemini 2.0 Flash) across the three datasets. Each taxonomy includes three hierarchical levels: broad, medium, and detailed topics. 

Files

climate_taxonomy_gemini.json

Files (8.2 MB)

Name Size Download all
md5:69ab3f809af94d30c1868fa1ec6f9a50
2.8 kB Preview Download
md5:783b9331c887dd52179733bc45dc1e57
16.7 kB Preview Download
md5:a35e93fcc1688171d199f44d2874091b
19.8 kB Preview Download
md5:1f3f36dc780ac661517a90359ccedef7
2.7 kB Preview Download
md5:e64665807460410d307ba997a13c2691
10.8 kB Preview Download
md5:e01a636a19102b8462d273054b8e1195
13.0 kB Preview Download
md5:52d1c2e94e08d021ec192563844bdb79
3.1 kB Preview Download
md5:13834689a1ccd2ff511da69faf4dec46
5.9 kB Preview Download
md5:bfc959ddf3adf51a5749f4d9fae8d024
7.2 kB Preview Download
md5:497e7a9b70814865b3f5b22bcdcab45e
8.1 MB Preview Download