Published November 14, 2025
| Version v1
Dataset
Open
Taxonomy Construction of Factual Claims from Social Media
Description
This dataset accompanies the paper "LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media" (Findings of ACL 2025). It contains the curated data used for taxonomy construction experiments described in the paper, focusing on factual claims extracted from social media discussions across three topic domains, including COVID-19 vaccine, climate change, and cybersecurity. This dataset is designed to support research in taxonomy construction and factual claim analysis.
Contents
- tweets.csv: The ids of 384,676 tweets collected from X (formerly Twitter) for the three domains above. (Note: Facebook data in the paper are not included due to data-sharing restrictions and privacy policies.)
- Taxonomies: Nine final taxonomies of factual claims generated by three LLMs (Zephyr, GPT-4o mini, Gemini 2.0 Flash) across the three datasets. Each taxonomy includes three hierarchical levels: broad, medium, and detailed topics.
Files
climate_taxonomy_gemini.json
Files
(8.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:69ab3f809af94d30c1868fa1ec6f9a50
|
2.8 kB | Preview Download |
|
md5:783b9331c887dd52179733bc45dc1e57
|
16.7 kB | Preview Download |
|
md5:a35e93fcc1688171d199f44d2874091b
|
19.8 kB | Preview Download |
|
md5:1f3f36dc780ac661517a90359ccedef7
|
2.7 kB | Preview Download |
|
md5:e64665807460410d307ba997a13c2691
|
10.8 kB | Preview Download |
|
md5:e01a636a19102b8462d273054b8e1195
|
13.0 kB | Preview Download |
|
md5:52d1c2e94e08d021ec192563844bdb79
|
3.1 kB | Preview Download |
|
md5:13834689a1ccd2ff511da69faf4dec46
|
5.9 kB | Preview Download |
|
md5:bfc959ddf3adf51a5749f4d9fae8d024
|
7.2 kB | Preview Download |
|
md5:497e7a9b70814865b3f5b22bcdcab45e
|
8.1 MB | Preview Download |