Published June 8, 2025 | Version v1
Dataset Open

Truthfulness Stance Detection on Claim-Tweet (TSD-CT)

  • 1. ROR icon The University of Texas at Arlington

Description

Under the final threshold, TSD-CT contains 5,331 finalized claim-tweet pairs (including 269 screening and training pairs) covering 2,201 unique factual claims. 
The label distribution is as follows: 2,104 (39.47%) are labeled as Positive, 882 (16.57%) as Neutral/No Stance, 883 (16.54%) as Negative, 309 (5.80%) as Different Topics, and 1,153 (21.62\%) as Problematic. 
The claim veracity is also diverse, with 722 claims labeled as false (32.80%), 353 as pants-fire (16.04%), 340 as barely-true (15.45%), 292 as half-true (13.27%), 287 as mostly-true (13.04%), and 207 as true (9.40%). On average, each claim–tweet pair contains 34.61 tokens and 261.76 characters. Additionally, 2,169 pairs (40.71%) include at least one hyperlink, indicating external reference or contextual support. Among all the 845 topics, the dataset is dominated by discussions on coronavirus (1,573 pairs; 29.51%) and public health (870; 16.32%), followed by Donald Trump (583; 10.94%), elections (438; 8.22%), economy (390; 7.32%), health care (329; 6.17%), crime (295; 5.54%), government regulation (255; 4.78%), drugs (240; 4.50\%), science (239; 4.49\%) and so on.

Fields/Columns

  1. id:

    • Type: Integer or String
    • Description: Unique identifier for each claim or record.
  2. claim_author:

    • Type: String
    • Description: The author of the claim.
  3. claim:

    • Type: String
    • Description: The text of the claim being analyzed.
  4. tweet:

    • Type: String (always "REDACTED")
    • Description: Placeholder for the tweet text, which has been redacted for privacy.
  5. screening:

    • Type: String or Boolean
    • Description: Indicates whether the claim has been screened.
  6. answered:

    • Type: Boolean
    • Description: Indicates whether the claim has been answered or fact-checked.
  7. tweet_url_title:

    • Type: String
    • Description: Title or description of the tweet's URL, if applicable.
  8. claim_timestamp:

    • Type: DateTime
    • Description: Timestamp when the claim was made.
  9. tweet_timestamp:

    • Type: DateTime
    • Description: Timestamp when the associated tweet was posted.
  10. tweet_id:

    • Type: String or Integer
    • Description: Unique identifier for the tweet.
  11. tweet_userhandle:

    • Type: String
    • Description: Twitter handle of the user who posted the tweet.
  12. retweet_count:

    • Type: Integer
    • Description: Number of retweets for the tweet.
  13. reply_count:

    • Type: Integer
    • Description: Number of replies to the tweet.
  14. like_count:

    • Type: Integer
    • Description: Number of likes for the tweet.
  15. quote_count:

    • Type: Integer
    • Description: Number of quote tweets for the tweet.
  16. claim_source:

    • Type: String
    • Description: Source of the claim (e.g., news outlet, individual, etc.).
  17. claim_verdict:

    • Type: String
    • Description: Verdict of the claim (e.g., true, false, misleading).
  18. factcheck_timestamp:

    • Type: DateTime
    • Description: Timestamp when the claim was fact-checked.
  19. claim_review_summary:

    • Type: String
    • Description: Summary of the claim review.
  20. claim_review:

    • Type: String
    • Description: Detailed review of the claim.
  21. factcheck_url:

    • Type: String
    • Description: URL to the fact-checking article or source.
  22. claim_tags:

    • Type: List of Strings
    • Description: Tags or categories associated with the claim.
  23. claimbuster_score:

    • Type: Float
    • Description: Score assigned by ClaimBuster, indicating the claim's importance or likelihood of being fact-checked.
  24. pair_id:

    • Type: String or Integer
    • Description: Identifier for paired records (e.g., claim and fact-check).
  25. factcheck_author_url:

    • Type: String
    • Description: URL to the profile of the fact-checking author.
  26. factcheck_post_time:

    • Type: DateTime
    • Description: Time when the fact-checking post was published.
  27. factcheck_author_info:

    • Type: String
    • Description: Information about the fact-checking author.
  28. subset:

    • Type: String
    • Description: Subset or category of the dataset (e.g., training, testing, validation).
  29. annotator_agreement:

    • Type: Float or String
    • Description: Claim-tweet pair label. One of Positive (1), Neutral/No Stance (0), Negative (-1), Different Topics (2), Problematic (3).

Files

TSD-CT.csv

Files (62.8 MB)

Name Size Download all
md5:fe637bf998322468d4b130467fe0804c
29.3 MB Preview Download
md5:7bb2449b803cc492f3a7e61e766aac12
33.5 MB Preview Download

Additional details

Software

Repository URL
https://idir.uta.edu/stance_annotation
Programming language
PHP, HTML+PHP
Development Status
Active