Truthfulness Stance Detection on Claim-Tweet (TSD-CT)
Description
Under the final threshold, TSD-CT contains 5,331 finalized claim-tweet pairs (including 269 screening and training pairs) covering 2,201 unique factual claims.
The label distribution is as follows: 2,104 (39.47%) are labeled as Positive, 882 (16.57%) as Neutral/No Stance, 883 (16.54%) as Negative, 309 (5.80%) as Different Topics, and 1,153 (21.62\%) as Problematic.
The claim veracity is also diverse, with 722 claims labeled as false (32.80%), 353 as pants-fire (16.04%), 340 as barely-true (15.45%), 292 as half-true (13.27%), 287 as mostly-true (13.04%), and 207 as true (9.40%). On average, each claim–tweet pair contains 34.61 tokens and 261.76 characters. Additionally, 2,169 pairs (40.71%) include at least one hyperlink, indicating external reference or contextual support. Among all the 845 topics, the dataset is dominated by discussions on coronavirus (1,573 pairs; 29.51%) and public health (870; 16.32%), followed by Donald Trump (583; 10.94%), elections (438; 8.22%), economy (390; 7.32%), health care (329; 6.17%), crime (295; 5.54%), government regulation (255; 4.78%), drugs (240; 4.50\%), science (239; 4.49\%) and so on.
Fields/Columns
-
id:
- Type: Integer or String
- Description: Unique identifier for each claim or record.
-
claim_author:
- Type: String
- Description: The author of the claim.
-
claim:
- Type: String
- Description: The text of the claim being analyzed.
-
tweet:
- Type: String (always "REDACTED")
- Description: Placeholder for the tweet text, which has been redacted for privacy.
-
screening:
- Type: String or Boolean
- Description: Indicates whether the claim has been screened.
-
answered:
- Type: Boolean
- Description: Indicates whether the claim has been answered or fact-checked.
-
tweet_url_title:
- Type: String
- Description: Title or description of the tweet's URL, if applicable.
-
claim_timestamp:
- Type: DateTime
- Description: Timestamp when the claim was made.
-
tweet_timestamp:
- Type: DateTime
- Description: Timestamp when the associated tweet was posted.
-
tweet_id:
- Type: String or Integer
- Description: Unique identifier for the tweet.
-
tweet_userhandle:
- Type: String
- Description: Twitter handle of the user who posted the tweet.
-
retweet_count:
- Type: Integer
- Description: Number of retweets for the tweet.
-
reply_count:
- Type: Integer
- Description: Number of replies to the tweet.
-
like_count:
- Type: Integer
- Description: Number of likes for the tweet.
-
quote_count:
- Type: Integer
- Description: Number of quote tweets for the tweet.
-
claim_source:
- Type: String
- Description: Source of the claim (e.g., news outlet, individual, etc.).
-
claim_verdict:
- Type: String
- Description: Verdict of the claim (e.g., true, false, misleading).
-
factcheck_timestamp:
- Type: DateTime
- Description: Timestamp when the claim was fact-checked.
-
claim_review_summary:
- Type: String
- Description: Summary of the claim review.
-
claim_review:
- Type: String
- Description: Detailed review of the claim.
-
factcheck_url:
- Type: String
- Description: URL to the fact-checking article or source.
-
claim_tags:
- Type: List of Strings
- Description: Tags or categories associated with the claim.
-
claimbuster_score:
- Type: Float
- Description: Score assigned by ClaimBuster, indicating the claim's importance or likelihood of being fact-checked.
-
pair_id:
- Type: String or Integer
- Description: Identifier for paired records (e.g., claim and fact-check).
-
factcheck_author_url:
- Type: String
- Description: URL to the profile of the fact-checking author.
-
factcheck_post_time:
- Type: DateTime
- Description: Time when the fact-checking post was published.
-
factcheck_author_info:
- Type: String
- Description: Information about the fact-checking author.
-
subset:
- Type: String
- Description: Subset or category of the dataset (e.g., training, testing, validation).
-
annotator_agreement:
- Type: Float or String
- Description: Claim-tweet pair label. One of Positive (1), Neutral/No Stance (0), Negative (-1), Different Topics (2), Problematic (3).
Files
TSD-CT.csv
Files
(62.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fe637bf998322468d4b130467fe0804c
|
29.3 MB | Preview Download |
|
md5:7bb2449b803cc492f3a7e61e766aac12
|
33.5 MB | Preview Download |
Additional details
Software
- Repository URL
- https://idir.uta.edu/stance_annotation
- Programming language
- PHP, HTML+PHP
- Development Status
- Active