Published September 13, 2023 | Version 1.0.0
Dataset Open

Temporal Validity Change Prediction - Dataset

Authors/Creators

  • 1. University of Innsbruck

Contributors

Supervisor:

  • 1. University of Innsbruck

Description

This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns. 

  • target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text.
  • follow_up - A synthetic follow-up tweet that semantically relates to the target tweet.
  • context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation.
  • combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet.
  • change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet.

The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution:
[no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month]

Different dataset splits are provided.

  • "dataset.csv" contains the full dataset.
  • "train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split.
  • "train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split. 

Files

dataset.csv

Files (3.9 MB)

Name Size Download all
md5:e23c16405dd96aea8414303b77e2814a
557.0 kB Preview Download
md5:2c6d89087f481f8862db0ef0dd557a17
56.3 kB Preview Download
md5:dd4f52ffb21b0c416b56ab271ac9ee04
111.5 kB Preview Download
md5:b444e8b78782f6373fc1550e328b3f8c
110.3 kB Preview Download
md5:afe54a50d2416f533c934b43c802c83d
111.9 kB Preview Download
md5:43955d433d98f93520b0229bbe5654b4
110.5 kB Preview Download
md5:7d5cfd34f1768ef7b998cef18e028a94
113.0 kB Preview Download
md5:730dca26724dc5612d32482df2304200
446.0 kB Preview Download
md5:52a4d57256f6e0304b4fe053c810de4d
445.6 kB Preview Download
md5:767d5cd09b37c257d75993d8c3d2d9bb
446.8 kB Preview Download
md5:5bc398d658347dea67e65d9e8e4eab3e
445.2 kB Preview Download
md5:aff8af493fe93bd1fd5f075e31411dca
446.6 kB Preview Download
md5:277da00eb5af992a20cc5b5c862f07af
444.1 kB Preview Download
md5:4474fdc8111c24414b0f46e9509b82e2
54.8 kB Preview Download