This is the training data used to produce the results shown in the paper listed below.
Source: Sampled public tweets from Twitter streaming API.
Date range: September 14 to October 27, 2010.
data.arff: holds the un-resampled training data.
data_balanced.arff: holds the resampled training data.
data.instance_to_id.pickle: holds a Python pickle relating instance IDs in the data.arff file with Meme IDs in the Truthy database. To view the page for a particular meme ID, go to http://truthy.indiana.edu/m?id=