Published July 5, 2011 | Version v1
Dataset Open

Astroturf/Legitimate Classification

  • 1. Center for Complex Networks and Systems Research, School of Informatics and Computing, Indiana University, Bloomington


This is the training data used to produce the results shown in the paper listed below.

  • Source: Sampled public tweets from Twitter streaming API.
  • Date range: September 14 to October 27, 2010.
  • Contains:
    1. data.arff: holds the un-resampled training data.
    2. data_balanced.arff: holds the resampled training data.
    3. data.instance_to_id.pickle: holds a Python pickle relating instance IDs in the data.arff file with Meme IDs in the Truthy database. To view the page for a particular meme ID, go to
  • Please cite:



Files (148.2 kB)

Name Size Download all
15.0 kB Preview Download
133.1 kB Download