Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published January 8, 2021 | Version v3
Dataset Open

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

Creators

Description

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

This model is used to analyze the Twitter users and assigns a score calculated based on their social profiles, the credibility of his tweets, the h-indexing score of the tweets. Users with a higher score are not only considered as more influential but also their tweets are considered to have greater credibility. The model is based on both the user level and content level features of a Twitter user. The details for feature extraction and calculating the Influence score is given in the paper.

Description
To extract the features from Twitter and generate the dataset we used Python. A modAL framework is used to randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. We generate a dataset for 50000 Twitter users and then used different classifiers to classify the Twitter user either as Trusted or Untrusted.

Organization
The project consists of the following files:

Dataset.csv
The dataset consists of different features of 50000 Twitter users (Politicians) without labels.

Manually_labeled-Dataset.csv
This CSV file contains all those Twitter users classified manually as Trusted or Untrusted

feature_extraction.py
This python script is used to calculate the Influence score of a Twitter user and further used to generate a dataset. The Influence score is based on:

- Social reputation of the user
- Content score of the tweets
- Tweets credibility 
- Index score for the number of re-tweets and likes

Activelearner.ipynb
To classify a large pool of unlabeled data, we used an active learning model (ModAL Framework). A semi-supervised learning algorithm ideal for a situation in which the unlabeled data is abundant but manual labeling is expensive. The active learner randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. Further, we use four different classifiers (Support Vector Machine, Logistic Regression, Multilayer Perceptron and Random Forest) to classify the Twitter user as either Trusted Or Untrusted.

twitter_reputation.ipynb
We used different regression models to test its performance on our generated dataset (It is only for testing, now no more part of our work). We train and evaluate our models using different regression models.
Training and testing three regression models:
1. Multilayer perceptron
2. Deep neural network
3. Linear regression

twitter_credentials.py
In order to extract the features of Twitter users first, one need to authenticate by providing the credentials given in this file.

Screen names (Screen_name_1.txt, Screen_name_2.txt, Screen_name_3.txt)
These text files consist of all the Twitter user screen_names. All of them are politicians. We remove the names of all those politicians whose accounts are private. In addition, all those politicians who have no followers/followings are not on the list are also removed. The text of the tweets are not saved.  Furthermore, we also remove duplicate names.

References
[1] https://stackoverflow.com/questions/38881314/twitter-data-to-csv-getting-error-when-trying-to-add-to-csv-file

[2] https://stackoverflow.com/questions/48157259/python-tweepy-api-user-timeline-for-list-of-multiple-users-error

[3] https://gallery.azure.ai/Notebook/Computing-Influence-Score-for-Twitter-Users-1

[4] https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

[5] https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33

Files

Activelearner.ipynb

Files (20.8 MB)

Name Size Download all
md5:30cfff05e1e0271f93e9b54fdfb2e1c7
246.9 kB Preview Download
md5:7c0ed22e9057bd13d9ec7562775902dd
122.5 kB Preview Download
md5:bc2069929f3eda42b2f18d373b929524
8.0 MB Preview Download
md5:65f6ad30af1d811ac7a7c1e517a19eef
5.2 MB Preview Download
md5:f30dcecbda00598edb79b118a45b1c65
4.8 MB Download
md5:db763ae375606b3d53b27bf0438395d7
139.5 kB Preview Download
md5:cb00faa555179c983929b262e479b4fb
7.7 kB Download
md5:db763ae375606b3d53b27bf0438395d7
139.5 kB Preview Download
md5:e3fd7e7780a1c2d10dae340b1d279ae6
3.7 kB Preview Download
md5:eaec2b325ff70321f2005265b4b4b664
383.3 kB Preview Download
md5:b725ee52983015e778547e600ec58d4d
530.8 kB Preview Download
md5:314223e201b3f801dd5743028f11cd67
734.3 kB Preview Download
md5:db763ae375606b3d53b27bf0438395d7
139.5 kB Preview Download
md5:b59ce852d9609141490389817866b73b
8.5 kB Download
md5:b025461dc6833617003246245e0ea8a7
83 Bytes Download
md5:b31de63987aa967482656b95f825c1b0
310.2 kB Preview Download
md5:fe37ad4e13ccc114608931f9f66d9441
6.4 kB Preview Download