Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

doi:10.5281/zenodo.7014109

Published January 8, 2021 | Version v3

Dataset Open

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

Tanveer Khan

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

This model is used to analyze the Twitter users and assigns a score calculated based on their social profiles, the credibility of his tweets, the h-indexing score of the tweets. Users with a higher score are not only considered as more influential but also their tweets are considered to have greater credibility. The model is based on both the user level and content level features of a Twitter user. The details for feature extraction and calculating the Influence score is given in the paper.

Description
To extract the features from Twitter and generate the dataset we used Python. A modAL framework is used to randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. We generate a dataset for 50000 Twitter users and then used different classifiers to classify the Twitter user either as Trusted or Untrusted.

Organization
The project consists of the following files:

Dataset.csv
The dataset consists of different features of 50000 Twitter users (Politicians) without labels.

Manually_labeled-Dataset.csv
This CSV file contains all those Twitter users classified manually as Trusted or Untrusted

feature_extraction.py
This python script is used to calculate the Influence score of a Twitter user and further used to generate a dataset. The Influence score is based on:

- Social reputation of the user
- Content score of the tweets
- Tweets credibility
- Index score for the number of re-tweets and likes

Activelearner.ipynb
To classify a large pool of unlabeled data, we used an active learning model (ModAL Framework). A semi-supervised learning algorithm ideal for a situation in which the unlabeled data is abundant but manual labeling is expensive. The active learner randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. Further, we use four different classifiers (Support Vector Machine, Logistic Regression, Multilayer Perceptron and Random Forest) to classify the Twitter user as either Trusted Or Untrusted.

twitter_reputation.ipynb
We used different regression models to test its performance on our generated dataset (It is only for testing, now no more part of our work). We train and evaluate our models using different regression models.
Training and testing three regression models:
1. Multilayer perceptron
2. Deep neural network
3. Linear regression

twitter_credentials.py
In order to extract the features of Twitter users first, one need to authenticate by providing the credentials given in this file.

Screen names (Screen_name_1.txt, Screen_name_2.txt, Screen_name_3.txt)
These text files consist of all the Twitter user screen_names. All of them are politicians. We remove the names of all those politicians whose accounts are private. In addition, all those politicians who have no followers/followings are not on the list are also removed. The text of the tweets are not saved. Furthermore, we also remove duplicate names.

References
[1] https://stackoverflow.com/questions/38881314/twitter-data-to-csv-getting-error-when-trying-to-add-to-csv-file

[2] https://stackoverflow.com/questions/48157259/python-tweepy-api-user-timeline-for-list-of-multiple-users-error

[3] https://gallery.azure.ai/Notebook/Computing-Influence-Score-for-Twitter-Users-1

[4] https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

[5] https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33

Files

Activelearner.ipynb

Files (20.8 MB)

Name	Size	Download all
Activelearner.ipynb md5:30cfff05e1e0271f93e9b54fdfb2e1c7	246.9 kB	Preview Download
data_set.csv md5:7c0ed22e9057bd13d9ec7562775902dd	122.5 kB	Preview Download
Dataset.csv md5:bc2069929f3eda42b2f18d373b929524	8.0 MB	Preview Download
Dataset1.csv md5:65f6ad30af1d811ac7a7c1e517a19eef	5.2 MB	Preview Download
Dataset1.xlsx md5:f30dcecbda00598edb79b118a45b1c65	4.8 MB	Download
datasettest.csv md5:db763ae375606b3d53b27bf0438395d7	139.5 kB	Preview Download
feature_extraction.py md5:cb00faa555179c983929b262e479b4fb	7.7 kB	Download
Manually_labeled-Dataset.csv md5:db763ae375606b3d53b27bf0438395d7	139.5 kB	Preview Download
README.md md5:e3fd7e7780a1c2d10dae340b1d279ae6	3.7 kB	Preview Download
Screen_name_1.txt md5:eaec2b325ff70321f2005265b4b4b664	383.3 kB	Preview Download
Screen_name_2.txt md5:b725ee52983015e778547e600ec58d4d	530.8 kB	Preview Download
Screen_names_3.txt md5:314223e201b3f801dd5743028f11cd67	734.3 kB	Preview Download
Trainingset.csv md5:db763ae375606b3d53b27bf0438395d7	139.5 kB	Preview Download
twiteer_reputation.py md5:b59ce852d9609141490389817866b73b	8.5 kB	Download
twitter_credentials.py md5:b025461dc6833617003246245e0ea8a7	83 Bytes	Download
twitter_reputation.ipynb md5:b31de63987aa967482656b95f825c1b0	310.2 kB	Preview Download
twitter_user_names.txt md5:fe37ad4e13ccc114608931f9f66d9441	6.4 kB	Preview Download

	All versions	This version
Views	1,232	449
Downloads	1,683	984
Data volume	5.3 GB	2.5 GB

Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users

Creators

Description

Files

Activelearner.ipynb

Files (20.8 MB)