Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users
Creators
Description
Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users
This model is used to analyze the Twitter users and assigns a score calculated based on their social profiles, the credibility of his tweets, the h-indexing score of the tweets. Users with a higher score are not only considered as more influential but also their tweets are considered to have greater credibility. The model is based on both the user level and content level features of a Twitter user. The details for feature extraction and calculating the Influence score is given in the paper.
Description
To extract the features from Twitter and generate the dataset we used Python. A modAL framework is used to randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. We generate a dataset for 50000 Twitter users and then used different classifiers to classify the Twitter user either as Trusted or Untrusted.
Organization
The project consists of the following files:
Dataset.csv
The dataset consists of different features of 50000 Twitter users (Politicians) without labels.
Manually_labeled-Dataset.csv
This CSV file contains all those Twitter users classified manually as Trusted or Untrusted
feature_extraction.py
This python script is used to calculate the Influence score of a Twitter user and further used to generate a dataset. The Influence score is based on:
- Social reputation of the user
- Content score of the tweets
- Tweets credibility
- Index score for the number of re-tweets and likes
Activelearner.ipynb
To classify a large pool of unlabeled data, we used an active learning model (ModAL Framework). A semi-supervised learning algorithm ideal for a situation in which the unlabeled data is abundant but manual labeling is expensive. The active learner randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. Further, we use four different classifiers (Support Vector Machine, Logistic Regression, Multilayer Perceptron and Random Forest) to classify the Twitter user as either Trusted Or Untrusted.
twitter_reputation.ipynb
We used different regression models to test its performance on our generated dataset (It is only for testing, now no more part of our work). We train and evaluate our models using different regression models.
Training and testing three regression models:
1. Multilayer perceptron
2. Deep neural network
3. Linear regression
twitter_credentials.py
In order to extract the features of Twitter users first, one need to authenticate by providing the credentials given in this file.
Screen names (Screen_name_1.txt, Screen_name_2.txt, Screen_name_3.txt)
These text files consist of all the Twitter user screen_names. All of them are politicians. We remove the names of all those politicians whose accounts are private. In addition, all those politicians who have no followers/followings are not on the list are also removed. The text of the tweets are not saved. Furthermore, we also remove duplicate names.
References
[1] https://stackoverflow.com/questions/38881314/twitter-data-to-csv-getting-error-when-trying-to-add-to-csv-file
[2] https://stackoverflow.com/questions/48157259/python-tweepy-api-user-timeline-for-list-of-multiple-users-error
[3] https://gallery.azure.ai/Notebook/Computing-Influence-Score-for-Twitter-Users-1
[4] https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
[5] https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33
Files
Activelearner.ipynb
Files
(20.8 MB)
Name | Size | Download all |
---|---|---|
md5:30cfff05e1e0271f93e9b54fdfb2e1c7
|
246.9 kB | Preview Download |
md5:7c0ed22e9057bd13d9ec7562775902dd
|
122.5 kB | Preview Download |
md5:bc2069929f3eda42b2f18d373b929524
|
8.0 MB | Preview Download |
md5:65f6ad30af1d811ac7a7c1e517a19eef
|
5.2 MB | Preview Download |
md5:f30dcecbda00598edb79b118a45b1c65
|
4.8 MB | Download |
md5:db763ae375606b3d53b27bf0438395d7
|
139.5 kB | Preview Download |
md5:cb00faa555179c983929b262e479b4fb
|
7.7 kB | Download |
md5:db763ae375606b3d53b27bf0438395d7
|
139.5 kB | Preview Download |
md5:e3fd7e7780a1c2d10dae340b1d279ae6
|
3.7 kB | Preview Download |
md5:eaec2b325ff70321f2005265b4b4b664
|
383.3 kB | Preview Download |
md5:b725ee52983015e778547e600ec58d4d
|
530.8 kB | Preview Download |
md5:314223e201b3f801dd5743028f11cd67
|
734.3 kB | Preview Download |
md5:db763ae375606b3d53b27bf0438395d7
|
139.5 kB | Preview Download |
md5:b59ce852d9609141490389817866b73b
|
8.5 kB | Download |
md5:b025461dc6833617003246245e0ea8a7
|
83 Bytes | Download |
md5:b31de63987aa967482656b95f825c1b0
|
310.2 kB | Preview Download |
md5:fe37ad4e13ccc114608931f9f66d9441
|
6.4 kB | Preview Download |