Emoji Gestures in Russian Tweets: Moscow
Description
The dataset consists of 48 838 tweets each of them contains one of the 31 gesture emoji (different hand configurations) and its skin tone modifier options (e.g. 🙏🙏🏿🙏🏾🙏🏽🙏🏼🙏🏻), and posted within 50km from Moscow, Russia, in Russian, during May-August 2021. The dataset can be used to investigate the use of gesture emoji by Russian users of the Twitter platform. Python libraries used for collecting tweets and preprocessing: tweepy, re, preprocessor, emoji, regex, string, nltk.
The dataset contains 11 columns:
-
preprocessed
preprocessed text of the tweet (4 steps)
-
all_emoji
lists all emoji in a given tweet
-
hashtags
lists all hashtags in a given tweet
-
user_encoded
encoded Twitter user name: the first 3 characters of the user name and the first 3 characters of the user's location
-
location_encoded
location of the user: "moscow", "moscow_region", or "other"
-
mention_present
checks whether each tweet contains mentions
-
url_present
checks whether each tweet contains url
-
preprocess_tweet
preprocessing step 1: tokenizing mentions, urls, and hashtags
-
lowercase_tweet
preprocessing step 2: lowercasing
-
remove_punct_tweet
preprocessing step 3: removing punctuation
-
tokenize_tweet
preprocessing step 4: tokenizing
The further information on the research project can be found here: https://github.com/mzhukovaucsb/emoji_gestures/
Files
emoji_gestures_dataset_russian_update.csv
Files
(35.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:9f159a0288dc83ace4762d7ebf510218
|
35.5 MB | Preview Download |