There is a newer version of the record available.

Published January 13, 2022 | Version v4
Dataset Open

Emoji Gestures in Russian Tweets: Moscow

Authors/Creators

  • 1. University of California, Santa Barbara

Description

The dataset consists of 48 838 tweets each of them contains one of the 31 gesture emoji (different hand configurations) and its skin tone modifier options (e.g. 🙏🙏🏿🙏🏾🙏🏽🙏🏼🙏🏻), and posted within 50km from Moscow, Russia, in Russian, during May-August 2021. The dataset can be used to investigate the use of gesture emoji by Russian users of the Twitter platform. Python libraries used for collecting tweets and preprocessing: tweepy, re, preprocessor, emoji, regex, string, nltk. 

The dataset contains 11 columns:

  1. preprocessed

    preprocessed text of the tweet (4 steps)

  2. all_emoji

    lists all emoji in a given tweet

  3. hashtags

    lists all hashtags in a given tweet

  4. user_encoded

    encoded Twitter user name: the first 3 characters of the user name and the first 3 characters of the user's location

  5. location_encoded

    location of the user: "moscow", "moscow_region", or "other"

  6. mention_present

    checks whether each tweet contains mentions

  7. url_present

    checks whether each tweet contains url

  8. preprocess_tweet

    preprocessing step 1: tokenizing mentions, urls, and hashtags

  9. lowercase_tweet

    preprocessing step 2: lowercasing

  10. remove_punct_tweet

    preprocessing step 3: removing punctuation

  11. tokenize_tweet

    preprocessing step 4: tokenizing

The further information on the research project can be found here: https://github.com/mzhukovaucsb/emoji_gestures/

Files

emoji_gestures_dataset_russian_update.csv

Files (35.5 MB)

Name Size Download all
md5:9f159a0288dc83ace4762d7ebf510218
35.5 MB Preview Download