There is a newer version of the record available.

Published December 23, 2021 | Version v1
Dataset Open

Emoji Gestures in Russian Tweets: Moscow

Authors/Creators

  • 1. University of California, Santa Barbara

Description

The dataset consists of 48 838 tweets each of them contains one of the 31 gesture emoji (different hand configurations) and its skin tone modifier options (e.g. 🙏🙏🏿🙏🏾🙏🏽🙏🏼🙏🏻), and posted within 50km from Moscow, Russia, in Russian, during May-August 2021. The dataset can be used to investigate the use of gesture emoji by Russian users of the Twitter platform. Python libraries used for collecting tweets and preprocessing: tweepy, re, preprocessor, emoji, regex, string, nltk. 

The dataset contains 12 columns:

  1. tweet_original

    original text of the tweet

  2. preprocessed

    preprocessed text of the tweet (4 steps)

  3. all_emoji

    lists all emoji in a given tweet

  4. hashtags

    lists all hashtags in a given tweet

  5. user_encoded

    encoded Twitter user name: the first 3 characters of the user name and the first 3 characters of the user's location

  6. location_encoded

    location of the user: "moscow", "moscow_region", or "other"

  7. mention_present

    checks whether each tweet contains url

  8. url_present

    checks whether each tweet contains url

  9. preprocess_tweet

    preprocessing step 1: tokenizing mentions, urls, and hashtags

  10. lowercase_tweet

    preprocessing step 2: lowercasing

  11. remove_punct_tweet

    preprocessing step 3: removing punctuation

  12. tokenize_tweet

    preprocessing step 4: tokenizing

The further information on the research project can be found here: https://github.com/mzhukovaucsb/emoji_gestures/

Files

emoji_gestures_dataset_russian.csv

Files (34.0 MB)

Name Size Download all
md5:432f90b26b82157116909559a9e5fcda
34.0 MB Preview Download