There is a newer version of this record available.

Software Open Access

The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats

Giovanni Spitale; Federico Germani; Nikola Biller - Andorno


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.5533907", 
  "language": "eng", 
  "title": "The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats", 
  "issued": {
    "date-parts": [
      [
        2021, 
        9, 
        28
      ]
    ]
  }, 
  "abstract": "<p>The purpose of this tool is performing NLP analysis on Telegram chats. Telegram chats can be exported as .json files from the official client, Telegram Desktop (v. 2.9.2.0).&nbsp;</p>\n\n<p>The files are parsed, the content is used to populate a message dataframe, which is then anonymized.&nbsp;</p>\n\n<p><strong>The software calculates and displays the following information:</strong></p>\n\n<ul>\n\t<li>user count (n of users, new users per day, removed users per day);</li>\n\t<li>message count (n and relative frequency of messages, messages per day);</li>\n\t<li>autocoded messages (anonymized message dataframe with code weights assigned to each message based on a customizable set of regex rules);</li>\n\t<li>prevalence of codes (n and relative frequency);</li>\n\t<li>prevalence of lemmas&nbsp;(n and relative frequency);</li>\n\t<li>prevalence of lemmas segmented by autocode (n and relative frequency);</li>\n\t<li>mean sentiment per day;</li>\n\t<li>mean sentiment&nbsp;segmented by autocode.</li>\n</ul>\n\n<p><strong>The software outputs:</strong></p>\n\n<ul>\n\t<li>messages_df_anon.csv - an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender, and the text;</li>\n\t<li>usercount_df.csv - user count dataframe;</li>\n\t<li>user_activity_df.csv - user activity dataframe;</li>\n\t<li>messagecount_df.csv - message count dataframe;</li>\n\t<li>messages_df_anon_coded.csv -&nbsp;an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender,&nbsp;the text, the codes, and the sentiment;</li>\n\t<li>autocode_freq_df.csv - general prevalence of codes;</li>\n\t<li>lemma_df.csv - lemma frequency;</li>\n\t<li>autocode_freq_df_[rule_name].csv - lemma frequency in coded messages, one file per rule;</li>\n\t<li>daily_sentiment_df.csv - daily sentiment;</li>\n\t<li>sentiment_by_code_df.csv - sentiment segmented by code;</li>\n\t<li>messages_anon.txt - anonymized text file generated from the message data frame, for easy import in other software for text mining or qualitative analysis;</li>\n\t<li>messages_anon_MaxQDA.txt - anonymized text file generated from the message data frame, formatted specifically for MaxQDA (to track speakers and codes).</li>\n</ul>\n\n<p>Dependencies:</p>\n\n<ul>\n\t<li>pandas (1.2.1)</li>\n\t<li>json</li>\n\t<li>random</li>\n\t<li>os</li>\n\t<li>re</li>\n\t<li>tqdm (4.62.2)</li>\n\t<li>datetime (4.3)</li>\n\t<li>matplotlib (3.4.3)</li>\n\t<li>Spacy (3.1.2) + it_core_news_md</li>\n\t<li>wordcloud (1.8.1)</li>\n\t<li>Counter</li>\n\t<li>feel_it (1.0.3)</li>\n\t<li>torch (1.9.0)</li>\n\t<li>numpy (1.21.1)</li>\n\t<li>transformers (4.3.3)</li>\n</ul>\n\n<p>This code is optimized for Italian.&nbsp;</p>\n\n<p>Lemma analysis is based on spaCy, which provides several other models for other languages (&nbsp;<a href=\"https://spacy.io/models\">https://spacy.io/models</a>&nbsp;) so it can easily be adapted.</p>\n\n<p>Sentiment analysis is performed using <a href=\"https://github.com/MilaNLProc/feel-it\">FEEL-IT: Emotion and Sentiment Classification for the Italian Language</a>&nbsp;(Kudos to Federico Bianchi &lt;f.bianchi@unibocconi.it&gt;; Debora Nozza &lt;debora.nozza@unibocconi.it&gt;; and Dirk Hovy &lt;dirk.hovy@unibocconi.it&gt;). Their work is specific for Italian. To perform sentiment analysis in other languages one could consider nltk.sentiment</p>\n\n<p>The code is structured in a Jupyter-lab notebook, heavily commented for future reference.</p>", 
  "author": [
    {
      "family": "Giovanni Spitale"
    }, 
    {
      "family": "Federico Germani"
    }, 
    {
      "family": "Nikola Biller - Andorno"
    }
  ], 
  "version": "1.0.0", 
  "type": "article", 
  "id": "5533907"
}
290
17
views
downloads
All versions This version
Views 29015
Downloads 172
Data volume 63.4 MB11.8 MB
Unique views 26515
Unique downloads 152

Share

Cite as