There is a newer version of this record available.

Software Open Access

The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats

Giovanni Spitale; Federico Germani; Nikola Biller - Andorno


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Bianchi F, Nozza D, Hovy D. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. In: Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics; 2021. https://github.com/MilaNLProc/feel-it</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">natural language processing</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">NLP</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">telegram</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">covid-19</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">social listening</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">green pass</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">vaccine</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">freedom</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">ethics</subfield>
  </datafield>
  <controlfield tag="005">20210928134829.0</controlfield>
  <controlfield tag="001">5533907</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield>
    <subfield code="0">(orcid)0000-0002-5604-0437</subfield>
    <subfield code="a">Federico Germani</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield>
    <subfield code="0">(orcid)0000-0001-7661-1324</subfield>
    <subfield code="a">Nikola Biller - Andorno</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">5903284</subfield>
    <subfield code="z">md5:67bb88c8016699f875c4e486470e7bfc</subfield>
    <subfield code="u">https://zenodo.org/record/5533907/files/telegram social listening v1.0.0.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-09-28</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o">oai:zenodo.org:5533907</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield>
    <subfield code="0">(orcid)0000-0002-6812-0979</subfield>
    <subfield code="a">Giovanni Spitale</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The purpose of this tool is performing NLP analysis on Telegram chats. Telegram chats can be exported as .json files from the official client, Telegram Desktop (v. 2.9.2.0).&amp;nbsp;&lt;/p&gt;

&lt;p&gt;The files are parsed, the content is used to populate a message dataframe, which is then anonymized.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The software calculates and displays the following information:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;user count (n of users, new users per day, removed users per day);&lt;/li&gt;
	&lt;li&gt;message count (n and relative frequency of messages, messages per day);&lt;/li&gt;
	&lt;li&gt;autocoded messages (anonymized message dataframe with code weights assigned to each message based on a customizable set of regex rules);&lt;/li&gt;
	&lt;li&gt;prevalence of codes (n and relative frequency);&lt;/li&gt;
	&lt;li&gt;prevalence of lemmas&amp;nbsp;(n and relative frequency);&lt;/li&gt;
	&lt;li&gt;prevalence of lemmas segmented by autocode (n and relative frequency);&lt;/li&gt;
	&lt;li&gt;mean sentiment per day;&lt;/li&gt;
	&lt;li&gt;mean sentiment&amp;nbsp;segmented by autocode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The software outputs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;messages_df_anon.csv - an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender, and the text;&lt;/li&gt;
	&lt;li&gt;usercount_df.csv - user count dataframe;&lt;/li&gt;
	&lt;li&gt;user_activity_df.csv - user activity dataframe;&lt;/li&gt;
	&lt;li&gt;messagecount_df.csv - message count dataframe;&lt;/li&gt;
	&lt;li&gt;messages_df_anon_coded.csv -&amp;nbsp;an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender,&amp;nbsp;the text, the codes, and the sentiment;&lt;/li&gt;
	&lt;li&gt;autocode_freq_df.csv - general prevalence of codes;&lt;/li&gt;
	&lt;li&gt;lemma_df.csv - lemma frequency;&lt;/li&gt;
	&lt;li&gt;autocode_freq_df_[rule_name].csv - lemma frequency in coded messages, one file per rule;&lt;/li&gt;
	&lt;li&gt;daily_sentiment_df.csv - daily sentiment;&lt;/li&gt;
	&lt;li&gt;sentiment_by_code_df.csv - sentiment segmented by code;&lt;/li&gt;
	&lt;li&gt;messages_anon.txt - anonymized text file generated from the message data frame, for easy import in other software for text mining or qualitative analysis;&lt;/li&gt;
	&lt;li&gt;messages_anon_MaxQDA.txt - anonymized text file generated from the message data frame, formatted specifically for MaxQDA (to track speakers and codes).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dependencies:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;pandas (1.2.1)&lt;/li&gt;
	&lt;li&gt;json&lt;/li&gt;
	&lt;li&gt;random&lt;/li&gt;
	&lt;li&gt;os&lt;/li&gt;
	&lt;li&gt;re&lt;/li&gt;
	&lt;li&gt;tqdm (4.62.2)&lt;/li&gt;
	&lt;li&gt;datetime (4.3)&lt;/li&gt;
	&lt;li&gt;matplotlib (3.4.3)&lt;/li&gt;
	&lt;li&gt;Spacy (3.1.2) + it_core_news_md&lt;/li&gt;
	&lt;li&gt;wordcloud (1.8.1)&lt;/li&gt;
	&lt;li&gt;Counter&lt;/li&gt;
	&lt;li&gt;feel_it (1.0.3)&lt;/li&gt;
	&lt;li&gt;torch (1.9.0)&lt;/li&gt;
	&lt;li&gt;numpy (1.21.1)&lt;/li&gt;
	&lt;li&gt;transformers (4.3.3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This code is optimized for Italian.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Lemma analysis is based on spaCy, which provides several other models for other languages (&amp;nbsp;&lt;a href="https://spacy.io/models"&gt;https://spacy.io/models&lt;/a&gt;&amp;nbsp;) so it can easily be adapted.&lt;/p&gt;

&lt;p&gt;Sentiment analysis is performed using &lt;a href="https://github.com/MilaNLProc/feel-it"&gt;FEEL-IT: Emotion and Sentiment Classification for the Italian Language&lt;/a&gt;&amp;nbsp;(Kudos to Federico Bianchi &amp;lt;f.bianchi@unibocconi.it&amp;gt;; Debora Nozza &amp;lt;debora.nozza@unibocconi.it&amp;gt;; and Dirk Hovy &amp;lt;dirk.hovy@unibocconi.it&amp;gt;). Their work is specific for Italian. To perform sentiment analysis in other languages one could consider nltk.sentiment&lt;/p&gt;

&lt;p&gt;The code is structured in a Jupyter-lab notebook, heavily commented for future reference.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.5533906</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.5533907</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
  </datafield>
</record>
290
17
views
downloads
All versions This version
Views 29015
Downloads 172
Data volume 63.4 MB11.8 MB
Unique views 26515
Unique downloads 152

Share

Cite as