Software Open Access
Giovanni Spitale;
Federico Germani;
Nikola Biller - Andorno
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Bianchi F, Nozza D, Hovy D. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. In: Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics; 2021. https://github.com/MilaNLProc/feel-it</subfield> </datafield> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">natural language processing</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">NLP</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">telegram</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">covid-19</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">social listening</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">green pass</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">vaccine</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">freedom</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">ethics</subfield> </datafield> <controlfield tag="005">20210928134829.0</controlfield> <controlfield tag="001">5533907</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield> <subfield code="0">(orcid)0000-0002-5604-0437</subfield> <subfield code="a">Federico Germani</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield> <subfield code="0">(orcid)0000-0001-7661-1324</subfield> <subfield code="a">Nikola Biller - Andorno</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">5903284</subfield> <subfield code="z">md5:67bb88c8016699f875c4e486470e7bfc</subfield> <subfield code="u">https://zenodo.org/record/5533907/files/telegram social listening v1.0.0.zip</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2021-09-28</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">software</subfield> <subfield code="o">oai:zenodo.org:5533907</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">University of Zurich - Institute of Biomedical Ethics and History of Medicine</subfield> <subfield code="0">(orcid)0000-0002-6812-0979</subfield> <subfield code="a">Giovanni Spitale</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>The purpose of this tool is performing NLP analysis on Telegram chats. Telegram chats can be exported as .json files from the official client, Telegram Desktop (v. 2.9.2.0).&nbsp;</p> <p>The files are parsed, the content is used to populate a message dataframe, which is then anonymized.&nbsp;</p> <p><strong>The software calculates and displays the following information:</strong></p> <ul> <li>user count (n of users, new users per day, removed users per day);</li> <li>message count (n and relative frequency of messages, messages per day);</li> <li>autocoded messages (anonymized message dataframe with code weights assigned to each message based on a customizable set of regex rules);</li> <li>prevalence of codes (n and relative frequency);</li> <li>prevalence of lemmas&nbsp;(n and relative frequency);</li> <li>prevalence of lemmas segmented by autocode (n and relative frequency);</li> <li>mean sentiment per day;</li> <li>mean sentiment&nbsp;segmented by autocode.</li> </ul> <p><strong>The software outputs:</strong></p> <ul> <li>messages_df_anon.csv - an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender, and the text;</li> <li>usercount_df.csv - user count dataframe;</li> <li>user_activity_df.csv - user activity dataframe;</li> <li>messagecount_df.csv - message count dataframe;</li> <li>messages_df_anon_coded.csv -&nbsp;an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender,&nbsp;the text, the codes, and the sentiment;</li> <li>autocode_freq_df.csv - general prevalence of codes;</li> <li>lemma_df.csv - lemma frequency;</li> <li>autocode_freq_df_[rule_name].csv - lemma frequency in coded messages, one file per rule;</li> <li>daily_sentiment_df.csv - daily sentiment;</li> <li>sentiment_by_code_df.csv - sentiment segmented by code;</li> <li>messages_anon.txt - anonymized text file generated from the message data frame, for easy import in other software for text mining or qualitative analysis;</li> <li>messages_anon_MaxQDA.txt - anonymized text file generated from the message data frame, formatted specifically for MaxQDA (to track speakers and codes).</li> </ul> <p>Dependencies:</p> <ul> <li>pandas (1.2.1)</li> <li>json</li> <li>random</li> <li>os</li> <li>re</li> <li>tqdm (4.62.2)</li> <li>datetime (4.3)</li> <li>matplotlib (3.4.3)</li> <li>Spacy (3.1.2) + it_core_news_md</li> <li>wordcloud (1.8.1)</li> <li>Counter</li> <li>feel_it (1.0.3)</li> <li>torch (1.9.0)</li> <li>numpy (1.21.1)</li> <li>transformers (4.3.3)</li> </ul> <p>This code is optimized for Italian.&nbsp;</p> <p>Lemma analysis is based on spaCy, which provides several other models for other languages (&nbsp;<a href="https://spacy.io/models">https://spacy.io/models</a>&nbsp;) so it can easily be adapted.</p> <p>Sentiment analysis is performed using <a href="https://github.com/MilaNLProc/feel-it">FEEL-IT: Emotion and Sentiment Classification for the Italian Language</a>&nbsp;(Kudos to Federico Bianchi &lt;f.bianchi@unibocconi.it&gt;; Debora Nozza &lt;debora.nozza@unibocconi.it&gt;; and Dirk Hovy &lt;dirk.hovy@unibocconi.it&gt;). Their work is specific for Italian. To perform sentiment analysis in other languages one could consider nltk.sentiment</p> <p>The code is structured in a Jupyter-lab notebook, heavily commented for future reference.</p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.5533906</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.5533907</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">software</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 290 | 15 |
Downloads | 17 | 2 |
Data volume | 63.4 MB | 11.8 MB |
Unique views | 265 | 15 |
Unique downloads | 15 | 2 |