Software Open Access
Giovanni Spitale;
Federico Germani;
Nikola Biller - Andorno
{ "files": [ { "links": { "self": "https://zenodo.org/api/files/124e0d12-5fb6-42c9-8a50-10858f0fd990/telegram%20social%20listening%20v1.0.1.zip" }, "checksum": "md5:2b624215bed80522138e544cd3bbf9ac", "bucket": "124e0d12-5fb6-42c9-8a50-10858f0fd990", "key": "telegram social listening v1.0.1.zip", "type": "zip", "size": 3438679 } ], "owners": [ 45242 ], "doi": "10.5281/zenodo.5534045", "stats": { "version_unique_downloads": 15.0, "unique_views": 258.0, "views": 275.0, "version_views": 290.0, "unique_downloads": 15.0, "version_unique_views": 265.0, "volume": 51580185.0, "version_downloads": 17.0, "downloads": 15.0, "version_volume": 63386753.0 }, "links": { "doi": "https://doi.org/10.5281/zenodo.5534045", "conceptdoi": "https://doi.org/10.5281/zenodo.5533906", "bucket": "https://zenodo.org/api/files/124e0d12-5fb6-42c9-8a50-10858f0fd990", "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.5533906.svg", "html": "https://zenodo.org/record/5534045", "latest_html": "https://zenodo.org/record/5534045", "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.5534045.svg", "latest": "https://zenodo.org/api/records/5534045" }, "conceptdoi": "10.5281/zenodo.5533906", "created": "2021-09-28T09:53:31.053297+00:00", "updated": "2021-09-28T13:48:29.993515+00:00", "conceptrecid": "5533906", "revision": 3, "id": 5534045, "metadata": { "access_right_category": "success", "doi": "10.5281/zenodo.5534045", "description": "<p>The purpose of this tool is performing NLP analysis on Telegram chats. Telegram chats can be exported as .json files from the official client, Telegram Desktop (v. 2.9.2.0). </p>\n\n<p>The files are parsed, the content is used to populate a message dataframe, which is then anonymized. </p>\n\n<p><strong>The software calculates and displays the following information:</strong></p>\n\n<ul>\n\t<li>user count (n of users, new users per day, removed users per day);</li>\n\t<li>message count (n and relative frequency of messages, messages per day);</li>\n\t<li>autocoded messages (anonymized message dataframe with code weights assigned to each message based on a customizable set of regex rules);</li>\n\t<li>prevalence of codes (n and relative frequency);</li>\n\t<li>prevalence of lemmas (n and relative frequency);</li>\n\t<li>prevalence of lemmas segmented by autocode (n and relative frequency);</li>\n\t<li>mean sentiment per day;</li>\n\t<li>mean sentiment segmented by autocode.</li>\n</ul>\n\n<p><strong>The software outputs:</strong></p>\n\n<ul>\n\t<li>messages_df_anon.csv - an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender, and the text;</li>\n\t<li>usercount_df.csv - user count dataframe;</li>\n\t<li>user_activity_df.csv - user activity dataframe;</li>\n\t<li>messagecount_df.csv - message count dataframe;</li>\n\t<li>messages_df_anon_coded.csv - an anonymized file containing the progressive id of the message, the date, the univocal pseudonym of the sender, the text, the codes, and the sentiment;</li>\n\t<li>autocode_freq_df.csv - general prevalence of codes;</li>\n\t<li>lemma_df.csv - lemma frequency;</li>\n\t<li>autocode_freq_df_[rule_name].csv - lemma frequency in coded messages, one file per rule;</li>\n\t<li>daily_sentiment_df.csv - daily sentiment;</li>\n\t<li>sentiment_by_code_df.csv - sentiment segmented by code;</li>\n\t<li>messages_anon.txt - anonymized text file generated from the message data frame, for easy import in other software for text mining or qualitative analysis;</li>\n\t<li>messages_anon_MaxQDA.txt - anonymized text file generated from the message data frame, formatted specifically for MaxQDA (to track speakers and codes).</li>\n</ul>\n\n<p><strong>Dependencies:</strong></p>\n\n<ul>\n\t<li>pandas (1.2.1)</li>\n\t<li>json</li>\n\t<li>random</li>\n\t<li>os</li>\n\t<li>re</li>\n\t<li>tqdm (4.62.2)</li>\n\t<li>datetime (4.3)</li>\n\t<li>matplotlib (3.4.3)</li>\n\t<li>Spacy (3.1.2) + it_core_news_md</li>\n\t<li>wordcloud (1.8.1)</li>\n\t<li>Counter</li>\n\t<li>feel_it (1.0.3)</li>\n\t<li>torch (1.9.0)</li>\n\t<li>numpy (1.21.1)</li>\n\t<li>transformers (4.3.3)</li>\n</ul>\n\n<p><strong>This code is optimized for Italian, however:</strong></p>\n\n<ul>\n\t<li>Lemma analysis is based on spaCy, which provides several other models for other languages ( <a href=\"https://spacy.io/models\">https://spacy.io/models</a> ) so it can easily be adapted.</li>\n\t<li>Sentiment analysis is performed using <a href=\"https://github.com/MilaNLProc/feel-it\">FEEL-IT: Emotion and Sentiment Classification for the Italian Language</a> (Kudos to Federico Bianchi <f.bianchi@unibocconi.it>; Debora Nozza <debora.nozza@unibocconi.it>; and Dirk Hovy <dirk.hovy@unibocconi.it>). Their work is specific for Italian. To perform sentiment analysis in other languages one could consider nltk.sentiment</li>\n</ul>\n\n<p>The code is structured in a Jupyter-lab notebook, heavily commented for future reference.</p>\n\n<p>The software comes with a toy dataset comprised of Wikiquotes copy-pasted in a chat created by the research group. Have fun exploring it.</p>", "language": "eng", "title": "The TSL machine: parser, lemma analysis, sentiment analysis and autocoding for Telegram chats", "license": { "id": "CC-BY-4.0" }, "relations": { "version": [ { "count": 2, "index": 1, "parent": { "pid_type": "recid", "pid_value": "5533906" }, "is_last": true, "last_child": { "pid_type": "recid", "pid_value": "5534045" } } ] }, "version": "1.0.1", "references": [ "Bianchi F, Nozza D, Hovy D. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. In: Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics; 2021. https://github.com/MilaNLProc/feel-it" ], "keywords": [ "natural language processing", "NLP", "telegram", "covid-19", "social listening", "green pass", "vaccine", "freedom", "ethics" ], "publication_date": "2021-09-28", "creators": [ { "orcid": "0000-0002-6812-0979", "affiliation": "University of Zurich - Institute of Biomedical Ethics and History of Medicine", "name": "Giovanni Spitale" }, { "orcid": "0000-0002-5604-0437", "affiliation": "University of Zurich - Institute of Biomedical Ethics and History of Medicine", "name": "Federico Germani" }, { "orcid": "0000-0001-7661-1324", "affiliation": "University of Zurich - Institute of Biomedical Ethics and History of Medicine", "name": "Nikola Biller - Andorno" } ], "access_right": "open", "resource_type": { "type": "software", "title": "Software" }, "related_identifiers": [ { "scheme": "doi", "identifier": "10.5281/zenodo.5533906", "relation": "isVersionOf" } ] } }
All versions | This version | |
---|---|---|
Views | 290 | 275 |
Downloads | 17 | 15 |
Data volume | 63.4 MB | 51.6 MB |
Unique views | 265 | 258 |
Unique downloads | 15 | 15 |