A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]
Description
The LGBTQI+ Dataset 2020-2022_es is a collection of 410,015 original tweets extracted from the social network Twitter between January 1, 2020, and December 31, 2022. To ensure data quality and relevance, retweets, replies, and other duplicate content were excluded, retaining only original tweets. The tweets were collected by Jacinto Mata (University of Huelva, I2C/CITES) with the support of the Python programming language and using the twarc2 tool and the Academic API v2 of Twitter. Tbis data collection is part of the project “Conspiracy Theories and Hate Speech Online: Comparison of patterns in narratives and social networks about COVID-19, immigrants and refugees and LGBTI people [NON-CONSPIRA-HATE!]”, PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ by FEDER/EU.
The search criteria (words and hashtags) used for the data collection followed the objectives of the aforementioned project and were defined by Estrella Gualda, Francisco Javier Santos Fernández and Jacinto Mata (University of Huelva, Spain). Terms and hashtags used for the search and extraction of tweets were: #orgullogay, #orgullotrans, #OrgulloLGTB, #OrgulloLGTBI, #Díadelorgullo, #TRANSFOBIA, #transexuales, #LGTB, #LGTBI, #LGTBIQ, #LGTBQ, #LGTBQ+, anti-gay, "anti gay", anti-trans, "anti trans", "Ley Anti-LGTB", "ley trans", "anti-ley trans".
This dataset collected in the frame of the NON-CONSPIRA-HATE! project had the aim of identifying and mapping online hate speech narratives and conspiracy theories towards LGBTIQ+ people and community. Additionally, the dataset is intended to compare communication patterns in social media (rhetoric, language, micro-discourses, semantic networks, emotions, etc.) deployed in different datasets collected in this project. This dataset also contributes to mapping the actors, communities, and networks that spread hate messages and conspiracy theories, aiming to understand the patterns and strategies implemented by extremist sectors on social media. he dataset includes messages that address a wide range of topics related to the LGBTQI+ community, such as rights, visibility, the fight against discrimination and transphobia, as well as debates surrounding the Trans Law and other related issues. It includes expressions of support and celebration of Pride as well as hate speech and opposition to LGBTQI+ rights, along with debates and controversies surrounding these issues.
This dataset offers a wide range of possibilities for research in various disciplines, as the following examples express:
Social Sciences & Digital Humanities:
- Analysis of opinions, attitudes, and trends toward the LGBTIQ+ people and community.
- Studies on the evolution of public discourse and polarization around issues such as transphobia, hate speech, disinformation, LGBTIQ+ rights and pride, and others.
- Analysis on social and political actors, leaders or organizations disseminating diverse narratives on LGBTIQ+
- Research on the impact of specific events (e.g., Pride Day) on social media conversations.
- Investigations on social and semantic networks around LGBTIQ+ people and community.
- Analysis of narratives, discourses and rethoric around gender identity and sexual diversity.
- Comparative studies on the representation of the LGBTIQ+ people and community in different cultural or geographic contexts.
Computer Science and Artificial Intelligence:
- Development of algorithms for the automatic detection of hate speech, discriminatory language, or offensive content.
- Training natural language processing (NLP) models to analyze sentiments and emotions in texts related to the LGBTIQ+ people and community.
For more information on other technical details of the dataset and the structure of the .jsonl data, see the “Readme.txt” file.
Files
Additional details
Related works
- Is cited by
- Journal article: https://www.cussoc.it/journal/article/view/334 (URL)
Funding
- Agencia Estatal de Investigación
- Conspiracy Theories and Online Hate Speech: Comparison of patterns in narratives and social networks about COVID 19, immigrants, refugees and LGBTI people [NON CONSPIRA HATE!] PID 2021-123983OB-I00
- Universidad de Huelva
- Ayudas a Grupos y Centros de Investigación EPIT-2024-2025
Dates
- Available
-
2025-02-16Dataset