A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]

Mata, Jacinto; Gualda, Estrella

doi:10.5281/zenodo.15071096

Published March 23, 2025 | Version v2

Dataset Restricted

A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]

1. Universidad de Huelva - Escuela Técnica Superior de Ingeniería
2. Universidad de Huelva

The LGBTQI+ Dataset 2020-2022_es is a collection of 410,015 original tweets extracted from the social network Twitter between January 1, 2020, and December 31, 2022. To ensure data quality and relevance, retweets, replies, and other duplicate content were excluded, retaining only original tweets. The tweets were collected by Jacinto Mata (University of Huelva, I2C/CITES) with the support of the Python programming language and using the twarc2 tool and the Academic API v2 of Twitter. Tbis data collection is part of the project “Conspiracy Theories and Hate Speech Online: Comparison of patterns in narratives and social networks about COVID-19, immigrants and refugees and LGBTI people [NON-CONSPIRA-HATE!]”, PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ by FEDER/EU.

The search criteria (words and hashtags) used for the data collection followed the objectives of the aforementioned project and were defined by Estrella Gualda, Francisco Javier Santos Fernández and Jacinto Mata (University of Huelva, Spain). Terms and hashtags used for the search and extraction of tweets were: #orgullogay, #orgullotrans, #OrgulloLGTB, #OrgulloLGTBI, #Díadelorgullo, #TRANSFOBIA, #transexuales, #LGTB, #LGTBI, #LGTBIQ, #LGTBQ, #LGTBQ+, anti-gay, "anti gay", anti-trans, "anti trans", "Ley Anti-LGTB", "ley trans", "anti-ley trans".

This dataset collected in the frame of the NON-CONSPIRA-HATE! project had the aim of identifying and mapping online hate speech narratives and conspiracy theories towards LGBTIQ+ people and community. Additionally, the dataset is intended to compare communication patterns in social media (rhetoric, language, micro-discourses, semantic networks, emotions, etc.) deployed in different datasets collected in this project. This dataset also contributes to mapping the actors, communities, and networks that spread hate messages and conspiracy theories, aiming to understand the patterns and strategies implemented by extremist sectors on social media. he dataset includes messages that address a wide range of topics related to the LGBTQI+ community, such as rights, visibility, the fight against discrimination and transphobia, as well as debates surrounding the Trans Law and other related issues. It includes expressions of support and celebration of Pride as well as hate speech and opposition to LGBTQI+ rights, along with debates and controversies surrounding these issues.

This dataset offers a wide range of possibilities for research in various disciplines, as the following examples express:

Social Sciences & Digital Humanities:
- Analysis of opinions, attitudes, and trends toward the LGBTIQ+ people and community.
- Studies on the evolution of public discourse and polarization around issues such as transphobia, hate speech, disinformation, LGBTIQ+ rights and pride, and others.
- Analysis on social and political actors, leaders or organizations disseminating diverse narratives on LGBTIQ+
- Research on the impact of specific events (e.g., Pride Day) on social media conversations.
- Investigations on social and semantic networks around LGBTIQ+ people and community.
- Analysis of narratives, discourses and rethoric around gender identity and sexual diversity.
- Comparative studies on the representation of the LGBTIQ+ people and community in different cultural or geographic contexts.

Computer Science and Artificial Intelligence:
- Development of algorithms for the automatic detection of hate speech, discriminatory language, or offensive content.
- Training natural language processing (NLP) models to analyze sentiments and emotions in texts related to the LGBTIQ+ people and community.

For more information on other technical details of the dataset and the structure of the .jsonl data, see the “Readme.txt” file.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

--------------------------
SHARING/ACCESS INFORMATION
--------------------------

Data availability: <This dataset has “Restricted Access. Nevertheless, information is available upon reasonable request from the authors. Researchers interested in accessing the "LGBTQI+ Dataset 2020-2022_es” must complete a request form and send it through Zenodo or directed to: nonconspirahate@uhu.es [Subject: LGBTQI+ DATASET 2020-2022_es].

This form includes ethical commitments concerned with Twitter data, and the obligation to properly cite the data source. Access to the data is subject to the approval of the request and compliance with Twitter's data protection and ethical guidelines>

Data Request Form: <Include the following information to have access to the dataset:
Applicant's Name:
Institution:
Email:
Purpose of the Research & Exploitation of Data:
Declare Ethical Commitments:
1. I commit to using the data solely for the purposes specified in this request.
2. I commit not to share the data with third parties without prior consent from the research team.
3. I commit to complying with all data protection and ethical guidelines of Twitter.
4. I commit to properly citing the data source in all publications and presentations derived from its use.

Recommended Citation:
Mata, J. & Gualda, E. (2025). A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]. 1.0. Zenodo.
https://doi.org/10.5281/zenodo.14878434

Applicant's Signature:
Date: >

Please, send the request as specified above, and also via Zenodo sending your e-mail in the "Restricted Access" area, in order to be authorised after receiving the Form.

You are currently not logged in. Do you have an account? Log in here

Additional details

Is cited by: Journal article: https://www.cussoc.it/journal/article/view/334 (URL)

Agencia Estatal de Investigación
Conspiracy Theories and Online Hate Speech: Comparison of patterns in narratives and social networks about COVID 19, immigrants, refugees and LGBTI people [NON CONSPIRA HATE!] PID 2021-123983OB-I00
Universidad de Huelva
Ayudas a Grupos y Centros de Investigación EPIT-2024-2025

Available: 2025-02-16

Dataset

	All versions	This version
Views	278	33
Downloads	11	0
Data volume	8.1 GB	0 Bytes

A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]

Files

Restricted

Request access

Additional details

Related works

Funding

Dates

A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]

Creators

Description

Files

Restricted

Request access

Additional details

Related works

Funding

Dates