Published March 23, 2025 | Version v5
Dataset Restricted

A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]

  • 1. Universidad de Huelva - Escuela Técnica Superior de Ingeniería
  • 2. ROR icon Universidad de Huelva

Description

The LGBTQI+ Dataset 2020-2022_es is a collection of  410,015 original tweets extracted from the social network Twitter between January 1, 2020, and December 31, 2022. To ensure data quality and relevance, retweets, replies, and other duplicate content were excluded, retaining only original tweets. The tweets were collected by Jacinto Mata (University of Huelva, I2C/CITES) with the support of the Python programming language and using the twarc2 tool and the Academic API v2 of Twitter. Tbis data collection is part of the project “Conspiracy Theories and Hate Speech Online: Comparison of patterns in narratives and social networks about COVID-19, immigrants and refugees and LGBTI people [NON-CONSPIRA-HATE!]”, PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ by FEDER/EU.

The search criteria (words and hashtags) used for the data collection followed the objectives of the aforementioned project and were defined by Estrella Gualda, Francisco Javier Santos Fernández and Jacinto Mata (University of Huelva, Spain). Terms and hashtags used for the search and extraction of tweets were: #orgullogay, #orgullotrans, #OrgulloLGTB, #OrgulloLGTBI, #Díadelorgullo, #TRANSFOBIA, #transexuales, #LGTB, #LGTBI, #LGTBIQ, #LGTBQ, #LGTBQ+, anti-gay, "anti gay", anti-trans, "anti trans", "Ley Anti-LGTB", "ley trans", "anti-ley trans".

This dataset collected in the frame of the NON-CONSPIRA-HATE! project had the aim of identifying and mapping online hate speech narratives and conspiracy theories towards LGBTIQ+ people and community. Additionally, the dataset is intended to compare communication patterns in social media (rhetoric, language, micro-discourses, semantic networks, emotions, etc.) deployed in different datasets collected in this project. This dataset also contributes to mapping the actors, communities, and networks that spread hate messages and conspiracy theories, aiming to understand the patterns and strategies implemented by extremist sectors on social media. he dataset includes messages that address a wide range of topics related to the LGBTQI+ community, such as rights, visibility, the fight against discrimination and transphobia, as well as debates surrounding the Trans Law and other related issues. It includes expressions of support and celebration of Pride as well as hate speech and opposition to LGBTQI+ rights, along with debates and controversies surrounding these issues.

This dataset offers a wide range of possibilities for research in various disciplines, as the following examples express:

Social Sciences &  Digital Humanities:
- Analysis of opinions, attitudes, and trends toward the LGBTIQ+ people and community.
- Studies on the evolution of public discourse and polarization around issues such as transphobia, hate speech, disinformation, LGBTIQ+ rights and pride, and others.
- Analysis on social and political actors, leaders or organizations disseminating diverse narratives on LGBTIQ+  
- Research on the impact of specific events (e.g., Pride Day) on social media conversations.
- Investigations on social and semantic networks around LGBTIQ+ people and community.
- Analysis of narratives, discourses and rethoric around gender identity and sexual diversity.
- Comparative studies on the representation of the LGBTIQ+ people and community in different cultural or geographic contexts.

Computer Science and Artificial Intelligence:
- Development of algorithms for the automatic detection of hate speech, discriminatory language, or offensive content.
- Training natural language processing (NLP) models to analyze sentiments and emotions in texts related to the LGBTIQ+ people and community.

For more information on other technical details of the dataset and the structure of the .jsonl data, see the “Readme.txt” file.

Technical info

This README.txt file was generated on <20250214> by <ESTRELLA GUALDA CABALLERO (ESEIS/COIDESO/CISCOA-Lab), and  JACINTO MATA (I2C/CITES), Universidad de Huelva, Spain>

------------------
GENERAL INFORMATION
-------------------

Title of Dataset: <A dataset of Spanish tweets on people and communities LGBTQI+ during the pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]>

Author Information <JACINTO MATA, Universidad de Huelva, I2C/CITES, Escuela Técnica Superior de Ingeniería, Campus El Carmen, Avda. de las Fuerzas Armadas, s/n. - 21007 Huelva, Spain, mata@uhu.es, https://orcid.org/0000-0001-5329-9622>

Principal Investigator Project PID2021-123983OB-I00 [NON-CONSPIRA-HATE!]: Estrella Gualda, Universidad de Huelva, ESEIS/COIDESO/CISCOA-Lab, Facultad de Trabajo Social, Avda. Tres de Marzo, s/n, 21007-Huelva, estrella@uhu.es, ORCID: http://orcid.org/0000-0003-0220-2135>

Date and description of data collection: <The "LGBTQI+ Dataset 2020-2022_es" is a collection of original tweets extracted from the social network Twitter between January 1, 2020, and December 31, 2022, focusing on topics related to the LGBTIQ+ people and community. To ensure data quality and relevance, retweets, replies, and other duplicate content were excluded, retaining only original tweets.

The dataset includes messages that address a wide range of topics related to the LGBTQI+ community, such as rights, visibility, the fight against discrimination and transphobia, as well as debates surrounding the Trans Law and other related issues. It includes expressions of support and celebration of Pride as well as hate speech and opposition to LGBTQI+ rights, along with debates and controversies surrounding these issues.

This dataset offers a wide range of possibilities for research in various disciplines, as the following examples express:

Social Sciences & Digital Humanities:
- Analysis of opinions, attitudes, and trends toward the LGBTIQ+ people and community.
- Studies on the evolution of public discourse and polarization around issues such as transphobia, hate speech, disinformation, LGBTIQ+ rights and pride, and others.
- Analysis on social and political actors, leaders or organizations disseminating diverse narratives on LGBTIQ+
- Research on the impact of specific events (e.g., Pride Day) on social media conversations.
- Investigations on social and semantic networks around LGBTIQ+ people and community.
- Analysis of narratives, discourses and rethoric around gender identity and sexual diversity.
- Comparative studies on the representation of the LGBTIQ+ people and community in different cultural or geographic contexts.

Computer Science and Artificial Intelligence:
- Development of algorithms for the automatic detection of hate speech, discriminatory language, or offensive content.
- Training natural language processing (NLP) models to analyze sentiments and emotions in texts related to the LGBTIQ+ people and community.

Geographic location of data collection: <Data were collected at the University of Huelva, coordinates latitude: 37.27066375, longitude: -6.9235446418407>

Funding: <Data were collected in the context of the I+D+i Project titled "Conspiracy Theories and Online Hate Speech: Comparison of Patterns in Narratives and Social Networks about COVID-19, Immigrants, Refugees, and LGBTIQ+ People [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU". Principal Investigator: Estrella Gualda. University of Huelva, ESEIS/COIDESO/CISCOA-Lab, Spain. We are also grateful for the support of our research group: "Estudios Sociales e Intervención Social" (GrupoESEIS), the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva>

General description: <The "LGBTQI+ Dataset 2020-2022_es" is one of the datasets collected in the NON-CONSPIRA-HATE project with the aim of identifying and mapping online hate speech narratives and conspiracy theories towards LGBTIQ+. Additionally, the dataset is intended to compare communication patterns in social media (rhetoric, language, micro-discourses, semantic networks, emotions, etc.) deployed in different datasets collected in this project. This dataset also contributes to mapping the actors, communities, and networks that spread hate messages and conspiracy theories, aiming to understand the patterns and strategies implemented by extremist sectors on social media>

Keywords: <LGBTIQ+ (Lesbian, Gay, Bisexual, Transgender, Queer/Questioning, Intersex, and others), Transphobia, Trans law, Anti-trans law, Anti-gay, Anti-trans, Anti-LGBTIQ+ law, online hate speech, conspiracy theories, Twitter, Computational Social Science, Computational Sociology>

--------------------------
SHARING/ACCESS INFORMATION
--------------------------

Data availability: <This dataset has �Restricted Access. Nevertheless, information is available upon reasonable request from the authors. Researchers interested in accessing the "LGBTQI+ Dataset 2020-2022_es" must complete a request form and send it through Zenodo or directed to: nonconspirahate@uhu.es [Subject: LGBTQI+ DATASET 2020-2022_es].

This form includes ethical commitments concerned with Twitter data, and the obligation to properly cite the data source. Access to the data is subject to the approval of the request and compliance with Twitter's data protection and ethical guidelines>

Data Request Form: <Include the following information to have access to the dataset:
Applicant's Name:
Institution:
Email:
Purpose of the Research & Exploitation of Data:
Declare Ethical Commitments:
1. I commit to using the data solely for the purposes specified in this request.
2. I commit not to share the data with third parties without prior consent from the research team.
3. I commit to complying with all data protection and ethical guidelines of Twitter.
4. I commit to properly citing the data source in all publications and presentations derived from its use.

Recommended Citation:
Mata, J. & Gualda, E. (2025). A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]. 1.0. Zenodo.
https://doi.org/10.5281/zenodo.14878434

Applicant's Signature:
Date:  >

Rights and permissions: <The "LGBTQI+ Dataset 2020-2022_es" is subject to a specific usage license. Researchers interested in accessing the dataset must adhere to the terms of the chosen license. The dataset is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0>

Citation to publications that cite or use the data: <
- Santos Fernández, F.J. (2024). Online Homophobia: Hate Speech and Conspiracy Theories towards LGBTQI+ people on Twitter in Spain. Culture e Studi del Sociale, 9(1), 39-56. https://www.cussoc.it/journal/article/view/334
- Santos Fernández, F.J., Ibáñez-Meseguer, J., Canillo, A., Gómez-Luque, S. & Gualda, E. (2024). Mapeando la diversidad de discursos de odio sobre personas LGTBIQ+ en Twitter. XV Congreso Español de Sociología. Universidad Pablo de Olavide, Sevilla, 26-29 de junio de 2024.
- Santos Fernández, F.J., Gómez-Luque, S., Canillo, A., Ibáñez-Meseguer, J. & Gualda, E. (2024). El incendiario debate sobre la �Ley trans� en España. XV Congreso Español de Sociología. Universidad Pablo de Olavide, Sevilla, 26-29 de junio de 2024.>

--------------------
DATA & FILE OVERVIEW
--------------------

File list: <The file "LGBTIQ_es_full.jsonl.zip" contains the "LGBTQI+ Dataset 2020-2022_es">

Total size: <The file size is 322.02 MB>

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

Data collection: <As part of the project "Conspiracy Theories and Hate Speech Online: Comparison of patterns in narratives and social networks about COVID-19, immigrants and refugees and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ by FEDER/EU., with the support of the Python programming language and using the twarc2 tool (Summers et al., 2023) and the Academic API v2 of Twitter, Jacinto Mata, extracted a dataset of 410,015 organic tweets in Spanish-language between 2020 and 2022 [LGBTQI+ Dataset, 2020-2022_es].

Keywords for collection data: <The words and hashtags used for the collection followed the objectives of the aforementioned project and were defined by Estrella Gualda, Francisco Javier Santos Fernández and Jacinto Mata. Terms and hashtags used for the search and extraction of tweets were:  #orgullogay, #orgullotrans, #OrgulloLGTB, #OrgulloLGTBI, #Díadelorgullo, #TRANSFOBIA, #transexuales, #LGTB, #LGTBI, #LGTBIQ, #LGTBQ, #LGTBQ+, anti-gay, "anti gay", anti-trans, "anti trans", "Ley Anti-LGTB", "ley trans", "anti-ley trans">

--------------------------
DATA-SPECIFIC INFORMATION: TWEET DATA JSON STRUCTURE AND EXAMPLE
--------------------------

[{
  "_id": {
    "$oid": "63e4c1beb82b43c83e3ce873"
  },
  "author_id": "702461646204243968",
  "possibly_sensitive": false,
  "created_at": "2022-12-31T23:58:14.000Z",
  "edit_controls": {
    "edits_remaining": 5,
    "is_edit_eligible": false,
    "editable_until": "2023-01-01T00:28:14.000Z"
  },
  "text": "@Dandastur Yo dir�a que no, antes han puesto con la orquesta gente joven (cito de memoria, algo de gracia perder�)\n\"Han ido posponiendo el cambio de nombre, como el PSOE con la ley trans\"\n\"Bueno, como el PSOE en general\"",
  "public_metrics": {
    "retweet_count": 0,
    "reply_count": 0,
    "like_count": 1,
    "quote_count": 0,
    "impression_count": 44
  },
  "in_reply_to_user_id": "259734564",
  "id": "1609338172235825153",
  "conversation_id": "1609336914603298816",
  "edit_history_tweet_ids": [
    "1609338172235825153"
  ],
  "entities": {
    "mentions": [
      {
        "start": 0,
        "end": 10,
        "username": "Dandastur",
        "id": "259734564",
        "created_at": "2011-03-02T13:56:02.000Z",
        "pinned_tweet_id": "728862357594750976",
        "entities": {
          "url": {
            "urls": [
              {
                "start": 0,
                "end": 22,
                "url": "http://t.co/WemORp7KAf",
                "expanded_url": "http://www.jugadoresdefortuna.com",
                "display_url": "jugadoresdefortuna.com"
              }
            ]
          },
          "description": {
            "urls": [
              {
                "start": 112,
                "end": 135,
                "url": "https://t.co/c7TFPHBAEq",
                "expanded_url": "https://play.spotify.com/album/7ops7tph6jMMwsjVejFxbw",
                "display_url": "play.spotify.com/album/7ops7tph�"
              }
            ]
          }
        },
        "location": "Entre Frankfurt y Asturias",
        "protected": false,
        "name": "Danda",
        "profile_image_url": "https://pbs.twimg.com/profile_images/997205486272311296/LkS3sRzU_normal.jpg",
        "description": "Traductor de series y videojuegos. Izquierdista pesado. En internet desde 1996. Fui bater�a de rock progresivo: https://t.co/c7TFPHBAEq Aviso: RETUITEO MUCHO",
        "url": "http://t.co/WemORp7KAf",
        "verified": false,
        "public_metrics": {
          "followers_count": 2782,
          "following_count": 922,
          "tweet_count": 669895,
          "listed_count": 156
        }
      }
    ],
    "annotations": [
      {
        "start": 165,
        "end": 168,
        "probability": 0.9728,
        "type": "Organization",
        "normalized_text": "PSOE"
      },
      {
        "start": 204,
        "end": 207,
        "probability": 0.9782,
        "type": "Organization",
        "normalized_text": "PSOE"
      }
    ]
  },
  "context_annotations": [
    {
      "domain": {
        "id": "131",
        "name": "Unified Twitter Taxonomy",
        "description": "A taxonomy of user interests. "
      },
      "entity": {
        "id": "847878884917886977",
        "name": "Politics",
        "description": "Politics"
      }
    },
    {
      "domain": {
        "id": "131",
        "name": "Unified Twitter Taxonomy",
        "description": "A taxonomy of user interests. "
      },
      "entity": {
        "id": "1516798152279425028",
        "name": "Spain politics"
      }
    }
  ],
  "lang": "es",
  "referenced_tweets": [
    {
      "type": "replied_to",
      "id": "1609336914603298816",
      "author_id": "259734564",
      "possibly_sensitive": false,
      "created_at": "2022-12-31T23:53:15.000Z",
      "edit_controls": {
        "edits_remaining": 5,
        "is_edit_eligible": true,
        "editable_until": "2023-01-01T00:23:15.000Z"
      },
      "text": "Por favor, que alguien me confirme que la TVE facha actual NO le ha pasado la censura y el filtro fachificador a los chistes de este a�o, veo muchos datos y pocos chistes. Me encantaban las hostias que le repart�an a Rivera antes de que se quitase de en medio. https://t.co/4TllwALWqb",
      "public_metrics": {
        "retweet_count": 0,
        "reply_count": 2,
        "like_count": 0,
        "quote_count": 0,
        "impression_count": 308
      },
      "conversation_id": "1609336914603298816",
      "edit_history_tweet_ids": [
        "1609336914603298816"
      ],
      "entities": {
        "urls": [
          {
            "start": 261,
            "end": 284,
            "url": "https://t.co/4TllwALWqb",
            "expanded_url": "https://twitter.com/c_3peor/status/1609333955156639744",
            "display_url": "twitter.com/c_3peor/status�"
          }
        ],
        "annotations": [
          {
            "start": 42,
            "end": 44,
            "probability": 0.519,
            "type": "Organization",
            "normalized_text": "TVE"
          },
          {
            "start": 217,
            "end": 222,
            "probability": 0.853,
            "type": "Person",
            "normalized_text": "Rivera"
          }
        ]
      },
      "lang": "es",
      "referenced_tweets": [
        {
          "type": "quoted",
          "id": "1609333955156639744"
        }
      ],
      "reply_settings": "everyone",
      "author": {
        "created_at": "2011-03-02T13:56:02.000Z",
        "id": "259734564",
        "pinned_tweet_id": "728862357594750976",
        "entities": {
          "url": {
            "urls": [
              {
                "start": 0,
                "end": 22,
                "url": "http://t.co/WemORp7KAf",
                "expanded_url": "http://www.jugadoresdefortuna.com",
                "display_url": "jugadoresdefortuna.com"
              }
            ]
          },
          "description": {
            "urls": [
              {
                "start": 112,
                "end": 135,
                "url": "https://t.co/c7TFPHBAEq",
                "expanded_url": "https://play.spotify.com/album/7ops7tph6jMMwsjVejFxbw",
                "display_url": "play.spotify.com/album/7ops7tph�"
              }
            ]
          }
        },
        "location": "Entre Frankfurt y Asturias",
        "protected": false,
        "username": "Dandastur",
        "name": "Danda",
        "profile_image_url": "https://pbs.twimg.com/profile_images/997205486272311296/LkS3sRzU_normal.jpg",
        "description": "Traductor de series y videojuegos. Izquierdista pesado. En internet desde 1996. Fui bater�a de rock progresivo: https://t.co/c7TFPHBAEq Aviso: RETUITEO MUCHO",
        "url": "http://t.co/WemORp7KAf",
        "verified": false,
        "public_metrics": {
          "followers_count": 2782,
          "following_count": 922,
          "tweet_count": 669895,
          "listed_count": 156
        }
      }
    }
  ],
  "reply_settings": "everyone",
  "author": {
    "created_at": "2016-02-24T11:54:42.000Z",
    "id": "702461646204243968",
    "protected": false,
    "username": "echidnamoroso",
    "name": "Echidnamoroso",
    "profile_image_url": "https://pbs.twimg.com/profile_images/747020599445295104/y9Z5F1wk_normal.jpg",
    "description": "",
    "verified": false,
    "public_metrics": {
      "followers_count": 42,
      "following_count": 248,
      "tweet_count": 2170,
      "listed_count": 0
    }
  },
  "in_reply_to_user": {
    "created_at": "2011-03-02T13:56:02.000Z",
    "id": "259734564",
    "pinned_tweet_id": "728862357594750976",
    "entities": {
      "url": {
        "urls": [
          {
            "start": 0,
            "end": 22,
            "url": "http://t.co/WemORp7KAf",
            "expanded_url": "http://www.jugadoresdefortuna.com",
            "display_url": "jugadoresdefortuna.com"
          }
        ]
      },
      "description": {
        "urls": [
          {
            "start": 112,
            "end": 135,
            "url": "https://t.co/c7TFPHBAEq",
            "expanded_url": "https://play.spotify.com/album/7ops7tph6jMMwsjVejFxbw",
            "display_url": "play.spotify.com/album/7ops7tph�"
          }
        ]
      }
    },
    "location": "Entre Frankfurt y Asturias",
    "protected": false,
    "username": "Dandastur",
    "name": "Danda",
    "profile_image_url": "https://pbs.twimg.com/profile_images/997205486272311296/LkS3sRzU_normal.jpg",
    "description": "Traductor de series y videojuegos. Izquierdista pesado. En internet desde 1996. Fui bater�a de rock progresivo: https://t.co/c7TFPHBAEq Aviso: RETUITEO MUCHO",
    "url": "http://t.co/WemORp7KAf",
    "verified": false,
    "public_metrics": {
      "followers_count": 2782,
      "following_count": 922,
      "tweet_count": 669895,
      "listed_count": 156
    }
  }
}]

 

 

<REFERENCES>

Ed Summers, Igor Brigadir, Sam Hames, Hugo van Kemenade, Peter Binkley, tinafigueroa, Nick Ruest, Walmir, Dan Chudnov, David Thiel, Betsy, Ryan Chartier, celeste, Hause Lin, Alice, Andy Chosak, Mirko Lenz, R. Miles McCain, Ian Milligan, Andreas Segerberg, Daniyal Shahrokhian, Melanie Walsh, Leonard Lausen, Nicholas Woodward, eggplants, Ashwin Ramaswami, Boyd Nguyen, Dar�o Here��, Dmitrijs Milajevs, and Frederik Elwert (2023). Docnow/twarc: v2.14.0. Zenodo [Computer Software]. https://doi.org/10.5281/zenodo.7799050
More information on twarc in:
- Twarc2. https://twarc-project.readthedocs.io/en/latest/twarc2_en_us/
- GitHub. https://github.com/DocNow/twarc

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

Data availability: <This dataset has “Restricted Access. Nevertheless, information is available upon reasonable request from the authors. Researchers interested in accessing the "LGBTQI+ Dataset 2020-2022_es” must complete a request form and send it through Zenodo or directed to: nonconspirahate@uhu.es [Subject: LGBTQI+ DATASET 2020-2022_es].
 
This form includes ethical commitments concerned with Twitter data, and the obligation to properly cite the data source. Access to the data is subject to the approval of the request and compliance with Twitter's data protection and ethical guidelines>

Data Request Form: <Include the following information to have access to the dataset:
Applicant's Name: 
Institution:
Email: 
Purpose of the Research & Exploitation of Data:
Declare Ethical Commitments:
1. I commit to using the data solely for the purposes specified in this request.
2. I commit not to share the data with third parties without prior consent from the research team.
3. I commit to complying with all data protection and ethical guidelines of Twitter.
4. I commit to properly citing the data source in all publications and presentations derived from its use.

Recommended Citation: 
Mata, J. & Gualda, E. (2025). A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es]. 1.0. Zenodo. 
https://doi.org/10.5281/zenodo.14878434

Applicant's Signature: 
Date:  >

Please, send the request as specified above, and also via Zenodo sending your e-mail in the "Restricted Access" area, in order to be authorised after receiving the Form.

 

 

You are currently not logged in. Do you have an account? Log in here

Additional details

Related works

Is cited by
Journal article: https://www.cussoc.it/journal/article/view/334 (URL)

Funding

Agencia Estatal de Investigación
Conspiracy Theories and Online Hate Speech: Comparison of patterns in narratives and social networks about COVID 19, immigrants, refugees and LGBTI people [NON CONSPIRA HATE!] PID 2021-123983OB-I00
Universidad de Huelva
Ayudas a Grupos y Centros de Investigación EPIT-2024-2025

Dates

Available
2025-02-16
Dataset