Published June 12, 2024 | Version v1
Dataset Restricted

Corpus of LGBTQIA+ Advocacy to the Taiwanese Government (Traditional Characters)

Authors/Creators

  • 1. ROR icon University of Verona

Description

This corpus was compiled by gathering texts published on the website https://hotline.org.tw/ from the 13th of March, 2011 to the 21st of February, 2024. It comprises 111,126 tokens, 90,273 words, and 4,370 sentences. Additionally, it encompasses 15,567 lemmas and 13,559 unique word forms (including non-words). The uploaded file contains both a plain text version (without POS tags or lemmas, but retaining all structures and structural attributes) and a vertical file (presenting the corpus in vertical format, including POS tags, lemmas, structures, and attributes).

POS tagging and lemmatization were executed using the Sketch Engine platform (http://www.sketchengine.eu; Kilgarriff et al. 2004, Kilgarriff et al. 2014). 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

References

  • Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: Ten Years On." Lexicography, 1: 7-36, 2014.
  • Adam Kilgarriff, Pavel Rychlý, Pavel Smrž, and David Tugwell. "The Sketch Engine." Proceedings of the 11th EURALEX International Congress, 105-116, 2004.