Corpus of Government Communications to the Taiwanese LGBTQIA+ Community (Traditional Characters)
Description
This corpus was compiled by gathering texts addressing the LGBTQIA+ community published on the official website of the Taiwanese government from the 12th of November, 2009 to the 6th of February, 2024. It comprises 211,644 tokens, 175,725 words, and 5,382 sentences. Additionally, it encompasses 26,924 lemmas and 23,325 unique word forms (including non-words). The uploaded file contains both a plain text version (without POS tags or lemmas, but retaining all structures and structural attributes) and a vertical file (presenting the corpus in vertical format, including POS tags, lemmas, structures, and attributes).
POS tagging and lemmatization were executed using the Sketch Engine platform (http://www.sketchengine.eu; Kilgarriff et al. 2004, Kilgarriff et al. 2014).
Files
Additional details
References
- Adam Kilgarriff, Pavel Rychlý, Pavel Smrž, David Tugwell. The Sketch Engine. Proceedings of the 11th EURALEX International Congress: 105-116, 2004.
- Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, Vít Suchomel. The Sketch Engine: ten years on. Lexicography, 1: 7-36, 2014.