Published November 18, 2019 | Version v1
Conference paper Open

Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora

  • 1. University of Ljubljana, Ljubljana, Slovenia
  • 2. Queen Mary University of London, London, UK
  • 3. University of Helsinki, Helsinki, Finland
  • 4. Jožef Stefan Institute

Description

Research on language and gender has a long tradition, and large electronic text corpora and novel computational methods for representing word meaning have recently opened new directions. We explain how gender can be analysed using word embeddings: vector representations of words computationally derived from lexical context in large corpora and capturing a degree of semantics. Being derived from naturally-occurring text, these also capture human biases, stereotypes and reflect social inequalities. The relation between the English words man and programmer can correspond to that between woman and homemaker. In Slovene, the availability of male and female forms for many words for occupations means that such effects might be reduced; however, we study a range of such relations and show that some gender bias still persists (e.g. the relation between words woman and secretary is very similar to that between man and boss).

Files

supej-et-al19sss.pdf

Files (2.0 MB)

Name Size Download all
md5:3b5075d3b356212d53d74995200f6dab
2.0 MB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission