Gender, language, and society: word embeddings as a reflection of social inequalities in linguistic corpora
- 1. University of Ljubljana, Ljubljana, Slovenia
- 2. Queen Mary University of London, London, UK
- 3. University of Helsinki, Helsinki, Finland
- 4. Jožef Stefan Institute
Description
Research on language and gender has a long tradition, and large electronic text corpora and novel computational methods for representing word meaning have recently opened new directions. We explain how gender can be analysed using word embeddings: vector representations of words computationally derived from lexical context in large corpora and capturing a degree of semantics. Being derived from naturally-occurring text, these also capture human biases, stereotypes and reflect social inequalities. The relation between the English words man and programmer can correspond to that between woman and homemaker. In Slovene, the availability of male and female forms for many words for occupations means that such effects might be reduced; however, we study a range of such relations and show that some gender bias still persists (e.g. the relation between words woman and secretary is very similar to that between man and boss).
Files
supej-et-al19sss.pdf
Files
(2.0 MB)
Name | Size | Download all |
---|---|---|
md5:3b5075d3b356212d53d74995200f6dab
|
2.0 MB | Preview Download |