Published May 20, 2021 | Version v3
Dataset Open

SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection

  • 1. Queen Mary University of London
  • 2. Oxford Brookes University

Description

Our repository presents the Sina Weibo Sexism Review (SWSR) dataset containing sexism-related posts in Chinese collected from Sina Weibo, as well as the Chinese lexicon SexHateLex

SWSR dataset consists of two files:  SexWeibo.csv and SexComment.csv, and SexHateLex lexicon contains a list of 3016 abusive terms in the file SexHateLex.txt. 

Our work has been published in the Journal of Online Social Networks and Media. If you are interested in this dataset, please cite: 

Aiqi Jiang, Xiaohan Yang, Yang Liu, Arkaitz Zubiaga, SWSR: A Chinese dataset and lexicon for online sexism detection, Online Social Networks and Media, Volume 27, 2022, 100182, ISSN 2468-6964, https://doi.org/10.1016/j.osnem.2021.100182.

If you have any queries or suggestions about our work, please contact us via a.jiang@qmul.ac.uk. We also welcome any ideas or cooperation related to Chinese sexist speech.

Files

README.md

Files (3.7 MB)

Name Size Download all
md5:30c4ad7b93a2d4cb18134199141b5cb1
2.2 kB Preview Download
md5:dc057099da4455b4036b72fe9b758d91
2.3 MB Preview Download
md5:641671e316ae413012ca5c96a0ff3c48
27.2 kB Preview Download
md5:eb32a276afde64335ec92341079a102e
1.3 MB Preview Download

Additional details

References

  • A. Jiang, X. Yang, Y. Liu and A. Zubiaga (2021). SWSR: A Chinese Dataset and Lexicon for Sexist Hate Speech Detection. Under review.