Published May 20, 2021 | Version v3
Dataset Open

SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection

  • 1. Queen Mary University of London
  • 2. Oxford Brookes University


Our repository presents the Sina Weibo Sexism Review (SWSR) dataset containing sexism-related posts in Chinese collected from Sina Weibo, as well as the Chinese lexicon SexHateLex

SWSR dataset consists of two files:  SexWeibo.csv and SexComment.csv, and SexHateLex lexicon contains a list of 3016 abusive terms in the file SexHateLex.txt. 

Our work has been published in the Journal of Online Social Networks and Media. If you are interested in this dataset, please cite: 

Aiqi Jiang, Xiaohan Yang, Yang Liu, Arkaitz Zubiaga, SWSR: A Chinese dataset and lexicon for online sexism detection, Online Social Networks and Media, Volume 27, 2022, 100182, ISSN 2468-6964,

If you have any queries or suggestions about our work, please contact us via We also welcome any ideas or cooperation related to Chinese sexist speech.


Files (3.7 MB)

Name Size Download all
2.2 kB Preview Download
2.3 MB Preview Download
27.2 kB Preview Download
1.3 MB Preview Download

Additional details


  • A. Jiang, X. Yang, Y. Liu and A. Zubiaga (2021). SWSR: A Chinese Dataset and Lexicon for Sexist Hate Speech Detection. Under review.