File uploads: We have fixed an issue which caused file uploads to fail. We apologise for the inconvenience it may have caused.

Published March 27, 2023 | Version 1.0.0
Dataset Open

Bengali Identity Bias Evaluation Dataset (BIBED)

  • 1. University of Colorado Boulder
  • 2. University of Toronto

Description

Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.

If you find this dataset useful, please cite the associated paper:

Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).

BibTeX:

@inproceedings{das-etal-2023-toward,
    title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity",
    author = "Das, Dipto  and
      Guha, Shion  and
      Semaan, Bryan",
    booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.c3nlp-1.8",
    pages = "68--83",
}

Files

Files (72.2 MB)

Name Size Download all
md5:2bb096c500cc747d0c2f8d430ef87bd5
10.2 MB Download
md5:d78e1df7d8697f1027ee01050658c3d8
29.1 MB Download
md5:392cb31c4df185c229fb46f1c5382544
17.5 MB Download
md5:d05829a61199d5195e9b77aaa706b6b8
14.5 MB Download
md5:3b581cf1c7f249cafcc35f2d345790f0
436.2 kB Download
md5:deeaa741a0c3f7284ab9f9da491c400f
436.2 kB Download

Additional details

References

  • Dipto Das, Shion Guha, and Bryan Semaan. 2023. Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of Workshop on Cross-Cultural Considerations in NLP at The 17th Conference of the European Chapter of the Association for Computational Linguistics. 1-15.