Bengali Identity Bias Evaluation Dataset (BIBED)
Creators
- 1. University of Colorado Boulder
- 2. University of Toronto
Description
Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.
If you find this dataset useful, please cite the associated paper:
Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).
BibTeX:
@inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }
Files
Files
(72.2 MB)
Name | Size | Download all |
---|---|---|
md5:2bb096c500cc747d0c2f8d430ef87bd5
|
10.2 MB | Download |
md5:d78e1df7d8697f1027ee01050658c3d8
|
29.1 MB | Download |
md5:392cb31c4df185c229fb46f1c5382544
|
17.5 MB | Download |
md5:d05829a61199d5195e9b77aaa706b6b8
|
14.5 MB | Download |
md5:3b581cf1c7f249cafcc35f2d345790f0
|
436.2 kB | Download |
md5:deeaa741a0c3f7284ab9f9da491c400f
|
436.2 kB | Download |
Additional details
References
- Dipto Das, Shion Guha, and Bryan Semaan. 2023. Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of Workshop on Cross-Cultural Considerations in NLP at The 17th Conference of the European Chapter of the Association for Computational Linguistics. 1-15.