Published July 12, 2025
| Version v1
Conference paper
Open
Nonstandard English and the Automated Scoring of Open-Ended Math Problems
Authors/Creators
Contributors
- 1. University of Minnesota, USA
- 2. Weizmann Institute of Science, Israel
- 3. CNR-ITD, Italy
- 4. University of Palermo, Italy
- 5. University of Illinois at Urbana-Champaign, USA
Description
Recent advances in AI have opened the door for the automated scoring of open-ended math problems, which were previously much more difficult to assess at scale. However, we know that biases still remain in some of these algorithms. For example, recent research on the automated scoring of student essays has shown that certain varieties of English are more strongly penalized for non-standard English than they are for other differences that reduce the quality of students' writing. This study examines that issue in a new domain, investigating the potential for large language models to accurately grade open-ended math problems produced by students who speak and write in non-standard English. Specifically, we look at four features of African American Vernacular English (AAVE), which range in the degree to which they are unique to AAVE or are common in other non-standard dialects. We then compare the scoring of answers that were produced by students using these dialect features to a control group of synthetic data--where we converted all non-standard dialect features to standard English. Results show that minor changes in the number of dialect features per student response do not impact GPTs automated scoring, but prompt engineering efforts did.
Files
2025.EDM.long-papers.195.pdf
Files
(4.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a94dabe0e6244c7362d23f0fd5d0b184
|
4.2 MB | Preview Download |