Bias, Randomness, and Blind-Faith: Large Language Model Code Generation and Security Analysis
Creators
Description
Testing bias in code generation and security in large language models (LLMs) ChatGPT, Claude, and Gemini. This data is accompanying a paper submitted to USENIX '25. In experimentation, three trials were completed, each testing five different categories.
The file Trial Charts.zip includes the trial model versions, final results from each category and test, as well as results from manual analysis. Each bias has its own .csv (Sex, Age, Race & Ethnicity, Experience, Special Circumstances) for every trial's results and another file with the first letter of the bias + Overall (i.e. A Overall for Age, SC Overall for Special Circumstances) for overall results of the category. The files labeled Manual Analysis (or Manual A.) highlight some of the differences between the biased code and the control code. It also includes notable outputs from each bias.
The Trial Results.zip includes all of the raw data from testing the biases. The data was collected in OneNote. The files "Trial [1-3]" include all outputs from all categories. The files "Trial [1-3] Retests" include any non-control retests. The files "[1-3] Control Retests" give the results of retesting the control for all three trials. The files "[1-3] A and B Tests" show the results from resting with the new labels "a" and "b".
Files
Trial Charts.zip
Files
(16.2 MB)
Name | Size | Download all |
---|---|---|
md5:3f682af3e6ed189acb2a1331ef0a5eb4
|
43.5 kB | Preview Download |
md5:f8e34ccd981b479fa9443e7b587aa9b0
|
16.2 MB | Preview Download |