Published December 23, 2025 | Version v1
Preprint Open

Code Generation Competition: 16 Proprietary vs. Open-Source LLMs & Iterative Learning Based on FDA Adverse Event Reporting System

Authors/Creators

  • 1. ChemicalQDevice

Description

Few effective goal-oriented iterative LLM code benchmarking studies exist. Successive high dimensional and complex problem improvements are desired versus conventional code assessments. Inspired by a recent CodeClash study, this tournament focuses primarily on the goal of generating functions to obtain a perfect competition task score based on three recent FDA FAERS files. Here, Opus 4.5 Extended was primarily utilized to build a novel Python evaluation engine measuring LLM code pair correctness, methodology, code quality, and algorithm effectiveness against a fixed reference standard and head-to-head. The notebook then automated Code A and Code B grading, and outputted their answers and reference standard of drug-reaction signals in csv files. The bracket was organized at scale: 16 LLMs - 8 proprietary LLMs on the left and 8 open-source LLMs on the right. The 8 Round 1 winners and corresponding notebooks were then re-introduced to each LLM with a competition prompt to generate the next round’s code submission. Iterative learning in the form of improved final scores was observed for several Round 2 winners, which was based on its prior round competition code, competitors’ code, and results. Gpt-5.2-pro and Gemini 2.5 Pro API were effective at iterative learning on the FAERS dataset goal; while Kimi K2 Thinking saw the biggest single round score increase at +0.405. Contestant models were from xAI, OpenAI, Gemini, Claude, DeepSeek, Kimi, GLM, MiniMax, and Qwen manufacturers.

Files

Code Generation Competition 16 Proprietary vs. Open-Source LLMs Iterative Learning Based on FDA Adverse Event Reporting System.pdf

Files (17.8 MB)

Name Size Download all
md5:ef715cc5f2a55057567115d67a84f930
2.0 MB Preview Download
md5:2883049d659f2f265a3482775e987500
125.9 kB Preview Download
md5:303fe85dceaf975164029f19b0cd6160
1.3 MB Preview Download
md5:2a83e2ab1e078da53ccb8486fa6a6548
2.6 MB Preview Download
md5:d251808e5017dd8ead24785e55401b66
1.5 MB Preview Download
md5:25195221f77c4b7181748457a1811471
2.7 MB Preview Download
md5:6b309a6e55b52e5056d2f03500f08a3b
5.8 MB Preview Download
md5:ba2f229b147c138a0b807aa8e3b083f9
1.0 MB Preview Download
md5:32b5b84b749724ed439b3991a7e08d5a
719.6 kB Preview Download

Additional details

Dates

Created
2025-12-23