CompBioBench v1: A benchmark of 100 diverse, verifiable questions for agents for computational biology
Authors/Creators
Description
We introduce CompBioBench v1, a benchmark of 100 diverse tasks for evaluating agentic systems in computational biology. Unlike mathematics and programming, which more readily admit systematic verification, biological data are inherently noisy and open to interpretation. To enable objective evaluation without reducing tasks to prescriptive checklists, we propose a new benchmark-construction strategy based on synthetic/augmented data and metadata scrambling/scrubbing of real datasets to create challenging problems with a single ground-truth answer that require multi-step reasoning, tool use, bespoke code, and interaction with real-world external resources. The benchmark spans genomics, transcriptomics, epigenomics, single-cell analysis, human genetics, and machine learning workflows. Questions are curated by domain experts to cover a broad range of skills with varying difficulty.
This record contains all questions, metadata, and input data files associated with CompBioBench v1.
Files
Files
(12.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:b9d72c04c018ee25798cc93ba77c1964
|
57.4 kB | Download |
|
md5:043dd0395898f2a71b6e81aea6a92276
|
12.0 GB | Download |