Multi-Agent Development of a Domain-Specific Scientific Application: Complexity Classes in Building StellarPop
Authors/Creators
Description
Abstract. We present StellarPop, a Ruby on Rails stellar population synthesis (SPS) pipeline, as a case study in building a non-trivial domain-specific scientific application using a two-agent LLM workflow. Claude (Anthropic) serves as engineer and scientist designing the architecture, directing physics decisions, generating implementation prompts and testing coding results. Codex (OpenAI CLI) serves as coder implementing all code changes from Claude’s prompts without making science or architecture decisions independently. The author serves as the tester, comparing the LLM calculations with peer-reviewed astrophysics data. We identify and analyze four interlocking complexity classes that challenged this workflow: (1) scientific complexity, including age/metallicity degeneracy, calibration instability, and objective function misalignment; (2) data complexity, including SDSS catalog provenance integrity, multi-release DR management, and photometric identity traceability across heterogeneous external sources; (3) agentic complexity, including prompt precision requirements, context boundary management between agents, and the tension between generation speed and physics correctness; and (4) architectural complexity, arising from implementing domain science in Ruby on Rails with no prior SPS precedent in that framework. We document specific failure modes including fabricated astrophysical context, silent physics errors surviving multiple review cycles, calibration regressions introduced by correct bug fixes, and the non-monotonic nature of scientific progress in an LLM-assisted workflow. The blackboard architecture underlying StellarPop itself is the subject of companion theoretical work on deterministic LLM blackboard pipelines proved well-suited to modular knowledge source management. We argue that multi-agent LLM workflows are potentially viable for domain-specific scientific software development but require explicit role separation, documented calibration protocols, physics validation that is never fully delegated to the coding agent, and persistent artifact generation to survive context resets
Files
stellarpop_draft_10.5281:zenodo.19414914.pdf
Files
(130.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:478e0311691b69602b1d98dc967b8199
|
130.8 kB | Preview Download |
Additional details
Related works
- Describes
- Software: 10.5281/zenodo.19412078 (DOI)
Dates
- Created
-
2026-04-03
Software
- Repository URL
- https://github.com/unixneo/stellar_pop
- Programming language
- Ruby
- Development Status
- Active