Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits
Authors/Creators
Description
Hound is a graph-based audit agent that improves system-level reasoning across interrelated components in complex codebases. Instead of relying on broad file chunks or language-specific tooling, Hound builds flexible, analyst-defined knowledge graphs (e.g., monetary/value flows, authentication/authorization roles, call graphs, invariants) with compact annotations. Investigations are planned in two phases: a Coverage sweep to quickly map components, then an Intuition/Saliency phase that targets high-impact, contradiction-rich leads. A persistent belief system tracks hypotheses with explicit evidence and confidence, while a QA Finalizer reviews high-confidence items over full source context to confirm or reject findings.
On a five-project subset of ScaBench, Hound raises micro recall and F1 over a baseline LLM analyzer (recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) at a modest precision trade-off typical of exploratory audits. Gains stem from relation-first graphs that enable exact, cross-component retrieval and a disciplined hypothesis lifecycle. The artifact includes code, graph builders, benchmark harnesses, and scripts to reproduce tables and HTML reports.
Files
mueller_hound_2025_v2.pdf
Files
(371.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6f2e6fb0e0f0cf646f94206cc1a3b639
|
371.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/muellerberndt/hound
- Programming language
- Python
References
- C. Jimenez et al. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv:2310.06770, 2023.
- Y. Wang et al. CodeRAG-Bench: Can Retrieval Augment Code Generation? arXiv:2406.14497, 2024.
- S. Ouyang et al. RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph. arXiv:2410.14684, 2024.
- X. Liu et al. CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases. In Proc. NAACL-HLT 2025, 2025.
- J. Spracklen et al. We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs. arXiv:2406.10279, 2024.
- H. Xu et al. CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph. arXiv:2411.11532, 2024.
- H. Zhang et al. AutoCodeRover: Autonomous Program Improvement. In Proc. ISSTA, 2024.
- J. Guo et al. RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing. In Proc. ICML, 2025.
- H. Jelodar, M. Meymani, and R. Razavi-Far. Large Language Models (LLMs) for Source Code Analysis: Applications, Models and Datasets. arXiv:2503.17502, 2025.
- R. Singh et al. Code Researcher: Deep Research Agent for Large Systems Code and Commit History. arXiv:2506.11060, 2025.
- R. Rao et al. Insights, Techniques, and Evaluation for LLM-Driven Knowledge Graphs. NVIDIA Technical Blog, 2024.
- N. Balić. Building a Knowledge Graph of Your Codebase. Daytona Blog, 2024.
- D. Yang et al. Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs. arXiv:2502.19411, 2025.