Published May 10, 2024
| Version v1
Software
Open
Inspect AI: Framework for Large Language Model Evaluations
Authors/Creators
Description
Inspect is a fully open-source, extensible framework for rigorous evaluation of large language models (LLMs). It enables comprehensive, reproducible assessments across a broad range of task domains—including coding, reasoning, knowledge, agentic tasks, behaviour, and multimodal understanding—supported by extensive tooling, over 100 pre-built benchmarks, and visualisation utilities. Inspect is designed to provide an excellent developer experience while enabling evaluations that can be reproducibly run at scale.
Files
inspect_white_paper.pdf
Files
(84.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:844b2d7ba7ef8b5254d8c42455435f94
|
84.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/UKGovernmentBEIS/inspect_ai
- Programming language
- Python
- Development Status
- Active