Published May 10, 2024 | Version v1
Software Open

Inspect AI: Framework for Large Language Model Evaluations

Authors/Creators

Description

Inspect is a fully open-source, extensible framework for rigorous evaluation of large language models (LLMs). It enables comprehensive, reproducible assessments across a broad range of task domains—including coding, reasoning, knowledge, agentic tasks, behaviour, and multimodal understanding—supported by extensive tooling, over 100 pre-built benchmarks, and visualisation utilities. Inspect is designed to provide an excellent developer experience while enabling evaluations that can be reproducibly run at scale.

Files

inspect_white_paper.pdf

Files (84.3 kB)

Name Size Download all
md5:844b2d7ba7ef8b5254d8c42455435f94
84.3 kB Preview Download

Additional details

Software

Repository URL
https://github.com/UKGovernmentBEIS/inspect_ai
Programming language
Python
Development Status
Active