Inspect AI: Framework for Large Language Model Evaluations

AI Security Institute, UK

doi:10.5281/zenodo.18434279

Published May 10, 2024 | Version v1

Software Open

Inspect AI: Framework for Large Language Model Evaluations

AI Security Institute, UK

Inspect is a fully open-source, extensible framework for rigorous evaluation of large language models (LLMs). It enables comprehensive, reproducible assessments across a broad range of task domains—including coding, reasoning, knowledge, agentic tasks, behaviour, and multimodal understanding—supported by extensive tooling, over 100 pre-built benchmarks, and visualisation utilities. Inspect is designed to provide an excellent developer experience while enabling evaluations that can be reproducibly run at scale.

Files

inspect_white_paper.pdf

Files (84.3 kB)

Name	Size	Download all
inspect_white_paper.pdf md5:844b2d7ba7ef8b5254d8c42455435f94	84.3 kB	Preview Download

Additional details

Repository URL: https://github.com/UKGovernmentBEIS/inspect_ai
Programming language: Python
Development Status: Active

165

Views

Downloads

Show more details

	All versions	This version
Views	165	165
Downloads	43	43
Data volume	4.5 MB	4.5 MB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

License: MIT License

A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: January 30, 2026
Modified: February 5, 2026

Inspect AI: Framework for Large Language Model Evaluations

Authors/Creators

Description

Files

inspect_white_paper.pdf

Files (84.3 kB)

Additional details

Software