Published November 11, 2025 | Version v1
Preprint Open

Evaluating and Regulating Agentic AI: A Study of Benchmarks, Metrics, and Regulation computing and processing

Authors/Creators

Description

Agentic AI represents a new generation of Artificial Intelligence (AI) systems capable of perceiving, reasoning, planning, and acting toward achieving goals with a degree of autonomy. Unlike traditional AI models that merely generate outputs, these systems maintain memory, interact with their environment, and adapt over time. However, evaluating such interactive and evolving behavior remains a significant challenge. While several recent surveys have examined agentic AI architectures, components, and applications, few have systematically reviewed their evaluation, particularly regarding performance, reliability, and governance across an evolving agentic AI ecosystem. This paper addresses that gap by reviewing recent progress in the development and assessment of agentic AI, focusing on three core dimensions: benchmarks, metrics, and governance. We analyze how current evaluation frameworks capture reasoning, planning, collaboration, and ethical alignment across single-and multi-agent systems. Ultimately, this study aims to establish a unified foundation for building trustworthy, auditable, and human-aligned AI agents. The project webpage is available at project link.

Files

1350845.pdf

Files (3.2 MB)

Name Size Download all
md5:b2a5e5eb074a4579d235d3f1aaaad0fd
3.2 MB Preview Download

Additional details

Funding

European Commission
AIXPERT - An agentic, multi-layer, GenAI-powered backbone to make an AI system explainable, accountable, and transparent 101214389