Evaluating and Regulating Agentic AI: A Study of Benchmarks, Metrics, and Regulation computing and processing
Authors/Creators
Description
Agentic AI represents a new generation of Artificial Intelligence (AI) systems capable of perceiving, reasoning, planning, and acting toward achieving goals with a degree of autonomy. Unlike traditional AI models that merely generate outputs, these systems maintain memory, interact with their environment, and adapt over time. However, evaluating such interactive and evolving behavior remains a significant challenge. While several recent surveys have examined agentic AI architectures, components, and applications, few have systematically reviewed their evaluation, particularly regarding performance, reliability, and governance across an evolving agentic AI ecosystem. This paper addresses that gap by reviewing recent progress in the development and assessment of agentic AI, focusing on three core dimensions: benchmarks, metrics, and governance. We analyze how current evaluation frameworks capture reasoning, planning, collaboration, and ethical alignment across single-and multi-agent systems. Ultimately, this study aims to establish a unified foundation for building trustworthy, auditable, and human-aligned AI agents. The project webpage is available at project link.
Files
1350845.pdf
Files
(3.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b2a5e5eb074a4579d235d3f1aaaad0fd
|
3.2 MB | Preview Download |