AI agents
agent evaluation
tool use
operational scorecards
LLM reliability
structured outputs
workflow evaluation
human-in-the-loop review
