Published March 31, 2026 | Version v1
Video/Audio Open

Why Does Your Agent Check Old Receipts First?

  • 1. My Weird Prompts
  • 2. Google DeepMind
  • 3. Resemble AI

Description

Episode summary: When an AI agent is asked to book a flight, why does it waste time checking your travel history first? This episode dives into the "agentic friction" that causes AI assistants to be overly zealous and slow. We explore the mechanics of tool selection in N8N, the role of semantic matching, and why system prompts often fail to curb this behavior. Discover practical strategies, including the "Plan Step" technique, to make your agents faster, more efficient, and less prone to derailing workflows.

Show Notes

### The Agentic Friction: Why Your AI Assistant Overthinks Simple Tasks

When you ask an AI agent to book a flight from Tel Aviv to New York, the model faces a critical split-second decision: should it check your past travel history or immediately search for current flights? This "fork in the road" is where many real-world agent builds fail. Instead of acting efficiently, the agent often becomes a digital hoarder, rummaging through old receipts when it should be executing the task at hand.

The core problem lies in how models evaluate tool calls. In platforms like N8N, developers provide tools with descriptions that act as "ad copy" for the LLM. The model performs a semantic matching game, comparing the user's prompt against these descriptions. If the prompt mentions "New York" and a tool is labeled "Travel History," the model sees a connection and triggers the tool—even if it's functionally unnecessary. This leads to what's known as the "eagerness" problem, where the agent defaults to gathering every possible scrap of data before answering.

### The Cost of Over-Research

In a typical scenario, an agent might trigger a flight search via Kiwi and a RAG query to Pinecone simultaneously. While the flight search takes three seconds, the vector database query—hampered by cold-start latency—might take twelve. The agent waits for both, resulting in a fifteen-second delay. Worse, the retrieved "past bookings" data often adds zero value to the current query, such as simply noting that the user flew to New York in 2024.

This behavior stems from the model's training. Reinforcement Learning from Human Feedback (RLHF) has conditioned models to be "good assistants," prioritizing thoroughness over speed. However, in production environments, users prefer a ninety-percent accurate answer in two seconds over a ninety-nine-percent accurate answer in twenty. The model's internal architecture lacks a "cost-benefit analysis" for tool calls, treating expensive, slow RAG pipelines the same as fast, local tools.

### The Brittleness of System Prompts

Developers often try to curb this eagerness with system prompts like, "Only check RAG if the user asks about preferences." However, these prompts are brittle. If the user says, "Use the same airline as last time," an overly restrained agent might fail to retrieve necessary history and ask redundant questions. Conversely, if the leash is too loose, the agent becomes expensive and slow.

Another issue is tool naming. A tool named "Memory_Search" invites overuse, acting as a crutch for the agent. Since every conversation turn is a fresh start without specific feedback loops, the agent treats each interaction as a blank slate, often repeating the same mistakes.

### Solutions: From Planning to Observability

One effective strategy is the "Plan Step." Instead of moving directly from user prompt to tool call, insert an intermediate phase where the model generates a plan. For example: "The user is asking for current flight options. I need the Kiwi tool. I do not need the Travel History tool because no specific preferences were mentioned." This approach, implemented via multi-node workflows in N8N, adds minimal latency compared to unnecessary RAG calls and forces the agent to show its work.

Improving observability is also crucial. While execution logs show what the agent did, they don't reveal why. Using reasoning models or Chain of Thought techniques can illuminate the internal logic, helping developers debug and refine tool selection.

### Key Takeaways

- **Tool Descriptions Matter**: Broad or vague descriptions lead to overuse. Be specific to guide the agent's choices. - **Latency vs. Accuracy**: Users prioritize speed. Optimize for quick, accurate responses rather than exhaustive data gathering. - **Plan Before Acting**: A "Plan Step" can reduce unnecessary tool calls and improve efficiency. - **Observability Gaps**: Use reasoning models to understand the "why" behind tool selection, not just the "what."

In the race to build reliable agentic systems, addressing the "eagerness" problem is a critical step. By refining tool definitions, incorporating planning phases, and improving observability, developers can create agents that are not only smart but also swift.

Listen online: https://myweirdprompts.com/episode/agent-tool-selection-eagerness

Notes

My Weird Prompts is an AI-generated podcast. Episodes are produced using an automated pipeline: voice prompt → transcription → script generation → text-to-speech → audio assembly. Archived here for long-term preservation. AI CONTENT DISCLAIMER: This episode is entirely AI-generated. The script, dialogue, voices, and audio are produced by AI systems. While the pipeline includes fact-checking, content may contain errors or inaccuracies. Verify any claims independently.

Files

agent-tool-selection-eagerness-cover.png

Files (33.8 MB)

Name Size Download all
md5:a20b72bcef40a5db7ab683b7710ee831
791.7 kB Preview Download
md5:6163fbd641321d50e3bfea7392e3cdeb
1.5 kB Preview Download
md5:4bfdeb59d6a7b1ef8bca290f1e9d3f48
33.0 MB Download
md5:886aa5133ea4e303a350d69a11bdb4f1
39.3 kB Preview Download

Additional details