Building a Sandbox for Agentic AI

Rosehill, Daniel; Gemini 3.1 (Flash); Chatterbox TTS

doi:10.5281/zenodo.19371622

Published April 1, 2026 | Version v1

Video/Audio Open

Building a Sandbox for Agentic AI

1. My Weird Prompts
2. Google DeepMind
3. Resemble AI

Episode summary: The barrier to entry for autonomous AI agents is dropping fast, but the complexity is skyrocketing. In this episode, we explore the "sandbox philosophy" for agentic AI—creating a safe, disposable environment where you can experiment without fear. We discuss why local setups are risky, how to leverage a VPS with Docker for isolation, and secure networking with Tailscale. Plus, we walk through practical projects like a movie recommendation bot and a multi-agent code review system to illustrate key concepts in agent orchestration and error handling.

Show Notes

The rise of autonomous AI agents brings a unique kind of anxiety: the fear that one wrong keystroke could corrupt your system or drain your API credits. The solution isn't to avoid experimentation, but to build a "safe sandbox" where failure is just a data point, not a disaster. This approach transforms the learning process from a high-stakes gamble into a controlled, educational experience.

**The Case for a Disposable Environment** The first step to understanding agentic AI is moving away from local development. While a local Python environment is fine for simple scripts, it's a minefield for autonomous agents. An agent with a Python interpreter tool acts like a remote shell that thinks for itself; running it on your personal laptop is a security risk. The solution is a disposable canvas: a Virtual Private Server (VPS).

A VPS provides an "air gap" by default. If an agent goes haywire—filling the disk with logs or changing the root password—you can simply hit "rebuild" and have a clean slate in sixty seconds. For beginners, services like DigitalOcean offer pre-configured "AI Agent" droplets that set up the necessary Linux environment and drivers.

**Layered Security: VPS, Docker, and Tailscale** A sandbox isn't just one layer; it's a set of concentric defenses. Even on a VPS, you shouldn't give an agent free rein. The best practice is to run the agent inside a restricted user account or, better yet, a Docker container.

Docker containers are the ideal "Lego block" for agentic testing. Using the `--rm` flag ensures that the entire container is deleted the moment you exit, leaving no residual files or broken paths. For advanced testing, you can even give an agent the ability to spin up its own Docker containers to run code it writes, as frameworks like E2B do.

Security extends to network access. A VPS is a computer on the public internet, which introduces risks. Tools like Tailscale create a zero-config VPN, making your VPS appear as a local device without opening ports to the open internet. Coupled with Cloudflare Access for authentication, this creates a robust "Zero Trust" model that catches mistakes before they become disasters.

**Project 1: The Movie Recommendation Bot (Level One)** A simple movie recommendation bot is a perfect "Level One" project because it immediately exposes the friction points of agentic reasoning. Unlike a standard LLM prompt, an agent must: 1. Identify the user's location (for geo-specific streaming libraries). 2. Query a live database (like JustWatch or TMDB). 3. Cross-reference results with the user's "Seen" list in a local database (e.g., SQLite). 4. Reason about why a specific movie fits the user's preferences.

To manage this complexity, you use a "Planner" pattern. Instead of a single prompt, the agent first generates a step-by-step plan. This acts as a cognitive "pre-flight check," allowing you to see where the logic might fail. Additionally, using a library like PydanticAI enforces type safety. By defining a structured "Movie" object, you force the LLM to return valid data; if it tries to give a fuzzy answer, the code crashes at validation—in a test project, this crash is your best friend.

**Project 2: The Code Review Agent (Level Two)** Moving to "Level Two," a multi-agent code review system demonstrates the power of orchestration. Using a framework like CrewAI, you can define three distinct agents: * **The Developer:** Writes the Python script. * **The Security Auditor:** Scans the code for vulnerabilities like SQL injection or hardcoded keys. * **The Refactorer:** Rewrites the code based on the Auditor's feedback.

This setup highlights "Agentic Friction." You'll observe emergent behaviors, like the Auditor rejecting perfectly fine code or the Developer getting stuck in a loop trying to satisfy an odd security requirement. Because the system is sandboxed, you can even have the Developer execute its own code and let the Auditor analyze the runtime errors, creating a closed-loop learning system. The key takeaway here is managing token quotas; without iteration limits, agents can burn through credits in "politeness loops" or perfectionist cycles.

**Project 3: Personal Finance Analyst (Level Three)** The final project discussed is a Personal Finance Analyst, a data-heavy application that introduces Retrieval-Augmented Generation (RAG). This agent would need to securely access financial data, query APIs for market information, and provide structured analysis—all within the safety of the sandbox. It reinforces the core lesson: the goal of these test projects isn't to build a production-ready app, but to understand every failure mode that occurs during development.

By building these projects in a layered, disposable environment, you move from fearing agent behavior to understanding it. The sandbox becomes a playground for learning, where "blue smoke" moments are just another step in the engineering journey.

Listen online: https://myweirdprompts.com/episode/building-sandbox-agentic-ai

Notes

My Weird Prompts is an AI-generated podcast. Episodes are produced using an automated pipeline: voice prompt → transcription → script generation → text-to-speech → audio assembly. Archived here for long-term preservation. AI CONTENT DISCLAIMER: This episode is entirely AI-generated. The script, dialogue, voices, and audio are produced by AI systems. While the pipeline includes fact-checking, content may contain errors or inaccuracies. Verify any claims independently.

Files

building-sandbox-agentic-ai-cover.png

Files (25.0 MB)

Name	Size	Download all
building-sandbox-agentic-ai-cover.png md5:ebde8be4abbebe65030d80de324d25de	417.5 kB	Preview Download
building-sandbox-agentic-ai.json md5:32106eea584abb4c1eec55ca5b916532	1.6 kB	Preview Download
building-sandbox-agentic-ai.m4a md5:e974035c6f2f158a24270857a5edefb4	24.5 MB	Download
building-sandbox-agentic-ai.txt md5:f466fd353bcba43c5e5426323ee8ea1f	29.3 kB	Preview Download

Additional details

Is identical to: https://myweirdprompts.com/episode/building-sandbox-agentic-ai (URL)
Is supplement to: https://episodes.myweirdprompts.com/transcripts/building-sandbox-agentic-ai.md (URL)

	All versions	This version
Views	16	16
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Building a Sandbox for Agentic AI

Authors/Creators

Description

Show Notes

Notes

Files

building-sandbox-agentic-ai-cover.png

Files (25.0 MB)

Additional details

Related works