Published January 6, 2026 | Version v1
Video/Audio Open

Ep. 182: Beyond the Transformer: The New AI Architecture Wars

  • 1. My Weird Prompts
  • 2. Google DeepMind
  • 3. Resemble AI

Description

Episode summary: For years, the transformer has been the undisputed king of AI, but its "quadratic bottleneck" is starting to show its age. In this episode, Herman and Corn dive into the 2026 landscape of alternative architectures like Mamba, RWKV, and x-LSTM that promise linear scaling and infinite context. Discover how hybrid models are combining the reasoning power of attention with the efficiency of state-space models to redefine what's possible in language modeling.

Show Notes

In the rapidly evolving landscape of 2026, the artificial intelligence community is witnessing a fundamental shift in how large language models (LLMs) are built. For nearly a decade, the "Transformer" architecture—defined by its self-attention mechanism—was considered the pinnacle of machine learning. However, as Herman and Corn discuss in the latest episode of *My Weird Prompts*, the era of the transformer monoculture is coming to an end. The discussion centers on the "quadratic bottleneck" and the innovative new architectures designed to shatter it.

### The Problem with Attention Herman opens the discussion by explaining why the industry is looking beyond the transformer. While the attention mechanism allows models to understand context by comparing every word in a sequence to every other word, it comes at a steep price. This process is "quadratic," meaning that if you double the length of a text, the computational work and memory required quadruple. This creates a massive barrier for processing long-form content, such as entire books or massive codebases. Furthermore, the "key-value cache" required for these models to function grows alongside the text, eventually leading to astronomical energy costs and hardware limitations.

### The Renaissance of Recurrent Neural Networks One of the most surprising developments discussed by the brothers is the comeback of Recurrent Neural Networks (RNNs). Historically, RNNs were sidelined because they processed information sequentially, making them impossible to train on the parallel processing power of modern GPUs. However, new models like RWKV (Receptance Weighted Key Value) and x-LSTM have changed the game.

Herman highlights RWKV-7, which uses clever mathematics to allow the model to be trained in parallel like a transformer but act like an RNN during use. This "shapeshifting" ability means the model has a constant memory footprint, regardless of whether it is processing ten words or ten thousand. Similarly, x-LTSM—a modernized version of the classic Long Short-Term Memory network—has emerged as a "Pareto-dominant" force, proving that old concepts can be revitalized with modern "warp drives" to outperform traditional transformers in efficiency.

### The Rise of Mamba and State Space Models The conversation then turns to what many consider the most significant challenger to the transformer: State Space Models (SSMs), specifically the Mamba architecture. Developed by Albert Gu and Tri Dao, Mamba represents a departure from discrete text processing toward a more continuous, fluid stream of information.

Unlike earlier SSMs that treated all information equally, Mamba introduced "selective" mechanisms. This allows the model to decide which information is worth remembering and which can be discarded, much like a human taking notes during a lecture. The primary advantage of Mamba and its successor, Mamba-2, is linear scaling. In a linear system, doubling the input only doubles the work, making it incredibly fast and efficient. This allows for context windows spanning millions of tokens on hardware that would typically struggle to run a standard transformer.

### Hybrid Vigor: The Best of Both Worlds Perhaps the most practical insight from the episode is the rise of hybrid architectures. Herman and Corn discuss Jamba, an architecture from AI21 Labs that refuses to choose between attention and state-space models. By interleaving transformer layers with Mamba layers, Jamba achieves "hybrid vigor."

These models use a small amount of attention to handle complex, non-linear reasoning tasks while utilizing Mamba layers for the bulk of the heavy lifting. This approach allows for high-quality reasoning without the massive memory overhead of the KV cache. As of 2026, these hybrid models are enabling sophisticated "reasoning" capabilities on consumer devices like smartphones, which was previously unthinkable.

### The Impossible Trinity As the episode draws to a close, Herman introduces the concept of the "Impossible Trinity" in LLM design: training parallelization, low-latency inference, and high performance. Historically, a model could only achieve two of these three goals. Transformers offered parallelization and performance but lacked low-latency inference at scale. RNNs offered low-latency but lacked parallelization.

New contenders like Microsoft's RetNet (Retentive Networks) are attempting to solve this by providing a unified theoretical framework that hits all three points of the triangle. While the transformer remains a powerful tool, the discussion between Herman and Corn makes it clear that the future of AI belongs to architectures that can scale linearly, think deeply, and run efficiently. The monoculture has broken, and the resulting diversity in AI architecture is driving the next great leap in machine intelligence.

Listen online: https://myweirdprompts.com/episode/ai-architectures-beyond-transformers

Notes

My Weird Prompts is an AI-generated podcast. Episodes are produced using an automated pipeline: voice prompt → transcription → script generation → text-to-speech → audio assembly. Archived here for long-term preservation. AI CONTENT DISCLAIMER: This episode is entirely AI-generated. The script, dialogue, voices, and audio are produced by AI systems. While the pipeline includes fact-checking, content may contain errors or inaccuracies. Verify any claims independently.

Files

ai-architectures-beyond-transformers-cover.png

Files (26.4 MB)

Name Size Download all
md5:801392686342d1c5187a13134de80a23
6.6 MB Preview Download
md5:6ad274ee690804b7dddf773012506f7d
1.6 kB Preview Download
md5:279cdf68f6892d7cba919d90b9a0587b
19.8 MB Download
md5:ce75056df11e46cd5d8de8506df6013d
23.6 kB Preview Download

Additional details