Why I’m Betting Against the AGI Hype

An Engineer’s (Philosophical) Perspective.

Nov 28, 2025

photo of girl laying left hand on white digital robot — Photo by Andy Kelly on Unsplash

There’s a specific kind of clarity you get from building systems that have to actually work. It’s a discipline of constraints, a respect for the difference between a beautiful theory and a stable architecture. It gives you a well-calibrated bullshit detector. And when I look at the breathless hype around Artificial General Intelligence, my detector isn’t just buzzing. It’s screaming.

I’ve been trying to work through this question methodically, based on everything I know about how complex systems actually work, how energy and computation relate, what architectural constraints mean in practice. I’ve listened carefully to the skeptics—Gary Marcus has made important points about the limitations of current approaches. I’ve listened to the optimists—the researchers and executives claiming we’re on the verge of artificial general intelligence. I’ve read the technical papers, followed the debates, tried to understand both sides charitably.

And here’s what I’ve concluded: No AGI breakthrough is on the near horizon. The current AI bubble is very real. And it’s very likely to pop in the near future.

So my thesis is narrow but strong: current LLM-based approaches are extraordinarily useful tools, but the specific claims that they are on a straight-line path to AGI look, from a systems perspective, like string theory circa 1995—beautiful, expensive, and structurally unable to reach what it promises.

None of this rules out AGI in principle, or via radically different architectures. I’m arguing specifically that this route—scaled LLMs plus light architectural tweaks—is overwhelmingly unlikely to deliver what’s being promised on the timelines being sold.

But as I worked through the technical arguments, I kept bumping into something deeper. The AGI-from-LLMs thesis fails not because it’s too ambitious, but because it’s the wrong kind of ambition—an attempt to engineer a solution to what is fundamentally a philosophical confusion about the nature of intelligence itself.

The constraints aren’t just engineering challenges—they point to a fundamental misunderstanding about what kind of problem we’re trying to solve. I’ve been an engineer long enough to recognize a pattern: when smart people can’t solve a problem, they sometimes redefine the problem to fit their solution. Not through conscious deception, but through a subtle shift in framing that makes the impossible seem merely difficult.

That’s what I think is happening with AGI-from-LLMs.

Let me show you what I mean.

The Constraint Problem

When you build real systems that have to work in the real world, you learn something fundamental: constraints matter. Not as obstacles to work around, but as fundamental limits that shape what’s actually possible.

Current large language models—the GPT-4s and Claudes of the world—are impressive. Genuinely impressive. But they have architectural limitations that keep revealing themselves as I dig deeper. What started as “here are some engineering challenges” has become “these might be categorical differences from what actual intelligence requires.”

Here’s what I mean: Your brain right now, reading these words, is doing something remarkable. You’re not just processing these words sequentially like tokens in a prediction engine. You’re holding multiple levels of meaning simultaneously. You’re connecting what you’re reading to things you already know. You’re evaluating whether it makes sense. You’re predicting where the argument is going. You’re monitoring your own understanding and adjusting your attention based on confusion or interest.

All of this happens in a unified experiential field. All of it updates continuously, fluidly, without discrete steps. And crucially—this is where it gets interesting—there’s no clear separation between learning and using what you’ve learned. Your brain isn’t frozen while it processes information. It’s constantly updating its models based on what it encounters. The predictions you’re generating right now are being produced by models that are simultaneously being refined by the prediction errors you’re experiencing.

This is what neuroscientists call predictive processing or active inference. Your brain generates expectations, compares them to reality, processes the difference, and uses that error signal to update both immediate predictions and deeper models. All of this happens simultaneously at multiple timescales—from milliseconds to years.

And here’s the kicker: all of this happens at roughly twenty watts of power consumption, with response times measured in milliseconds to seconds.

The LLM Reality Check

Now compare that to what LLMs actually do.

Current large language models separate learning and inference completely. They’re trained—which takes weeks on massive compute clusters consuming megawatts of power—then they’re frozen. At inference time, the architecture is fixed. There’s no real-time model updating. No continuous learning integrated with processing. No adaptive restructuring based on what the system is encountering.

When you query GPT-4, you’re not getting a system that learns from your interaction and updates its understanding in real-time. You’re getting sophisticated pattern-matching through a fixed network that was trained on historical data and then locked in place. The architecture can’t modify itself based on what it’s processing. It can’t monitor its own reasoning and adjust strategy. It can’t restructure its approach when it encounters something genuinely novel.

The energy situation has improved—current optimized inference runs at approximately 0.2-0.5 watt-hours per typical query, far better than earlier systems. But that’s still just for processing through a frozen network. Add the continuous learning that biological intelligence does automatically, and you’re back to requiring massive computational overhead.

As an engineer, I started here: “Okay, these are hard problems, but smart people are working on them.” But the deeper I dug, the more I realized something: these aren’t just hard problems. They might be pointing to a fundamental misunderstanding about what intelligence is.

The “Just Solve These Problems” Trap

When I raise these issues with AGI optimists, I get a familiar response: “Sure, those are challenges, but we have specific proposals. These problems will be solved.”

And yes, we do have proposals. Real technical work, not just hope. But here’s what concerns me: The gap between “works in principle at small scale” and “works in practice at the scales required for AGI” keeps revealing something systematic.

Let me be specific about what achieving AGI from scaled LLMs would actually require:

First, continuous learning during inference. Researchers have proposed test-time training (TTT) layers—hidden states that update via self-supervised learning during inference. It’s a real technical proposal. But it’s only been demonstrated at 1.3 billion parameters. Memory I/O challenges and computational overhead remain unsolved at production scales. The mechanism exists in principle, but doesn’t work at the scales that matter.

Second, solving catastrophic forgetting—the problem where updating weights to learn new things destroys what you learned previously. And here’s where it gets really concerning: research shows that catastrophic forgetting actually intensifies as models scale from 1B to 7B parameters.

Stop and think about that for a moment. You’d expect larger models to handle continuous learning better. But they don’t. Each approach that works at smaller scales shows diminished effectiveness at frontier model sizes. This isn’t “we haven’t solved it yet”—it’s “the problem gets worse as you scale up, exactly where you need it to get easier.”

Third, massive improvements in energy efficiency. Multiple technical pathways exist: IBM’s NorthPole chip demonstrates 25-72× efficiency gains through architectural changes alone. Neuromorphic systems show 100-1000× improvements on event-driven workloads. The gap to Landauer’s limit suggests orders of magnitude improvement is physically possible.

These are real engineering directions, not fantasies. But—and this is critical—none have been demonstrated at frontier model scales with the additional overhead of continuous learning. We’re not just optimizing inference through fixed weights anymore—we’re talking about running gradient computation and backpropagation constantly during operation.

Fourth, metacognitive monitoring. Current approaches like chain-of-thought reasoning achieve impressive results, but research confirms they work through external scaffolding rather than genuine self-monitoring. The system doesn’t actually observe and adjust its own processing strategies the way biological intelligence does continuously. It follows prompted reasoning patterns, which is clever but categorically different from integrated metacognition.

And fifth, you’d need all of this to work together in a stable, coherent way that maintains unified understanding while the architecture is being continuously modified.

Now, I’m trying to be fair here. I’m walking a wire—maintaining the precarious balance between acknowledging the genuine impressiveness of the technology and seeing fundamental architectural limitations. The AGI enthusiasts have fallen off the wire into pure hype. The pure skeptics who dismiss all progress have fallen off the other side.

But the wire still holds for those willing to hold the tension: impressive engineering, possible category error.

The Scale Paradox

Here’s something that genuinely surprised me when I dug into recent research: Some of these problems don’t just remain unsolved at scale—they actually get worse.

The catastrophic forgetting problem is the clearest example. Research shows it intensifies moving from 1B to 7B parameters. Approaches that work at smaller scales lose effectiveness at frontier sizes. This creates a paradox: the scaling that’s supposed to get you closer to AGI is simultaneously making one of the fundamental problems harder to solve.

This isn’t an engineering challenge where more resources help. This is an architectural barrier where the approach that’s supposed to lead to AGI is fighting against itself.

Even Ilya Sutskever—the chief architect of the scaling paradigm at OpenAI—now says the “age of scaling” is over and new architectural approaches are needed. When the most prominent scaling maximalist admits pure scaling is insufficient, that tells you the barriers are real.

The Compound Probability Problem

Here’s where the engineering mindset really matters. These aren’t one big problem—they’re multiple problems that all need to be solved, and some of them get harder as you scale.

Let me be careful about this. Some of these problems might not be fully independent—neuromorphic architectures could potentially address both energy and continuous learning simultaneously. Test-time compute scaling opens new dimensions orthogonal to some of these challenges.

But even accounting for that, let’s work through the math:

Continuous learning at frontier scale: 40% chance of being solved in the next few years (it works at small scale, but memory and overhead challenges remain unsolved at production scale).

Catastrophic forgetting: 30% (it worsens with scale, which is the opposite direction from what we need).

True metacognitive monitoring: 20% (current approaches are categorically different from what’s needed).

Energy efficiency breakthrough: 60% (multiple pathways exist, though none proven at scale with continuous learning overhead).

Architectural integration: 40% (some solutions might address multiple problems, but we’ve never seen it work together).

You can argue with any individual number here; what matters is the shape of the problem: multiple hard constraints that multiply, not a single unlock away from god-mode.

The compound probability: 0.4 × 0.3 × 0.2 × 0.6 × 0.4 = 1.4%

Even if you’re more generous and roughly double my estimates for each problem: 0.7 × 0.6 × 0.4 × 0.9 × 0.7 = 10.6%

So we’re looking at somewhere between 1-15% probability over the next five years, depending on how optimistic you are about individual problems.

And this is why I find the AGI hype so disconnected from reality. The industry is committing hundreds of billions of dollars based on implicit assumptions of 60-80% probability or higher, when realistic compound probability analysis gives single-digit percentages even with generous assumptions.

The “In Principle” Problem

The response I often get is: “But it works in principle! We have proposals for how to address these problems.”

And yes, we do have proposals. Test-time training layers are a real mechanism. In-context learning as gradient descent is elegant mathematics. These are genuine technical ideas.

But “having proposals” is very different from “proposals that work at the scales that matter.”

Let me borrow an analogy from physics. Consider what I’ll call the “Alcubierre warp drive problem.” Yes, faster-than-light travel “works in principle” if you can create exotic matter with negative energy density and manipulate spacetime. The math technically allows it. But “works in principle” just means “works if we assume away all the impossibly hard parts.”

Similarly, yes, you can create unlimited gold through nuclear fusion “in principle”—you just need unlimited hydrogen and unlimited energy inputs. Technically possible! Completely absurd economically and thermodynamically.

AGI from scaled LLMs “works in principle” the same way. The proposals exist. But when test-time training only works at 1.3B parameters and faces unsolved challenges at production scale, when catastrophic forgetting intensifies rather than improves with scale, when metacognition remains external scaffolding rather than integrated architecture—”works in principle” means “works if we solve all the fundamental problems that remain unsolved at the scales that matter.”

That’s not an argument for feasibility. It’s a tautology dressed as analysis.

What the Pattern Reveals

Here’s what bothers me as I look at all these constraints together: They’re not random engineering challenges. They’re pointing to something systematic.

Every example of adaptive, general intelligence we’ve observed—from insects to humans—operates in a specific way: learning and using what you’ve learned aren’t separate processes. They happen simultaneously, in mutual modification. When a bird learns to navigate in flight, it’s not updating a frozen model after each session—it’s continuously adapting its understanding while using that understanding to navigate.

Now, maybe you could build general intelligence differently. Maybe you can separate learning and inference in time—train a model, freeze it, use it, then retrain later—and still get something that deserves to be called AGI.

But here’s my question: Why would we expect that to work?

Nature has run millions of experiments in intelligence across millions of species over hundreds of millions of years. And they all converged on integrated learning-inference happening simultaneously. Not because there’s something magical about biology, but because this approach solves a fundamental problem: how to remain adaptive in environments that are constantly changing in unpredictable ways.

When LLM advocates say “we can achieve AGI by separating what nature does simultaneously,” they’re making an extraordinary claim. Maybe they’re right. But the burden of proof sits with them, not with those of us pointing out that every working example of general intelligence operates differently.

A Historical Parallel

There’s a pattern here that reminds me of something from physics. In the 1980s, superstring theory swept through theoretical physics like a revolution. The mathematics was elegant, the promise extraordinary: a “theory of everything” that would unify all fundamental forces. The brightest minds flocked to it. Departments reorganized around it. Careers were built on the assumption that breakthrough was imminent.

The theory worked beautifully—in principle. The mathematics was sophisticated and internally consistent. But it required assumptions that couldn’t be tested: extra dimensions we couldn’t observe, energy scales we couldn’t reach, predictions we couldn’t verify. “Just scale up the particle accelerators,” the advocates said. “Just wait for the technology to catch up to the theory.”

Forty years later, string theory has produced remarkable mathematics but no empirically testable predictions. No experimental verification. No connection to measurable reality. It didn’t fail because the people working on it weren’t brilliant—they were and are. It plateaued because the gap between “works in principle with untestable assumptions” and “works in practice within physical constraints” turned out to be unbridgeable with the approach they’d chosen.

The current AI hype cycle feels structurally identical. Elegant scaling laws. Sophisticated mathematics. Confident predictions of imminent breakthroughs. “Just scale up the compute,” the advocates say. “Just wait for the architecture to catch up to the theory.”

But the problems we’re encountering—catastrophic forgetting intensifying with scale, energy requirements growing rather than shrinking, the persistent gap between narrow task performance and general adaptability—these aren’t bugs to be fixed. They might be signals that we’re in the wrong theoretical framework entirely.

I’m not saying LLMs are useless—they’re extraordinarily useful for specific tasks, just as string theory has produced useful mathematics. But “useful tool” is very different from “path to AGI.” And the more we scale within the current paradigm, the more we might be doing the equivalent of building bigger particle accelerators for a theory that can’t make contact with empirical reality.

The Category Error at the Center of the AGI Dream

This is where I need to be explicit about what I think is actually happening. The AGI enthusiasts have, perhaps without fully realizing it, made a fundamental philosophical mistake—what philosophers call a category error.

They’ve redefined the problem to fit their solution:

The original question: “How do we create systems that can understand, reason, and adapt the way humans do?”

The redefined question: “How do we scale pattern-matching until it exhibits behaviors that look like understanding, reasoning, and adaptation?”

These aren’t the same question. And the confusion between them reveals a deeper misunderstanding about what kind of thing intelligence is.

Here’s the error: AGI advocates treat consciousness as a complicated engineering problem rather than a complex phenomenon. Complicated problems scale. Complex ones don’t.

A complicated problem has many parts, but they’re ultimately reducible. You can solve it through sufficient resources, clever optimization, and engineering persistence. Building a faster computer is complicated. So is sending humans to Mars. These are hard, but they’re solvable through scaling up what we already know how to do.

A complex problem involves irreducible tensions, value judgments, questions about what something fundamentally is. These can’t be solved just by adding resources and clever optimization. They require understanding the nature of the thing itself—what makes it what it is, rather than just what it appears to do.

Intelligence, consciousness, meaning-making—these are complex in this deeper sense. They involve:

Understanding what it means for a system to be truly adaptive versus merely responsive
Distinguishing between prediction and comprehension
Recognizing the difference between processing symbols and meaning something by them
Grasping why embodiment and continuous learning might not be incidental features but constitutive ones

The AGI-from-LLMs thesis treats all of this as merely complicated: just scale the compute, optimize the architecture, throw enough resources at it, and intelligence will emerge.

But you can’t engineer your way to consciousness by scaling statistical pattern-matching, any more than you can engineer your way to love by optimizing neurotransmitter levels. The attempt itself reveals a misunderstanding about what kind of thing you’re trying to create.

This isn’t a minor technical confusion. It’s a fundamental philosophical error that no amount of engineering brilliance can overcome. And it’s why I think the more we scale within the current paradigm, the more we’re building the equivalent of bigger particle accelerators for a theory that’s in the wrong conceptual space entirely.

The “Alien Intelligence” Dodge

When I make these arguments, I often encounter a particular response: “But maybe AGI from LLMs will be a form of alien intelligence—something that operates on fundamentally different principles than biological intelligence. Your epistemological framework assumes intelligence has to work like biological systems, but that’s just carbon chauvinism. Silicon-based intelligence might emerge through entirely different mechanisms.”

Fine. I’m willing to entertain that possibility. But if you’re claiming this, you need to do more than gesture at science fiction concepts.

Explain to me what that actually means.

Not in vague terms about “emergent properties” or “alien cognition.” In concrete, falsifiable terms:

What does “alien intelligence” that emerges from scaled LLMs actually do differently? How does it remain adaptive in novel situations without continuous learning? How does it understand rather than predict without some form of integrated experience-model updating? What replaces the biological mechanisms for handling catastrophic forgetting—and if nothing does, why would we expect scaling to magically solve a problem that intensifies with scale?

The “alien intelligence” framing sounds sophisticated, but it’s often deployed as a thought-terminating cliche—a way to avoid specifying mechanisms or addressing constraints. It’s the AGI equivalent of “God works in mysterious ways.”

Here’s what makes it a dodge: If you’re claiming intelligence can work on radically different principles, you still need to explain what those principles are and why they would emerge from your specific architecture.

You don’t get to say:

“Intelligence might not require continuous learning” without explaining how a system remains adaptive in genuinely novel environments without it
“The separation of learning and inference might not matter” without explaining how the system handles the problems this separation creates
“Energy efficiency might not be a real constraint” without explaining how to overcome thermodynamic limits
“Maybe it just emerges at scale” without specifying what mechanisms would cause this emergence

The burden of proof doesn’t disappear because you add the word “alien” in front of “intelligence.” If anything, it increases. Extraordinary claims require extraordinary evidence, and “intelligence that works on completely different principles than every example we’ve ever observed” is an extraordinary claim.

Moreover, the “alien intelligence” framing often sneaks in an assumption: that because LLMs can do impressive things we didn’t explicitly program them to do, they must be on a path toward general intelligence through some mechanism we don’t need to understand.

But this confuses “we didn’t explicitly program this specific behavior” with “emergent general intelligence.” LLMs display emergent capabilities within the bounds of their architecture—better at tasks we didn’t specifically train them for, yes. But these emergent capabilities are still bounded by the fundamental separation of learning and inference, still constrained by the architecture’s inability to continuously adapt, still limited by all the problems we’ve discussed.

“Alien intelligence” that still can’t learn during inference, that still suffers catastrophic forgetting that intensifies with scale, that still requires megawatts to do what biological intelligence does at twenty watts—that’s not alien intelligence. That’s the same architectural limitations with a science fiction framing.

If you want to claim that AGI will emerge through principles radically different from biological intelligence, I’m genuinely interested. But “alien intelligence” can’t be a wildcard you play to avoid specifying mechanisms or addressing constraints.

Show me the principles. Explain how they work. Demonstrate why they would emerge from scaled LLMs specifically rather than requiring fundamentally different architectures. Address why the problems that get worse with scale somehow don’t matter for this alien intelligence.

Until then, “it’ll be alien intelligence” is just a sophisticated way of saying “maybe something miraculous will happen if we scale enough.”

And I don’t bet on miracles.

The Collapse of Something Essential

Let me try to articulate what I think is actually missing—and why I think it might not be fixable by better engineering.

When your brain processes information and learns from experience, these aren’t two separate processes happening at different times. They’re two aspects of the same continuous activity. You’re learning from experience while using what you learned to process new experience, which generates new learning, which modifies how you process the next moment.

This isn’t just convenient—I think it might be essential to what makes intelligence general and adaptive. The ability to update your understanding based on what you’re encountering right now, while you’re encountering it, using models that are being modified by that very encounter.

LLMs don’t do this. They rigidly separate learning (a massive, offline process during training) from inference (a fixed, real-time process during deployment). They’re sequential, not simultaneous. Separated, not integrated.

Now, I need to be careful here. I’m making a claim that goes beyond pure engineering: that the simultaneity of learning and inference isn’t just how biological intelligence happens to work, but might be constitutive of what makes it general and adaptive.

Can I prove this? No. But consider the evidence:

Every instance of adaptive, general intelligence we’ve observed operates this way. When we’ve built systems that separate learning and inference (like current LLMs), they excel at narrow tasks but struggle with genuine adaptability and transfer. And critically, the problems we encounter trying to add continuous learning to LLMs aren’t just hard—some of them get worse as we scale up.

Maybe that’s just because we haven’t scaled enough. Maybe sequential separation can achieve what simultaneous integration does, we just need more compute and better algorithms.

But that’s the claim that needs defending, not mine. Nature has converged on integrated learning-inference across every example of general intelligence. The burden of proof sits with those claiming we can do it differently.

Why You Can’t Just Bolt It On

And here’s what makes this particularly challenging: You can’t just add continuous learning to a system designed for fixed inference and expect it to behave like biological intelligence. The integration has to be fundamental, not an afterthought.

The proposals for retrofitting this are staggering. They require running what is essentially continuous, training-level computation—gradient descent, backpropagation, real-time weight updates—during what’s currently “inference time,” all while maintaining stability and avoiding the catastrophic forgetting that we know gets worse with scale.

But nobody has demonstrated continuous gradient computation, backpropagation, and weight updates running constantly during inference at frontier model scales within energy budgets that would make deployment feasible.

The efficiency improvements we’ve seen—NorthPole’s 25-72× gains, neuromorphic systems’ 100-1000× improvements—are for inference through fixed weights. Adding continuous learning during inference would require orders of magnitude more computation that these efficiency gains would first need to overcome just to get back to current inference costs, let alone improve on them.

That’s not one engineering challenge—that’s solving energy efficiency and then solving it again for an entirely different computational profile. You need breakthrough efficiency gains just to make continuous learning as cheap as current fixed-weight inference, and then you need additional breakthroughs to make it economically viable for deployment at scale.

Nobody has shown a path to doing this. They assume efficiency improvements will materialize because they need to.

One could be forgiven for seeing this as wishcasting. What NASA accident researchers called “go fever”—the excessive eagerness to proceed with a launch despite warning signs, schedule pressure, and unresolved risks. It was identified as a pathological cultural tendency after both the Challenger and Columbia disasters: the impulse to keep going because you’re already committed, because billions have been spent, because careers depend on it, because turning back feels like failure.

The Economic Reality

So why is the industry pouring hundreds of billions into this if the technical case is so uncertain?

Because bubbles don’t require technical validity—they require narrative momentum and aligned incentives.

Sequoia Capital calculates a $600 billion annual revenue gap—the difference between what current AI infrastructure investment would need to generate versus actual revenue (roughly $60-85 billion in 2025). The industry “strain ratio” of capex to revenue at approximately 6× exceeds the peaks of the railroad bubble (2×) and telecom bubble (4×).

The AGI narrative serves multiple purposes that have nothing to do with actual AGI feasibility:

It justifies enormous capital investment in compute infrastructure. It provides cover for tech oligarchs positioning themselves as builders of the future. It creates urgency around “AI safety” that benefits incumbents by raising regulatory barriers. It distracts from the immediate harms of current AI deployment. And it lets companies raise money on dreams rather than demonstrated returns.

Now, there are important differences from historical bubbles. Today’s tech giants generate massive profits—NVIDIA’s 53% net margin, the Magnificent 7 combining for over $200 billion in annual profits. This isn’t the dot-com bubble where companies had no revenue. Current AI leaders have strong balance sheets and can largely fund expansion from cash flow rather than debt.

But here’s the critical insight: Without AGI or near-AGI capabilities materializing by 2028-2030, current investment levels are difficult to justify through narrow AI applications alone. Even Morgan Stanley’s bullish projection of $1.1 trillion in GenAI revenue by 2028 leaves a substantial gap if infrastructure spending reaches projected levels.

You can keep a bubble inflated for a long time if enough powerful people benefit from keeping it inflated. But eventually reality asserts itself. The laws that govern energy efficiency, the mathematics of compound probability, the difference between what you can optimize and what requires fundamental rethinking—these don’t negotiate.

The ground is approaching.

The Moral Stakes

The economic bubble is the most visible part of the spectacle, but it’s also the most transient. The ground is approaching, and capital will eventually have to reckon with physics. What’s more dangerous is the philosophical architecture this bubble is being built to justify.

If I’m right that AGI-from-LLMs represents a category error—treating consciousness as an engineering problem rather than as something complex and irreducible—then the stakes go beyond economics.

The AGI narrative provides technical-sounding justification for a future where human judgment is systematically devalued. If intelligence is just sophisticated computation, then human consciousness—with all its messiness, its embodiment, its meaning-making—is just an inferior version of what machines will do better.

The AGI project, as currently sold, embodies what happens when intellectual sophistication is applied without regard for human dignity—when optimization replaces meaning, when efficiency trumps agency, when the lived experience of being conscious is treated as a problem to be solved rather than as reality to be honored.

The impulse beneath the AGI narrative says: Let the superior intelligence of the machine—and by extension, its creators—handle things. Let those smart enough to build god govern on its behalf. Let technical expertise replace democratic choice about what kind of future we want to build together.

The economic bubble will pop. But if we’re not careful about the category error it’s built on, the moral architecture being constructed—where human judgment is treated as inferior to machine optimization, where consciousness is framed as an engineering challenge—that could outlast the bubble itself.

The Expert Disagreement

The landscape of expert opinion reveals something important. It’s strikingly bimodal—not a normal distribution but two distinct camps:

Industry leaders at major AI labs predict AGI in the near term. Dario Amodei at Anthropic has predicted that AI systems will be “better than almost all humans at almost everything” within two to three years, possibly by 2027.

Academic researchers and some industry figures are far more skeptical. Yann LeCun calls LLMs “a dead end, a distraction” and proposes entirely different architectures. Gary Marcus calls sky-high valuations based on LLMs becoming AGI “just a fantasy.” François Chollet notes that even advanced reasoning models “only get 4% on ARC-AGI-2”—suggesting genuine novel reasoning remains elusive.

Most significant is Ilya Sutskever’s evolution. The former OpenAI chief scientist and chief architect of the scaling paradigm now declares the “age of scaling” is over and says new approaches are needed. This isn’t an external critic—this is the person who built the approach acknowledging its limitations.

When the people building the spectacle see one reality and the people studying the foundations see another, you should pay attention to the foundations.

What I’m Betting On

I’m not just skeptical—I’m willing to bet on this analysis. I’ve taken positions shorting the thesis that AGI breakthroughs will justify current AI valuations in the near term. Not because I’m certain about timing—bubbles can stay irrational longer than you can stay solvent, as they say. But because the compound probability math combined with what I think is a fundamental category error says the breakthroughs probably aren’t coming.

This isn’t pessimism about technology. I spent my career building systems that actually worked. I know what real progress looks like versus what hype looks like.

Real progress shows you mechanisms demonstrated at scale, addresses constraints with working solutions, provides evidence of feasibility within realistic budgets. We have proposals and small-scale demonstrations. We don’t have solutions working at frontier scale. We have problems that worsen rather than improve as systems scale up. And we have what looks like a fundamental misunderstanding about what kind of problem we’re trying to solve.

The current AGI narrative asks you to have faith that all these problems will be solved simultaneously, that demonstrations at small scale will somehow work at production scale despite evidence to the contrary, that you can separate what nature integrates and still get the same result.

The skeptics who raise persistent, technically grounded objections—researchers like Gary Marcus, François Chollet, and others—are often dismissed as trolls rather than engaged seriously. When critics are treated as annoyances rather than taken seriously, you should ask why. The bear case is clear. The architectural barriers are real. The compound probabilities don’t lie.

The Bottom Line

Based on everything I know as someone who built systems for a living, here’s what I believe:

The AGI-from-LLMs thesis appears to be a category error—an attempt to solve through engineering optimization what might require fundamentally different thinking about the nature of intelligence itself.

Current LLM approaches have architectural limitations that aren’t minor engineering challenges. We have specific technical proposals for addressing some of them, but none demonstrated at frontier scales. Some problems—like catastrophic forgetting—actually worsen as systems scale up. And the separation of learning and inference, which works fine for narrow tasks, might be fundamentally incompatible with the kind of general adaptiveness we’re aiming for.

Using compound probability with realistic estimates for each problem, I get somewhere between 1-15% probability of AGI from scaled LLMs over five-year horizons.

That’s dramatically lower than the 60-80%+ probability the market is implicitly pricing in. The gap represents massive mispricing.

Even optimists are acknowledging the limitations now. Ilya Sutskever says scaling is over. Yann LeCun calls LLMs a dead end. The bimodal expert distribution itself suggests high uncertainty, not convergent knowledge.

The architectural barriers are real. The economic strain is historically extreme. Without AGI-level capabilities by 2028-2030, current investment levels face serious correction risk.

I could be wrong. Maybe I’m seeing category errors where there are just hard engineering problems. Maybe simultaneous learning-inference isn’t actually necessary for general intelligence. Maybe all these problems will be solved, and my probability estimates are too pessimistic.

But I’m claiming that my estimates are based on constraint analysis, compound probability, engineering judgment about the gap between “works at small scale” and “works at scale that matters,” and serious consideration of what intelligence actually requires.

And those estimates say: No AGI breakthrough on the near horizon. Real bubble dynamics. Likely to pop.

Two plus two equals four. There are twenty-four hours in a day. And when the problems you need to solve get worse as you scale up—when the chief architect of scaling admits the approach is insufficient—when you’re trying to add essential features to an architecture designed without them—when every working example of general intelligence operates differently than your proposal—you should update your confidence accordingly.

I’m betting against the hype. And I’m comfortable with that bet.

Because sometimes the most sophisticated position is recognizing that you might be trying to solve the wrong kind of problem—even when you’re surrounded by very expensive computational infrastructure and very sophisticated people claiming that any day now, if we just scale a bit more, something miraculous will emerge.

The real danger isn’t that machines will become intelligent—it’s that we’ll mistake impressive computation for understanding and surrender our judgment to those who control the servers.

The circus continues. The ground approaches. And some of us are paying attention to the actual distance.

One of my interlocutors, someone I relied on to pressure test this argument, sent me a frustrated response: “What do you know, Mike? You’re not a researcher operating at the state-of-the-art in the field.”

That is the question, isn’t it? Whether only people building the bubble are allowed to question it — or whether those of us who understand constraints, incentives, and category errors have not just the right, but the responsibility, to say: this doesn’t add up.

Go Deeper into the Circus

Gary Marcus

Nov 28

excellent essay, and I am in full agreement that LLMs will not get us to AGI. that’s been clear from the jump from HOW they work, and massive scaling has not changed that.

Halftrack Oxnard

Well said. I took an MIT university level course on LLMs and came to the same conclusion. My analogy: LLMs are like the auto steering and adaptive cruise control on my Subaru. Very useful, but not a substitute for human judgement. A self driving car won’t emerge from continual tweaking of that architecture wont result in a self driving car

62 more comments...

Discussion about this post

Ready for more?