The Wrong Hole

"Did I ever tell you what the definition of insanity is? Insanity is doing the exact same fucking thing over and over again, expecting shit to change. That is crazy. The first time somebody told me that, I dunno, I thought they were bullshitting me, so, boom, I shot him. The thing is... he was right. And then I started seeing, everywhere I looked, everywhere I looked all these fucking pricks, everywhere I looked, doing the exact same fucking thing... over and over and over and over again thinking, 'This time is going to be different. No, no, no please... This time is going to be different.'"

— Vaas Montenegro, Far Cry 3

1. The Parable

The Levi Strauss story is Silicon Valley’s favorite parable. During the Gold Rush, miners went broke chasing nuggets while Strauss sold them jeans and got rich. The lesson: sell shovels, not dreams. NVIDIA, we’re told, is the Levi Strauss of the AI gold rush—inevitable winners regardless of which AI company strikes it rich.

It’s a comforting narrative. It’s also wrong. Not wrong in degree. Wrong in kind.

The parable assumes the miners were digging in the right place. Shovels are agnostic to location. You can dig anywhere with a shovel—California, Alaska, your backyard. The shovel doesn’t care. It just moves dirt.

But NVIDIA’s shovels aren’t agnostic. They’re optimized for a very specific type of digging: massively parallel matrix multiplication. Tensor cores. CUDA. The entire stack is purpose-built for one paradigm—autoregressive transformer training and inference.

If the gold isn’t underground, the shovel isn’t just useless. The shovel is the problem.

2. The Math (Without the Formula)

Yann LeCun has been saying a version of this for years, and the core point is almost too simple:

If a system has even a small chance of going wrong at each step, then as you chain more steps together, the probability of staying “on the right track” collapses fast. That’s not a dunk on any one lab. It’s a horizon problem.

Autoregressive models don’t just make mistakes. They compound them. And once a model drifts, it starts conditioning on its own drift. You can add RLHF, guardrails, constitutional layers—useful, sometimes necessary—but these are corrections layered on top of a generator that is not built for stable long-horizon control.

The claim isn’t “LLMs never work.” They obviously do. The claim is sharper: as you push toward longer, higher-stakes sequences—planning, agency, autonomy—fragility rises faster than confidence. If the destination is reliable, controllable intelligence, this is not a small issue. It’s structural.

LeCun’s blunt conclusion follows: you don’t patch your way out of an architectural tendency like this. You redesign.

3. The Epistemological Trap

Here’s where it gets uncomfortable.

GPUs existed before deep learning. They were built for video games—rendering pixels, shading polygons. Then researchers noticed that the same parallel processing architecture happened to be very good at matrix operations. And matrix operations happened to be the bottleneck for neural networks.

So we scaled neural networks to fit what GPUs could do.

Then transformers arrived—an architecture that maps cleanly onto GPU throughput at scale. Self-attention is matrix multiplication. Feed-forward layers are matrix multiplication. Training is a festival of batched dense linear algebra.

Whether or not anyone intended it, the ecosystem selected what fit the machine.

And now we’ve quietly absorbed the implicit premise: intelligence is whatever this machine can produce when we scale it.

The hammer didn’t find the nail. The hammer created a world where “nails” are what get funded, published, benchmarked, and rewarded.

4. The Biological Counterargument (Used Carefully)

Brains don’t backprop the way our training loops do. Biological neurons are sparse, asynchronous, event-driven. They fire or they don’t. They don’t compute dense attention scores across every other neuron in a context window. They don’t do floating-point matrix multiplication at all.

Maybe that’s fine. Maybe artificial intelligence doesn’t need to mimic biological intelligence. Planes don’t flap their wings.

But planes still obey aerodynamics. They didn’t succeed by ignoring physics—they succeeded by finding a different solution to the same constraints: energy, stability, control, and feedback from the real world.

What constraints are transformers solving? Token prediction. Next-word probability. That’s not a constraint imposed by intelligence. That’s a constraint imposed by what we knew how to train efficiently on GPUs in 2017.

We’re not flying. We’re building very elaborate catapults and calling the arc “flight.”

5. Capability vs Reliability

This is the part that gets blurred on purpose.

Digging produces something. Benchmarks improve. Demos get smoother. The system gets more capable in the narrow sense: it can imitate, summarize, code, converse, style-match.

But the destination everyone is pricing isn’t just capability. It’s reliable agency: systems that can plan, act, stay grounded, and remain correct under long-horizon pressure.

A generator can look brilliant for 200 tokens and still be brittle at 2,000. That isn’t a cosmetic problem. It’s the difference between a parlor trick and an engine.

6. The Sunk Cost Spiral

The sunk cost is staggering.

NVIDIA’s market cap touched multi-trillion dollars. Microsoft has poured enormous sums into AI infrastructure. Google, Amazon, Meta—collectively, hundreds of billions more. Data centers consuming gigawatts. Talent wars for people who know how to scale transformers. An entire economy organized around making the current paradigm work.

And every dollar spent is another reason to believe the paradigm is correct.

This is the trap. Investment doesn’t just follow conviction—it creates conviction. You cannot spend nine figures on something and then say, publicly, “we may be building the wrong thing.” Not if you’re a public company. Not if you’re a CEO with a board. Not if you’re a researcher whose career is anchored to the paradigm.

The shovel sellers need you to keep digging.

But more than that—you need you to keep digging. Because if you stop, you have to face what you already spent.

7. The Wrong Coordinates

What if the gold is in the sky? What if it’s at the bottom of the ocean? Out in space?

You’ll never find it by digging. But everyone around you is digging. The best diggers in the world. The most funded diggers. The diggers with PhDs and Nature papers and TED talks. They’re all digging, and they’re all finding something—ore, artifacts, interesting rocks. Progress. Metrics improving quarter over quarter.

Just not gold.

And the kid who says “what if we should fly or swim” gets laughed out of the room. Look at all these holes. Look at the benchmarks. The latest model scores better than the last on every eval. That’s progress. That’s science.

Except the evals were designed to measure digging proficiency. Of course better diggers score better on digging tests. That tells you nothing about whether digging is the right activity.

The social proof is airtight. The consensus is overwhelming. And the consensus points in a direction that may never reach the destination everyone claims to be heading toward.

8. The Loop

Here’s what’s both amusing and chilling:

Even if the bubble pops—even if the AI trade unwinds and magazine covers declare an “AI winter”—it won’t teach the right lesson.

Valuation resets don’t trigger paradigm resets. After the dot-com crash, survivors didn’t question the internet. They just got cheaper and rebuilt. Fair enough—the internet wasn’t wrong. Just the prices.

But if autoregressive-as-the-core is wrong? A drawdown doesn’t teach anyone that the architecture is brittle at the destination. It teaches them “we over-invested too fast.” And then what?

Bigger clusters. Bigger context windows. Bigger datasets. Better scaffolding. Better guardrails. “This time it scales.”

Same activity. More conviction. A louder claim that it has to work—because look how much we’ve spent.

9. What Survives

Markets don’t reward truth. They reward survivability.

Every bubble eventually reveals a layer of real value underneath the hype. Dot-com didn’t kill the internet—but it did kill a lot of companies who thought “eyeballs” were a business model. The survivors were the ones who found durable value creation beneath the story.

What survives here?

Not necessarily the current paradigm as the endpoint. If long-horizon reliability and control are the destination, then the winner may be whatever comes next: world models, energy-based approaches, neuromorphic or event-driven compute, analog methods, or something we can’t even name yet.

Because the NVIDIA hegemony doesn’t just fund one approach—it shapes what feels thinkable. The research ecosystem is optimized for transformer-on-GPU work. Funding flows to what GPUs can train. Careers are built on what GPUs can run.

The shovel didn’t just limit the search. The shovel defined what “finding” means.

10. The Only Trade

So what do you do?

You stop digging. Not because you’re certain about next quarter’s stock prices—you can be right or wrong about that and it’s almost beside the point. You stop because you’ve recognized that the activity itself may be misdirected.

This is different from waiting for a pullback. A pullback means you want in at a better price. This is recognizing that no price makes the thesis work. The hole is pointed at bedrock.

Cash isn’t a hedge. Cash isn’t a fear trade. Cash is optionality for a world that doesn’t exist yet—because the current paradigm makes alternatives hard to fund, hard to research, and hard to imagine.

When the real breakthrough comes, it won’t look like a better transformer. It won’t be “just add more GPUs.” It might not even look like “AI” as we’ve defined it. The diggers will dismiss it. “That’s not real AI. Real AI scales with compute.”

And that’s how you’ll know it might be the real thing.

Personally, my gut check is simple: how much duct tape does it need to look intelligent?

Right now, it’s duct tape all the way down. Kludgy as hell.