The AI industry has spent the last five years trying to make compute cheaper through software. Quantization. Pruning. Distillation. Sparse attention. Flash attention. Mixture of experts. Each technique extracting a few percentage points of efficiency from hardware that was never designed for this workload in the first place.

The results are impressive. They are also, fundamentally, band-aids. We are optimizing around a problem we haven't named.

The problem is not that our algorithms are inefficient. The problem is that we are running the wrong physics.

A GPU computes by moving electrons through logic gates — billions of times per second, at enormous energy cost. The precision is perfect. The energy accounting is brutal. A single multiply-accumulate operation consumes around ten picojoules. A single forward pass through a large model requires hundreds of billions of them. The math is unforgiving.

What makes this particularly wasteful is what we're actually computing. Neural network inference is, at its core, a probabilistic operation. We're not solving differential equations to twelve decimal places. We're sampling from a distribution, estimating a score, finding a likely configuration. The answer doesn't need to be exact. It needs to be right.

The simulation tax

Here is the thing nobody says out loud: most of what a GPU does during AI inference is simulate physics. Diffusion models literally run the reverse of a physical diffusion process. Score-based generative models estimate the gradient of a probability distribution — the same gradient that governs how particles move in a thermal system. The computation we are performing on silicon is a mathematical description of something nature does for free.

We call this the simulation tax. Every operation that models a physical process on digital hardware pays it. The tax is not small. The gap between the energy required to simulate a physical process and the energy required to run it — actually run it, on a substrate that obeys the right physics natively — is measured in orders of magnitude.

Landauer's principle gives us the theoretical floor: erasing one bit of information at room temperature costs at minimum 0.018 electron volts, or about three zeptojoules. Current hardware operates roughly ten billion times above this limit. That is not an engineering gap. That is a physics gap.

What software optimization misses

Software optimization works within the constraints of the substrate. It cannot change the substrate. Quantization reduces precision — it does not change the fact that each reduced-precision operation still moves electrons through logic gates. Sparsity skips operations — it does not change the energy cost of the operations that remain. Distillation makes models smaller — it does not change what the hardware is.

These techniques are not wrong. They are necessary given current hardware. But they share a common ceiling: they cannot surpass the efficiency of the substrate they run on. And the substrate is wrong.

The correct question is not how to make digital hardware run AI faster. It is what physical substrate natively computes what AI actually needs — and whether we can build it.

The substrate question

AI is not special. The computations it performs — sampling, integration, score estimation, probabilistic inference — have natural physical analogs. Resistor networks perform matrix-vector multiplication passively, at femtojoule cost, as a consequence of Ohm's law. Thermal systems sample from Boltzmann distributions natively, without any computation at all. The physics is already doing the work. We are just not using it.

The history of computing has always moved toward closer alignment between the substrate and the computation. Digital computers were not faster mechanical calculators — they were a different substrate entirely, matched to a different class of problems. The next shift will not come from making digital computers faster. It will come from recognizing that some problems belong to a different substrate.

AI inference and training — at least the class of problems defined by probabilistic generative modeling — belong to physics. The question is who builds the hardware that lets them run there.

That is what we are working on.