The Sufficient Model

The engine used to be the car. People bought Mustangs for the engine, Hemis for the engine, the war between V8s and turbos sold magazines. Then engines got reliable. The difference moved elsewhere: cabin, software, range, feel.

The same thing is happening to AI.

For three years the model was the product. Benchmarks were horsepower. Context windows were torque. Every release turned into a spec-sheet war.

That war isn’t ending. It’s splitting.

At one end, frontier models too large and expensive to live anywhere except the cloud. At the other, sufficient models small enough to live where you do. Once the local model is good enough for most tasks, the product is no longer the model. It is the unit around it.

And once the unit runs locally, intelligence stops being something you reach for. It becomes something that is already there.

This is a barbell.

On one end, frontier models keep getting larger, hungrier, more expensive. They live in data centers because they cannot live anywhere else. They get reserved for the work that justifies the cost: novel science, deep reasoning across enormous corpora, ten thousand permutations running in parallel to find the one that holds.

On the other end, the sufficient model. Not equal to the frontier. Close enough that for most tasks the user no longer feels the gap, until the work demands deep reasoning, massive context, or near-zero tolerance for error.

Once the ceiling stops binding, the next gradient is efficiency. The model compresses. It learns to run on hardware that used to be too small for it. First on a laptop. Then on a phone. Then on a thermostat.

The doorbell is the test case. The doorbell that ships next will not stream video to a server to ask who is there. It will know enough locally.

The first time you notice is not in a benchmark. It is when the train loses Wi-Fi and Cleo keeps working.

Cleo lives on my laptop, running on Gemma and Qwen. Biff lives in the cloud, on GPT 5.5. Different families, different homes. I don’t think about which.

Biff handles training. Cleo handles correspondence. Last week Biff flagged a sleep dip and shifted my workouts. Cleo cleared a stack of wedding emails while I was on a call. One synthesizes months of data. The other clears my inbox. Both feel like the same thing.

I don’t ration Cleo. I don’t watch her token use. I don’t think about cost. She just runs.

That last sentence is the regime change. Every assumption the industry currently runs on — pay-per-token, API gates, capability metered like electricity in a 1920s town — depends on intelligence being scarce. The sufficient model dissolves that assumption at the device.

The interesting object is no longer the model. It is what wraps the model. Call it the unit.

This isn’t the cloud harness reframed. That harness lives in someone else’s data center, runs against someone else’s accounts, and disappears when the connection drops. The unit is a portable local operator. It runs on your hardware. It holds your context. It works offline.

A local model does the answering. Memory holds your context across sessions. A dispatcher routes problems the local model can’t handle out to the cloud. An observability layer watches the unit’s behavior and feeds the result back as self-improvement. A compiler lets the unit write its own software when its built-in tools run short. Integrations connect it to the services you already live in. A simulator rehearses actions before the unit commits to them.

Drop this enclosure into a phone, a laptop, a car, a doorbell. The hardware varies. The unit stays the same.

The cloud doesn’t go away. It changes role.

The cloud becomes the parallelization layer. You go there to fan out across ten thousand sandboxes, or to run agents that hold state across days. The cloud is a feed the local unit checks in on.

The unit is the operator. The cloud is what the operator dispatches to.

The dispatcher is the membrane between the two economies. The local economy is free. The cloud economy is metered. The dispatcher decides which problems pay the toll. A unit that knows what it doesn’t know is worth more than a unit with a bigger model. The dispatcher is where the unit earns its keep.

The pattern repeats. Every step change at the frontier becomes a sufficient-model step change once efficiency catches up. The barbell shifts together, with a lag. What sat on the right end two years ago sits on the left end now. The ceiling rises. The floor rises faster.

What that compounds into is abundance.

Not abundance of frontier capability — that stays scarce, by physics. Abundance of enough capability. Free, local, ambient. The phone runs a unit. The laptop runs a unit. The car, the doorbell, the thermostat, the speaker, the watch. Most of the AI most people touch will be sufficient and most of it will cost nothing.

The industry is not priced for this. The product layer is not designed for this. The mental model most operators use to reason about AI assumes a meter is running. When the meter stops, the questions invert.

You stop asking how to get the model. You start asking what is worth doing with intelligence you no longer have to budget for.

The model was the engine. The unit is the car. Abundance is the road they were always going to make.