Self-Driving Software

An engineer working with a coding agent today doesn’t move through stages. There are no stages. There’s intent, there’s generation, and there’s observation. The agent receives context. It produces software. The software either works or it doesn’t. If it doesn’t, the loop tightens and runs again.

I’ve watched engineers who started after Cursor launched try to explain what they do. They can’t map it to the SDLC because they never learned it. They never sat through sprint planning. They never estimated story points. They never waited three days for a PR review. They just build things. Describe what you want. The agent writes it. You look at it. You iterate. You ship.

And the software they produce is fine.

The stages that defined how software gets built for thirty years didn’t get faster. They merged. Then most of them disappeared. Software is learning to drive itself. At Level 2, your hands are still on the wheel. At Level 5, you’re choosing the destination. The mechanics of getting there are completely consumed.

But, what happens to the people in the car?

The Two Collapses

The first collapse: four roles become one.

Designer, engineer, product manager, and data scientist existed as separate disciplines because each required deep specialization and the handoffs between them were expensive. The designer prototyped in Figma. The engineer implemented in code. The PM specified in slides and docs. The data scientist built models in notebooks. Four roles, four tools, four artifacts. By the time a product ships, the original intent has been through four rounds of telephone.

Agents eliminate the translation. When one person can describe a UI and the agent renders it, build a data pipeline, iterate on business logic, and ship a working prototype in the same session, the four disciplines collapse into one. Call this person a Builder. Say what you want, look at what the agent made, say what’s wrong, go again.

I’ve always been a toolmaker — product instinct, design sense, an impatient relationship with engineering. Last month I built me and my fiance a wedding website. Personalized RSVPs, per-guest itineraries, dietary restriction handling, travel logistics, plus-one management. A design problem, a data problem, a product problem, and an engineering problem. A year ago it’s four people and a timeline. I did it in a weekend. Perfectly custom software. The boundaries between those disciplines stopped mattering when the agent could move between them faster than I could write a brief.

The second collapse: the Builder falls away.

As agents improve, you move upstream. First you’re telling the agent to do a task. Then you’re asking it to accomplish a thing. Then you’re pointing it at a goal. Then you’re describing what needs to be true in the world, and the agent works backward from there.

At each step, more of the Builder’s work gets absorbed by the model. The execution migrates into the machine, the same way typesetting migrated into desktop publishing, the same way drafting migrated into CAD. The tool internalized the knowledge.

What remains is someone I’d call a Producer. Think of a reality TV producer. They don’t perform. They don’t operate the cameras. They create the conditions where the right thing happens, ensure it gets captured, and make the calls about what stays and what gets cut. The Producer seeks an outcome. They marshal the resources, manage the distribution, and hold the contact point where capability meets need.

In ninety days I’ve shipped over forty applications. A friend started a GLP-1 medication and was anxious about understanding the patterns of their symptoms, so I built them a voice-driven tracker that uses AI to surface correlations they’d never catch in a journal. I’m chasing a marathon time my current training approach won’t get me to alone, so I built a coaching skill that pulls my biometric and sleep data and adapts the plan when reality diverges from the schedule. A wine social network. A personalized markdown viewer built around craft typography. A photographers-only sharing site. Dozens of reusable skills — deployment incantations, design systems, data persistence patterns.

Each of these existed because someone needed something to be true in the world. I don’t know what my CloudFormation templates do. When the wine app broke in production, the agent fixed it faster than I could have found the log file. Some of these apps have stuck. Some have hardcore daily users. Some have people paying for them. Not all. The value is in the velocity: enough shots on goal that the ones which connect matter.

The execution decisions were the agent’s. Whether the outcome was right, whether the product was serving people, whether it was worth existing: those were mine.

After the Doublings

The autonomous task horizon for frontier agents has been doubling since 2023, according to METR’s benchmark research. Through 2025, the horizon at the 50% reliability threshold doubled roughly every seven months. The latest data: four months. Three and a half minutes when GPT-4 launched. Fourteen and a half hours today. The curve is steepening.

Take the marathon coaching skill. Today, I tell the agent to research the science: periodization, taper protocols, heart rate variability, sleep-performance correlations. Distill it. Use that as the framework for adapting a training plan. Here’s my biometric data, my sleep history, my race goals. Here’s the outcome: get me to the start line healthy, finish under 3:30. Find what’s relevant. Structure the plan however the science supports.

A few doublings in, the system handles the full loop. It ingests the science, adapts to biometric data, detects patterns across users, proposes changes to the training logic and the interface, tests them, measures retention, iterates. I’m not evaluating individual coaching decisions anymore. I’m not curating. The system does that.

What I’m doing is asking whether the marathon coach should exist for ultramarathon runners. Whether there’s a version for physical therapy patients rehabbing a knee. Whether a partnership with Garmin changes the data picture enough to justify the integration. Whether the product is reaching the people who need it.

That’s the Producer at full resolution. Not managing the product. Managing the outcome. The system builds, measures, and improves. The Producer decides what should exist in the world, ensures it reaches people, and knows when it’s done.

Software used to be built by humans and monitored by machines. Now it’s built by machines and curated by humans. Soon the curation will be absorbed too. What remains is the decision about what should exist at all.

For Now

Everything I’ve described follows the autonomy curve toward Level 5. At Level 5, the car drives itself. The human chooses the destination.

So: when does the Producer get subsumed? If agents can research, build, distribute, analyze, and improve — creating the conditions for outcomes as capably as they write code — what’s left? Maybe the Producer doesn’t get subsumed. Maybe the role recedes until the only remaining act is deciding what should exist, which is less a job than an expression of values. Or maybe that gets absorbed too.

I’m trying to find out. Every day I try to move further upstream. Every task I touch, I ask: why am I doing this? Why isn’t the AI doing it for me? Not as a thought experiment. As pressure. To make the future arrive faster, to find the edges of what agents can’t yet do by refusing to accept the current boundary.

The boundary keeps moving. The list of things I do that agents can’t is shorter than it was last month. It’ll be shorter again next month.

The software is driving itself. For now, someone still has to know where it should go.