Where Software is Going — Luke Swithenbank

AI made generating code cheap. The disciplines that used to be optional, the harness and knowing what good looks like, are now the whole game.

Everyone is watching the models. I think the model is the least interesting part.

The engineers getting the most out of AI aren’t doing anything new. They’re doing what good engineers have always done: building the harness around the model and knowing what good looks like. That used to be optional. It isn’t anymore.

Addy Osmani’s whitepaper on the new SDLC is the best articulation I’ve seen of why. I agree with a lot of it, but it’s nuanced, so here’s where I land.

An agent is more harness than model

Start with the harness. Addy defines it as everything except the model, which is true, and I’d put the context and the tools inside it too. The tools a model uses drastically improve its output. It’s much easier to string together very powerful tools than to do the lower-level stuff yourself, the same way a scripting language gets you further than a compiled low-level one.

This is just platform work, and it’s how I’ve always approached it. Building good harnesses into the SDLC, even before AI, was the biggest benefit to the speed and quality of a codebase. Building the right gates into CI and the right tools for the engineers was what made the most sense. It even keeps mattering after the code ships. The right tools for observability and debugging are what let people solve issues quickly when they need to.

None of that is new. There’s just more riding on it now.

Context engineering is the cost knob

Context engineering is the knob that changes your price and your bill. I hadn’t thought of it as static context versus dynamic context before, but it makes the most sense.

For myself, I put so much into static context that it then gets overrun, which isn’t great for my bill. It does mean I understand exactly what’s going on, though. I’ve been toying with a friend’s knowledge base (equalseat.ai) and I’m hoping it solves a lot of the dynamic retrieved documents.

I could do a better job of using skills as I need them. A lot of my tasks aren’t as repeatable as I’d like, so I often don’t write them down as explicit skills. Maybe that means I should compose things better, but it’s something I just don’t do that often. Maybe I should run the Claude usage analytics?

Verification is the line

Verification is the line between vibe coding and engineering. I watch the AI work so I can verify what it’s done. I don’t put too much time into formal specs or evals to check it’s done the right thing, unless it’s code review and CI/CD gates.

When most of the work is creative, you have to know what good looks like, and I learn that by seeing results and forming opinions on them. A lot of people download skills off the internet to get good results quickly, because those skills bake in what good looks like. They don’t have to learn it all from scratch, and they get to leverage the work of others.

This is really about evals: checking the AI is moving in the right direction. I need to do this more. You have to be able to measure the output as well as how it got there. The same idea holds outside software, it’s just harder. Measuring guardrails for things unrelated to software engineering is hard. It’s still possible, but you have to know what good looks like there too, and actually think about it.

AI compresses the SDLC

Requirements, architecture and verification stay slow, while generating the code is now much, much faster.

I don’t find that requirements are a spec and a prototype at the same time, but I think that’s a great idea and it makes a lot of sense. Architecture is by far the biggest thing the model can’t do well, and the part we have to weigh in on. Implementation speeds up sometimes, but if there isn’t a good harness or good reviewing, things slow down immensely while the system races to accommodate generating that much code.

I haven’t really wired evals up properly, so again, I need to get better at it so everything runs against a benchmark. You can tell that TDD is how these tools want to work. The maintenance angle is fair too: code that was too risky to touch becomes workable once an LLM can understand it and write tests for it. That doesn’t mean we should touch things we don’t understand.

The economics

All of it lands on one decision, and it’s the part of Addy’s article I refer back to most. There’s a point where agentic engineering makes sense, and a point where vibe coding does. It just depends on maintenance.

I see this often at the moment. Most people don’t need full engineering, and vibe coding works beautifully. Their app is small and only needs to run for a handful of people. It’s like a shared Excel spreadsheet: it does the job and nothing more. It’s when things get popular and need proper maintenance that it gets hectic. Context engineering and model routing are financial levers, and you have to pull them as things heat up. If you don’t, it bites, fast.

Managing teams of agents

Addy spruiks an agents-cli command that lets you build your own prototype managed agent. I think that’s nice, but it’s not foundational.

What’s foundational is humans managing teams of agents. We like creating hierarchies, and this is the same move. He’s spot on that going from conductor to orchestrator is a skills shift. I thought I was an orchestrator. I don’t think I am. I can’t handle more than 10 agents at a time without a lot of help.

What actually changed

So where is software going? Not toward magic, and not toward whoever has the best model. It’s going toward the people who know what good looks like and can build the systems that enforce it.

The harness and knowing what good looks like were always the job. AI didn’t change that. It just made them much harder to skip.

I’m not all the way there myself. I still haven’t wired my evals up properly, and I cap out around 10 agents before I need help. But that’s the stuff to get good at now. Not prompting. It’s the same platform work I’ve been doing for years, it just matters more now.