Agent prompts evolved from yelling to specs

05 Apr 2025

I was looking at our agent prompts the other day and thought about how quickly the way we build the agent has changed.

In 2023, our prompts looked like emergency broadcasts: “ALWAYS LOOK AT RECENT DEPLOYMENTS FIRST!!!” and “JUST GIVE ME JSON, ONLY THE FACTS!!!” We were yelling at the models, pleading for them to just follow basic instructions.

Fast forward to today, and the prompts read more like technical specs with nuanced instructions. The architecture has grown up quite a bit beyond those simple input/output patterns.

Here’s a recent one that made me think: Last week, our agent bypassed the logging pipeline and went straight to kubectl logs to inspect a flaky service. When an engineer tried to redirect it, the agent came back with an analysis suggesting kubectl was better because log streams might contain too much data. It was actually wrong in this case (we’ve since fixed it), but I found myself momentarily engaged in what felt like a technical debate. Of course, it’s just an LLM following patterns, but still… the nature of our engineering challenges has clearly shifted from “please just work correctly” to handling more sophisticated reasoning paths.

This is an interesting shift in how we build these tools. We’re not dealing with deterministic systems anymore, so we’re spending more time teaching them how to build models of complex environments rather than giving them explicit instructions.

We’ve adapted our engineering organization accordingly:

One team builds increasingly sophisticated simulation environments that model failure modes we’ve seen across our deployments. There’s no shortage of weird failure modes to simulate!
The other team focuses on improving the agent’s ability to reason through these scenarios.

It’s been a fascinating feedback loop - as the system gets better, we crank up the complexity of our simulations accordingly.

What’s been most effective isn’t piling on more instructions or data sources. Instead, it’s been about developing systems that can formulate testable hypotheses about distributed systems and methodically validate them - similar to how any of us would troubleshoot a complex issue. No simulation will ever cover all the weird stuff that happens in production (as we all know too well), so we’re optimizing for generalized reasoning patterns instead.

The engineering challenge has evolved from writing procedural code to designing systems with more advanced reasoning capabilities. I’ve become a big fan of applying adversarial techniques to agent development, it just ‘fits’ and I think we’ll see more teams adopting similar approaches.

Agent prompts evolved from yelling to specs

More posts

AI agent learning beats demo flashiness

The coding interview needs to die

Hallucination rate is the wrong question