A production agent I work on runs as a library: a web service holds a compiled graph in-process, and a small facade exposes it to the chat inbox, the cron jobs, and the workflow hooks. The obvious “grown-up” move was to migrate it onto the framework’s dedicated server runtime — separate process, its own API, the works. I did the whole migration: an SDK shim, server-side auth middleware, a streaming compatibility layer, all of it.

Then I rolled it all back, and decided we’re never doing it.

The server runtime solved problems I didn’t have and added ones I did: more processes to deploy, more surfaces to debug, more distance between a bug and its stack trace.

What the server actually buys you

A dedicated agent server is real engineering and genuinely useful — for the right shape of problem. It earns its keep when you need to scale runs independently, isolate tenants, or let many services share one agent over the network. Those are multi-service concerns.

My agent isn’t that. It serves one product, in one process, behind one facade. In-process, the wins are concrete and the server can’t match them: prefix-cache hits across calls, a shared HTTP pool, context propagation across threads without serialization. The facade already gives me the one thing the server was supposed to — a clean boundary the rest of the system calls through.

Don’t pay for absent complexity

The trap is treating the heavier architecture as the more mature one. By default it isn’t. A server runtime is a bet that you’ll need distribution; if you don’t, you’ve bought deployment and debugging cost for a benefit that never arrives.

The discipline that actually scales is the opposite: keep the runtime embedded until something concrete forces it out — a real isolation boundary, a real scaling wall. Adopting the server is reversible. The complexity it sheds onto everything around it is not. I still read how the server-based harnesses are built and steal their good ideas — I just translate each one into “what does this look like embedded in one process,” and skip the part that assumes a fleet.