Now — Ethan Yan

Building the HR agent is my main focus — a production recruiting agent that handles real talent workflows end to end, not a demo. The hard part isn't any single task; it's that the requests combine combinatorially, and the old reflex of writing a new tool for every new need is O(number of requests) of effort against an effectively infinite space. You never catch up.

So it runs on the code-action pattern: instead of choosing from a wall of bespoke tools, the agent writes code in a sandbox that calls a small set of stable primitives — looping, filtering, and batching in one script that runs once and returns only the aggregate. "Re-tag two hundred candidates" stops being two hundred model round-trips and becomes a few lines, and the maintenance cost drops from O(requests) to O(primitives). It's a LangGraph runtime kept in-process behind a facade — no separate server, because that complexity wouldn't earn its keep.

Code-action is only one pillar. The rest of the design is about accuracy, latency, cost, and trust — the parts that decide whether it survives production:

Two-tier model routing. Every request is first classified by intent, then by task complexity. The main intent runs on a large model; the workers it spawns run on a mini model — accurate where it counts, faster and far cheaper everywhere else, with lower token spend overall. Genuinely hard work — writing code, generating custom data charts — skips straight to the large model.
Domain-scoped tools, fetched on demand. Tools are bound by task domain, and the agent calls a search tool to pull only the handful a task actually needs — instead of dragging the whole catalogue into every prompt.
Prompt layering for cache. A fixed system prompt that hits the cache, with user-specific context injected dynamically on top — cheap to repeat, still personalised.
Guardrails on both ends. Input is screened for safety, output is filtered, the agent has explicit capability boundaries, and its data access never exceeds the current user's permissions — the sandbox only ever holds their token, so the agent can write anything but never steps past what they're allowed to see.
Verification after the fact. When a task finishes, a separate eval pass checks the result — the agent never grades its own work — with extra guardrails on batch operations and human gates where a hiring decision actually belongs.

Alongside it:

Refining how I design agents — turning hard-won patterns (loop engineering, context engineering, memory, code-action) into a repeatable approach for agents that hold up in production.
On the side: polishing OpenFix, writing the AI-agent engineering essays, and maintaining this site.

This is a now page — a snapshot of the present, not a changelog.

What I'm focused on.