Building the HR agent is my main focus — a production recruiting agent that handles real talent workflows end to end, not a demo. The hard part isn't any single task; it's that the requests combine combinatorially, and the old reflex of writing a new tool for every new need is O(number of requests) of effort against an effectively infinite space. You never catch up.
So it runs on the code-action pattern: instead of choosing from a wall of bespoke tools, the agent writes code in a sandbox that calls a small set of stable primitives — looping, filtering, and batching in one script that runs once and returns only the aggregate. "Re-tag two hundred candidates" stops being two hundred model round-trips and becomes a few lines, and the maintenance cost drops from O(requests) to O(primitives). It's a LangGraph runtime kept in-process behind a facade — no separate server, because that complexity wouldn't earn its keep.
Code-action is only one pillar. The rest of the design is about accuracy, latency, cost, and trust — the parts that decide whether it survives production:
- Two-tier model routing. Every request is first classified by intent, then by task complexity. The main intent runs on a large model; the workers it spawns run on a mini model — accurate where it counts, faster and far cheaper everywhere else, with lower token spend overall. Genuinely hard work — writing code, generating custom data charts — skips straight to the large model.
- Domain-scoped tools, fetched on demand. Tools are bound by task domain, and the agent calls a search tool to pull only the handful a task actually needs — instead of dragging the whole catalogue into every prompt.
- Prompt layering for cache. A fixed system prompt that hits the cache, with user-specific context injected dynamically on top — cheap to repeat, still personalised.
- Guardrails on both ends. Input is screened for safety, output is filtered, the agent has explicit capability boundaries, and its data access never exceeds the current user's permissions — the sandbox only ever holds their token, so the agent can write anything but never steps past what they're allowed to see.
- Verification after the fact. When a task finishes, a separate eval pass checks the result — the agent never grades its own work — with extra guardrails on batch operations and human gates where a hiring decision actually belongs.
Alongside it:
- Refining how I design agents — turning hard-won patterns (loop engineering, context engineering, memory, code-action) into a repeatable approach for agents that hold up in production.
- On the side: polishing OpenFix, writing the AI-agent engineering essays, and maintaining this site.
This is a now page — a snapshot of the present, not a changelog.