Building Gordon: Docker's AI Agent
A behind-the-scenes look at building Gordon - Docker's AI agent. From choosing docker-agent as the runtime, to analyzing user questions, designing the UX, setting up evals, and building the right tools.
Over the past year, I've been part of the team building Gordon - Docker's AI agent. If you've used Docker Desktop recently, you've probably seen it: click the Gordon icon in the sidebar or run docker ai in your terminal, and you get an agent that actually understands your containers, your images, and your environment. It doesn't just answer questions - it takes action. But building an AI agent that millions of developers trust with their code, containers, images, Compose files, builds, and CI pipelines wasn't straightforward. This is the story of how we built it - the decisions we made, the things we got wrong, and what we learned along the way. Gordon v1 The first version of Gordon was built before most of the agentic tooling we have today existed. Gordon has been powering Docker's AI experience since the beginning - on docs.docker.com, in support, and inside Docker Desktop. We wrote the initial agentic loop using LangGraph, wired up a RAG system over Docker's documentation so Gordon could answer questions grounded in real content, and built what we called "recipes" - deterministic code paths that handled specific tasks like generating a Dockerfile or debugging a container. Think of recipes as the predecessor to MCP and tool calling, but entirely custom. Each recipe was a handcrafted flow: detect the user's intent, gather the right context, and execute a sequence of steps that we knew worked. It shipped. People used it. And we learned a ton - what users actually needed, where the LLM struggled, and how brittle deterministic flows become when you try to cover every edge case. We were also building on GPT-4o era models - capable, but a far cry from what's available now.

Add a comment: