Surfaces

Pattern 23 of 26

Headless and CI Agents

Scales with commits, not headcount

Agents running in CI pipelines, cron jobs, and background processes with no human watching. Code review bots, automated refactoring, dependency updates, test generation. They run on every PR, every commit, around the clock. This is where agents stop being tools and start being infrastructure, and the operational requirements change to match.

Why it matters

Headless agents scale with your commit volume, not your headcount. A team of three can have the review coverage of a team of ten if the agent is reliable. That changes what small teams can actually ship.

Deep Dive

Headless agents run without a UI and without a human in the loop: in CI pipelines, triggered by webhooks, or embedded in automated workflows. Claude Code's -p flag makes this explicit, running a single prompt non-interactively and exiting when done. I use this for things like automated documentation updates and PR description generation on this site. GitHub Copilot Autofix runs as a bot on every security alert. GitLab Duo runs on every merge request. The output is not impressive in any single instance. The leverage comes from the fact that it runs every time without anyone having to remember to do it.

The operational requirements for headless agents are different from interactive ones in one important way: the agent cannot ask questions. If the task is ambiguous, it has to choose: make the most reasonable interpretation and document it, or fail cleanly and report why. Silent failure or wrong-assumption execution are both worse outcomes than a clean error. A well-designed CLAUDE.md combined with Claude Code's -p flag gives the agent the structure it needs to make those choices consistently. The instruction file is doing a lot of work here. An agent with a vague CLAUDE.md in headless mode is essentially guessing.

Security is the uncomfortable part of headless agents with write access. An agent that can push code, merge PRs, or modify configuration without human review is a meaningful attack surface. The approach that works: sandbox the execution environment, scope permissions to the minimum needed, require human review for high-risk actions like deletes and deploys, and implement rate limiting so a compromised agent cannot cause damage at machine speed. GitHub Copilot Autofix sits at the conservative end of this spectrum by only suggesting fixes rather than applying them. Full autonomous deployment agents sit at the other end. Most production uses are somewhere in between, and where exactly you land should be determined by how reversible the actions are.

In the Wild

Claude Code -p flag (headless/CI)
Codex on GitHub
GitHub Copilot Autofix
GitLab Duo

Go Deeper

ARTICLEEnabling Claude Code to Work More AutonomouslyDOCSClaude Code Headless ModeARTICLEFound Means Fixed: Introducing Code Scanning AutofixARTICLEMaking Claude Code More Secure and Autonomous

Related Patterns