AGENTS.md: The Configuration Standard Quietly Reshaping Coding Agents

A README written for agents instead of humans sounded obvious. The empirical results are not.

In August 2025, OpenAI shipped a small convention as part of its Codex tooling: a Markdown file named AGENTS.md, sitting next to a project’s README, telling coding agents how to work in that codebase. It was a minor release at the time. Six months later it had been adopted by more than sixty thousand public GitHub repositories and read natively by virtually every serious coding agent: Codex, Claude Code (via the closely related CLAUDE.md), Cursor, Aider, Devin, GitHub Copilot, Gemini CLI, Windsurf, Amazon Q, and dozens of others. On December 9, 2025, OpenAI co-founded the Agentic AI Foundation under the Linux Foundation and donated AGENTS.md to it alongside Anthropic’s MCP and Block’s Goose framework. The standard now has the same governance posture as the Linux kernel and Kubernetes.

Then, in February 2026, a team at ETH Zurich and LogicStar.ai published the first rigorous empirical study of whether AGENTS.md files actually help. The finding, in one sentence: most of them don’t, and the LLM-generated ones actively hurt. The study has rattled the standard’s most enthusiastic users because it contradicts the operating recommendation that ships with most coding-agent tooling. It is also the most useful guidance the AGENTS.md community has produced to date.

This post is the practical synthesis: what AGENTS.md is, what the empirical research found, what to include and leave out, how the OpenAI monorepo pattern works, and how AGENTS.md fits with MCP, skills, and tool definitions.

What AGENTS.md is and why it suddenly matters

AGENTS.md is plain Markdown. The file sits at the root of a repository (or anywhere in a subtree), and coding agents read it before doing work in that scope. It is not a schema. It is not parsed structurally. The agent ingests the whole file as part of its context and uses the contents to inform its behavior — which commands to run for building and testing, which conventions to follow, which directories to avoid, which security boundaries to respect. The closest AGENTS.md to the file being edited wins; explicit user instructions override the file.

The standard exists because coding agents face a specific problem. A foundation model is broadly competent at writing code in any language and any framework, but it has no way of knowing the project-specific things competent humans pick up by working in a codebase for a week. Which test command actually runs. Which linter the team uses. Which directories are generated and should not be edited. Which database the project actually uses (the README often lies, the lockfile tells the truth). AGENTS.md is a place to put those things.

The adoption numbers and governance shift are the reason the standard now matters beyond the agent-tooling community.

Milestone	Date	Source
OpenAI ships AGENTS.md with Codex	August 2025	OpenAI
Reaches 60,000+ public repositories	by December 2025	Linux Foundation press release
Donated to Agentic AI Foundation	December 9, 2025	Linux Foundation
ETH Zurich empirical study published	February 12, 2026	arxiv.org/abs/2602.11988

What the Linux Foundation donation did was end the question of whether AGENTS.md would remain an OpenAI-specific convention or become a true industry standard. Anthropic’s CLAUDE.md was already convergent in syntax; Cursor’s .cursorrules, Aider’s CONVENTIONS.md, and various other variants were being quietly unified around AGENTS.md as the canonical name. The donation made that explicit. As of early 2026, a single AGENTS.md file in a repository is read by every major coding agent. That cross-vendor compatibility is the property that makes the format genuinely valuable.

What the ETH Zurich study actually found

The study, by Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, built a new benchmark called AGENTbench: one hundred thirty-eight real-world Python software-engineering tasks drawn from twelve niche repositories that all had developer-committed context files. The tasks were derived from over five thousand real GitHub pull requests. The team also ran the same agents against SWE-bench Lite (three hundred tasks) to check generalizability.

Four coding agents were tested across three conditions: no context file, an LLM-generated AGENTS.md file (the kind produced by running /init in Claude Code or the equivalent), and the developer-written file already in the repository. The agents: Claude Code with Sonnet 4.5, Codex with GPT-5.2, Codex with GPT-5.1 mini, and Qwen Code with Qwen3-30B.

Headline results, averaged across agents:

Configuration	Task success vs. baseline	Inference cost
No context file	Baseline	Baseline
LLM-generated AGENTS.md	−3% (hurt in 5 of 8 settings)	+20% to +23%
Developer-written AGENTS.md	+4% (marginal gain)	+19%

Two things in this table are doing the work. First, the LLM-generated context files — the ones produced automatically by tooling, often as a one-line setup step the docs encourage you to run — reduced agent performance in five of eight settings and increased inference cost in every setting. Net-negative on both axes. Second, even the human-written files only produced a four percent average improvement on AGENTbench and were essentially flat on SWE-bench Lite, while still increasing cost by nineteen percent. The premium AGENTS.md was supposed to earn back through higher success rates is mostly not there.

The deeper analysis explained why. Both kinds of context files cause agents to explore more — read more files, run more tests, write more reasoning traces. That exploration drives the cost increase. Sometimes it finds something useful; more often it just consumes tokens. Specifically, the architectural overviews and directory listings that most AGENTS.md files include do not help agents navigate faster, because modern coding agents are already good at discovering project structure on their own. Reading a manual listing just consumes tokens and adds overhead without adding signal.

When the researchers removed the README and other documentation from the test repositories, LLM-generated context files improved performance by 2.7 percent — proving the LLM-generated files were mostly just restating what the README already said.

Section-by-section: what to include and what to leave out

The actionable synthesis from the research, the broader empirical analysis of two-thousand-plus production files done in November 2025, and the consensus emerging from teams that have refined their AGENTS.md files over multiple iterations.

Section	Include?	Reasoning
Build commands	Yes	Cannot be discovered reliably from code alone; agents need to know what works
Test commands	Yes	Same logic; the exact invocation matters and is rarely obvious from package config
Code-style requirements	Yes	Project-specific conventions the agent cannot infer (e.g. “we use snake_case, even in TS”)
Security boundaries	Yes	Most files don’t have these; they are the highest-leverage thing to add
Directories not to edit	Yes	Generated code, vendored deps, anything the agent should leave alone
Tech stack overview	Skip	Agent discovers from lockfiles and imports faster than it reads your prose
Directory structure / file listings	Skip	Agent will explore on its own; ETH study shows listings don’t speed navigation
Architecture overview	Skip	Distracts the agent toward unbounded exploration without improving outcomes
Coding conventions (general)	Skip	If the conventions are already in the linter config, the agent will pick them up
Marketing-language intro	Skip	Actively harmful; the agent will quote it back as factual claim

The pattern is consistent: include what the agent cannot discover from the code or its tooling, leave out what it can. The unit test of a well-written AGENTS.md file is that every line in it gives the agent information it could not derive from looking around. Lines that fail that test should be deleted.

Two additional practical principles. First, keep it short. Three of the empirically best-performing AGENTS.md files in production are under fifty lines. The temptation to “just add one more section” is the path to the LLM-generated baseline that hurts performance. Second, version-control it like code, not like documentation. Review changes. Remove stale instructions. Treat it as a configuration file, not a wiki page.

The OpenAI monorepo pattern

OpenAI’s main internal repository contains eighty-eight AGENTS.md files. This is not redundancy; it is the recommended pattern for monorepos and the answer to “where do I put project-specific instructions in a codebase that contains many projects.”

The mechanism is scope. Each AGENTS.md file applies to its directory and everything below it, until a more deeply nested AGENTS.md takes over. A file at the repository root contains repo-wide instructions: which Python version, which linter, what make test does. A file inside services/billing/ contains billing-service-specific instructions: which database, which feature flags, which deploy command. A file inside services/billing/migrations/ might contain migration-specific instructions: how to write a backwards-compatible migration, which patterns are forbidden.

Three rules make this pattern work.

First, deeper files inherit, they do not replace. The agent reads from the file closest to the edit target and walks upward, accumulating context. A monorepo with sensibly nested AGENTS.md files behaves like a sensibly nested configuration tree.

Second, deeper files should be smaller. The root AGENTS.md is the only one that can afford to be comprehensive (and it should still be short). Nested files should be sharp deltas — “in this directory, the test command is different,” “in this directory, the database is read-only” — not full copies of the root.

Third, the absence of an AGENTS.md is itself a signal. A directory with no AGENTS.md tells the agent “the rules from the parent apply here.” Adding a file with the same content as the parent is noise that costs tokens and adds no value.

How AGENTS.md fits with MCP, skills, and tool definitions

A surprising amount of confusion exists about how AGENTS.md relates to the other layers of agent context. The clean mental model is that they sit at different levels of the stack.

AGENTS.md is repository-scoped context. It tells the agent things that are true about this specific codebase. It is read every time the agent does work in this repo. It is checked into version control. It is the closest thing to a project’s .editorconfig for coding agents.

MCP servers are capability extensions. They give the agent tools — read this database, query this issue tracker, deploy to this environment. MCP is about what the agent can do; AGENTS.md is about what conventions to follow when doing it. They compose: an MCP server might give the agent the ability to query a database, and the AGENTS.md file might say “use this database for read-only diagnostic queries only.”

Skills (the term Claude Code uses for reusable instruction bundles) are cross-repository workflows. A “review this PR” skill, an “investigate this incident” skill. They are reusable across projects. AGENTS.md customizes how a skill behaves in a specific project — the skill says “run the project’s test suite”; the AGENTS.md says “the test suite is make test-fast.”

Tool definitions are the contract. AGENTS.md cannot create capabilities the agent does not have through tool definitions; it can only direct how the agent uses them.

A common anti-pattern: putting MCP-server configuration or skill definitions inside AGENTS.md. The format is not designed for that, the agents do not interpret it that way, and it crowds out the kind of project-specific guidance the file actually does well. Keep MCP configuration in MCP configuration files. Keep skill definitions in skill files. Keep AGENTS.md focused on project-specific conventions and commands.

AGENTS.md is now infrastructure. The fact that it is governed by the Linux Foundation, read by every major coding agent, and has rigorous empirical research behind it puts it in the same category as .gitignore, .editorconfig, and Dockerfile — small standard files that quietly shape how a codebase is worked in. The teams that get the most out of it are not the ones with the most thorough files; they are the ones with the most ruthlessly minimal ones. Write it by hand. Keep it short. Include what the agent cannot find for itself, and trust the agent to find everything else. The cost is a few minutes of thought up front. The benefit is an agent that actually does what you wanted instead of one that explores eagerly, costs twenty percent more, and gets it wrong slightly more often. The ETH Zurich researchers did the work. The rest of us just need to read it.