How to Write a Spec for AI Agents: Template, EARS Format, and Examples
A copy-paste spec template, the EARS format, negative scope, and the Smart Kid test: how to write a software requirements specification an AI agent can actually build from.
A spec is only good if someone with none of your context can read it and build the right thing. That test is the whole job.
People who search for a “spec template” or a “software requirements specification” are usually solving a human coordination problem. They want a document that aligns a team, gets stakeholder sign-off, and survives a handoff. The classic IEEE-830 SRS format, with its introduction, scope, functional requirements, non-functional requirements, interface specifications, and constraints, was designed for exactly that. It solved it reasonably well.
The job changes when the reader of the spec is an AI agent.
A human team carries context between meetings. They ask clarifying questions before they start. They notice when something looks wrong and check with you. An AI agent does none of that. It starts each session with zero memory of every conversation you have had, zero recollection of the decisions you made last week, and zero intuition about your implicit business rules. Whatever the agent needs to know must be in the document.
Anthropic says the target plainly in their Claude Code best practices: “the most useful specs are self-contained: they name the files and interfaces involved, state what is out of scope, and end with an end-to-end verification step that proves the feature works.” That is the bar. The rest of this piece is how you clear it.
Most people who try Spec-Driven Development already understand the why. They have read what Spec-Driven Development is, believe the spec is the right artifact to give an AI agent, and then open a blank file and write: “The system should handle user authentication securely.” Then they wonder why the output is still wrong.
The problem is not commitment. Writing a good spec is its own skill, and almost nothing teaches it. This article covers the test for spec quality, the sentence format that closes ambiguity, what you must explicitly exclude, a full worked example from a real fintech, and a copy-paste template.
This is the practical companion to Spec-Driven Development with Claude Code and to the case study that shows what these documents produce at scale.
The one test of a good spec
The test comes before any template.
You wouldn’t tell her: “Do that thing with the tasks.”
You’d say: “When someone creates a task, save the title, check that the person has permission in that workspace, and send a real-time notification to everyone currently viewing that list.”
That level of specificity is the spec. The agent is not unintelligent: it has no context between sessions. Every conversation starts from zero. The spec is the only memory it gets.
The principle is not about dumbing things down. It forces every implicit rule into the open. “Check permissions” is easy to say in conversation. In a spec, you have to write: which permissions? On which operations? What happens on failure: silent reject or error response? Permission check before or after input validation? Each of those is a question the agent will answer somehow. The principle is how you control those answers.
Be specific, not generic
Vague requirements are where agents hallucinate scope. The fix is precision.
Vague vs. executable
| Vague: the agent guesses | Executable: the agent knows |
|---|---|
| "The system should be fast." | GET /api/v1/tasks responds under 500ms at p95 for lists up to 1,000 tasks. |
| "Validate the title." | Empty → "Title is required"; 1 char → "Min 2 characters"; 501 chars → "Max 500". |
| "Handle errors gracefully." | On provider timeout, retry 3x with exponential backoff, then queue for manual review. |
| "The UI should look clean." | Task list renders under 200ms. Skeleton shown during load. Empty state: "No tasks yet. Create one." |
The right question to ask for each requirement: “If I gave this line to someone with no context, could they implement it with no further questions?” If the answer is no, make it more specific. Adjectives (fast, clean, secure, robust) are not requirements. They are prompts for the agent to fill in with its own assumptions.
Declare the negative scope
What you explicitly won’t build is as important as what you will. This is the single best defense against an agent “helpfully” adding features you never asked for.
Write it as a flat list, early in the document. Call the section “Non-Goals” or “Out of Scope”. Be blunt:
## Non-Goals (v1)
- No recurring tasks
- No calendar integration (planned v2)
- No time tracking
- No task dependencies, subtasks only, max 50 per task
- No bulk operations (multi-select, bulk delete)
- No offline mode
This section prevents a class of errors that’s invisible until it’s expensive: the agent extends the data model for a feature you didn’t want, and now you’re three hours into refactoring something you never asked to build.
I’ve started adding a one-liner rationale for any item that might look like an oversight. “No time tracking. The product does not compete on analytics.” That line closes a gap the agent might otherwise fill with the wrong answer.
Concrete examples beat adjectives
Abstract validation rules are consistently misimplemented. Concrete examples are not.
Don’t write “validate the task title appropriately.” Write a table:
Title field: validation examples
| Input | Expected result |
|---|---|
| "" (empty string) | Error: "Title is required" |
| "A" (1 character) | Error: "Title must be at least 2 characters" |
| "Review PR #123" (valid) | Success: title saved |
| "A" × 501 (501 characters) | Error: "Title must be 500 characters or fewer" |
| " " (whitespace only) | Error: "Title is required" (trim before validation) |
That last row is the one that bites every time. No agent will add it unless you do, because nothing in “validate the task title” implies trim-before-validate. Concrete examples turn interpretation into verification: the agent either produces the exact output for that input, or it doesn’t.
This pattern works for any validation, any state machine, any conditional flow. Think in input/output pairs, then write the pairs down. It’s the most direct way to specify behavior.
Write requirements in EARS
Plain English requirements sound reasonable until you implement them. “The system should validate workspace permissions.” On which operations? On failure, reject silently or throw an error? Before or after input validation?
EARS (Easy Approach to Requirements Syntax) closes that gap. Alistair Mavin developed it at Rolls-Royce PLC while analyzing airworthiness regulations for a jet engine control system, first published in 2009 and now used by Airbus, NASA, Intel, and Bosch. It maps almost perfectly onto what AI agents need: unambiguous, machine-executable sentences with explicit triggers and conditions.
The six EARS patterns
| Pattern | Template | Example |
|---|---|---|
| Ubiquitous | THE SYSTEM SHALL [action]. | THE SYSTEM SHALL validate workspace permissions on every task operation. |
| Event-driven | WHEN [trigger], THE SYSTEM SHALL [response]. | WHEN a task is completed, THE SYSTEM SHALL record the timestamp and the completing user. |
| State-driven | WHILE [state], THE SYSTEM SHALL [constraint]. | WHILE a task is archived, THE SYSTEM SHALL NOT allow edits. |
| Unwanted behavior | IF [unwanted condition], THE SYSTEM SHALL [mitigation]. | IF more than 50 subtasks are created, THE SYSTEM SHALL show "Subtask limit reached" and reject the operation. |
| Optional | WHERE [feature flag / config], THE SYSTEM SHALL [behavior]. | WHERE notifications are enabled, THE SYSTEM SHALL notify all assignees when a task changes status. |
| Complex | WHEN [event] AND [condition], THE SYSTEM SHALL [A] BEFORE [B]. | WHEN a task is completed AND an automation rule is configured, THE SYSTEM SHALL run the automation BEFORE updating the task status. |
You don’t use all six for every requirement. Match the pattern to the requirement type. A constant invariant is Ubiquitous. A user action is Event-driven. A guard condition is State-driven. Pick the template, fill in the blanks, and the ambiguity dissolves.
One practical note on vocabulary: EARS uses SHALL (required behavior) and SHALL NOT (prohibited behavior). Map these to MoSCoW directly: SHALL = Must Have, SHOULD = Should Have, MAY = Could Have. Do not weaken them. “The system should validate permissions” is not the same requirement as “THE SYSTEM SHALL validate permissions.” The first gives the agent an opt-out. The second does not.
The Implementation FAQ technique
Before your agent sees the spec, ask yourself: what will it have to guess?
List every ambiguity, then answer them in the spec as an explicit Q&A section. Call it “Implementation FAQ” or “Open Questions”. Every gap you surface becomes a decision you made intentionally, not a wrong assumption baked silently into the code.
Here’s what this looks like for a task management feature:
## Implementation FAQ
**Q: What happens when a user tries to delete a task that has subtasks?**
A: Cascade delete all subtasks. Prompt for confirmation:
"This will also delete 3 subtasks. Continue?" Require explicit
confirmation before proceeding.
**Q: Who can see unassigned tasks?**
A: All members of the workspace, regardless of role. Only workspace
owners can assign tasks to others.
**Q: What happens if an assignee is removed from a workspace while they
have open tasks?**
A: Tasks remain open. The assignee field becomes null. A system
notification is sent to the workspace owner listing affected tasks.
**Q: Can a task belong to more than one project?**
A: No, one task belongs to exactly one project. This is a v1
constraint, not a design choice to revisit.
**Q: What timezone is used for due dates?**
A: Store as UTC. Display in the authenticated user's profile timezone.
If no timezone is set, display UTC with a "(UTC)" label.
The Q&A entries you most need are the ones that aren’t on the happy path: deletion with cascades, conflicting states, removed users, timezone handling, concurrent edits. Those are exactly the cases agents handle worst when left to guess, because the training data is saturated with happy-path implementations and nearly empty on edge cases.
I write the FAQ by imagining the agent mid-implementation, hitting a decision point where the spec is silent. What would it do? That answer goes in the FAQ with the correct answer next to it.
From SRS to AI-agent spec
The classic software requirements specification had five main sections: introduction, overall description, functional requirements, non-functional requirements, and external interfaces. Teams wrote them as prose organized by feature. That structure is still the right skeleton. What changes for AI agents is not the shape but the assumptions underneath it.
A traditional SRS could rely on shared context. Every engineer on the team had attended the planning sessions. They understood the product’s history. They asked questions in standup. Implicit knowledge didn’t need to be written down because it existed in people’s heads. An AI agent has none of that. If your spec says “validate permissions,” the agent cannot ask what you mean. It guesses and keeps going.
Three changes make the SRS executable by an agent instead of just readable by a team.
First, EARS closes the sentence-level ambiguity that IEEE-830 never addressed. The standard said “requirements should be unambiguous.” EARS gives you the grammar to enforce that at the sentence level: WHEN, WHILE, IF, WHERE, SHALL.
Second, negative scope protects you from plausible additions. A human team knows what they’re not building because they were in the room when the decision was made. The agent was not. An explicit “Out of Scope” section is required, not optional.
Third, a “Confirm before building” closing line stops an eager agent from racing into code before it has verified its understanding. You end the spec by asking the agent to restate the key requirements in its own words before writing a character of code. If it misunderstood anything, you find out at zero cost, not after three sessions of implementation.
Everything else in the classic SRS, the traceability IDs, the acceptance criteria, the constraints section, the risk table, transfers directly. The structure was right. The audience changed.
A spec that must never double-bill
The best way to see how these techniques combine is a hard real case. This is close to the spec I wrote for the crypto fintech: a payment charge endpoint where a retry must never bill a customer twice.
# Spec: Create a payment charge (POST /v1/charges)
# Status: requirements:approved
## Overview
A merchant creates a charge against a customer. This moves money, so it
must be safe to retry and impossible to double-bill.
## In scope
- Create a charge from an authenticated merchant request.
- Return the charge id and status.
- Guarantee exactly-once billing under client retries.
## Out of scope (v1)
- Refunds (separate spec: 005-refunds).
- Partial captures.
- Multi-currency. All amounts are BRL, stored as integer cents. Never float.
## Functional requirements (EARS)
- FR-1 WHEN a merchant POSTs a charge with a valid Idempotency-Key,
THE SYSTEM SHALL create at most one charge for that key.
- FR-2 WHEN the same Idempotency-Key is replayed within 24h,
THE SYSTEM SHALL return the original charge and create no new one.
- FR-3 IF the amount is <= 0,
THE SYSTEM SHALL reject with 422 "amount must be positive".
- FR-4 IF the merchant is over its rate limit,
THE SYSTEM SHALL reject with 429 and a Retry-After header.
- FR-5 WHILE a charge is pending,
THE SYSTEM SHALL NOT allow a second capture.
## Acceptance criteria (examples the agent must satisfy)
- amount=1000, key=abc -> 201, status=pending
- same key=abc, replayed -> 200, same charge id, no new row
- amount=0 -> 422 "amount must be positive"
- amount=-50 -> 422 "amount must be positive"
- 6th request in 1s, one merchant -> 429, Retry-After: 1
## Non-functional
- p95 latency under 300ms at 200 requests/sec per merchant.
- Every money value is an integer. No floating point anywhere in the path.
## Data
- charges(id, merchant_id, amount_cents, currency, status,
idempotency_key, created_at)
- UNIQUE(merchant_id, idempotency_key) # this is what enforces FR-1
## Verification
- Integration test replays one key 50x concurrently; assert exactly one
row and one ledger entry.
- Load test holds p95 < 300ms at 200 rps.
## Confirm before building
Do not write code until you restate FR-1 through FR-5 and the uniqueness
constraint in your own words. If any acceptance criterion is ambiguous,
ask before implementing.
Let me walk through why each part earns its place.
“Out of scope” names what this spec does not cover. Without it, the agent might extend the charge table with a refund_amount column because refunds seem related. That’s extra columns, migration files, and a data model now coupled to a feature you haven’t specced yet.
FR-1 and FR-2 together specify idempotency from both directions: on first receipt, create one charge; on replay, return the original. Saying it twice closes both directions of the gap. One EARS sentence is not enough here because the two situations (first call vs. repeated call) produce different HTTP responses.
FR-3 and FR-4 are the “unwanted behavior” pattern. They specify what the system must do when things go wrong. Without them, the agent chooses its own error responses. Sometimes 400, sometimes 500, sometimes nothing.
The acceptance criteria section is not a test file. It is a table of input/output pairs living in the spec so the agent can verify its own work before you review anything. Each row is a check the agent can run.
“Every money value is an integer. No floating point anywhere in the path.” That one sentence prevents the floating-point rounding error that bites a payment system on the second day in production. It lives in the spec, not a code comment, because comments don’t survive a session boundary.
The UNIQUE constraint in the data model is how FR-1 is actually enforced. If you leave it out, the agent might enforce idempotency in application logic. Application logic fails under concurrent retries. The database constraint does not.
“Confirm before building” is the last line. The agent restates FR-1 through FR-5 in its own words before typing a character of code. If it misunderstood FR-2, you learn that now, at zero cost. Skip this line and you learn it after the implementation.
Three documents, not one
A spec for a real feature is not one document. It’s three, and they have a strict order.
Feature spec folder
- spec/
- tasks-feature/
- .statusgaterequirements:draft → approved, design:draft → approved, tasks:draft → approved
- requirements.mdWHAT: approved before design begins
- design.mdHOW: approved before tasks begin
- tasks.mdHOW MUCH: approved before implementation begins
The .status file is the gate. The agent reads it at the start of every session. If it says requirements:draft, no design work proceeds. Each document exists in a dependency chain: the design maps to requirements, the tasks map to design. If you write all three at once without the approval gates, you get waterfall. With the gates, you get deliberate iteration where each phase is locked before the next opens.
requirements.md: WHAT
This is the document you write first and the only one you hand to a non-technical stakeholder for review. It stays technology-independent.
Sections: Overview, Goals, Non-Goals, User Stories (US-001…) with acceptance criteria, Functional Requirements (FR-001… in EARS with MoSCoW priority), Non-Functional Requirements (NFR-001…), Constraints, Decisions (D-001…), Implementation FAQ (Q-001…), Success Metrics, Risks.
These IDs (US-001, FR-001, NFR-001, D-001) are how the design traces back to requirements, how tasks trace back to design, and how you answer “why does this code exist?” six months later without reading the whole codebase. Never skip them.
design.md: HOW
This is the technical document. Every functional requirement must have a corresponding design decision. If a requirement doesn’t map to the design, either the design is incomplete or the requirement doesn’t need implementing.
Sections: Executive Summary (the architecture in two sentences), Requirements Mapping (explicit table: FR-001 maps to which design section), System Architecture, Data Model, API Contract, Edge Cases, Testing and Verification Strategy, Technical Decisions (TD-001… with alternatives considered and rationale), Risks.
The requirements mapping table is the most important section and the most skipped. It forces a check: every FR must have a home in the design. Any FR with no mapping is a gap, and a gap in the design is a gap in the code.
tasks.md: HOW MUCH
This is what the agent implements, one task at a time. Tasks are sized at 2-4 hours each. Anything larger gets split.
Sections: Requirement Coverage (FR to tasks traceability), Implementation Readiness Check (a pass/fail gate: are requirements and design approved? are all Q-001 items answered?), Tasks, each with title, requirement reference (FR-003), files to touch, estimated size, dependencies on other tasks, acceptance criteria, and verification commands.
Each task’s acceptance criteria are what the agent runs to verify its own work before marking the task done. If the criteria aren’t there, the agent declares victory based on whether it believes the code looks right, not whether it actually works.
The requirements.md template
Drop this into your feature folder, fill in each section, and hand it to the agent. This is the exact structure I use on every feature.
# [Feature Name], Requirements
**Status:** draft
**Version:** 1.0
**Author:** [name]
**Date:** [YYYY-MM-DD]
---
## Overview
[One paragraph: what this feature does, why it exists now, and who it's for.]
## Goals
- [Measurable goal 1, e.g., "Users can create and assign tasks in under 30 seconds."]
- [Measurable goal 2]
- [Measurable goal 3]
## Non-Goals (v1)
- [Thing you will NOT build, be explicit]
- [Another thing out of scope, add rationale if the omission might look like a mistake]
- [Third item]
## User Stories
### US-001: [Story title]
**As a** [persona],
**I want to** [action],
**so that** [benefit].
**Acceptance criteria:**
- [ ] [Observable, testable criterion]
- [ ] [Observable, testable criterion]
- [ ] [Edge case criterion]
### US-002: [Story title]
[Repeat structure]
---
## Functional Requirements
### FR-001, [Requirement name] [Must Have]
THE SYSTEM SHALL [specific, unambiguous behavior].
**Priority:** Must Have
**User story:** US-001
**Notes:** [Any clarification or related constraint]
### FR-002, [Requirement name] [Must Have]
WHEN [trigger], THE SYSTEM SHALL [response].
**Priority:** Must Have
**User story:** US-001
### FR-003, [Requirement name] [Should Have]
WHILE [state], THE SYSTEM SHALL NOT [prohibited action].
**Priority:** Should Have
**User story:** US-002
### FR-004, [Requirement name] [Must Have]
IF [unwanted condition], THE SYSTEM SHALL [mitigation].
**Priority:** Must Have
**User story:** US-001
### FR-005, [Requirement name] [Could Have]
WHERE [feature flag / config], THE SYSTEM SHALL [behavior].
**Priority:** Could Have
**User story:** US-002
### FR-006, [Requirement name] [Must Have]
WHEN [event] AND [condition], THE SYSTEM SHALL [A] BEFORE [B].
**Priority:** Must Have
**User story:** US-001
[Continue with FR-007, FR-008...]
---
## Non-Functional Requirements
### NFR-001, Performance
THE SYSTEM SHALL respond to [specific endpoint or operation] in under
[X]ms at p95 for [load condition, e.g., "lists up to 1,000 tasks"].
### NFR-002, Security
THE SYSTEM SHALL [specific security behavior, e.g., "validate a signed
JWT on every mutation before any business logic runs"].
### NFR-003, Accessibility
THE SYSTEM SHALL meet WCAG 2.1 AA for all new UI components in scope.
---
## Constraints
- **Technology:** [e.g., "Must use the existing PostgreSQL instance, no new databases."]
- **Timeline:** [e.g., "Must ship before [date] to support [event]."]
- **Compliance:** [e.g., "All PII fields must be encrypted at rest."]
- **Integration:** [e.g., "Must call the existing notification service via its current API, no schema changes."]
---
## Decisions
### D-001: [Decision title]
**Decision:** [What was decided]
**Rationale:** [Why, include what problem it solves]
**Alternatives considered:** [What else was evaluated and why it was rejected]
**Date:** [YYYY-MM-DD]
---
## Implementation FAQ
**Q: [Anticipated ambiguity 1, focus on edge cases and deletion behavior]**
A: [Explicit answer, no hedging, no "it depends"]
**Q: [Anticipated ambiguity 2, conflicting states, concurrent operations]**
A: [Explicit answer]
**Q: [Access / visibility edge case]**
A: [Explicit answer, who sees what under which conditions]
**Q: [Timezone / locale / formatting question if relevant]**
A: [Explicit answer, specify storage format and display format separately]
---
## Success Metrics
- [ ] [Metric 1, e.g., "P95 latency for task list under 500ms in staging under 1,000-task load."]
- [ ] [Metric 2, user-observable outcome if no instrumentation exists]
- [ ] [Metric 3]
---
## Risks
| Risk | Likelihood | Impact | Mitigation |
| -------- | ---------------- | ---------------- | ------------ |
| [Risk 1] | Low / Med / High | Low / Med / High | [Mitigation] |
| [Risk 2] | | | |
Before you hand a spec to the agent
Spec readiness gate
- Required:The Smart Kid Principle passes: someone with no context can build the right thing from this.If you'd have to explain anything verbally that isn't in the document, the document isn't done.
- Required:Every functional requirement is in EARS format.If you wrote 'should be fast' or 'handle errors gracefully', find it and replace it.
- Required:Non-Goals are explicit: at least 3 things you're not building.The absence of a Non-Goals section is a scope risk, not a clean spec.
- Required:Validation rules have input/output example tables.Prose validation rules are almost always underspecified. Tables close it.
- Required:All User Stories have testable acceptance criteria.If a criterion isn't independently testable, it's a vague goal, not a requirement.
- Required:Stable IDs exist and are unique: US-001, FR-001, NFR-001, D-001.You'll reference these in design.md and tasks.md. Missing IDs break traceability.
- Required:The Implementation FAQ covers the top 3 edge cases.At minimum: the deletion cascade, the conflicting-state scenario, and any access-control edge.
- Required:Performance requirements have numbers, not adjectives.'Fast' is not a requirement. '500ms at p95 for lists up to 1,000 tasks' is.
- Required:The spec ends with a 'Confirm before building' line.The agent restates the key requirements in its own words before typing code. If it misunderstood anything, you learn now, not after.
- Required:The .status file says 'requirements:draft' until you've reviewed and approved.Draft is not approved. The agent must not proceed to design on a draft spec.
- Required:Non-Functional Requirements cover performance, security, and accessibility.These are the three sections most consistently missing from first drafts.
FAQ
How long should a requirements.md be?
Long enough to prevent wrong assumptions, short enough to stay maintained. A typical production feature lands between 400 and 800 lines, including the template boilerplate.
If you're over 1,000 lines, ask whether this is one feature or two. If you're under 200, ask whether you've actually answered the hard questions, or just written the questions.
Do I need all three documents for every feature?
For a throwaway script or a two-hour fix, no. Just prompt and go. The overhead isn't worth it.
For anything that spans multiple sessions, involves real architecture choices, or handles data you can't easily rewrite, all three earn their keep. The design.md prevents the most expensive mistakes; the tasks.md prevents the most wasted implementation effort.
What's the difference between Non-Goals and Constraints?
Non-Goals are features you're choosing not to build in this version: 'no recurring tasks.' Constraints are things you have to work within: 'must use the existing database, no new services.'
Non-Goals protect scope. Constraints shape the solution space. Both belong in requirements, in separate sections.
Can I use this template with agents other than Claude Code?
Yes. The format is plain markdown. Any agent that can read files benefits from this structure: Cursor, Copilot, GPT-4o in a custom GPT, Gemini. The EARS syntax and stable IDs are model-agnostic.
The .status gate file is specific to the SDD workflow I use, but the documents themselves work with any capable agent.
My team writes specs in Confluence or Notion. Do I have to switch to markdown files?
Not strictly. The value is in the structure and the content, not the format. You can write a spec in Notion and paste it into the agent context at the start of each session.
The reason I use markdown files in the repo is traceability: the spec lives next to the code it governs, it's versioned in git alongside it, and the agent reads it directly without any copy-paste step. That last point matters more than it sounds. Friction is how specs get skipped.
How do I handle a requirement that changes mid-implementation?
Update requirements.md, add a change note with the date, set the .status back to 'requirements:draft', and re-approve before the agent continues.
The discipline is: never let the agent implement from a spec you haven't re-read since the change. A changed requirement that doesn't propagate to the design creates a contradiction between what the code does and what the spec says. Future you has to untangle it cold.
What if I don't know all the answers when writing the spec?
Write what you do know, and put the open questions explicitly in the Implementation FAQ section, marked as open, not answered. That's still better than silence, because the agent sees the question and knows to ask rather than guess.
Then resolve them before you approve the spec. An open question in an approved spec is a delayed bug.
Where to go next
A good spec takes 30-90 minutes to write for a typical feature. That time returns in the first implementation session: you spend it on problems that are actually hard, not on debugging misunderstood requirements.
- What Is Spec-Driven Development?: the methodology behind this approach
- Spec-Driven Development with Claude Code: the hands-on workflow
- How I built a 13-app crypto fintech in 70 days with SDD: what this produces at scale