TDD with AI Coding Agents: The Write-Failing-Test Pattern
Test-driven development was already a practice that many developers agreed with in principle but struggled to maintain in practice. Writing the test first is slow. It requires you to think carefully about the interface before you've built it. It feels like doing the work twice.
AI coding agents change this dynamic more than people realize. The "write a failing test, then implement" loop becomes genuinely faster when an agent is doing the implementation step. The discipline part of TDD stays with you. The grunt work shifts to the model.
This guide covers the practical workflow, not the philosophy. How to structure prompts so the agent implements against your tests, what to do when the agent cheats, and how to catch the failure modes before they waste your time.
Why TDD specifically works well with AI agents
When you write the test first, you're forced to define the interface precisely: what the function accepts, what it returns, what it throws. That precision is exactly what makes an AI agent's implementation more accurate.
Think about the alternative: you describe a function in plain language, the agent implements it, and then you find out the edge cases it doesn't handle. You correct those, the agent updates the implementation, you test again. That's an iterative guessing game.
With TDD, you front-load the specification work into the tests themselves. The agent sees failing tests with concrete inputs and expected outputs. There's much less to guess. The implementation loop is tighter.
The other advantage is that the agent can't hallucinate a solution that passes your tests. It either passes them or it doesn't. This is a meaningful reliability improvement over reviewing implementation-first code, where subtle bugs might not be obvious until they hit production.
The basic workflow
The pattern is simple:
- Write the tests yourself (or with minimal AI help)
- Confirm the tests fail for the right reason
- Hand the failing tests to the agent with a clear implementation prompt
- Run the tests
- If they don't pass, show the failure output to the agent and iterate
Step 1 is non-negotiable. The agent should not be writing the tests in TDD mode. The whole point is that your tests define the specification. If the agent writes both the tests and the implementation, it'll write tests that pass its implementation, not tests that enforce your requirements.
There's a place for agent-generated tests (coverage tests after the fact, edge case expansion), but that's not TDD.
A real example: parsing a config file
Let's say you're building a configuration parser. You want it to handle a specific format, validate required fields, and return useful errors. Here's how the TDD workflow looks.
Write the tests first:
// config-parser.test.ts
import { parseConfig } from './config-parser'
describe('parseConfig', () => {
test('returns a valid config for minimal valid input', () => {
const input = `
port: 3000
host: localhost
`
const result = parseConfig(input)
expect(result.port).toBe(3000)
expect(result.host).toBe('localhost')
})
test('throws ConfigError when port is missing', () => {
const input = `host: localhost`
expect(() => parseConfig(input)).toThrow(ConfigError)
expect(() => parseConfig(input)).toThrow('port is required')
})
test('throws ConfigError when port is not a number', () => {
const input = `port: abc\nhost: localhost`
expect(() => parseConfig(input)).toThrow(ConfigError)
expect(() => parseConfig(input)).toThrow('port must be a number')
})
test('returns default host when host is not specified', () => {
const input = `port: 8080`
const result = parseConfig(input)
expect(result.host).toBe('127.0.0.1')
})
test('throws on empty input', () => {
expect(() => parseConfig('')).toThrow(ConfigError)
})
})
Run these. They all fail because parseConfig doesn't exist yet. Good. That's the right state.
Write the implementation prompt:
Implement parseConfig() in config-parser.ts to make these tests pass.
The tests are in config-parser.test.ts.
The function signature should be:
parseConfig(input: string): Config
Where Config is:
type Config = { port: number; host: string }
Use a custom ConfigError class (extend Error) for all error cases.
Parse the YAML-like format with simple line-by-line parsing, no external
YAML library.
That's it. The tests define the behavior precisely. The prompt defines the interface and the constraint (no external library).
In Claude Code, you'd attach the test file and say:
Here's config-parser.test.ts with failing tests. Implement config-parser.ts
to make all tests pass. Use a ConfigError class for all error cases.
Run the tests after the agent implements. If something fails, paste the failure output directly:
2 tests still failing:
FAIL config-parser.test.ts
✗ throws ConfigError when port is not a number
Expected: ConfigError with message "port must be a number"
Received: Error with message "Invalid type"
✗ returns default host when host is not specified
Expected: "127.0.0.1"
Received: undefined
Fix these two failures.
This is a much tighter feedback loop than reviewing code visually and asking "does this handle all edge cases?"
The "cheating" problem and how to handle it
AI agents will sometimes make tests pass in ways that technically pass but violate the spirit of the test. The most common form: the agent hardcodes return values that match your test inputs instead of implementing actual logic.
Example: if your test checks that add(2, 3) returns 5, a cheating implementation literally returns 5. This doesn't happen often in real code, but it does happen with edge cases. If your only test for "default host" checks that a specific input returns 127.0.0.1, the agent might special-case that string rather than implementing default fallback logic.
The defense: write tests with varied inputs. If you're testing default behavior, use two different inputs that should both trigger the same default:
test('returns default host for any input without a host field', () => {
expect(parseConfig('port: 3000').host).toBe('127.0.0.1')
expect(parseConfig('port: 8080\ndebug: true').host).toBe('127.0.0.1')
})
Now hardcoding is impossible. The logic has to be real.
More complex example: an async service with external dependencies
TDD gets interesting when the implementation involves async operations and external services. The pattern still works; you just need to think about mocking upfront.
Tests for a notification service:
// notification-service.test.ts
import { NotificationService } from './notification-service'
import { EmailClient } from './email-client'
vi.mock('./email-client')
describe('NotificationService', () => {
let service: NotificationService
let mockEmailClient: vi.Mocked<EmailClient>
beforeEach(() => {
mockEmailClient = new EmailClient() as vi.Mocked<EmailClient>
mockEmailClient.send.mockResolvedValue({ id: 'mock-id' })
service = new NotificationService(mockEmailClient)
})
test('sends an email to the user when notified', async () => {
await service.notifyUser('[email protected]', 'Your report is ready')
expect(mockEmailClient.send).toHaveBeenCalledWith({
to: '[email protected]',
subject: 'Notification',
body: 'Your report is ready'
})
})
test('retries once if the first send fails', async () => {
mockEmailClient.send
.mockRejectedValueOnce(new Error('Temporary failure'))
.mockResolvedValue({ id: 'mock-id' })
await service.notifyUser('[email protected]', 'Message')
expect(mockEmailClient.send).toHaveBeenCalledTimes(2)
})
test('throws NotificationError if both retries fail', async () => {
mockEmailClient.send.mockRejectedValue(new Error('Persistent failure'))
await expect(
service.notifyUser('[email protected]', 'Message')
).rejects.toThrow(NotificationError)
})
})
These tests define: the interface (dependency injection for the email client), the retry behavior (one retry), and the error surface (NotificationError). The agent now has a complete specification to work from.
Implementation prompt:
Implement NotificationService in notification-service.ts to make these tests pass.
The service takes an EmailClient as a constructor argument.
EmailClient.send() is already typed in email-client.ts.
The service should:
- Inject dependencies, not instantiate them internally
- Retry exactly once on failure, not more
- Throw NotificationError (custom class extending Error) if both attempts fail
Keep it simple. No exponential backoff here, just one retry.
When to skip TDD and use AI directly
TDD isn't always the right approach. Some cases where writing tests first creates more friction than value:
Exploratory code. If you're not sure what the right interface is yet, writing tests first locks you into an interface before you've thought it through. Write a throwaway implementation to explore the problem, then add tests once the shape is clearer.
Simple utilities with obvious behavior. A function that formats a date string doesn't need tests before it's written. The behavior is obvious; test after.
UI components. Testing React components in a TDD fashion is possible but often produces brittle tests that fight every refactor. Snapshot tests and visual tests are better handled after the component exists.
Infrastructure code. Database migrations, deployment scripts, CI configuration: these don't benefit from TDD.
For business logic, services, parsing/validation code, and any function with non-trivial edge cases, TDD with AI agents is genuinely faster than the alternative.
Setting up your project for AI-assisted TDD
A few project setup choices that make AI-assisted TDD smoother:
Use a fast test runner. Vitest and pytest both have watch modes that run relevant tests on save in under a second. When the iteration loop is fast, the TDD cycle is comfortable. If your tests take 30 seconds to run, you'll abandon the discipline.
Keep test files close to source files. config-parser.ts and config-parser.test.ts in the same directory means the agent can find both immediately without you having to specify paths.
Name your test files predictably. *.test.ts, *.spec.ts, test_*.py: pick one convention and stick to it. The agent needs to find test files reliably.
Add a test script to your CLAUDE.md or .cursorrules. Tell the agent how to run tests:
## Testing
- Run unit tests: `npm run test`
- Run tests in watch mode: `npm run test:watch`
- Run specific file: `npx vitest run src/path/to/file.test.ts`
- Run tests for a changed file: `npx vitest related src/path/to/file.ts`
When the agent knows how to run the tests, it can verify its own implementation. That's the closest thing to autonomous TDD you'll get from current tools.
For broader error handling patterns in AI-assisted code, including what to do when agent-generated implementations have subtle bugs, the error recovery patterns guide covers the production side of the equation.