Spec-Driven Development with LLMs: Precise Engineering Through Specifications
Spec-Driven Development with LLMs: Precise Engineering Through Specifications
LLMs are transforming how we write code, but they've also exposed a fundamental truth: vague instructions produce vague implementations. This post introduces spec-driven development (SDD)—a methodology for building reliable software when working with Large Language Models as coding assistants. Specs aren't just documentation; they're the contract that ensures both humans and AI produce exactly what you need.
Why Specs Matter More Than Ever in the Age of LLMs
LLMs like Claude are remarkably capable coding assistants, but they have a fundamental limitation: they can only build what you describe. Vague instructions produce vague implementations. Incomplete requirements lead to incomplete features.
This is where spec-driven development becomes essential:
Without Specs With Specs
───────────────────────────── ─────────────────────────────
"Add health endpoints" → Ambiguous implementation
- What status codes?
- What response format?
- Which dependencies to check?
"Implement requirements → Precise implementation
1.1 through 1.5 from - HTTP 200 with JSON
control-plane-health- - RFC3339 timestamps
endpoints spec" - Database/cache checks
- Configurable timeouts
The Contract Between Human and Machine
Think of a spec as a legally binding contract between you and the LLM:
You specify exactly what you want, with testable acceptance criteria
The LLM implements according to those criteria
Tests verify the implementation matches the spec
Everyone wins: you get what you asked for, the LLM has clear guidance
Specs Prevent "AI Drift"
Without specs, LLMs can:
Make assumptions about behavior you didn't intend
Add features you didn't ask for
Implement patterns that don't match your architecture
Miss edge cases that seem obvious to you
With specs, these problems disappear. The LLM has explicit requirements to follow, and tests verify compliance.
Specs as Versioned Code Artifacts
Critical principle: Specs are not separate documentation—they are first-class code artifacts that live alongside your implementation.
Directory Structure
your-project/
├── .kiro/
│ └── specs/ # All specifications
│ ├── README.md # Spec conventions and overview
│ ├── mage-build-system/
│ │ ├── requirements.md # What we're building
│ │ ├── design.md # How we'll build it
│ │ └── tasks.md # Implementation checklist
│ ├── authentication-middleware/
│ │ ├── requirements.md
│ │ ├── design.md
│ │ └── tasks.md
│ └── control-plane-health-endpoints/
│ ├── requirements.md
│ ├── design.md
│ └── tasks.md
├── internal/ # Implementation
├── cmd/ # Entry points
└── test/ # Integration tests
Why Version Specs with Code?
Traceability:
git blameshows who changed what requirement and whenHistory: You can see how requirements evolved over time
Context: Understanding why code exists by reading the original spec
Synchronization: Specs and code stay in sync through the same PR process
Onboarding: New engineers read specs to understand the system
Specs as Living Documentation
Unlike external documentation that drifts from reality, versioned specs:
Are reviewed in PRs alongside code changes
Must be updated when requirements change
Provide audit trails for compliance and debugging
Explain rationale that comments can't capture
For example:
- [x] 12. Final checkpoint - Ensure all tests pass and architecture is validated
- Build successful: `go build ./...` passes
- Property-based tests passing: All test structure alignment tests pass
- **PBT Status**:
- Test structure alignment: All 6 properties PASS
- Layer-based directory structure: All 5 properties PASS
- Migration completeness: All 6 properties PASS
- **Overall Status**: CLEAN architecture migration COMPLETE
This is permanent, searchable history of what was verified and when.
The Three-Document Structure
Every feature has three documents that work together:
| Document | Purpose | Audience | LLM Usage |
requirements.md | What to build and why | Product, Engineering | Context for implementation |
design.md | How to build it | Engineering | Architecture guidance |
tasks.md | Step-by-step checklist | Implementation (Human or LLM) | Direct instructions |
How LLMs Use Each Document
When working with an LLM on a feature:
1. Share requirements.md → LLM understands the goal and constraints
2. Share design.md → LLM follows your architecture decisions
3. Work through tasks.md → LLM implements each task with clear scope
4. Run verification → Tests confirm correctness
Requirements: The "What" and "Why"
The requirements.md file defines success criteria. For LLMs, this is especially critical—they need explicit, testable statements.
The EARS Pattern
We use EARS (Easy Approach to Requirements Syntax) for machine-parseable requirements:
| Keyword | Meaning | Example |
| WHEN | Trigger condition | WHEN a client sends GET /health |
| THE | System component | THE System |
| SHALL | Mandatory | SHALL return HTTP 200 |
| SHALL NOT | Forbidden | SHALL NOT log secrets |
| IF | Conditional | IF the cache is nil |
Example: Health Endpoints Requirements
For example:
### Requirement 1
**User Story:** As a platform operator, I want a basic health check endpoint,
so that I can verify the Control Plane service is running and responsive.
#### Acceptance Criteria
1. WHEN a client sends GET /health, THE System SHALL return HTTP 200 with JSON
2. WHEN the health endpoint responds, THE System SHALL include status field
with value "healthy"
3. WHEN the health endpoint responds, THE System SHALL include timestamp field
in RFC3339 format
4. WHEN the health endpoint responds, THE System SHALL include service field
with value "control-plane"
5. WHEN the health endpoint responds, THE System SHALL include version field
with the current service version
Why this works for LLMs:
Each criterion is specific and testable
Values are explicitly stated ("healthy", "RFC3339")
No ambiguity about expected behavior
The Glossary: Shared Vocabulary
Define terms once, use them everywhere:
## Glossary
- **Task_Store**: Persistent storage for tasks (JSON file)
- **Zero_Magic**: Architectural principle requiring explicit behavior,
no automatic discovery, and inspectable operations
- **12_Factor**: Application design methodology emphasizing configuration
via environment, stateless processes, and explicit dependencies
The underscore convention (Task_Store not "task store") makes terms searchable and unambiguous for both humans and LLMs.
Design: The "How"
The design.md document captures architectural decisions that the LLM must follow.
Why Design Documents Matter for LLMs
Without design guidance, LLMs will:
Choose their own patterns (which may not match your codebase)
Make their own architectural decisions (which you'll have to reverse)
Miss integration points (which cause bugs later)
With design documents:
// From mage-build-system/design.md
### Mage Target Organization
Targets are organized into namespaces for clarity (100% namespaced):
```go
// Build namespace - compilation and build management (9 targets)
type Build mg.Namespace
func (Build) Default() error // Build for current platform
func (Build) All() error // Build for all platforms
func (Build) LinuxAmd64() error // Build for linux-amd64
The LLM now knows:
- Use namespaces (not flat functions)
- Follow the naming convention
- Match the existing pattern
### Correctness Properties
A critical part of design documents is **correctness properties**—formal statements about system behavior:
```markdown
### Property 4: Status Code Mapping
*For any* ready endpoint response, if all checks have value "ok" then
HTTP status should be 200, and if any check has value "error" then
HTTP status should be 503.
**Validates: Requirements 2.3, 2.4, 4.2, 4.3**
These properties:
Define invariants that must always hold
Become property-based tests in the implementation
Provide verification criteria for LLM output
Tasks: The "When"
The tasks.md file is the implementation checklist—direct instructions for whoever (human or LLM) is writing the code.
Structure for LLM Consumption
- [ ] 1. Create health check logic file and implement dependency testing
- Create `internal/control/health.go` with package declaration and imports
- Implement `CheckDatabaseHealth(db *gorm.DB) string` function
- Handle nil database connection (return "error")
- Execute ping with 500ms timeout
- Return "ok" on success, "error" on failure
- _Requirements: 5.1, 5.2, 5.3, 5.5_
- [ ] 1.1 Write property test for database health check function
- **Property 5: Database Check Result Mapping**
- **Validates: Requirements 5.2, 5.3**
- Tag: `Feature: control-plane-health-endpoints, Property 5`
Key elements:
Checkboxes track progress
Specific file paths eliminate guessing
Requirement references enable verification
Testing tasks follow implementation tasks
The Implementation-Test Pattern
Notice how every implementation task has corresponding test tasks:
Task 1: Implement feature X
Task 1.1: Write unit tests for X
Task 1.2: Write property test for X
Task 2: Implement feature Y
Task 2.1: Write unit tests for Y
...
Task N: Final checkpoint - verify all tests pass
This ensures nothing ships without verification.
The Verification Pyramid: Ensuring Correctness
Specs are only valuable if we can verify the implementation matches them. We use a multi-layered verification approach:
┌─────────────────┐
│ E2E Tests │ ← Full system verification
│ (Minutes) │
└────────┬────────┘
│
┌────────┴────────┐
│ Integration │ ← Component interaction
│ (Seconds) │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ Property-Based Tests │ ← Universal properties
│ (Seconds) │
└──────────────┬──────────────┘
│
┌────────────────────┴────────────────────┐
│ Unit Tests │ ← Individual functions
│ (Milliseconds) │
└────────────────────┬────────────────────┘
│
┌────────────────────────┴────────────────────────┐
│ Linting & Formatting │ ← Code quality
│ (Milliseconds) │
└─────────────────────────────────────────────────┘
Layer 1: Linting and Formatting
Purpose: Ensure code quality before tests even run
mage quality:lint # Run golangci-lint
mage quality:fmt # Format code with gofmt
mage quality:vet # Run go vet
mage quality:check # Verify formatting (CI-friendly)
Why this matters for LLM output:
LLMs sometimes generate code with style inconsistencies
Linting catches security issues, bugs, and anti-patterns
Formatting ensures consistent code style
From mage-build-system/requirements.md:
### Requirement 7: Validation and Quality Targets
1. THE Build_System SHALL provide a `quality:lint` target for golangci-lint
2. THE Build_System SHALL provide a `quality:fix` target for auto-fix
3. THE Build_System SHALL provide a `quality:fmt` target for formatting
4. THE Build_System SHALL provide a `quality:check` for verifying format (CI)
5. THE Build_System SHALL provide a `quality:vet` for running go vet
Layer 2: Unit Tests
Purpose: Verify individual functions behave correctly
// TestNewTask_EmptyTitle verifies that empty title returns error.
func TestNewTask_EmptyTitle(t *testing.T) {
_, err := NewTask("", PriorityMedium)
if err != ErrEmptyTitle {
t.Errorf("expected ErrEmptyTitle, got %v", err)
}
}
Maps to requirements:
5. WHEN the title is empty, THE System SHALL return an error "title is required"
Layer 3: Property-Based Tests
Purpose: Verify universal properties hold across ALL valid inputs
// Feature: control-plane-health-endpoints, Property 5: Database check result mapping
func TestProperty_DatabaseCheckResultMapping(t *testing.T) {
parameters := gopter.DefaultTestParameters()
parameters.MinSuccessfulTests = 100
properties := gopter.NewProperties(parameters)
properties.Property("database check returns correct status", prop.ForAll(
func(dbState string) bool {
switch dbState {
case "working":
return CheckDatabaseHealth(workingDB) == "ok"
case "nil":
return CheckDatabaseHealth(nil) == "error"
case "failed":
return CheckDatabaseHealth(failedDB) == "error"
}
return true
},
gen.OneOfConst("working", "nil", "failed"),
))
properties.TestingRun(t)
}
Why property tests are essential:
Unit tests verify specific examples
Property tests verify universal truths
LLMs may miss edge cases that properties catch
From clean-architecture-reorganization/design.md:
**Property 6: Dependency rule enforcement**
*For any* Go source file in the repository, its import statements
should follow the dependency rule where entities import nothing
from other layers, use cases import only from entities, adapters
import from entities and use cases, and drivers import from any layer.
**Validates: Requirements 6.1, 6.2, 6.3, 8.1, 8.2, 8.3, 8.4**
Layer 4: Integration Tests
Purpose: Verify components work together correctly
func TestJSONStore_Persistence(t *testing.T) {
// Create temp directory
tmpDir := t.TempDir()
storePath := filepath.Join(tmpDir, "tasks.json")
// Create store and add task
store1, _ := NewJSONStore(storePath)
task, _ := NewTask("Persistent task", PriorityLow)
store1.Add(*task)
// Create new store instance (simulates restart)
store2, err := NewJSONStore(storePath)
if err != nil {
t.Fatalf("failed to create second store: %v", err)
}
// Verify task persisted
tasks, _ := store2.GetAll()
if len(tasks) != 1 {
t.Errorf("expected 1 task, got %d", len(tasks))
}
}
Maps to requirements:
### Requirement 5: Data Persistence
4. WHEN the application starts, THE System SHALL load tasks from Task_Store
5. WHEN the Task_Store file doesn't exist, THE System SHALL create it
Layer 5: End-to-End Tests
Purpose: Verify the complete system works as intended
From control-plane-health-endpoints/tasks.md:
- [x] 9. Write integration tests for failure scenarios
- Test database failure scenario
- Start Control Plane server
- Stop database container
- Make GET /ready request
- Verify HTTP 503 response
- Verify database check is "error"
The Verification Commands
mage test:unit # Run unit tests (fast)
mage test:property # Run property-based tests
mage test:integration # Run integration tests (with testcontainers)
mage test:e2e # Run end-to-end tests (with KIND)
mage test:all # Run all tests
mage test:coverage # Generate coverage report
Real Examples from Our Codebase
Example 1: Mage Build System Migration
The challenge: Migrate from Makefile to Mage while maintaining all functionality.
How specs helped:
19 detailed requirements covering every target
Design document with exact interface signatures
23 implementation tasks with checkboxes
Verification:
- [x] 23. Final Validation
- Run full test suite (mage test:all)
- Build for all platforms (mage build:all)
- Generate all code (mage gen:all)
- Validate all specs (mage validate:specs)
- Test release process (mage release:dryRun)
- Verify CI/CD workflows pass
Outcome: Complete migration with zero functionality loss, fully verified.
Example 2: CLEAN Architecture Reorganization
The challenge: Restructure entire codebase to follow CLEAN architecture.
How specs helped:
Property-based tests verify architectural constraints
Import restrictions enforced by linting rules
Clear migration path in tasks document
Key property test:
**Property 6: Dependency rule enforcement**
*For any* Go source file in the repository, its import statements
should follow the dependency rule...
Outcome: Architecture constraints are automatically verified on every commit.
Example 3: Authentication Middleware
The challenge: Implement JWT auth with development bypass mode.
How specs helped:
Clear requirements for production vs development behavior
Design specifies use of go-chi/jwtauth (no custom crypto)
Tests verify both modes work correctly
Key requirement:
1. WHEN running in development mode with X-Test-Namespace header present,
THE Authentication_Middleware SHALL use the header value as the namespace
2. WHEN running in development mode without X-Test-Namespace header,
THE Authentication_Middleware SHALL use a default namespace "default"
Working with LLMs: The Spec-Test-Verify Loop
Here's the workflow for LLM-assisted development:
Step 1: Write the Spec First
Before engaging the LLM:
Write
requirements.mdwith testable acceptance criteriaWrite
design.mdwith architecture and interfacesWrite
tasks.mdwith implementation checklist
Step 2: Share Context with the LLM
You: "I need to implement the control-plane-health-endpoints feature.
Here are the specs: [paste requirements.md, design.md, tasks.md]
Please implement task 1."
Step 3: LLM Implements
The LLM follows:
Requirements for behavior
Design for architecture
Tasks for scope
Step 4: Verify with Tests
# After LLM generates code
mage quality:lint # Does it pass linting?
mage quality:fmt # Is it formatted correctly?
mage test:unit # Do unit tests pass?
mage test:property # Do properties hold?
Step 5: Iterate if Needed
If verification fails:
You: "Task 1 is failing property test 5. The requirement says:
'WHEN the database connection is nil, THE System SHALL return error'
But the implementation returns 'ok'. Please fix."
The LLM has specific feedback to address.
Step 6: Mark Complete and Continue
- [x] 1. Create health check logic file ← Mark done
- [x] 1.1 Write property test ← Mark done
- [ ] 2. Define response types ← Next task
Specs Provide Insight: The "Why" Behind the "What"
Specs aren't just for implementation—they're permanent records of decision-making.
Understanding Intent
Six months from now, when someone asks "why does the cache check return 'ok' when the cache is nil?":
For example:
5. WHEN the cache connection is nil, THE System SHALL mark cache status
as "ok" (cache is optional)
The spec explains the requirement. The design explains the rationale:
For example:
### Cache Connectivity Errors
**Scenarios**:
- Cache connection is nil → Return "ok" (cache is optional)
- Cache ping fails → Return "error" status
**Rationale**: The cache is used for performance optimization, not core
functionality. A missing cache should not prevent the service from
being marked as ready.
Debugging with Specs
When a bug is reported:
Find the relevant spec
Check if the requirement covers this case
If yes → implementation bug (fix the code)
If no → spec gap (update spec, then code)
Onboarding with Specs
New team members can:
Read specs to understand what the system does
Read designs to understand how it's built
Read tasks to see what was verified
Use
git logon specs to see evolution
Best Practices and Anti-Patterns
Best Practices
1. Write Specs Before Implementation
Even if the LLM could "just figure it out," specs ensure you get what you actually need.
2. Make Every Requirement Testable
Bad: "The system should be fast"
Good: "THE System SHALL respond within 100 milliseconds"
3. Include Verification in Tasks
Every implementation task should have corresponding test tasks:
- [ ] 3. Implement feature X
- [ ] 3.1 Write unit tests for X
- [ ] 3.2 Write property test for X
4. Run Full Verification Before Merge
mage quality:all && mage test:all
5. Update Specs When Requirements Change
Specs must stay synchronized with code. If a PR changes behavior, it must update the spec.
6. Reference Requirements in Tests
// Requirement 1.3: timestamp field in RFC3339 format
func TestHealthResponse_TimestampFormat(t *testing.T) {
...
}
Anti-Patterns to Avoid
1. Writing Specs After Implementation
This defeats the purpose. Specs guide implementation, not document it after the fact.
2. Skipping Tests "Because the LLM Seems Right"
LLMs are confident even when wrong. Always verify.
3. Vague Acceptance Criteria
Bad: "The system should handle errors gracefully"
Good: "WHEN the database query fails, THE System SHALL return HTTP 503"
4. Not Running Linting
LLM output often has subtle issues that linting catches.
5. Orphan Tests
Every test should trace to a requirement. No requirement? No test needed.
6. Treating Specs as Separate from Code
Specs live in the repo, are reviewed in PRs, and evolve with the code.
Conclusion
Spec-driven development with LLMs is about precision and verification:
Specs define success with testable acceptance criteria
LLMs implement following explicit guidance
Tests verify the implementation matches the spec
Versioned specs provide permanent, searchable history
The result:
Reliable code that does exactly what you specified
Comprehensive tests that catch regressions
Living documentation that explains why code exists
Efficient LLM collaboration with clear contracts
Specs aren't overhead—they're the foundation of quality. Welcome to the team!
Further Reading
Last updated: 2026-01-11