Learn how to create Claude Code feature specs from code by extracting real app behavior from routes and components, then producing a living spec and gaps list.

People disagree about what an app does because they remember different versions of it. Support remembers the last angry ticket. Sales remembers the demo path. Engineers remember what the feature was meant to do. Ask three people and you get three confident answers, and none of them match the current build.
Over time, the code becomes the only source that stays current. Docs drift, tickets get closed, and quick fixes pile up. A route gets a new validation rule. A UI toggle changes a default. A handler starts returning different errors. Nobody updates the spec because it feels optional, and each change feels too small to document.
That creates predictable problems. Teams ship changes that break edge cases they didn't know existed. QA tests the happy path and misses rules buried in handlers. New teammates copy behavior from the UI without understanding real constraints. Stakeholders debate opinions instead of pointing to agreed behavior.
A good outcome isn't a perfect document. It's shared clarity. Everyone should be able to answer: "What happens if I do X?" and "What does the system guarantee?" without guessing. You get fewer surprises, smaller review cycles, and fewer "Wait, it already does that" moments because the team is looking at the same truth.
When a spec matches the code, it becomes safe to plan changes. You can spot what's stable, what's accidental, and what's missing before you ship.
A living spec is a short, editable description of what the app actually does today. It's not a one-time document. It changes whenever behavior changes, so the team can trust it.
When people talk about feature specs written from code (for example, using Claude Code), the goal is simple: read real behavior from routes, handlers, and screens, then write it down in plain language.
A useful living spec focuses on what users can see and what the system promises. It should cover:
What it should not cover is how the code is organized. If you start naming files and refactor plans, you're drifting into implementation detail. Avoid:
A gaps list is separate. It's a small list of mismatches and unknowns you find while writing the spec.
Example: one route rejects files over 10MB, but the UI says 25MB. That's a gap until the team decides which rule is real and updates either the code or the spec.
Start small. If you try to document the whole app, you'll end up with a pile of notes nobody trusts. Pick one slice users can describe in a sentence, like "invite a teammate," "checkout," or "reset password." Good scopes are a single feature area, one module, or one user journey from entry point to outcome.
Choose your entry point based on where truth lives:
Before you read code, collect a few inputs so mismatches stand out quickly: any existing API docs, old product notes, support tickets, and the "known pain points" people complain about. These don't override the code, but they help you notice missing states like errors, edge cases, and permissions.
Keep the spec format boring and consistent. Teams align faster when every spec reads the same way.
Use this structure repeatedly and your feature specs will stay readable, comparable, and easy to update.
Start with server entry points. Routes and handlers show "what the app does" in concrete terms: who can call it, what they must send, what they get back, and what changes in the system.
List the routes in scope and map each one to a user intent. Don't write "POST /api/orders." Write "Place an order" or "Save a draft." If you can't name the intent in plain words, that's already a spec gap.
As you read each handler, capture inputs and validation rules as user-facing requirements. Include required fields, allowed formats, and the rules that cause real errors. For example: "Email must be valid," "Quantity must be at least 1," "Start date can't be in the past."
Write auth and role checks the same way. Instead of "middleware: requireAdmin," document: "Only admins can cancel any order. Regular users can only cancel their own order within 10 minutes." If the code checks ownership, feature flags, or tenant boundaries, include those too.
Then note outputs and outcomes. What does success return (a created ID, an updated object)? What do common failures look like (401 not signed in, 403 not allowed, 404 not found, 409 conflict, 422 validation error)?
Finally, record side effects because they're part of behavior: records created or updated, emails or notifications sent, events published, background jobs queued, and anything that triggers other flows. These details prevent surprises when teams rely on the spec later.
Routes tell you what the app can do. Components tell you what users actually experience. Treat the UI as part of the contract: what shows up, what gets blocked, and what happens when things go wrong.
Start by finding the entry screens for the feature. Look for the page component, layout wrapper, and a few "decision" components that control fetching, permissions, and navigation. That's usually where real behavior lives.
As you read components, capture rules users can feel: when actions are disabled, required steps, conditional fields, loading states, and how errors appear (inline field errors vs toast, auto-retry, "try again" buttons). Also note state and caching behavior such as stale data showing first, optimistic updates, or "last saved" timestamps.
Watch for hidden flows that silently change what users see. Search for feature flags, experiment buckets, and admin-only gates. Note silent redirects too, like sending logged-out users to sign-in or sending users without access to an upgrade screen.
A concrete example: on a "Change Email" screen, document that Save stays disabled until the email is valid, a spinner shows during the request, success triggers a confirmation banner, and backend validation errors render under the input. If the code shows a flag like newEmailFlow, note both variants and what differs.
Write each UI flow as short steps (what the user does, what the UI does back) and keep conditions and errors next to the step they affect. That keeps the spec readable and makes gaps easier to spot.
Raw notes from routes and components are useful, but hard to discuss. Rewrite what you observed into a spec a PM, designer, QA, and engineer can all read and agree on.
A practical pattern is one user story per route or screen. Keep it small and specific. For example: "As a signed-in user, I can reset my password so I can regain access." If the code shows different behavior by role (admin vs user), split it into separate stories instead of hiding it in footnotes.
Then write acceptance criteria that mirror real code paths, not the ideal product. If the handler returns 401 when the token is missing, that's a criterion. If the UI disables submit until a field is valid, that's a criterion.
Include data rules in plain language, especially the ones that cause surprises: limits, ordering, uniqueness, required fields. "Usernames must be unique (checked on save)" is clearer than "unique index."
Edge cases are often the difference between a nice doc and a useful one. Call out empty states, null values, retries, timeouts, and what users see when an API call fails.
When you hit unknowns, mark them instead of guessing:
Those markers turn into quick team questions instead of silent assumptions.
A gaps list is not a second Jira. It's a short, evidence-based record of where code and intended behavior don't match, or where nobody can clearly explain what "correct" is. Done well, it becomes a tool for agreement, not a planning fight.
Be strict about what counts as a gap:
When you log a gap, include three parts so it stays grounded:
Evidence is what keeps the list from becoming opinions. For example: "POST /checkout/apply-coupon accepts expired coupons, but CouponBanner.tsx blocks them in the UI. Impact: revenue and user confusion. Type: bug or missing decision (confirm intended rule)."
Keep it short. Set a hard cap, like 10 items for the first pass. If you find 40 issues, group them into patterns (validation inconsistencies, permission checks, empty states) and keep only the top examples.
Avoid dates and scheduling inside the gaps list. If you need ownership, keep it lightweight: note who should make the decision (product) or who can verify the behavior (engineering), then move the real planning to your backlog.
Pick a small, high-traffic scope: checkout with promo codes and shipping options. The goal isn't to rewrite the whole product, just to capture what the app does today.
Start with backend routes. This is often where rules show up first. You might find routes like POST /checkout/apply-promo, GET /checkout/shipping-options, and POST /checkout/confirm.
From those handlers, write behavior in plain words:
Then check UI components. A PromoCodeInput might show totals only refresh after a successful response and errors render inline under the input. A ShippingOptions component might auto-select the cheapest option on first load and trigger a full price breakdown refresh when the user changes it.
Now you have a readable spec and a small gaps list. For example: error messages differ between the promo route and the UI ("Invalid code" vs "Not eligible"), and nobody can point to a clear tax rounding rule (per line vs order total).
In planning, the team agrees on reality first, then decides what to change. Instead of debating opinions, you review documented behaviors, pick one inconsistency to fix, and leave the rest as "known current behavior" until it's worth revisiting.
A spec only helps if the team agrees it matches reality. Do a short read-through with one engineer and one product person. Keep it tight: 20-30 minutes focused on what users can do and what the system does in response.
During the read-through, turn statements into yes/no questions. "When a user hits this route, do we always return 403 without a session?" "Is this empty state intentional?" This separates intended behavior from accidental behavior that slipped in over time.
Agree on vocabulary before you edit anything. Use the words users see in the UI (button labels, page titles, error messages). Add internal names only when they help engineers find the code (route names, component names). This prevents mismatches like product saying "Workspace" while the spec says "Org."
To keep it current, make ownership and cadence explicit:
If you're using a tool like Koder.ai, snapshots and rollback can help you compare "before" and "after" behavior when you update a spec, especially after a big refactor.
The fastest way to lose trust in a spec is to describe the product you want, not the product you have. Keep a hard rule: every statement should be backed by something you can point to in code or a real screen.
Another common trap is copying the code's shape into the document. A spec that reads like "Controller -> Service -> Repository" isn't a spec, it's a folder map. Write in user-facing terms: what triggers the action, what the user sees, what gets saved, and what errors look like.
Permissions and roles are often ignored until the end, then everything breaks. Add access rules early, even if they're messy. Call out which roles can view, create, edit, delete, export, or approve, and where the rule is enforced (UI only, API only, or both).
Don't skip non-happy paths. Real behavior hides in retries, partial failures, and time-based rules like expirations, cooldowns, scheduled jobs, or "only once per day" limits. Treat these as first-class behaviors.
A quick way to surface gaps is to check for:
Finally, keep your gaps list moving. Each gap should be labeled as one of: "unknown, needs decision," "bug, fix," or "missing feature, plan." If nothing gets labeled, the list stalls and the spec stops being "living."
Do a fast pass for clarity, coverage, and actionability. Someone who didn't write it should understand what the feature does today, and what's still unclear.
Read the spec like a new teammate on day one. If they can summarize the feature in a minute, you're close. If they keep asking "where does this start?" or "what's the happy path?" tighten the opening.
Check:
Each gap should be specific and testable. Instead of "Error handling unclear," write: "If payment provider returns 402, UI shows a generic toast; confirm desired message and retry behavior." Add a single next action (ask product, add a test, inspect logs) and note who should answer it.
Pick one feature area and timebox it to 60 minutes. Choose something small but real (login, checkout, search, an admin screen). Write one sentence of scope: what's included and what's out.
Run the workflow once end-to-end: skim key routes/handlers, trace the main UI flow, and write down observable behaviors (inputs, outputs, validation, error states). If you get stuck, log the question as a gap and move on.
When you're done, share the spec where the team can comment, and set one rule: any shipped behavior change must update the spec in the same delivery window, even if it's five lines.
Keep gaps separate from the backlog. Group them into "unknown behavior," "inconsistent behavior," and "missing tests," then review them briefly each week to decide what matters now.
If drafting and iteration feel slow, a chat-based builder like Koder.ai can help you get a first version down quickly. Describe the feature, paste key snippets or route names, refine the wording in conversation, and export source when you need it. The point is speed and shared clarity, not a bigger process.
Start with one small, user-visible slice (for example, “reset password” or “invite a teammate”). Read the routes/handlers to capture rules and outcomes, then read the UI flow to capture what users actually see (disabled states, errors, redirects). Write it up using a consistent template and log unknowns as a separate gaps list.
Default: treat current code behavior as the source of truth and document it.
If the behavior looks accidental or inconsistent, don’t “fix it” in the spec—mark it as a gap with evidence (where you saw it and what it does), then get a decision to update either the code or the spec.
Keep it boring and repeatable. A practical template is:
This keeps specs readable and makes mismatches easier to spot.
Write rules as user-facing requirements, not as code notes.
Examples:
Capture what triggers an error and what the user experiences when it happens.
Focus on what’s observable:
Side effects matter because they affect other features and support/ops expectations.
If the UI blocks something the API allows (or vice versa), log it as a gap until a decision is made.
Record:
Then agree on one rule and update both code and spec to match.
Keep the gaps list small and evidence-based. Each item should have:
Avoid scheduling or turning it into a second backlog.
Document them explicitly instead of hiding them.
Include:
These are usually where surprises and bugs come from.
Keep it short: a 20–30 minute read-through with one engineer and one product person.
Turn statements into yes/no checks (for example, “Do we always return 403 when not allowed?”). Align on vocabulary using the UI’s words (labels and messages) so everyone means the same thing.
Put the spec close to the code and make updates part of shipping.
Practical defaults:
The goal is small, frequent edits—not a big rewrite.