Claude Code test generation prompt for boundary-case tests

Q: What should I write down before asking an AI to generate tests?

Start with a tiny contract you can read in one breath: - Inputs: types, allowed ranges, what counts as empty/missing - Outputs: success shape and error shape - Side effects: what can be written/changed (DB, files, network) - “Must never happen”: crash, silent data loss, double charge, partial writes Then generate tests from that contract, not from examples alone.

Q: Which boundary cases are usually worth testing?

Test these first: - Min/max values (0, 1, max, max+1) - Empty vs present ("", [], null/nil) - Off-by-one (n-1, n, n+1) - Formatting edges (whitespace-only strings, leading zeros) - Time cutoffs (just before/after expiry) Pick one or two per input dimension so each test covers a unique risk.

Q: How do I write a good “failure mode” test instead of a shallow one?

A good failure-mode test proves two things: 1) The function returns a clear, expected error (type/message/status). 2) It fails safely : - no partial state changes - no leaked internal details - no retries or side effects you didn’t intend If there’s a database write involved, always check what happened in storage after the failure.

Q: How do I turn an invariant into a test assertion?

Default approach: turn the invariant into an assertion on observable outcomes . Examples: - “Total never negative” → - “On error, no state changes” → assert no new rows / no flags flipped - “Idempotent” → call twice and assert the second call doesn’t change state Prefer checking both return value and side effects , because many bugs hide in “returned OK but wrote the wrong thing.”

Q: What should I ask the model to output before generating test code?

Push for PHASE 1: plan only first. Require the model to provide: - 6–10 proposed tests max - For each: intent, setup, input, expected result, why it’s high-signal - A small boundary matrix - A failure-mode list - 3–5 invariants and how to assert them Only after you approve the plan should it generate code. This prevents “20 look-alike tests” output.

Q: How can I quickly tell if an AI-generated test is low-value?

Use a simple deletion test: - If you delete the test and lose no boundary , no failure mode , and no invariant , it didn’t earn its place. Also scan for duplicates: - If two tests would fail for the same bug, keep the one with the stronger assertion. - If assertions are just “not null” or “status 200,” strengthen them or remove the test.

Claude Code test generation prompt for boundary-case tests | Koder.ai

Why happy-path test generation wastes time

Auto-generated test suites often look impressive: dozens of tests, lots of setup code, and every function name shows up somewhere. But many of those tests are just “it works when everything is normal” checks. They pass easily, they rarely catch bugs, and they still cost time to read and maintain.

With a typical Claude Code test generation prompt, the model tends to mirror the example inputs it sees. You get variations that look different but cover the same behavior. The result is a big suite with thin coverage where it matters.

High-signal tests are different. They’re the small set that would have caught last month’s incident. They fail when behavior changes in a risky way, and they stay stable when harmless refactors happen. One high-signal test can be worth twenty “returns the expected value” checks.

Low-value happy-path generation usually has a few clear symptoms:

Many tests differ only in input labels, not in what can break.
Assertions are shallow (“not null”, “status is 200”) instead of checking meaning.
Setup is heavier than the behavior being tested, so people stop updating tests.
Coverage looks high, but edge cases are untouched.

Imagine a function that applies a discount code. Happy-path tests confirm that “SAVE10” reduces the price. Real bugs hide elsewhere: 0 or negative prices, expired codes, rounding edges, or maximum discount caps. Those are the cases that cause bad totals, angry customers, and midnight rollbacks.

The goal is to move from “more tests” to “better tests” by aiming at three targets: boundaries, failure modes, and invariants.

The three targets: boundaries, failure modes, invariants

If you want high-signal unit tests, stop asking for “more tests” and start asking for three specific kinds. This is the core of a Claude Code test generation prompt that produces useful coverage instead of a pile of “works on normal input” checks.

1) Boundaries (where bugs hide)

Boundaries are the edges of what the code accepts or produces. Many real defects are off-by-one, empty-state, or timeout problems that never show up in a happy path.

Think in terms of minimums and maximums (0, 1, max length), empty vs present ("", [], nil), off-by-one (n-1, n, n+1), and time limits (near the cutoff).

Example: if an API accepts “up to 100 items”, test 100 and 101, not just 3.

2) Failure modes (prove it fails safely)

Failure modes are the ways the system can break: bad inputs, missing dependencies, partial results, or upstream errors. Good failure mode tests check behavior under stress, not just output under ideal conditions.

Example: when a database call fails, does the function return a clear error and avoid writing partial data?

3) Invariants (rules that must always hold)

Invariants are truths that should remain true before and after a call. They turn vague correctness into crisp assertions.

Examples:

“Balance never goes negative” after any withdrawal attempt.
“IDs are unique” even if you create items quickly.
“On error, no state changes” (no new rows, no flags flipped).

When you focus on these three targets, you get fewer tests, but each one carries more signal.

Prep: extract a small contract before writing tests

If you ask for tests too early, you usually get a pile of polite “works as expected” checks. A simple fix is to write a tiny contract first, then generate tests from that contract. It’s the fastest way to turn a Claude Code test generation prompt into something that finds real bugs.

A useful contract is short enough to read in one breath. Aim for 5 to 10 lines that answer three questions: what goes in, what comes out, and what else changes.

A 5-10 line contract template

Write the contract in plain language, not code, and include only what you can test.

Inputs: types, allowed ranges, and what counts as “empty” or “missing”.
Output: return value or error shape, and what “success” guarantees.
Side effects: changes to state, database rows, network calls, files, logs.
Assumptions: things callers often get wrong (timezone, encoding, auth, ordering).
“Must never happen”: crash, silent data loss, double charge, info leak, partial writes.

Once you have that, scan it for where reality can break your assumptions. Those become boundary cases (min/max, zero, overflow, empty strings, duplicates) and failure modes (timeouts, permission denied, unique constraint violations, corrupted input).

Here’s a concrete example for a feature like reserveInventory(itemId, qty):

The contract might say qty must be a positive integer, the function should be atomic, and it should never create negative stock. That instantly suggests high-signal tests: qty = 0, qty = 1, qty greater than available, concurrent calls, and a forced database error halfway through.

If you’re using a vibe-coding tool like Koder.ai, the same workflow applies: write the contract in chat first, then generate tests that directly attack boundaries, failure modes, and the “must never happen” list.

Prompt pattern: the high-signal test blueprint

Use this Claude Code test generation prompt when you want fewer tests, but each one pulls its weight. The key move is to force a test plan first, then generate test code only after you approve the plan.

You are helping me write HIGH-SIGNAL unit tests.

Context
- Language/framework: <fill in>
- Function/module under test: <name + short description>
- Inputs: <types, ranges, constraints>
- Outputs: <types + meaning>
- Side effects/external calls: <db, network, clock, randomness>

Contract (keep it small)
1) Preconditions: <what must be true>
2) Postconditions: <what must be true after>
3) Error behavior: <how failures are surfaced>

Task
PHASE 1 (plan only, no code):
A) Propose 6-10 tests max. Do not include “happy path” unless it protects an invariant.
B) For each test, state: intent, setup, input, expected result, and WHY it is high-signal.
C) Invariants: list 3-5 invariants and how each will be asserted.
D) Boundary matrix: propose a small matrix of boundary values (min/max/empty/null/off-by-one/too-long/invalid enum).
E) Failure modes: list negative tests that prove safe behavior (no crash, no partial write, clear error).
Stop after PHASE 1 and ask for approval.

PHASE 2 (after approval):
Generate the actual test code with clear names and minimal mocks.

A practical trick is to require the boundary matrix as a compact table, so gaps are obvious:

Dimension	Valid edge	Just outside	“Weird” value	Expected behavior
length	0	-1	10,000	error vs clamp vs accept

If Claude proposes 20 tests, push back. Ask it to merge similar cases and keep only the ones that would catch a real bug (off-by-one, wrong error type, silent data loss, broken invariant).

Step-by-step: run the prompt and turn output into tests

Write high-signal tests faster

Turn your contract into a focused test plan inside one chat-based build space.

Try Free

Start with a small, concrete contract for the behavior you want. Paste the function signature, a short description of inputs and outputs, and any existing tests (even if they’re only happy-path). This keeps the model anchored in what the code actually does, not what it guesses.

Next, ask for a risk table before asking for any test code. Require three columns: boundary cases (edges of valid input), failure modes (bad input, missing data, timeouts), and invariants (things that must always be true). Add one sentence per row: “why this can break.” A simple table reveals gaps faster than a pile of test files.

Then choose the smallest set of tests where each one has a unique bug-catching purpose. If two tests fail for the same reason, keep the stronger one.

A practical selection rule:

Keep tests that hit different boundaries (min, max, empty, off-by-one).
Keep tests that prove safe behavior under failure (clear error, no partial write, no crash).
Keep tests that assert an invariant (ordering, totals, idempotency, no duplicates).
Cut tests that only repeat “works with normal input.”

Finally, require a short explanation per test: what bug it would catch if it fails. If the explanation is vague (“validates behavior”), the test is probably low-signal.

How to encode invariants into assertions

An invariant is a rule that should stay true no matter which valid input you pass in. With invariant based testing, you first write the rule in plain language, then turn it into an assertion that can fail loudly.

Pick 1 or 2 invariants that actually protect you from real bugs. Good invariants are often about safety (no data loss), consistency (same inputs, same outputs), or limits (never exceed caps).

Turn an invariant into a check you can prove

Write the invariant as a short sentence, then decide what evidence your test can observe: return values, stored data, emitted events, or calls to dependencies. Strong assertions check both outcome and side effects, because many bugs hide in “it returned OK, but wrote the wrong thing.”

For example, say you have a function that applies a coupon to an order:

Invariant: the final total is never negative.
Invariant: applying the same coupon twice does not discount twice.

Now encode those as assertions that measure something concrete:

expect(result.total).toBeGreaterThanOrEqual(0)
expect(db.getOrder(orderId).discountCents).toBe(originalDiscountCents)

Avoid vague asserts like “returns expected result”. Assert the specific rule (non-negative), and the specific side effect (discount stored once).

Add a counterexample note so the test stays sharp

For each invariant, add a short note in the test about what data would violate it. This keeps the test from drifting into a happy-path check later.

A simple pattern that holds up over time:

Put the invariant in the test name.
Assert the invariant on the output.
Assert the key side effect (or lack of side effect).
Add one comment describing a violating case (for example, a huge coupon value or duplicate application).

Failure modes: write tests that prove safe behavior

High-signal tests are often the ones that confirm your code fails safely. If a model only writes happy-path tests, you learn almost nothing about how the feature behaves when inputs and dependencies get messy.

Start by deciding what “safe” means for this feature. Does it return a typed error? Does it fall back to a default? Does it retry once and then stop? Write that expected behavior down in one sentence, then make the tests prove it.

When you ask Claude Code for failure mode tests, keep the goal strict: cover the ways the system can break, and assert the exact response you want. A useful line is: “Prefer fewer tests with stronger assertions over many shallow tests.”

Failure categories that tend to produce the best tests:

Bad inputs: invalid formats, missing required fields, out-of-range values
Dependency failures: timeouts, 500s, empty responses, corrupted payloads
Ordering issues: out-of-order events, duplicates, partial writes
Concurrency: racing updates, idempotency checks
Recovery behavior: when you return an error vs fall back vs retry

Example: you have an endpoint that creates a user and calls an email service to send a welcome message. A low-value test checks “returns 201.” A high-signal failure test checks that if the email service times out, you either (a) still create the user and return 201 with a “email_pending” flag, or (b) return a clear 503 and do not create the user. Pick one behavior, then assert both the response and the side effects.

Also test what you do not leak. If validation fails, ensure nothing is written to the database. If a dependency returns a corrupted payload, ensure you don’t throw an unhandled exception or return raw stack traces.

Common traps that create low-value tests

Align on behavior as a team

Bring teammates into the same workspace to agree on contracts and invariants early.

Invite Team

Low-value test sets usually happen when the model is rewarded for volume. If your Claude Code test generation prompt asks for “20 unit tests,” you often get tiny variations that look thorough but catch nothing new.

Common traps:

Look-alike tests: the same “valid input” test repeated with different strings or numbers.
Tests that mirror the code: asserting private steps or helper calls instead of observable behavior.
Mocking everything: replacing database, clock, network, and config all at once.
Weak assertions: only checking “no error,” “not null,” or “status is 200.”
Dirty shared state: leaving behind seeded data, modified globals, or cached values.

Example: imagine a “create user” function. Ten happy-path tests might vary the email string and still miss the important stuff: rejecting duplicate emails, handling an empty password, and guaranteeing returned user IDs are unique and stable.

Guardrails that help in review:

Require each test to name the risk it covers (boundary, failure mode, or invariant).
Avoid implementation-only checks unless they change observable behavior.
Keep mocks minimal, and allow a small number of tests that hit the real integration point when that’s feasible.
Demand strong assertions: exact outputs, state changes, and error types/messages.
Add cleanup rules so tests don’t depend on order.

Example: turning one feature into a small, strong test set

Imagine one feature: applying a coupon code at checkout.

Contract (small and testable): given a cart subtotal in cents and an optional coupon, return a final total in cents. Rules: percentage coupons round down to the nearest cent, fixed coupons subtract a fixed amount, and totals never go below 0. A coupon can be invalid, expired, or already used.

Don’t ask for “tests for applyCoupon()”. Ask for boundary case testing, failure mode tests, and invariants tied to this contract.

Boundaries to force edge behavior

Pick inputs that tend to break math or validation: an empty coupon string, subtotal = 0, subtotal just below and above a minimum spend, a fixed discount larger than the subtotal, and a percent like 33% that creates rounding.

Failure modes to prove safe behavior

Assume coupon lookup can fail and state can be wrong: the coupon service is down, the coupon is expired, or the coupon is already redeemed by this user. The test should prove what happens next (coupon rejected with a clear error, total unchanged).

A minimal, high-signal test set (5 tests) and what each catches:

Reject empty or whitespace code: catches “accepts blank as valid” bugs and bad trimming.
Percent coupon rounding (subtotal 101, 33%): catches rounding mistakes and off-by-one cents.
Fixed discount greater than subtotal (subtotal 500, discount 1000): proves the invariant that total never becomes negative.
Minimum spend boundary (subtotal 999 vs 1000): catches wrong comparison logic (< vs <=).
Coupon lookup failure or timeout: proves safe fallback (no discount applied) and stable error handling.

If these pass, you’ve covered the common breakpoints without filling the suite with duplicate happy-path tests.

Quick checklist for high-signal AI-generated tests

Deploy with more confidence

Deploy and host your app after your high-signal tests pass the risky cases.

Deploy App

Before you accept what the model generates, do a fast quality pass. The goal is tests that each protect you from a specific, likely bug.

Use this checklist as a gate:

Boundaries per input: for each input field (strings, IDs, timestamps, flags), include at least one edge case (empty vs whitespace-only, max length, zero vs negative, missing optional fields, one past the limit).
Dependency failures: include at least one test where a dependency misbehaves (database timeout, third-party API 500, expired auth token). Prove safe behavior (clear error, no partial writes).
Invariants with strong assertions: pick 1-3 rules that must always hold and assert them directly. Avoid vague asserts like “response is ok”.
One unique bug per test: read each test title and ask, “What exact bug would this catch?” If two tests answer the same question, merge them.
Removal test: try deleting a test. If nothing meaningful is lost (no boundary, no failure mode, no invariant), it didn’t earn its place.

A quick practical trick after generation: rename tests to “should <behavior> when <edge condition>” and “should not <bad outcome> when <failure>”. If you can’t rename them cleanly, they’re not focused.

If you’re building with Koder.ai, this checklist also fits nicely with snapshots and rollback: generate tests, run them, and roll back if the new set adds noise without improving coverage.

Next steps: make this a repeatable workflow

Treat your prompt as a reusable harness, not a one-off request. Save one blueprint prompt (the one that forces boundaries, failure modes, and invariants) and reuse it for every new function, endpoint, or UI flow.

A simple habit that upgrades results fast: ask for one sentence per test explaining what bug it would catch. If that sentence is generic, the test is probably noise.

Keep a living list of domain invariants for your product. Don’t store it in your head. Add to it whenever you find a real bug.

A lightweight workflow you can repeat:

Extract a tiny contract: inputs, outputs, error handling, and 3 to 5 invariants.
Run the blueprint prompt and request boundaries, failure modes, invariants, plus one-line justifications.
Implement only the top 5 to 10 tests that cover distinct risks.
Refactor, then re-run the prompt to see what new risks appear.
Prune duplicates and keep the tests that would have caught past incidents.

If you build apps via chat, run this cycle inside Koder.ai (koder.ai) so the contract, the plan, and the generated tests live in one place. When a refactor changes behavior unexpectedly, snapshots and rollback make it easier to compare and iterate until your high-signal set stays stable.

FAQ

How many unit tests should I generate per function?

Default: aim for a small set that would catch a real bug.

A quick cap that works well is 6–10 tests per unit (function/module). If you need more, it usually means your unit is doing too much or your contract is unclear.

What’s wrong with generating lots of happy-path tests?

Happy-path tests mostly prove that your example still works. They tend to miss the stuff that breaks in production.

High-signal tests target:

Boundaries (0/1/max, empty/null, off-by-one)
Failure modes (timeouts, invalid inputs, dependency errors)
Invariants (rules that must always hold, like “no partial write on error”)

What should I write down before asking an AI to generate tests?

Start with a tiny contract you can read in one breath:

Inputs: types, allowed ranges, what counts as empty/missing
Outputs: success shape and error shape
Side effects: what can be written/changed (DB, files, network)
“Must never happen”: crash, silent data loss, double charge, partial writes

Then generate tests from that contract, not from examples alone.

Which boundary cases are usually worth testing?

Test these first:

Min/max values (0, 1, max, max+1)
Empty vs present ("", [], null/nil)
Off-by-one (n-1, n, n+1)
Formatting edges (whitespace-only strings, leading zeros)
Time cutoffs (just before/after expiry)

Pick one or two per input dimension so each test covers a unique risk.

How do I write a good “failure mode” test instead of a shallow one?

A good failure-mode test proves two things:

The function returns a clear, expected error (type/message/status).
It fails safely:

no partial state changes
no leaked internal details
no retries or side effects you didn’t intend

If there’s a database write involved, always check what happened in storage after the failure.

How do I turn an invariant into a test assertion?

Default approach: turn the invariant into an assertion on observable outcomes.

Examples:

“Total never negative” → expect(total).toBeGreaterThanOrEqual(0)
“On error, no state changes” → assert no new rows / no flags flipped
“Idempotent” → call twice and assert the second call doesn’t change state

Prefer checking both and , because many bugs hide in “returned OK but wrote the wrong thing.”

When is a happy-path test still worth writing?

It’s worth keeping a happy-path test when it protects an invariant or a critical integration.

Good reasons to keep one:

It asserts a key invariant on normal input (e.g., rounding rules)
It locks down an API contract that callers rely on
It guards against a past incident regression

Otherwise, trade it for boundary/failure tests that catch more classes of bugs.

What should I ask the model to output before generating test code?

Push for PHASE 1: plan only first.

Require the model to provide:

6–10 proposed tests max
For each: intent, setup, input, expected result, why it’s high-signal
A small boundary matrix
A failure-mode list
3–5 invariants and how to assert them

Only after you approve the plan should it generate code. This prevents “20 look-alike tests” output.

How do I avoid tests that are brittle because they mock too much?

Default: mock only the boundary you don’t own (DB/network/clock), and keep everything else real.

To avoid over-mocking:

Don’t mock internal helpers just to mirror implementation
Use a real in-memory version when feasible, or a small fake with clear behavior
Mock the clock/randomness only when it affects the assertion

If a test breaks on refactor but behavior didn’t change, it’s often over-mocked or too implementation-coupled.

How can I quickly tell if an AI-generated test is low-value?

Use a simple deletion test:

If you delete the test and lose no boundary, no failure mode, and no invariant, it didn’t earn its place.

Also scan for duplicates:

If two tests would fail for the same bug, keep the one with the stronger assertion.
If assertions are just “not null” or “status 200,” strengthen them or remove the test.