Joel Spolsky software truths for AI-assisted development

Q: Why does testing still matter so much with AI-generated code?

Because tests are where you discover what the code actually does. AI can produce plausible logic that misses one key rule (permissions, retries, edge states). Tests turn your expectations into something you can run, repeat, and trust.

Q: What should I test first in an AI-assisted project?

Start with what would hurt most: - Core flows (signup, checkout, save, export) - Permissions (view/edit/delete rules) - Data integrity (no duplicates, correct totals, safe migrations) - Failure modes (timeouts, retries, empty states, bad inputs) Add more coverage after the “high damage” behaviors are locked down.

Q: How do I avoid huge AI-generated commits that nobody can review?

Break it up. Keep each change set reviewable in minutes: - One feature per change set - One migration per change set - Tests updated in the same change - A short note: what changed, how to verify, and what could break This makes review real instead of rubber-stamping.

Joel Spolsky software truths for AI-assisted development | Koder.ai

Why these truths still matter when AI writes code quickly

AI can produce working-looking code in minutes. That changes the pace of a project, but it doesn’t change what makes software succeed. The lessons in Joel Spolsky’s “software truths” were never really about typing speed. They were about judgment, feedback loops, and avoiding self-inflicted complexity.

What’s changed is the cost of creating code. You can ask for three approaches, five variations, or a full rewrite and get something back instantly. What hasn’t changed is the cost of choosing the right approach, checking it, and living with it for months. Time saved on writing often moves to deciding what you meant, validating edge cases, and making sure today’s quick win doesn’t become tomorrow’s maintenance tax.

Correctness, security, and maintainability still take real time because they rely on proof, not confidence. A login flow isn’t done when it compiles. It’s done when it reliably rejects bad inputs, handles weird states, and doesn’t leak data. AI can sound certain while missing one crucial detail, like a permissions check on an endpoint or a race condition in a payment update.

AI is strongest when you treat it like a fast draft machine. It shines at boilerplate, repetitive patterns, quick refactors, and exploring options you can compare side by side. Used well, it compresses the “blank page” phase.

AI hurts most when you hand it vague goals and accept the output at face value. The same failure patterns show up again and again: hidden assumptions (unstated business rules), untested paths (error handling, retries, empty states), confident mistakes (plausible code that’s subtly wrong), and “clever” solutions that are hard to explain later.

If code is cheap, the new scarce resource is trust. These truths matter because they protect that trust: with users, with teammates, and with your future self.

Testing stays the bottleneck, and that is a good thing

When AI can generate a feature in minutes, it’s tempting to treat testing as the slow part you need to eliminate. Spolsky’s point still holds: the slow part is where the truth is. Code is easy to produce. Correct behavior is not.

A useful shift is to treat tests as requirements you can run. If you can’t describe the expected behavior in a checkable way, you’re not done thinking. In AI-assisted work, this matters more, not less, because the model can confidently produce something that’s only slightly wrong.

Start testing with the things that would hurt most if they broke. For most products, that’s core flows (signup, checkout, save, export), permissions (who can view, edit, delete), and data integrity (no duplicates, correct totals, safe migrations). Then cover the edges that tend to cause late-night incidents: empty inputs, long text, time zones, retries, and flaky external boundaries like payments, emails, and file uploads.

AI is great at proposing test cases, but it can’t know what you actually promised users. Use it like a brainstorming partner: ask for missing edge cases, abuse scenarios, and permission combinations. Then do the human work: match coverage to your real rules and remove tests that only “test the implementation” instead of the behavior.

Make failures actionable. A failing test should tell you what broke, not send you on a scavenger hunt. Keep tests small, name them like sentences, and make error messages specific.

A quick example

Say you build a simple “team notes” app with AI help. CRUD screens show up fast. The correctness risk isn’t the UI. It’s access control and data rules: a user must not see another team’s notes, edits must not overwrite newer changes, and deleting a note shouldn’t leave orphaned attachments. Tests that lock these rules down will feel like the bottleneck, but they’re also your safety net.

When testing is the bottleneck, it forces clarity. That clarity is what keeps fast code from turning into fast bugs.

Simplicity beats cleverness, especially with AI in the loop

One of the most durable truths is that simple code wins over clever code. AI makes it tempting to accept fancy abstractions because they arrive polished and fast. The cost shows up later: more places for bugs to hide, more files to scan, and more “what is this even doing?” moments.

When code is cheap, complexity is what you pay for. A small, boring design is easier to test, easier to change, and easier to explain. That matters even more when the first draft came from a model that can sound confident while being subtly wrong.

A practical rule is to keep functions, components, and modules small enough that a teammate can review them in minutes, not hours. If a React component needs multiple custom hooks, a local state machine, and a generic “smart renderer” layer, pause and ask whether you’re solving a real problem or just accepting architecture because AI offered it.

A few “simplicity tests” help you push back:

Can a new teammate understand the main flow in one sitting?
Could you delete this abstraction and replace it with plain code without losing clarity?
Is there one obvious place to fix a bug, or five?
Does each part have one job you can state in a sentence?

Prompts matter here. If you ask for “the best architecture,” you often get an overbuilt one. Ask for constraints that push toward fewer moving parts. For example: use the simplest approach with the fewest files; avoid new abstractions unless they remove duplication in three or more places; prefer explicit code over generic helpers.

A concrete example: you ask AI to add role-based access to an admin page. The clever version introduces a permission framework, decorators, and a config DSL. The simple version checks the user role in one place, gates routes in one place, and logs denied access. The simple version is easier to review, easier to test, and harder to misinterpret.

If you’re building in a chat-based tool like Koder.ai, simplicity also makes snapshots and rollback more valuable. Small, obvious changes are easier to compare, keep, or revert.

Hiring: you need editors and decision-makers, not typists

When code is easy to produce, the scarce skill is choosing what should exist at all and making sure it’s correct. The old “hire great programmers” advice still applies, but the job shifts. You’re not hiring someone to type faster. You’re hiring someone to judge, refine, and defend the product.

The most valuable people in AI-assisted development tend to share four traits: judgment (what matters), taste (what good looks like), debugging skill (finding the real cause), and communication (making tradeoffs clear). They can take an AI-written feature that “mostly works” and turn it into something you can trust.

A better interview: improve an AI change

Instead of asking for a perfect solution from scratch, give candidates an AI-generated pull request (or a pasted diff) with a few realistic problems: unclear naming, a hidden edge case, missing tests, and a small security mistake.

Ask them to explain what the code is trying to do in plain language, find the highest-risk parts, propose fixes, and add (or outline) tests that would catch regressions. If you want a strong signal, also ask how they would change the instructions so the next AI attempt is better.

This reveals how they think under real conditions: imperfect code, limited time, and the need to choose priorities.

The superpower: saying “no”

AI often sounds confident. Good hires are comfortable pushing back. They can say no to a feature that adds complexity, no to a change that weakens security, and no to shipping without proof.

A concrete signal is how they respond to “Would you merge this?” Strong candidates don’t answer with a vibe. They give a decision and a short list of required changes.

Example: you ask for a “quick” access-control update and the AI suggests sprinkling checks across handlers. A strong candidate rejects that approach and proposes one clear authorization layer, plus tests for admin and non-admin paths.

Finally, build shared standards so the team edits AI output in the same way. Keep it simple: one definition of done, consistent review expectations, and a testing baseline.

Specs and planning: clearer prompts start with clearer thinking

Plan first, ship safer

Map inputs, outputs, errors, and non-goals before you generate anything.

Try Planning

When AI can generate a lot of code in minutes, it’s tempting to skip the thinking and just iterate. That works for demos. It breaks down when you need correctness, predictable behavior, and fewer surprises.

A good prompt is usually a short spec in disguise. Before you ask for code, turn the vague goal into a few acceptance criteria and explicit non-goals. This prevents the AI (and your team) from quietly expanding scope.

Keep the spec small but specific. You’re not writing a novel. You’re setting boundaries around:

Inputs: what comes in (fields, formats, edge cases)
Outputs: what must come out (including examples)
Errors: what can go wrong and how to respond
Constraints: performance, privacy, dependencies, or “must not change” areas
Non-goals: what you are not doing in this change

Define “done” before generation, not after. “Done” should be more than “it compiles” or “the UI looks right.” Include test expectations, backward compatibility, and what gets monitored after release.

Example: you want “add password reset.” A clearer spec might say: users request reset by email; links expire in 15 minutes; the same message appears whether the email exists or not; rate limit per IP; log reset attempts without storing tokens in plain text. Non-goal: no redesign of the login page. Now your prompt has guardrails and reviews get simpler.

Keep a lightweight change log of decisions. One paragraph per decision is enough. Note why you chose an approach and why you rejected alternatives. When someone asks “why is it like this?” two weeks later, you’ll have an answer.

A practical AI-assisted workflow you can repeat

The biggest shift with AI is that producing code is easy. The hard part is deciding what the code should do and proving it does it.

Start by writing the goal and constraints in plain language. Include what must never happen, what can be slow, and what is out of scope. A good constraint is testable: “No user should see another user’s data,” or “Totals must match the finance export to the cent.”

Before you ask for code, ask for a simple design and the tradeoffs. You want the AI to show its reasoning in a form you can judge: what it will store, what it will validate, and what it will log. If it proposes something clever, push back and request the simplest version that still meets the constraints.

A repeatable loop looks like this:

Write a short problem statement and 3 to 5 clear pass/fail acceptance checks.
Ask for a minimal plan: data model, key functions, and what could go wrong.
Generate one small change at a time (one endpoint, one UI screen, one migration), not a full app dump.
Review like an editor: read diffs, run tests, try failure cases, then ask for fixes.
Release safely: use a flag or limited rollout, watch logs and metrics, and be ready to roll back.

Here’s a small scenario: you add “refund status” to an order screen. The AI can generate the UI quickly, but correctness lives in edge cases. What if a refund is partial? What if the payment provider retries a webhook? Write those cases first, then implement one slice (database column plus validation) and verify it with tests before moving on.

If you use Koder.ai, features like planning mode, snapshots, and rollback fit naturally into this loop: plan first, generate in slices, and capture a safe restore point for every meaningful change.

Common mistakes when AI makes coding feel too easy

Make testing the guardrail

Draft high-risk test cases for auth, payments, and edge states in minutes.

Generate Tests

When code generation is fast, it’s tempting to treat code as the work product. It isn’t. The work product is behavior: the app does the right thing, even when things go wrong.

1) Believing confident output without proving it

AI often sounds sure, even when it’s guessing. The failure is skipping the boring part: running tests, checking edge cases, and validating real inputs.

A simple habit helps: before you accept a change, ask “How do we know this is correct?” If the answer is “it looks right,” you’re gambling.

2) Letting the tool grow the scope

AI loves to add extras: caching, retries, more settings, more endpoints, a nicer UI. Some of those ideas are good, but they raise risk. Many bugs come from “nice to have” features that nobody asked for.

Keep a hard boundary: solve the problem you set out to solve, then stop. If a suggestion is valuable, capture it as a separate task with its own tests.

3) Merging changes too big to review

A large AI-generated commit can hide a dozen unrelated decisions. Review becomes rubber-stamping because nobody can hold it all in their head.

Treat chat output as a draft. Break it into small changes you can read, run, and revert. Snapshots and rollback are only helpful if you take them at sensible points.

A few simple limits prevent most pain: one feature per change set, one database migration per change set, one high-risk area at a time (auth, payments, data deletion), tests updated in the same change, and a clear “how to verify” note.

4) Copying code with unclear licensing or security risk

AI may reproduce patterns from training data or suggest dependencies you don’t understand. Even when licensing is fine, the bigger risk is security: hard-coded secrets, weak token handling, or unsafe file and query operations.

If you can’t explain what a snippet does, don’t ship it. Ask for a simpler version, or rewrite it yourself.

5) Forgetting migrations, limits, and failure modes

Many “it worked on my machine” bugs are really data and scale bugs. AI can create schema changes without thinking about existing rows, large tables, or downtime.

A realistic example: the model adds a new NOT NULL column to a PostgreSQL table and backfills it in a slow loop. In production, that can lock the table and break the app. Always consider what happens with a million rows, a slow network, or a failed deploy halfway through.

Example: a simple app where correctness matters more than speed

Imagine a small internal request tracker: people submit requests, managers approve or reject, and finance marks items as paid. It sounds simple, and with AI assistance you can generate screens and endpoints quickly. The part that slows you down is the same old truth: the rules, not the typing.

Start by writing down the minimum that must be correct. If you can’t explain it in plain words, you can’t test it.

A tight first-version definition often looks like this: fields (title, requester, department, amount, reason, status, timestamps); roles (requester, approver, finance, admin); statuses (draft, submitted, approved, rejected, paid). Then state the transitions that matter: only an approver can move submitted to approved or rejected; only finance can move approved to paid.

Use AI in a controlled order so you can catch mistakes early:

Define the database schema and status enum first.
Generate endpoints around transitions (submit, approve, reject, pay), not a generic update-anything route.
Generate the UI last, based on what the API allows.

The highest-value tests aren’t “does the page load.” They’re permission checks and state transitions. Prove, for example, that a requester can’t approve their own request, an approver can’t mark something paid, rejected requests can’t be paid, and (if it’s your rule) amounts can’t be edited after submission.

What takes longest is clarifying edge cases. Can an approver change their mind after rejecting? What if two approvers click approve at the same time? What if finance needs to partially pay? AI can generate code for any answer you pick, but it can’t pick the answer for you. Correctness comes from making those calls, then forcing the code to obey them.

Quick checklist before you ship AI-generated changes

Keep rollback ready

Undo a bad change fast when behavior in production surprises you.

Roll Back

AI can produce a lot of code fast, but the last mile is still human work: proving it does what you meant, and failing safely when it doesn’t.

Before you start checking boxes, pick the smallest “done” definition that matters. For a small feature, that might be one happy path, two failure paths, and a quick readability pass. For payments or auth, raise the bar.

The five checks

Every requirement is verifiable. For each requirement, you either have a test or a clear manual check written in one sentence. If you can’t explain how to verify it, you probably don’t understand what you built.
Failure is handled and explained. Try the top error cases (bad input, missing permissions, network failure, empty database). Make sure the app shows a useful message and doesn’t leak sensitive details.
The design stayed simple. If the AI added helpers, abstractions, or clever patterns, ask: would you keep them if you wrote this by hand? Delete extra layers unless they pay rent.
A fresh reviewer can follow it. Assume the reviewer didn’t watch the chat session. The code should read like a story: clear names, short functions, and a short note on why the change exists.
You can undo it safely. Know what “back to normal” looks like. Capture a known good version and confirm you can roll back quickly if production behaves differently.

A quick example

Say AI adds “bulk invite users” to an admin screen. The happy path works, but the real risk is edge cases: duplicate emails, partial failures, and rate limits. A solid ship decision might be one automated test for duplicates, one manual check for partial-failure messaging, and a rollback plan.

Next steps: add guardrails, then scale up your process

When code is cheap, the risk shifts to decision quality: what you asked for, what you accepted, and what you shipped. The fastest way to make these truths pay off in AI-assisted work is to add guardrails that prevent “almost right” changes from slipping through.

Start with a one-page spec for the next feature. Keep it plain: who it’s for, what it should do, what it should not do, and a handful of acceptance tests written in everyday language. Those acceptance tests become your anchor when the AI suggests a tempting shortcut.

A guardrail set that scales without a lot of process overhead:

Keep changes small enough to review in minutes, not hours.
Require tests (or at least test notes) for every behavior change.
Use a standard prompt template: constraints, coding style, and test expectations.
Keep a rollback path you trust for every deploy.
Track unknowns as explicit TODOs, not hidden assumptions.

Prompts are part of your process now. Agree on a house style: what libraries are allowed, how errors are handled, what “done” means, and what tests must pass. If a prompt can’t be reused by another teammate, it’s probably too vague.

If you prefer a chat-first way to build web, backend, and mobile apps, Koder.ai (koder.ai) is one example of a vibe-coding platform where planning mode, snapshots, and source code export can support these guardrails. The tool can speed up drafts, but the discipline is what keeps humans in charge of correctness.

FAQ

What’s the safest way to use AI when it can generate code so fast?

Treat AI output like a fast draft, not a finished feature. Start by writing 3–5 pass/fail acceptance checks, then generate one small slice (one endpoint, one screen, one migration) and verify it with tests and failure-case tries before moving on.

Why does testing still matter so much with AI-generated code?

Because tests are where you discover what the code actually does. AI can produce plausible logic that misses one key rule (permissions, retries, edge states). Tests turn your expectations into something you can run, repeat, and trust.

What should I test first in an AI-assisted project?

Start with what would hurt most:

Core flows (signup, checkout, save, export)
Permissions (view/edit/delete rules)
Data integrity (no duplicates, correct totals, safe migrations)
Failure modes (timeouts, retries, empty states, bad inputs)

Add more coverage after the “high damage” behaviors are locked down.

How do I keep AI from generating overly complex architecture?

Ask for the simplest approach with explicit constraints, then delete extra layers unless they pay rent. A good rule: don’t introduce a new abstraction unless it removes duplication in 3+ places or makes correctness easier to prove.

How do I turn a vague feature request into a good prompt?

Write a short spec: inputs, outputs, errors, constraints, and non-goals. Include concrete examples (sample requests/responses, edge cases). Then define “done” upfront: required tests, backward-compat expectations, and a quick “how to verify” note.

How do I avoid huge AI-generated commits that nobody can review?

Break it up. Keep each change set reviewable in minutes:

One feature per change set
One migration per change set
Tests updated in the same change
A short note: what changed, how to verify, and what could break

This makes review real instead of rubber-stamping.

What are the biggest risks of accepting AI output at face value?

Don’t trust confidence—trust proof. Run tests, try malformed inputs, and verify permission boundaries. Also look for common AI traps: missing auth checks, unsafe query building, weak token handling, and silent error swallowing.

How should I structure APIs and business rules so they’re harder to get wrong?

Prefer explicit transition endpoints over “update anything.” For example: submit, approve, reject, pay instead of a generic update route. Then write tests that enforce who can do each transition and which transitions are forbidden.

How can I interview engineers for AI-assisted development?

Give candidates an AI-generated diff with real problems: unclear naming, a missing test, an edge case, and a small security issue. Ask them to explain intent, find the highest-risk parts, propose fixes, and outline the tests they’d add.

How do snapshots and rollback fit into an AI-assisted workflow?

Use tool features to support a disciplined loop: plan first, generate in small slices, snapshot before risky changes, and roll back if validation fails. In a chat-based platform like Koder.ai, this pairs well with planning mode, snapshots, and rollback—especially when changes touch auth, payments, or migrations.