Joel Spolsky software truths still help when AI can write code fast. Learn how to keep tests, hiring, and simplicity focused on correctness.

AI can produce working-looking code in minutes. That changes the pace of a project, but it doesn’t change what makes software succeed. The lessons in Joel Spolsky’s “software truths” were never really about typing speed. They were about judgment, feedback loops, and avoiding self-inflicted complexity.
What’s changed is the cost of creating code. You can ask for three approaches, five variations, or a full rewrite and get something back instantly. What hasn’t changed is the cost of choosing the right approach, checking it, and living with it for months. Time saved on writing often moves to deciding what you meant, validating edge cases, and making sure today’s quick win doesn’t become tomorrow’s maintenance tax.
Correctness, security, and maintainability still take real time because they rely on proof, not confidence. A login flow isn’t done when it compiles. It’s done when it reliably rejects bad inputs, handles weird states, and doesn’t leak data. AI can sound certain while missing one crucial detail, like a permissions check on an endpoint or a race condition in a payment update.
AI is strongest when you treat it like a fast draft machine. It shines at boilerplate, repetitive patterns, quick refactors, and exploring options you can compare side by side. Used well, it compresses the “blank page” phase.
AI hurts most when you hand it vague goals and accept the output at face value. The same failure patterns show up again and again: hidden assumptions (unstated business rules), untested paths (error handling, retries, empty states), confident mistakes (plausible code that’s subtly wrong), and “clever” solutions that are hard to explain later.
If code is cheap, the new scarce resource is trust. These truths matter because they protect that trust: with users, with teammates, and with your future self.
When AI can generate a feature in minutes, it’s tempting to treat testing as the slow part you need to eliminate. Spolsky’s point still holds: the slow part is where the truth is. Code is easy to produce. Correct behavior is not.
A useful shift is to treat tests as requirements you can run. If you can’t describe the expected behavior in a checkable way, you’re not done thinking. In AI-assisted work, this matters more, not less, because the model can confidently produce something that’s only slightly wrong.
Start testing with the things that would hurt most if they broke. For most products, that’s core flows (signup, checkout, save, export), permissions (who can view, edit, delete), and data integrity (no duplicates, correct totals, safe migrations). Then cover the edges that tend to cause late-night incidents: empty inputs, long text, time zones, retries, and flaky external boundaries like payments, emails, and file uploads.
AI is great at proposing test cases, but it can’t know what you actually promised users. Use it like a brainstorming partner: ask for missing edge cases, abuse scenarios, and permission combinations. Then do the human work: match coverage to your real rules and remove tests that only “test the implementation” instead of the behavior.
Make failures actionable. A failing test should tell you what broke, not send you on a scavenger hunt. Keep tests small, name them like sentences, and make error messages specific.
Say you build a simple “team notes” app with AI help. CRUD screens show up fast. The correctness risk isn’t the UI. It’s access control and data rules: a user must not see another team’s notes, edits must not overwrite newer changes, and deleting a note shouldn’t leave orphaned attachments. Tests that lock these rules down will feel like the bottleneck, but they’re also your safety net.
When testing is the bottleneck, it forces clarity. That clarity is what keeps fast code from turning into fast bugs.
One of the most durable truths is that simple code wins over clever code. AI makes it tempting to accept fancy abstractions because they arrive polished and fast. The cost shows up later: more places for bugs to hide, more files to scan, and more “what is this even doing?” moments.
When code is cheap, complexity is what you pay for. A small, boring design is easier to test, easier to change, and easier to explain. That matters even more when the first draft came from a model that can sound confident while being subtly wrong.
A practical rule is to keep functions, components, and modules small enough that a teammate can review them in minutes, not hours. If a React component needs multiple custom hooks, a local state machine, and a generic “smart renderer” layer, pause and ask whether you’re solving a real problem or just accepting architecture because AI offered it.
A few “simplicity tests” help you push back:
Prompts matter here. If you ask for “the best architecture,” you often get an overbuilt one. Ask for constraints that push toward fewer moving parts. For example: use the simplest approach with the fewest files; avoid new abstractions unless they remove duplication in three or more places; prefer explicit code over generic helpers.
A concrete example: you ask AI to add role-based access to an admin page. The clever version introduces a permission framework, decorators, and a config DSL. The simple version checks the user role in one place, gates routes in one place, and logs denied access. The simple version is easier to review, easier to test, and harder to misinterpret.
If you’re building in a chat-based tool like Koder.ai, simplicity also makes snapshots and rollback more valuable. Small, obvious changes are easier to compare, keep, or revert.
When code is easy to produce, the scarce skill is choosing what should exist at all and making sure it’s correct. The old “hire great programmers” advice still applies, but the job shifts. You’re not hiring someone to type faster. You’re hiring someone to judge, refine, and defend the product.
The most valuable people in AI-assisted development tend to share four traits: judgment (what matters), taste (what good looks like), debugging skill (finding the real cause), and communication (making tradeoffs clear). They can take an AI-written feature that “mostly works” and turn it into something you can trust.
Instead of asking for a perfect solution from scratch, give candidates an AI-generated pull request (or a pasted diff) with a few realistic problems: unclear naming, a hidden edge case, missing tests, and a small security mistake.
Ask them to explain what the code is trying to do in plain language, find the highest-risk parts, propose fixes, and add (or outline) tests that would catch regressions. If you want a strong signal, also ask how they would change the instructions so the next AI attempt is better.
This reveals how they think under real conditions: imperfect code, limited time, and the need to choose priorities.
AI often sounds confident. Good hires are comfortable pushing back. They can say no to a feature that adds complexity, no to a change that weakens security, and no to shipping without proof.
A concrete signal is how they respond to “Would you merge this?” Strong candidates don’t answer with a vibe. They give a decision and a short list of required changes.
Example: you ask for a “quick” access-control update and the AI suggests sprinkling checks across handlers. A strong candidate rejects that approach and proposes one clear authorization layer, plus tests for admin and non-admin paths.
Finally, build shared standards so the team edits AI output in the same way. Keep it simple: one definition of done, consistent review expectations, and a testing baseline.
When AI can generate a lot of code in minutes, it’s tempting to skip the thinking and just iterate. That works for demos. It breaks down when you need correctness, predictable behavior, and fewer surprises.
A good prompt is usually a short spec in disguise. Before you ask for code, turn the vague goal into a few acceptance criteria and explicit non-goals. This prevents the AI (and your team) from quietly expanding scope.
Keep the spec small but specific. You’re not writing a novel. You’re setting boundaries around:
Define “done” before generation, not after. “Done” should be more than “it compiles” or “the UI looks right.” Include test expectations, backward compatibility, and what gets monitored after release.
Example: you want “add password reset.” A clearer spec might say: users request reset by email; links expire in 15 minutes; the same message appears whether the email exists or not; rate limit per IP; log reset attempts without storing tokens in plain text. Non-goal: no redesign of the login page. Now your prompt has guardrails and reviews get simpler.
Keep a lightweight change log of decisions. One paragraph per decision is enough. Note why you chose an approach and why you rejected alternatives. When someone asks “why is it like this?” two weeks later, you’ll have an answer.
The biggest shift with AI is that producing code is easy. The hard part is deciding what the code should do and proving it does it.
Start by writing the goal and constraints in plain language. Include what must never happen, what can be slow, and what is out of scope. A good constraint is testable: “No user should see another user’s data,” or “Totals must match the finance export to the cent.”
Before you ask for code, ask for a simple design and the tradeoffs. You want the AI to show its reasoning in a form you can judge: what it will store, what it will validate, and what it will log. If it proposes something clever, push back and request the simplest version that still meets the constraints.
A repeatable loop looks like this:
Here’s a small scenario: you add “refund status” to an order screen. The AI can generate the UI quickly, but correctness lives in edge cases. What if a refund is partial? What if the payment provider retries a webhook? Write those cases first, then implement one slice (database column plus validation) and verify it with tests before moving on.
If you use Koder.ai, features like planning mode, snapshots, and rollback fit naturally into this loop: plan first, generate in slices, and capture a safe restore point for every meaningful change.
When code generation is fast, it’s tempting to treat code as the work product. It isn’t. The work product is behavior: the app does the right thing, even when things go wrong.
AI often sounds sure, even when it’s guessing. The failure is skipping the boring part: running tests, checking edge cases, and validating real inputs.
A simple habit helps: before you accept a change, ask “How do we know this is correct?” If the answer is “it looks right,” you’re gambling.
AI loves to add extras: caching, retries, more settings, more endpoints, a nicer UI. Some of those ideas are good, but they raise risk. Many bugs come from “nice to have” features that nobody asked for.
Keep a hard boundary: solve the problem you set out to solve, then stop. If a suggestion is valuable, capture it as a separate task with its own tests.
A large AI-generated commit can hide a dozen unrelated decisions. Review becomes rubber-stamping because nobody can hold it all in their head.
Treat chat output as a draft. Break it into small changes you can read, run, and revert. Snapshots and rollback are only helpful if you take them at sensible points.
A few simple limits prevent most pain: one feature per change set, one database migration per change set, one high-risk area at a time (auth, payments, data deletion), tests updated in the same change, and a clear “how to verify” note.
AI may reproduce patterns from training data or suggest dependencies you don’t understand. Even when licensing is fine, the bigger risk is security: hard-coded secrets, weak token handling, or unsafe file and query operations.
If you can’t explain what a snippet does, don’t ship it. Ask for a simpler version, or rewrite it yourself.
Many “it worked on my machine” bugs are really data and scale bugs. AI can create schema changes without thinking about existing rows, large tables, or downtime.
A realistic example: the model adds a new NOT NULL column to a PostgreSQL table and backfills it in a slow loop. In production, that can lock the table and break the app. Always consider what happens with a million rows, a slow network, or a failed deploy halfway through.
Imagine a small internal request tracker: people submit requests, managers approve or reject, and finance marks items as paid. It sounds simple, and with AI assistance you can generate screens and endpoints quickly. The part that slows you down is the same old truth: the rules, not the typing.
Start by writing down the minimum that must be correct. If you can’t explain it in plain words, you can’t test it.
A tight first-version definition often looks like this: fields (title, requester, department, amount, reason, status, timestamps); roles (requester, approver, finance, admin); statuses (draft, submitted, approved, rejected, paid). Then state the transitions that matter: only an approver can move submitted to approved or rejected; only finance can move approved to paid.
Use AI in a controlled order so you can catch mistakes early:
The highest-value tests aren’t “does the page load.” They’re permission checks and state transitions. Prove, for example, that a requester can’t approve their own request, an approver can’t mark something paid, rejected requests can’t be paid, and (if it’s your rule) amounts can’t be edited after submission.
What takes longest is clarifying edge cases. Can an approver change their mind after rejecting? What if two approvers click approve at the same time? What if finance needs to partially pay? AI can generate code for any answer you pick, but it can’t pick the answer for you. Correctness comes from making those calls, then forcing the code to obey them.
AI can produce a lot of code fast, but the last mile is still human work: proving it does what you meant, and failing safely when it doesn’t.
Before you start checking boxes, pick the smallest “done” definition that matters. For a small feature, that might be one happy path, two failure paths, and a quick readability pass. For payments or auth, raise the bar.
Say AI adds “bulk invite users” to an admin screen. The happy path works, but the real risk is edge cases: duplicate emails, partial failures, and rate limits. A solid ship decision might be one automated test for duplicates, one manual check for partial-failure messaging, and a rollback plan.
When code is cheap, the risk shifts to decision quality: what you asked for, what you accepted, and what you shipped. The fastest way to make these truths pay off in AI-assisted work is to add guardrails that prevent “almost right” changes from slipping through.
Start with a one-page spec for the next feature. Keep it plain: who it’s for, what it should do, what it should not do, and a handful of acceptance tests written in everyday language. Those acceptance tests become your anchor when the AI suggests a tempting shortcut.
A guardrail set that scales without a lot of process overhead:
Prompts are part of your process now. Agree on a house style: what libraries are allowed, how errors are handled, what “done” means, and what tests must pass. If a prompt can’t be reused by another teammate, it’s probably too vague.
If you prefer a chat-first way to build web, backend, and mobile apps, Koder.ai (koder.ai) is one example of a vibe-coding platform where planning mode, snapshots, and source code export can support these guardrails. The tool can speed up drafts, but the discipline is what keeps humans in charge of correctness.
Treat AI output like a fast draft, not a finished feature. Start by writing 3–5 pass/fail acceptance checks, then generate one small slice (one endpoint, one screen, one migration) and verify it with tests and failure-case tries before moving on.
Because tests are where you discover what the code actually does. AI can produce plausible logic that misses one key rule (permissions, retries, edge states). Tests turn your expectations into something you can run, repeat, and trust.
Start with what would hurt most:
Add more coverage after the “high damage” behaviors are locked down.
Ask for the simplest approach with explicit constraints, then delete extra layers unless they pay rent. A good rule: don’t introduce a new abstraction unless it removes duplication in 3+ places or makes correctness easier to prove.
Write a short spec: inputs, outputs, errors, constraints, and non-goals. Include concrete examples (sample requests/responses, edge cases). Then define “done” upfront: required tests, backward-compat expectations, and a quick “how to verify” note.
Break it up. Keep each change set reviewable in minutes:
This makes review real instead of rubber-stamping.
Don’t trust confidence—trust proof. Run tests, try malformed inputs, and verify permission boundaries. Also look for common AI traps: missing auth checks, unsafe query building, weak token handling, and silent error swallowing.
Prefer explicit transition endpoints over “update anything.” For example: submit, approve, reject, pay instead of a generic update route. Then write tests that enforce who can do each transition and which transitions are forbidden.
Give candidates an AI-generated diff with real problems: unclear naming, a missing test, an edge case, and a small security issue. Ask them to explain intent, find the highest-risk parts, propose fixes, and outline the tests they’d add.
Use tool features to support a disciplined loop: plan first, generate in small slices, snapshot before risky changes, and roll back if validation fails. In a chat-based platform like Koder.ai, this pairs well with planning mode, snapshots, and rollback—especially when changes touch auth, payments, or migrations.