John Carmack’s Performance Mindset for Real-Time Graphics

Q: Why does the article emphasize frame time (ms) instead of FPS?

Frame time is the time per frame in milliseconds (ms), and it maps directly to how much work the CPU/GPU did. - FPS is reciprocal and can hide variability. - Frame time exposes stutter (e.g., occasional 40–120 ms frames) even when average FPS looks fine. - It’s easier to budget: 16.6 ms = 60 FPS , 33.3 ms = 30 FPS .

Q: How can I quickly tell if I’m CPU-bound or GPU-bound?

Run fast, targeted experiments that isolate the limiter: - Drop resolution : big improvement usually means GPU/pixel-bound . - Toggle features one at a time (shadows, SSR, AO, particles): whichever moves frame time materially is likely your current “big rock.” - Confirm with CPU profiler and a GPU capture . Avoid rewriting systems until you can name the dominant cost in milliseconds.

Q: Why are frame-time spikes (tail latency) more important than average FPS?

Because users feel the worst frames , not the average. Track: - Percentiles (p95/p99/p99.9) to expose tail latency - Histograms to see clusters vs outliers - Event correlation (GC, shader compile, asset load) to attribute spikes A build that averages 16.6 ms but spikes to 80 ms will still feel broken.

Q: What are practical ways to reduce stutter and hitching?

Make expensive work predictable and scheduled: - Precompute (offline shader builds, baked data) - Warm up (compile/create pipelines during loading or a controlled warm-up scene) - Amortize streaming/decompression/uploads across many frames - Cap work per frame (e.g., “streaming gets max 2 ms this frame”) Also log spikes so you can reproduce and fix them, not just “hope they go away.”

John Carmack’s Performance Mindset for Real-Time Graphics | Koder.ai

Why Carmack’s Approach Still Matters

John Carmack is often treated like a legend of game engines, but the useful part isn’t the mythology—it’s the repeatable habits. This isn’t about copying one person’s style or assuming “genius moves.” It’s about practical principles that reliably lead to faster, smoother software, especially when deadlines and complexity pile up.

Performance engineering, in plain terms

Performance engineering means making software meet a speed target on real hardware, under real conditions—without breaking correctness. It’s not “make it fast at any cost.” It’s a disciplined loop:

decide what “fast enough” means
measure where time is actually going
change one thing on purpose
verify you improved the right metric

That mindset shows up in Carmack’s work again and again: argue with data, keep changes explainable, and prefer approaches you can maintain.

Why real-time graphics exposes reality

Real-time graphics is unforgiving because it has a deadline every frame. If you miss it, the user feels it immediately as stutter, input lag, or uneven motion. Other software can hide inefficiency behind queues, loading screens, or background work. A renderer can’t negotiate: you either finish in time, or you don’t.

That’s why the lessons generalize beyond games. Any system with tight latency requirements—UI, audio, AR/VR, trading, robotics—benefits from thinking in budgets, understanding bottlenecks, and avoiding surprise spikes.

What you’ll take away

You’ll get checklists, heuristics, and decision patterns you can apply to your own work: how to set frame-time (or latency) budgets, how to profile before optimizing, how to pick the “one thing” to fix, and how to prevent regressions so performance becomes routine—not a late-stage panic.

Think in Frame-Time Budgets, Not Vibes

Carmack-style performance thinking starts with a simple switch: stop talking about “FPS” as the primary unit and start talking about frame time.

FPS is a reciprocal (“60 FPS” sounds good, “55 FPS” sounds close), but the user experience is driven by how long each frame takes—and, just as importantly, how consistent those times are. A jump from 16.6 ms to 33.3 ms is instantly visible even if your average FPS still looks respectable.

Frame time vs. FPS (why frame time wins)

FPS hides variability. Two builds can both “average 60 FPS,” but one may stutter due to occasional 40–60 ms frames.
Frame time maps to work. Every millisecond is a real slice of CPU/GPU work you can attribute to systems.
Targets are clearer. “Stay under 16.6 ms” is a concrete requirement; “feel smooth” is not.

Budgets: what you’re really spending

A real-time product has multiple budgets, not just “render faster”:

CPU time (game logic, animation, culling, draw call submission)
GPU time (shading, post-processing, overdraw, resolution)
Memory (footprint, spikes, fragmentation, streaming headroom)
Loading time (boot, level loads, shader compilation, streaming stalls)

These budgets interact. Saving GPU time by adding CPU-heavy batching can backfire, and reducing memory can increase streaming or decompression costs.

Example: 16.6 ms at 60 FPS

If your target is 60 FPS, your total budget is 16.6 ms per frame. A rough breakdown might look like:

CPU: 7 ms (simulation, gameplay, visibility)
GPU: 9 ms (render + post)
OS/driver + overhead buffer: ~0.6 ms

If either CPU or GPU exceeds the budget, you miss the frame. This is why teams talk about being “CPU-bound” or “GPU-bound”—not as labels, but as a way to decide where the next millisecond can realistically come from.

“Fast enough” is a product requirement

The point isn’t chasing a vanity metric like “highest FPS on a high-end PC.” The point is defining what fast enough means for your audience—hardware targets, resolution, battery limits, thermals, and input responsiveness—and then treating performance as explicit budgets you can manage and defend.

Profiling First: Measure, Then Decide

Carmack’s default move isn’t “optimize,” it’s “verify.” Real-time performance problems are full of plausible stories—GC pauses, “slow shaders,” “too many draw calls”—and most of them are wrong in your build on your hardware. Profiling is how you replace intuition with evidence.

Start with measurement (before guessing)

Treat profiling like a first-class feature, not a last-minute rescue tool. Capture frame times, CPU and GPU timelines, and the counts that explain them (triangles, draw calls, state changes, allocations, cache misses if you can get them). The goal is to answer one question: where is the time actually going?

A useful model: in every slow frame, one thing is the limiting factor. Maybe it’s the GPU stuck on a heavy pass, the CPU stuck in animation update, or the main thread stalled on synchronization. Find that constraint first; everything else is noise.

Iterate like a scientist

A disciplined loop keeps you from thrashing:

Measure a baseline with a repeatable scene and camera path
Change one thing
Re-measure, and write down the delta

If the improvement isn’t clear, assume it didn’t help—because it probably won’t survive the next content drop.

Beware placebo optimizations

Performance work is especially vulnerable to self-deception:

Benchmarking mistakes: inconsistent test scenes, debug builds, background tasks, thermal throttling, vsync differences
Confirmation bias: “it feels faster” without frame-time data
Misleading averages: a better mean frame time can hide worse spikes

Profiling first keeps your effort focused, your tradeoffs justified, and your changes easier to defend in review.

Bottlenecks: Find the One Thing That’s Actually Slow

Real-time performance problems feel messy because everything is happening at once: gameplay, rendering, streaming, animation, UI, physics. Carmack’s instinct is to cut through the noise and identify the dominant limiter—the one thing currently setting your frame time.

Common bottleneck categories

Most slowdowns fall into a few buckets:

CPU-bound: the main thread (or a critical worker) can’t finish its work in time—game logic, draw-call submission, physics, animation evaluation.
GPU-bound: the GPU can’t finish the frame—heavy shaders, too many pixels, expensive post-processing, complex geometry.
Memory-bound: you’re limited by bandwidth/latency—cache misses, poor data layout, lots of random access, copying large buffers.
I/O-bound: asset streaming, shader compilation, decompression, file reads, network waits.

The point isn’t to label it for a report—it’s to pick the right lever.

Quick ways to diagnose (before you rewrite anything)

A few fast experiments can tell you what’s really in control:

Resolution scaling test: drop render resolution (or force dynamic resolution). If frame time improves a lot, you’re likely GPU/pixel limited. If it barely moves, look at CPU or non-pixel GPU work.
Feature toggles: turn off shadows, SSR, AO, particles, or expensive passes one at a time. A meaningful change reveals where the time is going.
Instrumentation and captures: use built-in timers, a CPU profiler, and a GPU capture to see where the milliseconds actually land.

The “one big rock” principle

You rarely win by shaving 1% off ten systems. Find the biggest cost that repeats every frame and attack that first. Removing a single 4 ms offender beats weeks of micro-optimizations.

Bottlenecks move

After you fix the big rock, the next biggest rock becomes visible. That’s normal. Treat performance work as a loop: measure → change → re-measure → re-prioritize. The goal isn’t a perfect profile; it’s steady progress toward predictable frame time.

Smoothness Wins: Spikes, Stutter, and Tail Latency

Average frame time can look fine while the experience still feels bad. Real-time graphics is judged by the worst moments: the dropped frame during a big explosion, the hitch when entering a new room, the sudden stutter when a menu opens. That’s tail latency—rare-but-not-rare-enough slow frames that users immediately notice.

Why tails matter more than averages

A game running at 16.6 ms most of the time (60 FPS) but spiking to 60–120 ms every few seconds will feel “broken,” even if the average still prints as 20 ms. Humans are sensitive to rhythm. A single long frame breaks input predictability, camera motion, and audio/visual sync.

Common sources of spikes

Spikes often come from work that isn’t evenly spread:

Garbage collection or memory page faults that pause the world
Shader compilation and pipeline creation triggered “just in time”
Asset streaming that suddenly needs decompression, uploads, or file I/O
OS scheduling and background work stealing CPU time (or frequency/thermal changes)

Strategies to reduce stutter

The goal is to make expensive work predictable:

Precompute what you can: build shaders offline, bake data, prepare lookup tables.
Warm up early: compile shaders, create pipelines, touch critical assets during loading screens or a controlled warm-up scene.
Amortize expensive tasks: spread streaming, decompression, and uploads across many frames instead of one.
Cap work per frame: enforce time budgets (e.g., “no more than 2 ms for streaming this frame”), and defer the rest.

Log and visualize the tail

Don’t just plot an average FPS line. Record per-frame timings and visualize:

Histograms of frame time to see clustering and outliers
Percentiles (p95, p99, p99.9) to track the tail explicitly
Spike markers with correlated events (GC start, shader compile, asset load)

If you can’t explain your worst 1% frames, you haven’t really explained performance.

Make Tradeoffs Explicit (Quality vs Speed vs Complexity)

Build a Performance Checklist App

Turn your frame-time budget checklist into a simple internal web app in hours, not weeks.

Start Free

Performance work gets easier the moment you stop pretending you can have everything at once. Carmack’s style pushes teams to name the tradeoff out loud: what are we buying, what are we paying, and who feels the difference?

Name the axes (and the real cost)

Most decisions sit on a few axes:

Quality: visual fidelity, simulation accuracy, input feel
Speed: frame time, load time, compile time, iteration time
Memory: VRAM, RAM, bandwidth
Complexity: harder debugging, more edge cases, more test burden
Time-to-ship: schedule risk, integration risk, team focus

If a change improves one axis but quietly taxes three others, document it. “This adds 0.4 ms GPU and 80 MB VRAM to gain softer shadows” is a usable statement. “It looks better” isn’t.

Define “good enough” thresholds

Real-time graphics isn’t about perfection; it’s about hitting a target consistently. Agree on thresholds like:

minimum FPS / maximum frame time on a reference machine
acceptable worst-case spikes (not just average)
memory ceilings per platform

Once the team agrees that, say, 16.6 ms at 1080p on the baseline GPU is the goal, arguments become concrete: does this feature keep us under budget, or force a downgrade elsewhere?

Prefer reversible decisions

When you’re unsure, choose options you can undo:

feature flags for risky effects
scalable settings (low/medium/high) that map to real costs
fallback paths for older hardware

Reversibility protects the schedule. You can ship the safe path and keep the ambitious one behind a toggle.

Optimize what users can feel

Avoid overengineering invisible wins. A 1% average improvement is rarely worth a month of complexity—unless it removes stutter, fixes input latency, or prevents a hard memory crash. Prioritize the changes players notice immediately, and let the rest wait.

Engineering Discipline: Correctness Enables Speed

Performance work gets dramatically easier when the program is right. A surprising amount of “optimization” time is actually spent chasing correctness bugs that merely look like performance issues: an accidental O(N²) loop from duplicated work, a render pass running twice because a flag didn’t reset, a memory leak that slowly increases frame time, or a race condition that turns into random stutter.

Treat correctness as a performance tool

A stable, predictable engine gives you clean measurements. If behavior changes between runs, you can’t trust profiles, and you’ll end up optimizing noise.

Disciplined engineering practices help speed:

Clear invariants: define what must always be true (e.g., “each visible object is submitted once,” “GPU resources are not mutated while in-flight,” “frame graph has no cycles”).
Validation in debug builds: add assertions and lightweight checks that scream early—before a broken state turns into mysterious hitching. Validate buffer sizes, state transitions, and that per-frame allocations stay under a known limit.

Make performance bugs reproducible on demand

Many frame-time spikes are “Heisenbugs”: they disappear when you add logging or step through a debugger. The antidote is deterministic reproduction.

Build a small, controlled test harness:

Minimal test scenes that isolate a feature (shadowing, particles, UI, streaming)
Fixed camera paths and scripted input so every run is comparable
Locked settings (resolution, quality level, fixed time step when possible) to remove variables

When a hitch shows up, you want a button that replays it 100 times—not a vague report that it “sometimes happens after 10 minutes.”

Change less, learn more

Speed work benefits from small, reviewable changes. Large refactors create multiple failure modes at once: regressions, new allocations, and hidden extra work. Tight diffs make it easier to answer the only question that matters: what changed in frame time, and why?

Discipline isn’t bureaucracy here—it’s how you keep measurements trustworthy so optimization becomes straightforward instead of superstitious.

Work With the Machine: Data, Cache, and Overhead

Turn Learnings Into Credits

Get credits by sharing what you built with Koder.ai or inviting teammates to join.

Earn Credits

Real-time performance isn’t only about “faster code.” It’s about arranging work so the CPU and GPU can do it efficiently. Carmack repeatedly emphasized a simple truth: the machine is literal. It loves predictable data and hates avoidable overhead.

Data-oriented thinking: make memory easy to read

Modern CPUs are incredibly fast—until they’re waiting on memory. If your data is scattered across lots of little objects, the CPU spends time chasing pointers instead of doing math.

A useful mental model: don’t make ten separate shopping trips for ten items. Put them in one cart and walk the aisles once. In code, that means keeping frequently used values close together (often in arrays or tightly packed structs) so each cache line fetch brings in data you’ll actually use.

Allocation patterns: small churn becomes big pain

Frequent allocations create hidden costs: allocator overhead, memory fragmentation, and unpredictable pauses when the system has to tidy up. Even if each allocation is “small,” a steady stream of them can become a tax you pay every frame.

Common fixes are intentionally boring: reuse buffers, pool objects, and prefer long-lived allocations for hot paths. The goal isn’t cleverness—it’s consistency.

Batching: reduce overhead before optimizing math

A surprising amount of frame time can disappear into bookkeeping: state changes, draw calls, driver work, syscalls, and thread coordination.

Batching is the “one big cart” version of rendering and simulation. Instead of issuing many tiny operations, group similar work so you cross expensive boundaries fewer times. Often, cutting overhead beats micro-optimizing a shader or inner loop—because the machine spends less time preparing to work and more time actually working.

Simplicity as a Performance Strategy

Performance work isn’t only about faster code—it’s also about having less code. Complexity has a cost you pay every day: bugs take longer to isolate, fixes require more careful testing, iteration slows because every change touches more moving parts, and regressions creep in through rarely used paths.

The hidden tax of complexity

A “clever” system can look elegant until you’re on a deadline and a frame spike shows up only on one map, one GPU, or one settings combo. Every extra feature flag, fallback path, and special case multiplies the number of behaviors you need to understand and measure. That complexity doesn’t just waste developer time; it often adds runtime overhead (extra branches, allocations, cache misses, synchronization) that’s hard to see until it’s too late.

Prefer solutions you can explain

A good rule: if you can’t explain the performance model to a teammate in a few sentences, you probably can’t reliably optimize it.

Simple solutions have two advantages:

They’re easier to profile and reason about (fewer variables)
They reduce “unknown unknowns,” where a minor tweak causes unexpected slowdowns

“Delete code” is a real optimization tool

Sometimes the fastest path is removing a feature, cutting an option, or collapsing multiple variants into one. Fewer features means fewer code paths, fewer state combinations, and fewer places for performance to silently degrade.

Deleting code is also a quality move: the best bug is the one you remove by deleting the module that could generate it.

Refactor or patch? A quick decision checklist

Patch (surgical fix) when:

you’ve identified a specific hot path and a small change measurably improves it
the system is stable and widely used; changing architecture risks new regressions
you need a safe improvement that fits the current release timeline

Refactor (simplify structure) when:

profiling points to overhead spread across many call sites or layers
you routinely re-break performance in the same area after unrelated changes
the code requires tribal knowledge to modify safely
you can delete or merge paths and end up with fewer concepts overall

Simplicity is not “less ambitious.” It’s choosing designs that stay understandable under pressure—when performance matters most.

Prevent Regressions: Make Performance a Habit

Performance work only sticks if you can tell when it slips. That’s what performance regression testing is: a repeatable way to detect when a new change makes the product slower, less smooth, or heavier on memory.

Unlike functional tests (which answer “does it work?”), regression tests answer “does it still feel the same speed?” A build can be 100% correct and still be a bad release if it adds 4 ms of frame time or doubles loading.

A lightweight workflow that actually gets used

You don’t need a lab to start—just consistency.

Pick a small set of baseline scenes that represent real usage: one GPU-heavy view, one CPU-heavy view, and one “worst case” stress scene. Keep them stable and scripted so the camera path and inputs are identical run to run.

Run tests on fixed hardware (a known PC/console/devkit). If you change drivers, OS, or clock settings, record it. Treat the hardware/software combo like part of the test fixture.

Store results in a versioned history: commit hash, build config, machine ID, and the measured metrics. The goal is not a perfect number—it’s a trustworthy trend line.

CI-friendly metrics to track

Favor metrics that are hard to argue with:

Frame time percentiles (p50/p95/p99), not just average FPS. Percentiles surface stutter and long-tail hitches.
Peak memory (and allocation spikes). Memory creep often shows up before crashes.
Loading time (cold start and level/scene transitions), because players notice seconds more than micro-optimizations.

Define simple thresholds (for example: p95 frame time must not regress more than 5%).

What to do when you catch a regression

Treat regressions like bugs with an owner and a deadline.

First, bisect to find the change that introduced it. If the regression blocks a release, revert quickly and re-land with a fix.

When you fix it, add guardrails: keep the test, add a note in code, and document the expected budget. The habit is the win—performance becomes something you maintain, not something you “do later.”

Ship Complex Systems: Performance, Deadlines, and Reality

Own the Tooling You Build

Keep full control by exporting the source when your tooling needs to live in your stack.

Export Code

“Shipping” isn’t a calendar event—it’s an engineering requirement. A system that only runs well in the lab, or only hits frame time after a week of manual tweaking, isn’t done. Carmack’s mindset treats real-world constraints (hardware variety, messy content, unpredictable player behavior) as part of the spec from day one.

Shipping means choosing what must be true

When you’re close to release, perfection is less valuable than predictability. Define the non-negotiables in plain terms: target FPS, worst-case frame-time spikes, memory limits, and load times. Then treat anything that violates them as a bug, not “polish.” This reframes performance work from optional optimization into reliability work.

Prioritize what players actually feel

Not all slowdowns matter equally. Fix the top user-visible problems first:

Stutter and long spikes usually beat steady-but-slightly-slower rendering in perceived quality.
Menu hitching, streaming pops, and input lag often harm experience more than a small drop in average FPS.
Regressions in common scenarios (busy combat, camera turns, effect-heavy moments) deserve priority over rare corner cases.

Profiling discipline pays off here: you’re not guessing which issue “seems big,” you’re choosing based on measured impact.

Stage changes and default to safety

Late-cycle performance work is risky because “fixes” can introduce new costs. Use staged rollouts: land instrumentation first, then the change behind a toggle, then widen exposure. Prefer performance-safe defaults—settings that protect frame time even if they slightly reduce visual quality—especially for auto-detected configurations.

If you ship multiple platforms or tiers, treat defaults as a product decision: it’s better to look a touch less fancy than to feel unstable.

Communicate constraints to non-technical stakeholders

Translate tradeoffs into outcomes: “This effect costs 2 ms every frame on mid-tier GPUs, which risks dropping below 60 FPS during fights.” Offer options, not lectures: reduce resolution, simplify the shader, limit spawn rate, or accept a lower target. Constraints are easier to accept when framed as concrete choices with clear user impact.

A Practical Checklist to Apply the Mindset Today

You don’t need a new engine or a rewrite to adopt Carmack-style performance thinking. You need a repeatable loop that makes performance visible, testable, and hard to accidentally break.

The repeatable loop (measure → budget → isolate → optimize → validate → document)

Measure: capture a baseline (average, p95, worst spike) for frame time and key subsystems.
Budget: set a per-frame budget for CPU and GPU (and memory if you’re tight). Write the budget down next to the feature goal.
Isolate: reproduce the cost in a minimal scene or test. If you can’t reproduce it, you can’t reliably fix it.
Optimize: change one thing at a time. Prefer changes that reduce work, not just “make it faster.”
Validate: re-profile, compare deltas, and check for quality regressions and correctness issues.
Document: record what changed, why it helped, and what to watch for in the future.

Rules of thumb you can apply immediately

Optimize the biggest bar, not the most annoying guess.
Chase spikes before averages if users feel stutter.
If you can’t explain the cost, you don’t own the feature yet.
Prefer predictable costs over rare worst-case explosions.
Budget new work upfront (CPU ms, GPU ms, memory, bandwidth).
Avoid hidden per-object/per-frame loops that scale with content.
Make performance tests part of “done,” not a pre-release scramble.

A simple “performance review” template (before merge)

Feature summary: what changed, what it enables
Target platforms & settings: (e.g., console perf mode, mid-tier PC)
Budget: CPU __ ms, GPU __ ms, memory __ MB
Baseline vs after: avg / ms, p95 / ms, worst spike / ms
Bottleneck assumption: CPU or GPU? evidence:
Test scene & steps to reproduce:
Risks & guardrails: what might regress, what metrics alert
Rollback plan: how to disable or degrade gracefully

Where Koder.ai fits in this workflow

If you want to operationalize these habits across a team, the key is reducing friction: quick experiments, repeatable harnesses, and easy rollbacks.

Koder.ai can help here when you’re building the surrounding tooling—not the engine itself. Because it’s a vibe-coding platform that generates real, exportable source code (web apps in React; backends in Go with PostgreSQL; mobile in Flutter), you can quickly spin up internal dashboards for frame-time percentiles, regression history, and “performance review” checklists, then iterate via chat as requirements evolve. Snapshots and rollback are also a practical match for the “change one thing, re-measure” loop.

If you want more practical guidance, browse /blog or see how teams operationalize this on /pricing.

FAQ

Why does the article emphasize frame time (ms) instead of FPS?

Frame time is the time per frame in milliseconds (ms), and it maps directly to how much work the CPU/GPU did.

FPS is reciprocal and can hide variability.
Frame time exposes stutter (e.g., occasional 40–120 ms frames) even when average FPS looks fine.
It’s easier to budget: 16.6 ms = 60 FPS, 33.3 ms = 30 FPS.

How do I set a practical frame-time budget for my project?

Pick a target (e.g., 60 FPS) and convert it to a hard deadline (16.6 ms). Then split that deadline into explicit budgets.

Example starting point:

CPU: ~7 ms
GPU: ~9 ms
Overhead buffer: ~0.6 ms

Treat these as product requirements, and adjust based on platform, resolution, thermals, and input-latency goals.

What’s the minimum profiling setup I should have before optimizing?

Start by making your tests repeatable, then measure before changing anything.

Use a fixed scene + fixed camera path
Capture CPU timeline + GPU timeline
Record supporting counts (draw calls, triangles, allocations, streaming events)

Only after you know where the time goes should you decide what to optimize.

How can I quickly tell if I’m CPU-bound or GPU-bound?

Run fast, targeted experiments that isolate the limiter:

Drop resolution: big improvement usually means GPU/pixel-bound.
Toggle features one at a time (shadows, SSR, AO, particles): whichever moves frame time materially is likely your current “big rock.”
Confirm with CPU profiler and a GPU capture.

Avoid rewriting systems until you can name the dominant cost in milliseconds.

Why are frame-time spikes (tail latency) more important than average FPS?

Because users feel the worst frames, not the average.

Track:

Percentiles (p95/p99/p99.9) to expose tail latency
Histograms to see clusters vs outliers
Event correlation (GC, shader compile, asset load) to attribute spikes

A build that averages 16.6 ms but spikes to 80 ms will still feel broken.

What are practical ways to reduce stutter and hitching?

Make expensive work predictable and scheduled:

Precompute (offline shader builds, baked data)
Warm up (compile/create pipelines during loading or a controlled warm-up scene)
Amortize streaming/decompression/uploads across many frames
Cap work per frame (e.g., “streaming gets max 2 ms this frame”)

Also log spikes so you can reproduce and fix them, not just “hope they go away.”

How do I decide between visual quality, performance, and complexity?

Make the tradeoff explicit in numbers and user impact.

Use statements like:

“This adds 0.4 ms GPU and 80 MB VRAM to improve shadow softness.”

Then decide based on agreed thresholds:

max frame time on reference hardware

Why does correctness matter so much for performance work?

Because unstable correctness makes performance data untrustworthy.

Practical steps:

Define invariants (e.g., “each visible object is submitted once”).
Add debug validation (assert allocation limits, validate state transitions).
Build deterministic repro harnesses (minimal scenes, scripted inputs).

If behavior changes run-to-run, you’ll end up optimizing noise instead of bottlenecks.

What does “work with the machine” mean in practice (cache, data, batching)?

Most “fast code” work is really “memory and overhead” work.

Focus on:

Data locality: keep hot data contiguous to reduce cache misses.
Allocation control: reuse buffers, pool objects, avoid per-frame churn.
Batching: reduce draw calls/state changes/sync points before micro-optimizing math.

Often, cutting overhead produces larger wins than tweaking an inner loop.

How do I prevent performance regressions as the project evolves?

Make performance measurable, repeatable, and hard to accidentally break.

Keep a small set of baseline scenes (CPU-heavy, GPU-heavy, worst-case).
Run on fixed hardware/config and store results with commit hashes.