A practical guide to the performance-first mindset associated with John Carmack: profiling, frame-time budgets, tradeoffs, and shipping complex real-time systems.

John Carmack is often treated like a legend of game engines, but the useful part isn’t the mythology—it’s the repeatable habits. This isn’t about copying one person’s style or assuming “genius moves.” It’s about practical principles that reliably lead to faster, smoother software, especially when deadlines and complexity pile up.
Performance engineering means making software meet a speed target on real hardware, under real conditions—without breaking correctness. It’s not “make it fast at any cost.” It’s a disciplined loop:
That mindset shows up in Carmack’s work again and again: argue with data, keep changes explainable, and prefer approaches you can maintain.
Real-time graphics is unforgiving because it has a deadline every frame. If you miss it, the user feels it immediately as stutter, input lag, or uneven motion. Other software can hide inefficiency behind queues, loading screens, or background work. A renderer can’t negotiate: you either finish in time, or you don’t.
That’s why the lessons generalize beyond games. Any system with tight latency requirements—UI, audio, AR/VR, trading, robotics—benefits from thinking in budgets, understanding bottlenecks, and avoiding surprise spikes.
You’ll get checklists, heuristics, and decision patterns you can apply to your own work: how to set frame-time (or latency) budgets, how to profile before optimizing, how to pick the “one thing” to fix, and how to prevent regressions so performance becomes routine—not a late-stage panic.
Carmack-style performance thinking starts with a simple switch: stop talking about “FPS” as the primary unit and start talking about frame time.
FPS is a reciprocal (“60 FPS” sounds good, “55 FPS” sounds close), but the user experience is driven by how long each frame takes—and, just as importantly, how consistent those times are. A jump from 16.6 ms to 33.3 ms is instantly visible even if your average FPS still looks respectable.
A real-time product has multiple budgets, not just “render faster”:
These budgets interact. Saving GPU time by adding CPU-heavy batching can backfire, and reducing memory can increase streaming or decompression costs.
If your target is 60 FPS, your total budget is 16.6 ms per frame. A rough breakdown might look like:
If either CPU or GPU exceeds the budget, you miss the frame. This is why teams talk about being “CPU-bound” or “GPU-bound”—not as labels, but as a way to decide where the next millisecond can realistically come from.
The point isn’t chasing a vanity metric like “highest FPS on a high-end PC.” The point is defining what fast enough means for your audience—hardware targets, resolution, battery limits, thermals, and input responsiveness—and then treating performance as explicit budgets you can manage and defend.
Carmack’s default move isn’t “optimize,” it’s “verify.” Real-time performance problems are full of plausible stories—GC pauses, “slow shaders,” “too many draw calls”—and most of them are wrong in your build on your hardware. Profiling is how you replace intuition with evidence.
Treat profiling like a first-class feature, not a last-minute rescue tool. Capture frame times, CPU and GPU timelines, and the counts that explain them (triangles, draw calls, state changes, allocations, cache misses if you can get them). The goal is to answer one question: where is the time actually going?
A useful model: in every slow frame, one thing is the limiting factor. Maybe it’s the GPU stuck on a heavy pass, the CPU stuck in animation update, or the main thread stalled on synchronization. Find that constraint first; everything else is noise.
A disciplined loop keeps you from thrashing:
If the improvement isn’t clear, assume it didn’t help—because it probably won’t survive the next content drop.
Performance work is especially vulnerable to self-deception:
Profiling first keeps your effort focused, your tradeoffs justified, and your changes easier to defend in review.
Real-time performance problems feel messy because everything is happening at once: gameplay, rendering, streaming, animation, UI, physics. Carmack’s instinct is to cut through the noise and identify the dominant limiter—the one thing currently setting your frame time.
Most slowdowns fall into a few buckets:
The point isn’t to label it for a report—it’s to pick the right lever.
A few fast experiments can tell you what’s really in control:
You rarely win by shaving 1% off ten systems. Find the biggest cost that repeats every frame and attack that first. Removing a single 4 ms offender beats weeks of micro-optimizations.
After you fix the big rock, the next biggest rock becomes visible. That’s normal. Treat performance work as a loop: measure → change → re-measure → re-prioritize. The goal isn’t a perfect profile; it’s steady progress toward predictable frame time.
Average frame time can look fine while the experience still feels bad. Real-time graphics is judged by the worst moments: the dropped frame during a big explosion, the hitch when entering a new room, the sudden stutter when a menu opens. That’s tail latency—rare-but-not-rare-enough slow frames that users immediately notice.
A game running at 16.6 ms most of the time (60 FPS) but spiking to 60–120 ms every few seconds will feel “broken,” even if the average still prints as 20 ms. Humans are sensitive to rhythm. A single long frame breaks input predictability, camera motion, and audio/visual sync.
Spikes often come from work that isn’t evenly spread:
The goal is to make expensive work predictable:
Don’t just plot an average FPS line. Record per-frame timings and visualize:
If you can’t explain your worst 1% frames, you haven’t really explained performance.
Performance work gets easier the moment you stop pretending you can have everything at once. Carmack’s style pushes teams to name the tradeoff out loud: what are we buying, what are we paying, and who feels the difference?
Most decisions sit on a few axes:
If a change improves one axis but quietly taxes three others, document it. “This adds 0.4 ms GPU and 80 MB VRAM to gain softer shadows” is a usable statement. “It looks better” isn’t.
Real-time graphics isn’t about perfection; it’s about hitting a target consistently. Agree on thresholds like:
Once the team agrees that, say, 16.6 ms at 1080p on the baseline GPU is the goal, arguments become concrete: does this feature keep us under budget, or force a downgrade elsewhere?
When you’re unsure, choose options you can undo:
Reversibility protects the schedule. You can ship the safe path and keep the ambitious one behind a toggle.
Avoid overengineering invisible wins. A 1% average improvement is rarely worth a month of complexity—unless it removes stutter, fixes input latency, or prevents a hard memory crash. Prioritize the changes players notice immediately, and let the rest wait.
Performance work gets dramatically easier when the program is right. A surprising amount of “optimization” time is actually spent chasing correctness bugs that merely look like performance issues: an accidental O(N²) loop from duplicated work, a render pass running twice because a flag didn’t reset, a memory leak that slowly increases frame time, or a race condition that turns into random stutter.
A stable, predictable engine gives you clean measurements. If behavior changes between runs, you can’t trust profiles, and you’ll end up optimizing noise.
Disciplined engineering practices help speed:
Many frame-time spikes are “Heisenbugs”: they disappear when you add logging or step through a debugger. The antidote is deterministic reproduction.
Build a small, controlled test harness:
When a hitch shows up, you want a button that replays it 100 times—not a vague report that it “sometimes happens after 10 minutes.”
Speed work benefits from small, reviewable changes. Large refactors create multiple failure modes at once: regressions, new allocations, and hidden extra work. Tight diffs make it easier to answer the only question that matters: what changed in frame time, and why?
Discipline isn’t bureaucracy here—it’s how you keep measurements trustworthy so optimization becomes straightforward instead of superstitious.
Real-time performance isn’t only about “faster code.” It’s about arranging work so the CPU and GPU can do it efficiently. Carmack repeatedly emphasized a simple truth: the machine is literal. It loves predictable data and hates avoidable overhead.
Modern CPUs are incredibly fast—until they’re waiting on memory. If your data is scattered across lots of little objects, the CPU spends time chasing pointers instead of doing math.
A useful mental model: don’t make ten separate shopping trips for ten items. Put them in one cart and walk the aisles once. In code, that means keeping frequently used values close together (often in arrays or tightly packed structs) so each cache line fetch brings in data you’ll actually use.
Frequent allocations create hidden costs: allocator overhead, memory fragmentation, and unpredictable pauses when the system has to tidy up. Even if each allocation is “small,” a steady stream of them can become a tax you pay every frame.
Common fixes are intentionally boring: reuse buffers, pool objects, and prefer long-lived allocations for hot paths. The goal isn’t cleverness—it’s consistency.
A surprising amount of frame time can disappear into bookkeeping: state changes, draw calls, driver work, syscalls, and thread coordination.
Batching is the “one big cart” version of rendering and simulation. Instead of issuing many tiny operations, group similar work so you cross expensive boundaries fewer times. Often, cutting overhead beats micro-optimizing a shader or inner loop—because the machine spends less time preparing to work and more time actually working.
Performance work isn’t only about faster code—it’s also about having less code. Complexity has a cost you pay every day: bugs take longer to isolate, fixes require more careful testing, iteration slows because every change touches more moving parts, and regressions creep in through rarely used paths.
A “clever” system can look elegant until you’re on a deadline and a frame spike shows up only on one map, one GPU, or one settings combo. Every extra feature flag, fallback path, and special case multiplies the number of behaviors you need to understand and measure. That complexity doesn’t just waste developer time; it often adds runtime overhead (extra branches, allocations, cache misses, synchronization) that’s hard to see until it’s too late.
A good rule: if you can’t explain the performance model to a teammate in a few sentences, you probably can’t reliably optimize it.
Simple solutions have two advantages:
Sometimes the fastest path is removing a feature, cutting an option, or collapsing multiple variants into one. Fewer features means fewer code paths, fewer state combinations, and fewer places for performance to silently degrade.
Deleting code is also a quality move: the best bug is the one you remove by deleting the module that could generate it.
Patch (surgical fix) when:
Refactor (simplify structure) when:
Simplicity is not “less ambitious.” It’s choosing designs that stay understandable under pressure—when performance matters most.
Performance work only sticks if you can tell when it slips. That’s what performance regression testing is: a repeatable way to detect when a new change makes the product slower, less smooth, or heavier on memory.
Unlike functional tests (which answer “does it work?”), regression tests answer “does it still feel the same speed?” A build can be 100% correct and still be a bad release if it adds 4 ms of frame time or doubles loading.
You don’t need a lab to start—just consistency.
Pick a small set of baseline scenes that represent real usage: one GPU-heavy view, one CPU-heavy view, and one “worst case” stress scene. Keep them stable and scripted so the camera path and inputs are identical run to run.
Run tests on fixed hardware (a known PC/console/devkit). If you change drivers, OS, or clock settings, record it. Treat the hardware/software combo like part of the test fixture.
Store results in a versioned history: commit hash, build config, machine ID, and the measured metrics. The goal is not a perfect number—it’s a trustworthy trend line.
Favor metrics that are hard to argue with:
Define simple thresholds (for example: p95 frame time must not regress more than 5%).
Treat regressions like bugs with an owner and a deadline.
First, bisect to find the change that introduced it. If the regression blocks a release, revert quickly and re-land with a fix.
When you fix it, add guardrails: keep the test, add a note in code, and document the expected budget. The habit is the win—performance becomes something you maintain, not something you “do later.”
“Shipping” isn’t a calendar event—it’s an engineering requirement. A system that only runs well in the lab, or only hits frame time after a week of manual tweaking, isn’t done. Carmack’s mindset treats real-world constraints (hardware variety, messy content, unpredictable player behavior) as part of the spec from day one.
When you’re close to release, perfection is less valuable than predictability. Define the non-negotiables in plain terms: target FPS, worst-case frame-time spikes, memory limits, and load times. Then treat anything that violates them as a bug, not “polish.” This reframes performance work from optional optimization into reliability work.
Not all slowdowns matter equally. Fix the top user-visible problems first:
Profiling discipline pays off here: you’re not guessing which issue “seems big,” you’re choosing based on measured impact.
Late-cycle performance work is risky because “fixes” can introduce new costs. Use staged rollouts: land instrumentation first, then the change behind a toggle, then widen exposure. Prefer performance-safe defaults—settings that protect frame time even if they slightly reduce visual quality—especially for auto-detected configurations.
If you ship multiple platforms or tiers, treat defaults as a product decision: it’s better to look a touch less fancy than to feel unstable.
Translate tradeoffs into outcomes: “This effect costs 2 ms every frame on mid-tier GPUs, which risks dropping below 60 FPS during fights.” Offer options, not lectures: reduce resolution, simplify the shader, limit spawn rate, or accept a lower target. Constraints are easier to accept when framed as concrete choices with clear user impact.
You don’t need a new engine or a rewrite to adopt Carmack-style performance thinking. You need a repeatable loop that makes performance visible, testable, and hard to accidentally break.
If you want to operationalize these habits across a team, the key is reducing friction: quick experiments, repeatable harnesses, and easy rollbacks.
Koder.ai can help here when you’re building the surrounding tooling—not the engine itself. Because it’s a vibe-coding platform that generates real, exportable source code (web apps in React; backends in Go with PostgreSQL; mobile in Flutter), you can quickly spin up internal dashboards for frame-time percentiles, regression history, and “performance review” checklists, then iterate via chat as requirements evolve. Snapshots and rollback are also a practical match for the “change one thing, re-measure” loop.
If you want more practical guidance, browse /blog or see how teams operationalize this on /pricing.
Frame time is the time per frame in milliseconds (ms), and it maps directly to how much work the CPU/GPU did.
Pick a target (e.g., 60 FPS) and convert it to a hard deadline (16.6 ms). Then split that deadline into explicit budgets.
Example starting point:
Treat these as product requirements, and adjust based on platform, resolution, thermals, and input-latency goals.
Start by making your tests repeatable, then measure before changing anything.
Only after you know where the time goes should you decide what to optimize.
Run fast, targeted experiments that isolate the limiter:
Avoid rewriting systems until you can name the dominant cost in milliseconds.
Because users feel the worst frames, not the average.
Track:
A build that averages 16.6 ms but spikes to 80 ms will still feel broken.
Make expensive work predictable and scheduled:
Also log spikes so you can reproduce and fix them, not just “hope they go away.”
Make the tradeoff explicit in numbers and user impact.
Use statements like:
Then decide based on agreed thresholds:
Because unstable correctness makes performance data untrustworthy.
Practical steps:
If behavior changes run-to-run, you’ll end up optimizing noise instead of bottlenecks.
Most “fast code” work is really “memory and overhead” work.
Focus on:
Often, cutting overhead produces larger wins than tweaking an inner loop.
Make performance measurable, repeatable, and hard to accidentally break.
Measure: capture a baseline (average, p95, worst spike) for frame time and key subsystems.
Budget: set a per-frame budget for CPU and GPU (and memory if you’re tight). Write the budget down next to the feature goal.
Isolate: reproduce the cost in a minimal scene or test. If you can’t reproduce it, you can’t reliably fix it.
Optimize: change one thing at a time. Prefer changes that reduce work, not just “make it faster.”
Validate: re-profile, compare deltas, and check for quality regressions and correctness issues.
Document: record what changed, why it helped, and what to watch for in the future.
If you’re unsure, prefer reversible decisions (feature flags, scalable quality tiers).
When a regression appears: bisect, assign an owner, and revert quickly if it blocks release.