Go + Postgres performance tuning playbook for AI-generated APIs: pool connections, read query plans, index smartly, paginate safely, and shape JSON fast.

AI-generated APIs can feel fast in early testing. You hit an endpoint a few times, the dataset is small, and requests come in one at a time. Then real traffic shows up: mixed endpoints, bursty load, colder caches, and more rows than you expected. The same code can start to feel randomly slow even though nothing actually broke.
Slow usually shows up in a few ways: latency spikes (most requests are fine, some take 5x to 50x longer), timeouts (a small percent fail), or CPU running hot (Postgres CPU from query work, or Go CPU from JSON, goroutines, logging, and retries).
A common scenario is a list endpoint with a flexible search filter that returns a large JSON response. In a test database, it scans a few thousand rows and finishes quickly. In production, it scans a few million rows, sorts them, and only then applies a LIMIT. The API still "works", but p95 latency explodes and a few requests time out during bursts.
To separate database slowness from app slowness, keep the mental model simple.
If the database is slow, your Go handler spends most of its time waiting for the query. You may also see many requests stuck "in flight" while Go CPU looks normal.
If the app is slow, the query finishes quickly, but time is lost after the query: building large response objects, marshaling JSON, running extra queries per row, or doing too much work per request. Go CPU rises, memory rises, and latency grows with response size.
"Good enough" performance before launch is not perfection. For many CRUD endpoints, aim for stable p95 latency (not just the average), predictable behavior under bursts, and no timeouts at your expected peak. The goal is straightforward: no surprise slow requests when data and traffic grow, and clear signals when something drifts.
Before you tune anything, decide what "good" means for your API. Without a baseline, it's easy to spend hours changing settings and still not know whether you improved things or just moved the bottleneck.
Three numbers usually tell you most of the story:
p95 is the "bad day" metric. If p95 is high but average is fine, a small set of requests is doing too much work, getting blocked on locks, or triggering slow plans.
Make slow queries visible early. In Postgres, enable slow query logging with a low threshold for pre-launch testing (for example, 100-200 ms), and log the full statement so you can copy it into a SQL client. Keep this temporary. Logging every slow query in production gets noisy fast.
Next, test with real-looking requests, not just a single "hello world" route. A small set is enough if it matches what users will actually do: a list call with filters and sorting, a detail page with a couple of joins, a create or update with validation, and a search-style query with partial matches.
If you're generating endpoints from a spec (for example, with a vibe-coding tool like Koder.ai), run the same handful of requests repeatedly with consistent inputs. That makes changes like indexes, pagination tweaks, and query rewrites easier to measure.
Finally, pick a target you can say out loud. Example: "Most requests stay under 200 ms p95 at 50 concurrent users, and errors stay under 0.5%." The exact numbers depend on your product, but a clear target prevents endless tinkering.
A connection pool keeps a limited number of open database connections and reuses them. Without a pool, each request may open a new connection, and Postgres wastes time and memory managing sessions instead of running queries.
The goal is to keep Postgres busy doing useful work, not context-switching between too many connections. This is often the first meaningful win, especially for AI-generated APIs that can quietly turn into chatty endpoints.
In Go, you usually tune max open connections, max idle connections, and connection lifetime. A safe starting point for many small APIs is a small multiple of your CPU cores (often 5 to 20 total connections), with a similar number kept idle, and recycling connections periodically (for example, every 30 to 60 minutes).
If you're running multiple API instances, remember the pool multiplies. A pool of 20 connections across 10 instances is 200 connections hitting Postgres, which is how teams unexpectedly run into connection limits.
Pool problems feel different from slow SQL.
If the pool is too small, requests wait before they even reach Postgres. Latency spikes, but database CPU and query times may look fine.
If the pool is too big, Postgres looks overloaded: lots of active sessions, memory pressure, and uneven latency across endpoints.
A fast way to separate the two is to time your DB calls in two parts: time spent waiting for a connection vs time spent executing the query. If most time is "waiting", the pool is the bottleneck. If most time is "in query", focus on SQL and indexes.
Useful quick checks:
max_connections.If you use pgxpool, you get a Postgres-first pool with clear stats and good defaults for Postgres behavior. If you use database/sql, you get a standard interface that works across databases, but you need to be explicit about pool settings and driver behavior.
A practical rule: if you're all-in on Postgres and want direct control, pgxpool is often simpler. If you rely on libraries that expect database/sql, stick with it, set the pool explicitly, and measure waits.
Example: an endpoint that lists orders might run in 20 ms, but under 100 concurrent users it jumps to 2 s. If logs show 1.9 s waiting for a connection, query tuning won't help until the pool and total Postgres connections are sized correctly.
When an endpoint feels slow, check what Postgres is actually doing. A quick read of EXPLAIN often points to the fix in minutes.
Run this on the exact SQL your API sends:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE user_id = $1 AND status = $2
ORDER BY created_at DESC
LIMIT 50;
A few lines matter most. Look at the top node (what Postgres chose) and the totals at the bottom (how long it took). Then compare estimated vs actual rows. Big gaps usually mean the planner guessed wrong.
If you see Index Scan or Index Only Scan, Postgres is using an index, which is usually good. Bitmap Heap Scan can be fine for medium-sized matches. Seq Scan means it read the whole table, which is only OK when the table is small or nearly every row matches.
Common red flags:
ORDER BY)Slow plans usually come from a handful of patterns:
WHERE + ORDER BY pattern (for example, (user_id, status, created_at))WHERE (for example, WHERE lower(email) = $1), which can force scans unless you add a matching expression indexIf the plan looks odd and estimates are way off, stats are often stale. Run ANALYZE (or let autovacuum catch up) so Postgres learns current row counts and value distributions. This matters after large imports or when new endpoints start writing a lot of data quickly.
Indexes help only when they match how your API queries data. If you build them from guesses, you get slower writes, larger storage, and little to no speedup.
A practical way to think about it: an index is a shortcut for a specific question. If your API asks a different question, Postgres ignores the shortcut.
If an endpoint filters by account_id and sorts by created_at DESC, a single composite index often beats two separate indexes. It helps Postgres find the right rows and return them in the right order with less work.
Rules of thumb that usually hold up:
Example: if your API has GET /orders?status=paid and it always shows newest first, an index like (status, created_at DESC) is a good fit. If most queries also filter by customer, (customer_id, status, created_at) can be better, but only if that's how the endpoint actually runs in production.
If most traffic hits a narrow slice of rows, a partial index can be cheaper and faster. For example, if your app mostly reads active records, indexing only WHERE active = true keeps the index smaller and more likely to stay in memory.
To confirm an index helps, do quick checks:
EXPLAIN (or EXPLAIN ANALYZE in a safe environment) and look for an index scan that matches your query.Remove unused indexes carefully. Check usage stats (for example, whether an index has been scanned). Drop one at a time during low-risk windows, and keep a rollback plan. Unused indexes aren't harmless. They slow inserts and updates on every write.
Pagination is often where a fast API starts feeling slow, even when the database is healthy. Treat pagination as a query design problem, not a UI detail.
LIMIT/OFFSET looks simple, but deeper pages usually cost more. Postgres still has to walk past (and often sort) the rows you're skipping. Page 1 might touch a few dozen rows. Page 500 might force the database to scan and discard tens of thousands just to return 20 results.
It can also create unstable results when rows are inserted or deleted between requests. Users may see duplicates or miss items because the meaning of "row 10,000" changes as the table changes.
Keyset pagination asks a different question: "Give me the next 20 rows after the last row I saw." That keeps the database working on a small, consistent slice.
A simple version uses an increasing id:
SELECT id, created_at, title
FROM posts
WHERE id > $1
ORDER BY id
LIMIT 20;
Your API returns a next_cursor equal to the last id in the page. The next request uses that value as $1.
For time-based sorting, use a stable order and break ties. created_at alone is not enough if two rows share the same timestamp. Use a compound cursor:
WHERE (created_at, id) < ($1, $2)
ORDER BY created_at DESC, id DESC
LIMIT 20;
A few rules prevent duplicates and missing rows:
ORDER BY (usually id).created_at and id together).A surprisingly common reason an API feels slow isn't the database. It's the response. Large JSON takes longer to build, longer to send, and longer for clients to parse. The fastest win is often returning less.
Start with your SELECT. If an endpoint needs only id, name, and status, ask for those columns and nothing else. SELECT * quietly gets heavier over time as tables gain long text, JSON blobs, and audit columns.
Another frequent slowdown is N+1 response building: you fetch a list of 50 items, then run 50 more queries to attach related data. It might pass tests, then collapses under real traffic. Prefer a single query that returns what you need (careful joins), or two queries where the second batches by IDs.
A few ways to keep payloads smaller without breaking clients:
include= flag (or a fields= mask) so list responses stay lean and detail responses opt into extras.Both can be fast. Choose based on what you're optimizing for.
Postgres JSON functions (jsonb_build_object, json_agg) are useful when you want fewer round trips and predictable shapes from one query. Shaping in Go is useful when you need conditional logic, reuse structs, or keep SQL easier to maintain. If your JSON-building SQL becomes hard to read, it becomes hard to tune.
A good rule is: let Postgres filter, sort, and aggregate. Then let Go handle final presentation.
If you're generating APIs quickly (for example with Koder.ai), adding include flags early helps avoid endpoints that bloat over time. It also gives you a safe way to add fields without making every response heavier.
You don't need a huge test lab to catch most performance issues. A short, repeatable pass surfaces the problems that turn into outages once traffic shows up, especially when the starting point is generated code that you plan to ship.
Before changing anything, write down a small baseline:
Start small, change one thing at a time, and re-test after each change.
Run a 10 to 15 minute load test that looks like real usage. Hit the same endpoints your first users will hit (login, list pages, search, create). Then sort routes by p95 latency and total time spent.
Check for connection pressure before tuning SQL. A pool that's too large overwhelms Postgres. A pool that's too small creates long waits. Look for rising wait time to acquire a connection and connection counts that spike during bursts. Adjust pool and idle limits first, then re-run the same load.
EXPLAIN the top slow queries and fix the biggest red flag. The usual culprits are full table scans on large tables, sorts on large result sets, and joins that explode row counts. Pick the single worst query and make it boring.
Add or adjust one index, then re-test. Indexes help when they match your WHERE and ORDER BY. Don't add five at once. If your slow endpoint is "list orders by user_id ordered by created_at", a composite index on (user_id, created_at) can be the difference between instant and painful.
Tighten responses and pagination, then re-test again. If an endpoint returns 50 rows with large JSON blobs, your database, network, and client all pay. Return only the fields the UI needs, and prefer pagination that doesn't slow down as tables grow.
Keep a simple change log: what changed, why, and what moved in p95. If a change doesn't improve your baseline, revert it and move on.
Most performance problems in Go APIs on Postgres are self-inflicted. The good news is that a few checks catch many of them before real traffic arrives.
One classic trap is treating pool size like a speed knob. Setting it "as high as possible" often makes everything slower. Postgres spends more time juggling sessions, memory, and locks, and your app starts timing out in waves. A smaller, stable pool with predictable concurrency usually wins.
Another common mistake is "index everything." Extra indexes can help reads, but they also slow writes and can change query plans in surprising ways. If your API inserts or updates frequently, every extra index adds work. Measure before and after, and re-check plans after adding an index.
Pagination debt sneaks in quietly. Offset pagination looks fine early, then p95 rises over time because the database has to walk past more and more rows.
JSON payload size is another hidden tax. Compression helps bandwidth, but it doesn't remove the cost of building, allocating, and parsing large objects. Trim fields, avoid deep nesting, and return only what the screen needs.
If you only watch average response time, you'll miss where real user pain starts. p95 (and sometimes p99) is where pool saturation, lock waits, and slow plans show up first.
A quick pre-launch self-check:
EXPLAIN after adding indexes or changing filters.Before real users arrive, you want evidence that your API stays predictable under stress. The goal isn't perfect numbers. It's catching the few issues that cause timeouts, spikes, or a database that stops accepting new work.
Run checks in a staging environment that resembles production (similar DB size range, same indexes, same pool settings): measure p95 latency per key endpoint under load, capture your top slow queries by total time, watch pool wait time, EXPLAIN (ANALYZE, BUFFERS) the worst query to confirm it's using the index you expect, and sanity-check payload sizes on your busiest routes.
Then do one worst-case run that mimics how products break: request a deep page, apply the broadest filter, and try it with a cold start (restart the API and hit the same request first). If deep pagination gets slower every page, switch to cursor-based pagination before launch.
Write down your defaults so the team makes consistent choices later: pool limits and timeouts, pagination rules (max page size, whether offset is allowed, cursor format), query rules (select only needed columns, avoid SELECT *, cap expensive filters), and logging rules (slow query threshold, how long to keep samples, how to label endpoints).
If you build and export Go + Postgres services with Koder.ai, doing a short planning pass before deployment helps keep filters, pagination, and response shapes intentional. Once you start tuning indexes and query shapes, snapshots and rollback make it easier to undo a "fix" that helps one endpoint but hurts others. If you want a single place to iterate on that workflow, Koder.ai on koder.ai is designed around generating and refining those services through chat, then exporting the source when you're ready.
Start by separating DB wait time from app work time.
Add simple timing around “wait for connection” and “query execution” to see which side dominates.
Use a small baseline you can repeat:
Pick a clear target like “p95 under 200 ms at 50 concurrent users, errors under 0.5%.” Then only change one thing at a time and re-test the same request mix.
Enable slow query logging with a low threshold in pre-launch testing (for example 100–200 ms) and log the full statement so you can copy it into a SQL client.
Keep it temporary:
Once you’ve found the worst offenders, switch to sampling or raise the threshold.
A practical default is a small multiple of CPU cores per API instance, often 5–20 max open connections, with similar max idle connections, and recycle connections every 30–60 minutes.
Two common failure modes:
Remember pools multiply across instances (20 connections × 10 instances = 200 connections).
Time DB calls in two parts:
If most time is pool wait, adjust pool sizing, timeouts, and instance counts. If most time is query execution, focus on EXPLAIN and indexes.
Also confirm you always close rows promptly so connections return to the pool.
Run EXPLAIN (ANALYZE, BUFFERS) on the exact SQL your API sends and look for:
Indexes should match what the endpoint actually does: filters + sort order.
Good default approach:
WHERE + ORDER BY pattern.Use a partial index when most traffic hits a predictable subset of rows.
Example pattern:
active = trueA partial index like ... WHERE active = true stays smaller, is more likely to fit in memory, and reduces write overhead versus indexing everything.
Confirm with that Postgres actually uses it for your high-traffic queries.
LIMIT/OFFSET gets slower on deep pages because Postgres still has to walk past (and often sort) the skipped rows. Page 500 can be dramatically more expensive than page 1.
Prefer keyset (cursor) pagination:
Usually yes for list endpoints. The fastest response is the one you don’t send.
Practical wins:
SELECT *).include= or fields= so clients opt into heavy fields.ORDER BYFix the biggest red flag first; don’t tune everything at once.
Example: if you filter by user_id and sort by newest, an index like (user_id, created_at DESC) is often the difference between stable p95 and spikes.
EXPLAINid).ORDER BY identical across requests.(created_at, id) or similar into a cursor.This keeps each page cost roughly constant as tables grow.
You’ll often reduce Go CPU, memory pressure, and tail latency just by shrinking payloads.