stop letting postgresql crawl: 9 battle‑tested database optimizations every engineer should use

August 7, 20257 min read

11 months ago0views

why postgresql feels slow (and how to fix it)

postgresql is fast, but it’s unforgiving when schemas, queries, and infrastructure are sloppy. if you’re a beginner, full‑stack dev, or devops engineer, these battle‑tested optimizations will help you turn “why is this query crawling?” into “that’s snappy.” each tip includes practical steps and code you can paste into psql.

1) measure before you tune: enable query diagnostics

don’t guess. turn on the tools that show you what’s actually slow.

enable auto_explain to log slow plan details.
use explain (analyze, buffers) to see real work and i/o.
log slow statements to catch offenders in prod.

-- in postgresql.conf or via alter system:
shared_preload_libraries = 'auto_explain'
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_min_duration = '200ms'
log_min_duration_statement = '500ms'
log_statement = 'none'

-- per-session deep dive:
explain (analyze, buffers, verbose)
select * from orders where customer_id = 42 order by created_at desc limit 50;

2) create the right index (and the right type)

missing or wrong indexes are the top cause of slow queries.

b-tree for equality and range on sortable data.
gin for jsonb containment, arrays, full-text.
brin for huge, naturally ordered tables (time-series).
partial indexes for frequently filtered subsets.
covering indexes (include) to avoid table hits.

-- equality/range
create index on orders (customer_id, created_at desc);

-- covering index for common query
create index orders_customer_created_cover
on orders (customer_id, created_at desc) include (status, total_cents);

-- partial index for "open" orders only
create index orders_open_idx on orders (customer_id)
where status = 'open';

-- jsonb containment with gin
create index users_prefs_gin on users using gin (preferences jsonb_path_ops);

-- massive time-series table: brin
create index events_brin on events using brin (occurred_at);

checklist: is your where or join using columns with indexes? if not, fix that first.

3) kill sequential scans with good join and where patterns

make filters sargable (search-arg-able) so indexes can be used.

avoid functions on indexed columns in where (wrap values, not columns).
consistent data types across joins and predicates.
use =, in, between, and prefix like patterns where possible.

-- bad: function on column prevents index usage
select * from users where lower(email) = '[email protected]';

-- good: normalize parameter instead
select * from users where email = lower('[email protected]');

-- bad: implicit cast blocks efficient plan
select * from orders where id = '123'; -- id is bigint

-- good:
select * from orders where id = 123;

-- prefix search can use btree if collation supports it; otherwise use pg_trgm
create extension if not exists pg_trgm;
create index users_email_trgm on users using gin (email gin_trgm_ops);
select * from users where email ilike 'john%';

4) vacuum, analyze, and autovacuum tuning

postgres uses mvcc, which creates dead tuples. without vacuum, bloat grows and queries crawl.

autovacuum must be on and tuned for write-heavy tables.
analyze updates statistics for the planner.
manual vacuum (full) only as a last resort (it locks).

-- per table tuning for hot tables
alter table orders set (autovacuum_vacuum_scale_factor = 0.05,
                        autovacuum_analyze_scale_factor = 0.03,
                        autovacuum_vacuum_threshold = 1000,
                        autovacuum_analyze_threshold = 500);

-- when deploying large changes:
vacuum (analyze) orders;

-- if bloat is severe and maintenance window is available:
vacuum (full, analyze) orders; -- locks table, use with caution

5) control data size: archiving and partitioning

huge tables make every operation harder. partitioning keeps hot data fast and cold data out of the way.

range partition by date for logs, events, orders.
pruning skips entire partitions, reducing i/o.
archive or move cold partitions to cheaper storage.

-- native partitioning by month
create table events (
  id bigserial primary key,
  occurred_at timestamptz not null,
  payload jsonb
) partition by range (occurred_at);

create table events_2025_07 partition of events
for values from ('2025-07-01') to ('2025-08-01');

create table events_2025_08 partition of events
for values from ('2025-08-01') to ('2025-09-01');

-- ensure constraints/indexes on each partition as needed
create index on events_2025_08 (occurred_at);

6) write queries that scale

small changes in sql can yield major wins.

prefer select explicit_columns over select * to reduce i/o.
use limit with an index-friendly order by.
replace offset pagination with keyset pagination.
use union all instead of union when duplicates are impossible.

-- keyset pagination example
select id, created_at, total_cents
from orders
where customer_id = 42
  and (created_at, id) < (timestamp '2025-08-01', 987654)
order by created_at desc, id desc
limit 50;

-- avoid select *
select id, name, email from users where id = 123;

7) optimize connections and pooling (devops must-have)

too many connections cause context switching and memory pressure. use a pooler in full‑stack apps and microservices.

use pgbouncer in transaction pooling mode for web workloads.
right-size max_connections; prefer fewer with pooling.
tune work_mem and shared_buffers sensibly.

# pgbouncer.ini essentials
[databases]
appdb = host=127.0.0.1 port=5432 dbname=appdb

[pgbouncer]
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 50
reserve_pool_size = 10
server_idle_timeout = 30

-- postgres basics (example values, measure!)
alter system set max_connections = 200;
alter system set shared_buffers = '25%';         -- or numeric like '8gb'
alter system set work_mem = '64mb';              -- per sort/hash, per node
select pg_reload_conf();

8) keep transactions short and index-friendly

long transactions hold locks, prevent vacuum progress, and inflate bloat. open late, close early.

do not keep transactions open across network calls or user think time.
batch writes with copy or multi-row inserts.
add indexes concurrently to avoid blocking writes.

-- multi-row insert:
insert into metrics (ts, key, val)
values 
  (now(), 'cpu', 0.42),
  (now(), 'mem', 0.77);

-- fast bulk load:
copy metrics (ts, key, val) from program 'cat /tmp/metrics.csv' csv header;

-- online index creation:
create index concurrently idx_orders_status on orders (status);

9) use materialized views and caching for read hotspots

for expensive aggregations, precompute and refresh on a schedule or trigger.

materialized views for heavy joins/aggregations.
application cache or redis for ultra-hot keys.
incremental refresh with triggers or cron jobs.

-- heavy daily revenue summary
create materialized view daily_revenue as
select date_trunc('day', created_at) as day,
       sum(total_cents) as revenue_cents
from orders
group by 1;

-- refresh on schedule (outside sql): e.g., cron, pg_cron, or job runner
refresh materialized view concurrently daily_revenue;

-- index the matview for fast reads
create index on daily_revenue (day);

bonus: stats, extensions, and safe defaults

pg_stat_statements to rank slowest and most frequent queries.
pg_partman for easier partition maintenance.
pg_repack to remove bloat without long locks.

-- enable and query pg_stat_statements
create extension if not exists pg_stat_statements;
select query, calls, total_time, mean_time
from pg_stat_statements
order by total_time desc
limit 20;

common anti-patterns to avoid

select * in apis returning large rows.
offset 100000 limit 50 instead of keyset pagination.
functions on indexed columns in where and join conditions.
unbounded text searches without trigram or full-text indexes.
leaving autovacuum disabled “temporarily.”

quick tuning checklist

observe: turn on logs, auto_explain, pg_stat_statements.
index: create proper indexes for your top 20 queries.
vacuum/analyze: tune autovacuum; watch bloat.
refactor sql: sargable filters, keyset pagination, explicit columns.
partition: time-based for large append-only tables.
pool: pgbouncer for apps; cap connections.
short tx: keep transactions brief; use copy for bulk.
cache: matviews/redis for hot reads.

for full-stack, devops, and seo-focused teams

fast databases power fast apis and uis, which improve core web vitals and seo. in ci/cd, add query plan checks and regression tests for your most critical endpoints. treat performance as a feature: profile locally, benchmark in staging, monitor in production.

sample performance test script (psql + bash)

#!/usr/bin/env bash
query="explain (analyze, buffers, format json)
select id, created_at from orders
where customer_id = 42
order by created_at desc
limit 100;"
psql "$database_url" -atc "$query" | jq '.[0]."plan" | {total: .["actual total time"], rows: .["actual rows"], shared_hits: .["shared hit blocks"], shared_read: .["shared read blocks"]}'

next steps

pick your slowest endpoint and run explain (analyze, buffers).
create or fix the top two missing indexes.
tune autovacuum for your busiest tables.
introduce pgbouncer if you have many app processes.
plan partitioning for any table over 50–100m rows.

make these habits part of your engineering playbook. your postgresql won’t crawl—it’ll sprint.