PostgreSQL Is Eating the Database World — Including Vector Search
When pgvector wins, when it doesn't, and the scaling patterns that decide. A practitioner's view of the database consolidation now sweeping AI stacks
TL;DR
Reading the post…
The most boring database choice in 2025 is the most expensive habit to break — and increasingly the most correct one.
The technology industry has a long-running joke that comes back every few years: “you don’t need a new database, just use Postgres.” It is a joke because it is partly right, and the part that is right keeps expanding. PostgreSQL started as a relational database. Then it absorbed JSON workloads that briefly belonged to MongoDB. Then full-text search that briefly belonged to Elasticsearch. Then time-series with TimescaleDB. Then geospatial with PostGIS. The pattern is the same each time: a specialized database appears, gains market share, then loses ground to a Postgres extension that does most of the job inside the database the team already operates.
Vector search is the latest cycle. Three years ago, building anything with embeddings meant standing up a separate vector database — Pinecone, Weaviate, Qdrant, Milvus, all genuinely good products. Today, pgvector is the default choice for most production RAG and semantic-search workloads, and the question has flipped from “which vector database should we use?” to “is there a reason we shouldn’t just enable the extension?” For the workloads most teams actually have, the answer is usually no.
This post is about how to make that decision with eyes open. The “just use Postgres” instinct is correct often enough that ignoring it costs real money. It is also wrong sometimes, in specific ways worth being clear about.
Why “just use Postgres” keeps winning
The Postgres consolidation pattern has three drivers, and understanding them is the difference between using pgvector well and accidentally over- or under-investing in your data layer.
First, operational consolidation matters more than peak performance for most teams. A specialized database delivers higher peak performance on its specific workload, but it adds an entire operational dimension: another deployment to monitor, another backup strategy, another permission system, another set of failure modes, another migration tool. For the median engineering team running a SaaS product, peak performance on one workload is rarely the binding constraint. Operational surface area usually is. Consolidating vectors into Postgres means one connection pool, one backup, one set of SQL skills.
Second, the cost comparison goes the wrong way. Specialized vector databases bill for the index. Postgres extensions bill for nothing beyond the database you are already running. If you have a Postgres instance with headroom — and most teams do — adding a vector column is free in licensing terms. The cost shows up only if the workload outgrows the existing instance, at which point the comparison is sizing up Postgres versus the entire spend on the new database plus the operational overhead of running both.
Third, vector search is no longer a moat. Hierarchical Navigable Small World (HNSW) is the dominant approximate-nearest-neighbor algorithm across essentially all modern vector databases. pgvector implements it. So do Pinecone, Weaviate, and Qdrant. The algorithmic gap between “vector database” and “database with a vector index” is essentially gone. The remaining differences are operational: how the system shards, how it handles updates, how it composes filtering with similarity.
That is the strategic picture. The tactical picture is where teams either save serious money or accidentally break production.
What pgvector actually does
pgvector adds three things to Postgres: a vector column type, distance operators (cosine, L2, inner product, and a few less-common ones), and two index types — IVFFlat (older, list-based) and HNSW (graph-based, the practical default since pgvector 0.5.0). That is it. Everything else is regular Postgres: rows, joins, transactions, partitioning, replication, all working the way they already work.
The query pattern looks like a standard SQL query with a similarity operator:
SELECT id, titleFROM documentsWHERE tenant_id = $1ORDER BY embedding <=> $2LIMIT 10;The <=> operator computes cosine distance to the query vector. With an HNSW index, this query runs in the same low-millisecond range as any other indexed lookup, while the tenant_id filter and the LIMIT compose naturally with the vector search. The thing that took a separate database in 2023 is now a column.
This composition is the underrated win. Most production AI features need vectors combined with relational data — filter by tenant, exclude soft-deleted rows, join to user metadata, sort within a date range. In a separate vector database, every one of those constraints is either a metadata filter the index has to support natively, or a round trip to the primary database that costs latency and creates consistency problems. In Postgres, they are just WHERE clauses.
The HNSW tuning knobs that matter in production
Three parameters control HNSW behavior. Most production problems with pgvector come from misunderstanding one of them.
m(build-time) is the maximum number of connections each node has in the graph. Default sixteen. Higher values give better recall at the cost of memory and build time. A reasonable range for production is twelve to forty-eight. For high-dimensional embeddings — think the fifteen-hundred-and-thirty-six-dimensional vectors that OpenAI’s embedding APIs return — pushingmhigher tends to pay off.ef_construction(build-time) is the size of the candidate list during graph construction. Default sixty-four. Higher values give a better-quality index at the cost of much slower index build. Increase this if you are seeing recall gaps thatef_searchalone cannot close.ef_search(query-time, session-level) is the size of the candidate list during search. Default forty, maximum one thousand. This is your primary speed-versus-recall dial in production: set it lower for fast queries with acceptable recall, higher when recall matters more than latency.
The pattern that survives contact with production: pick build-time parameters once based on offline recall testing on a representative subset of your real data, then leave them alone. Tune ef_search per query class — an autocomplete path can run at low ef_search; a “give me the top documents for this customer’s actual question” path runs higher. The cost of getting this wrong is silent: you ship, recall looks fine on the launch dataset, then it degrades as the corpus grows and nobody can tell why.
A few more practical points. HNSW indexes consume memory in proportion to the dataset; the index wants to fit in RAM for fast queries, and falling out of memory is the failure mode that surprises teams. Plan for index size in the same order of magnitude as the raw vector data, often somewhat larger. Index builds are slow — building an HNSW index on a few million rows takes hours, not minutes. Build indexes concurrently in production, and stagger builds so they do not collide with other heavy workloads.
When pgvector is enough — and when it isn’t
The honest sizing guide, based on what I have seen survive production:
pgvector is enough when your corpus is under roughly fifty million vectors on a properly-sized instance; your workload combines vectors with relational filters (which is most of them); your team already knows Postgres; you need transactional consistency between vectors and the rest of your data; or you do not have dedicated infrastructure staff for a second database. This covers the substantial majority of production RAG, semantic search, recommendation, and classification workloads.
Reach for a dedicated vector database when you are operating at hundreds of millions to billions of vectors with high query volume; your workload is essentially pure vector search with minimal relational filtering; you need horizontal sharding of the vector index itself (something pgvector does not provide natively, though Citus and application-level partitioning can approximate it); or your team has the operational depth to run two systems well and the workload genuinely justifies it.
Be honest about which bucket you are in. The mistake teams make most often is assuming they are in the second bucket when their actual workload puts them in the first. Most “we need a vector database for scale” rationales rest on projected scale that never arrives. Start with pgvector. Migrate later if you actually hit a wall. Migration in this direction is also easier than in the other one, because the relational data stays put.
Scaling: partitioning, replicas, and the multi-tenant problem
The scaling patterns for pgvector at scale are the same patterns that work for any other Postgres workload, with a few wrinkles specific to vectors.
Partitioning is the first lever. For multi-tenant workloads, partition the table by tenant or tenant-group and put a separate HNSW index on each partition. This keeps individual indexes small enough to fit comfortably in memory and lets large tenants scale independently. The query planner uses partition pruning to skip irrelevant partitions before the vector search runs. The operational cost is that index builds happen per-partition, but for multi-tenant systems this is usually a feature rather than a bug.
Read replicas scale query throughput linearly. Vector queries are read-heavy, and Postgres replicas have been a well-understood scaling pattern for two decades. Most teams underuse this lever for vector workloads because they instinctively reach for sharding first.
pgvectorscale is the extension worth knowing about. Built by Tiger Data (formerly Timescale), it adds a DiskANN-based index that keeps part of the index on disk rather than requiring everything in RAM, plus statistical compression. The result is closer to dedicated-vector-database performance at scales where vanilla pgvector starts to strain. If you are pushing the edge of what HNSW-in-RAM allows, this is the first step before considering a different database entirely.
The multi-tenant problem deserves a specific note. Vector workloads in B2B SaaS are rarely “ten million vectors” — they are “ten million vectors split across five thousand tenants, each with very different query patterns.” A single big HNSW index across all tenants performs poorly because the tenant filter happens after the approximate search, throwing away most of the candidate set when a tenant has only a small share of the corpus. The remedy is partitioning by tenant (per-partition indexes, partition pruning before vector search) or, for very large tenants, separate tables. The wrong remedy is increasing ef_search until recall recovers, which trades quality for latency and never actually fixes the underlying issue. Many production pgvector incidents that get blamed on “Postgres not handling vector scale” turn out to be exactly this misdiagnosis.
The teams that win on this stack do not pick “Postgres” or “Pinecone” as a tribal identity. They look at the workload, apply the sizing guide above, and reach for the right tool — which, for most production workloads in 2025, happens to be the database they already have. The strategic value of “just use Postgres” is not that Postgres is always the right answer. It is that picking it costs almost nothing to try, and the cost of moving off later, if you ever need to, is one of the cheaper migrations in the entire data stack. That is the asymmetry that keeps making the boring choice the correct one.