Vector search in PostgreSQL has gone from “experimental hack” to a serious production option, largely thanks to the pgvector extension. But as teams push from thousands to tens or hundreds of millions of embeddings, a natural question emerges:
How do you keep vector search fast and cost‑effective as the dataset grows, without adding yet another external database?
This is exactly the problem pgvectorscale is designed to address.
In this article, we’ll take a detailed look at pgvectorscale: what it is, how it fits into the Postgres ecosystem, how it scales vector search, and what trade‑offs you should understand before using it.
Note: Names, concepts and high‑level behavior in this article are based on the state of the ecosystem as of late 2024. For exact configuration options and up‑to‑date syntax, always refer to the official pgvectorscale documentation and GitHub repository.
1. Background: Vector Search in PostgreSQL with pgvector
Before we can understand what pgvectorscale adds, we need a quick refresher on pgvector and the challenges it faces at scale.
1.1 What pgvector does well
pgvector is a Postgres extension that introduces a vector data type and similarity operators for storing and querying embeddings, for example:
- Text embeddings from OpenAI, Cohere, etc.
- Image or audio embeddings
- User or item embeddings for recommendations
Typical usage:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536) -- dimension must match your embedding model
);
You can then insert embeddings and run similarity search with operators such as:
<->for Euclidean distance<#>for inner product (useful when vectors are normalized)<=>for cosine distance
Example query:
SELECT id, content
FROM documents
ORDER BY embedding <-> $1 -- $1 is a parameter: the query embedding
LIMIT 10;
For small datasets (e.g., up to a few hundred thousand rows), this can be performant even with a sequential scan, especially if caching and hardware are decent.
1.2 Indexing in pgvector and its limits
To scale beyond that, pgvector introduces index support (for example, IVF‑style indexes). With the right index, pgvector can perform approximate nearest neighbor (ANN) search, turning a full table scan into an index probe.
Typical characteristics of pgvector alone:
- Great for:
- 10k–1M vectors on modest hardware
- Low latency and acceptable memory usage in this range
- Starts to hurt when:
- You’re storing tens or hundreds of millions of embeddings
- You need high recall AND sub‑100ms latency
- You can’t afford to keep all vectors or all index structures in RAM
The core issue is that standard indexing strategies and full‑precision floats can become memory‑heavy and I/O‑bound at larger scales.
This is where pgvectorscale enters.
2. What Is pgvectorscale?
At a high level:
pgvectorscale is a PostgreSQL extension built to make vector search scale to much larger datasets by compressing and indexing embeddings more efficiently, while remaining inside Postgres.
You can think of it as a companion to pgvector:
- pgvector
Defines thevectortype, operators, and basic index methods. - pgvectorscale
Adds scalable index structures and compression mechanisms to handle large‑scale vector search more efficiently.
2.1 Goals of pgvectorscale
The design goals (as communicated in blogs, talks, and documentation) typically include:
Scale to much larger datasets
- Millions to hundreds of millions of vectors
- Without requiring that all vectors (or the full index) live in RAM
Reduce memory and storage footprint
- Use compression techniques (e.g., quantization) to store vector data more compactly
- Allow higher throughput and more data on the same hardware
Remain “Postgres‑native”
- No external vector database to deploy, secure, and operate
- Leverage existing Postgres features: transactions, roles, extensions, backups, etc.
Offer tunable recall vs. performance
- Let you choose between:
- Higher recall (more accurate results, slower queries), or
- Lower recall (faster queries, less CPU/I/O)
- Let you choose between:
2.2 How pgvectorscale fits in the stack
A common mental model is:
- Application layer
- Your app makes queries, passes embeddings (from your chosen model).
- Postgres with pgvector
- Stores embeddings as
vector. - Exposes similarity operators.
- Stores embeddings as
- pgvectorscale
- Provides specialized index types and compression for that vector column.
- Intercepts similarity queries and accelerates them using those indexes.
From your application’s point of view, you stay within the Postgres world: SQL, extensions, tables, indexes.
3. Core Ideas Behind pgvectorscale
Even without diving into every algorithm detail, there are a few key concepts that explain how pgvectorscale can scale vector search.
3.1 Approximate Nearest Neighbor (ANN)
Exact nearest neighbor search over large, high‑dimensional embeddings is expensive:
- Cost grows roughly linearly with number of vectors (
O(N)) - High‑dimensional distance calculations are CPU heavy
ANN index structures use clever approximations so that:
- You don’t inspect every vector for each query
- You accept a tunable trade‑off:
- You may miss some of the true nearest neighbors
- In return, you get much faster queries
pgvectorscale builds on this general ANN idea, just as systems like FAISS, HNSWlib, and specialized vector databases do.
3.2 Compression and Quantization
Storage and memory become bottlenecks when you have:
- High‑dimensional embeddings (e.g., 768, 1024, 1536 dimensions)
- Stored as 32‑bit floats (
float4)
A single 1536‑dimensional vector in float32 is:
1536 dims * 4 bytes ≈ 6 KBper vector.
At 100 million vectors, that’s hundreds of gigabytes just for embeddings, before indexes and metadata.
To scale, pgvectorscale leans on compression—often implemented through variations of quantization, such as:
- Scalar quantization
Reduce precision of each component (e.g., store as 8‑bit instead of 32‑bit) while trying to preserve distances. - Product quantization (PQ) and related techniques
Break the vector into sub‑vectors and quantize each in a way that allows fast approximate distances.
Conceptually:
- Instead of storing the full 32‑bit float for each dimension, you store a compact code.
- At query time, you compute distances in the compressed space first.
- Optionally, you can then re‑rank a small candidate set using the full‑precision embeddings (if you kept them).
This is why pgvectorscale can reduce memory/storage footprint dramatically while still returning good nearest‑neighbor results.
3.3 Two‑stage retrieval: candidate generation + re‑ranking
A common pattern in scalable vector search (and used in many systems that pgvectorscale is inspired by) is:
- Candidate generation
- Use a fast, compressed index to retrieve a small set of candidate IDs.
- This leverages approximations and cheap distance computations.
- Re‑ranking
- Fetch original embeddings (if stored) or higher‑precision representations.
- Compute exact distances for just these candidates.
- Return the top‑k results with higher accuracy.
pgvectorscale’s architecture is naturally aligned with this approach: compression for speed and scale, plus optional “exact” re‑ranking to preserve relevance.
4. Installing and Setting Up pgvectorscale
Since installation specifics can vary by distribution (e.g., Timescale stack, cloud offerings, native packages), here is the conceptual process you’ll likely encounter. Always follow the official instructions for your environment.
4.1 Prerequisites
- PostgreSQL 14+ (exact required version may vary; check docs)
pgvectorinstalled and enabled:CREATE EXTENSION IF NOT EXISTS vector;- Sufficient privileges to install/enable extensions.
4.2 Installing pgvectorscale
On many setups, pgvectorscale will be available as a shared library and extension, so you can enable it with:
CREATE EXTENSION IF NOT EXISTS pgvectorscale;
In self‑hosted scenarios you may:
- Install the OS package (e.g.,
.deb,.rpm, Homebrew, etc.), or - Build from source (
make && make install) and ensure the library directory is visible to Postgres.
Again, the exact commands depend on your platform and packaging.
4.3 Creating a table and storing embeddings
Assume you’re building a document search feature:
CREATE TABLE documents (
id bigserial PRIMARY KEY,
title text,
body text,
-- 1536-dimensional embeddings, e.g., from OpenAI text-embedding-3-large
embedding vector(1536)
);
Insert data from your application:
INSERT INTO documents (title, body, embedding)
VALUES
($1, $2, $3::vector); -- where $3 is something like '[0.1, 0.2, ...]'
4.4 Adding a pgvectorscale index (conceptual)
pgvectorscale introduces its own index type / method, which you apply to the vector column.
The high‑level pattern is:
CREATE INDEX CONCURRENTLY IF NOT EXISTS documents_embedding_pgvectorscale_idx
ON documents
-- using a pgvectorscale-provided index access method
-- with options tuned for your workload
(embedding);
Because index names, access method names, and option keys are extension‑specific and can change across versions, you should:
- Check the pgvectorscale docs for:
- The name of the index access method (e.g.,
USING ...) - Any necessary operator classes or opfamilies (e.g., L2 vs cosine)
- Tuning parameters (number of lists, probes, compression settings, etc.)
- The name of the index access method (e.g.,
Once created, your queries typically look identical to standard pgvector queries, but the planner will use the pgvectorscale index to accelerate them.
5. Querying with pgvectorscale
From the application’s perspective, using pgvectorscale is often transparent: you keep using pgvector’s similarity operators while the planner decides to use the new index.
5.1 Example similarity query
Assume:
- You have
embedding vector(1536)indocuments. - You’re using Euclidean distance for similarity.
Query:
-- $1 is the query embedding: a 1536-dim vector
SELECT id, title, body
FROM documents
ORDER BY embedding <-> $1
LIMIT 10;
If:
- The
pgvectorscaleextension is installed, - The correct index is in place,
enable_seqscanis not artificially forcing full scans,
then Postgres will typically choose to use the pgvectorscale index to execute this query.
5.2 Cosine vs inner product vs L2
pgvector supports multiple distance metrics; pgvectorscale typically follows suit by:
- Providing operator classes / index support specialized for:
- L2 / Euclidean distance
- Inner product
- Cosine distance (often implemented via normalized vectors and inner product)
Choice depends on:
- Your model’s recommended similarity metric
- Whether vectors are normalized to unit length
Ensure your index is created with the correct operator class to match the metric you’ll use in queries. The official docs will specify the exact syntax.
5.3 Hybrid search (semantic + keyword)
Because pgvectorscale lives inside Postgres, it plays nicely with other indexing features:
- Full‑text search (
tsvector/tsquery) - B‑tree and GIN indexes
- JSONB queries
You can combine filters and similarity search:
SELECT id, title, body
FROM documents
WHERE
category = 'finance'
AND published_at > now() - interval '30 days'
ORDER BY
embedding <-> $1 -- semantic similarity
LIMIT 10;
And you can blend text relevance scores with vector scores in a hybrid ranking strategy, all in SQL.
6. Tuning and Trade‑offs in pgvectorscale
The real power of pgvectorscale comes from its tuning knobs. While the exact parameter names differ by version, the trade‑offs they control are common to most ANN and compression‑based systems.
6.1 Key dimensions to think about
When tuning pgvectorscale, you’re balancing:
- Recall (accuracy)
- How often the retrieved neighbors are truly the closest by the chosen metric.
- Latency (per query)
- How quickly you can return results, especially p95/p99.
- Throughput
- How many queries per second your system can serve.
- Memory and storage usage
- How much RAM and disk are needed for embeddings and indexes.
- Index build and maintenance cost
- How long it takes to build or rebuild indexes.
- How updates (INSERT/UPDATE/DELETE) behave.
pgvectorscale gives you parameters that roughly move you along these axes.
6.2 Typical categories of settings
Without claiming specific parameter names, here are common categories of settings you will see and need to tune:
Index granularity / partitions / lists
- Controls how embeddings are grouped or partitioned internally.
- Higher granularity can improve recall but may increase memory and build time.
Search breadth / probes / visited nodes
- Controls how much of the index is explored per query.
- More probes → higher recall, higher latency; fewer probes → faster but less accurate.
Compression level / code size
- Determines how aggressively embeddings are compressed.
- Stronger compression → smaller index, faster I/O, but more approximation error.
Re‑ranking size (candidate pool)
- How many candidates are pulled from the compressed index for exact re‑ranking.
- Larger candidate pool → higher recall, more CPU per query.
Build vs query trade‑offs
- Some settings influence both build time and query time (e.g., number of centroids or graph connectivity).
- If you rebuild rarely, you may accept longer index builds for faster queries.
The official docs for pgvectorscale will map these conceptual categories to actual parameters and recommended values for different dataset sizes.
6.3 Practical tuning workflow
A sensible approach:
Start simple
- Use defaults recommended by the pgvectorscale docs for your dimension and estimated dataset size.
Collect realistic queries
- Prepare a test set:
- Query embeddings
- Ground‑truth nearest neighbors (computed offline with exact search on a subset or in a separate system, if needed)
- Prepare a test set:
Measure recall vs latency
- For each configuration:
- Run your queries and measure:
- Average and tail latency
- Recall@k (e.g., recall of top 10 neighbors compared to exact)
- Run your queries and measure:
- For each configuration:
Adjust parameters
- Increase search breadth or re‑ranking size if recall is too low.
- Increase compression or reduce search breadth if latency is too high and recall is acceptable.
Monitor in production
- Log query latency and, where feasible, approximate recall on sampled queries over time.
- Watch resource usage (CPU, RAM, disk I/O).
7. When (and When Not) to Use pgvectorscale
7.1 Good use cases for pgvectorscale
Consider pgvectorscale if:
You already rely on Postgres as your primary data store
And you want vector search without:- Running an additional vector database,
- Implementing complex replication or dual‑write patterns.
Your dataset is beyond what plain pgvector can comfortably handle
- Tens of millions or more embeddings.
- Your current indexes are becoming too big, slow, or memory‑hungry.
You care about operational simplicity
- Fewer moving parts (no separate vector DB),
- Single backup/restore strategy,
- Consistent authentication and authorization in Postgres.
You accept a tunable approximation
- Perfect recall is not required; near‑perfect is acceptable.
- You’re willing to test and tune parameters.
Typical domains:
- Semantic search over large document corpora
- Retrieval‑augmented generation (RAG) at scale
- Personalized recommendations and similarity search
- Log / event search with embedding‑based anomaly detection
7.2 Situations where pgvectorscale may not be ideal
You might not need or want pgvectorscale if:
Your dataset is small
- For under ~100k–500k vectors, simple pgvector (with or without a basic index) may be more than adequate.
- Avoid unnecessary complexity.
You require exact nearest neighbors at all costs
- Some workloads (e.g., certain scientific/financial applications) demand exactness.
- ANN and compression inherently introduce approximation, even if small.
You’re already invested in a dedicated vector database
- If you’ve chosen and operationalized a system like Pinecone, Weaviate, Qdrant, etc., the marginal benefit of pgvectorscale may be less.
Complex multi‑node / cross‑region Postgres topologies
- At very large scale and strict availability requirements, you may find simpler horizontal scaling stories in specialized vector stores.
- Postgres can scale, but you must plan carefully for replication and failover when indexes are heavy.
8. Operational Considerations
Beyond pure performance, you should think about how pgvectorscale behaves operationally in a real system.
8.1 Backup and restore
Because pgvectorscale is a native Postgres extension, its indexes are managed like any other Postgres index:
- Physical backups (e.g.,
pg_basebackup, file system snapshots, cloud snapshots) will capture:- Data files
- Index files, including pgvectorscale’s structures
- Logical backups (e.g.,
pg_dump) will:- Dump table data
- Regenerate indexes via
CREATE INDEXcommands on restore
Implications:
- If you use logical backups, index rebuild time matters:
- Large compressed indexes can take time to rebuild.
- Plan restore windows accordingly.
8.2 Replication
- Streaming replication (physical replication):
- Index updates are replicated along with other changes.
- Standby instances can serve read‑only vector queries using pgvectorscale indexes.
- Logical replication:
- Typically replicates table changes (INSERT/UPDATE/DELETE).
- Target instances must have:
- The same extension versions installed
- Indexes defined to match query plans
In both cases:
- Ensure pgvectorscale is installed on all nodes.
- Test failover and recovery with real workloads.
8.3 Updates, deletes, and index maintenance
As with any sophisticated index:
- Frequent updates to embeddings can lead to:
- Index fragmentation
- Potential performance degradation over time
- You may need:
- Periodic reindexing
- Vacuuming and regular maintenance
Patterns that help:
- Use append‑heavy designs when possible (e.g., versioned rows).
- Batch updates of embeddings instead of constant small updates.
- Schedule index rebuilds during low‑traffic periods if needed.
8.4 Monitoring
Monitor at several layers:
- Database resource metrics
- CPU usage
- Memory usage / shared buffers hit ratio
- Disk I/O (read/write throughput, IOPS)
- Query performance
- Latency distributions (avg, p95, p99)
EXPLAIN (ANALYZE, BUFFERS)to understand query paths
- Index health
- Size on disk
- Growth over time
- Rebuild times in staging
Some systems or integrations may expose additional pgvectorscale‑specific metrics (e.g., recall estimates, internal cache hit rates). Use them where available.
9. Comparing pgvectorscale to Alternative Approaches
It’s helpful to place pgvectorscale in context.
9.1 pgvectorscale vs “plain” pgvector indexes
Plain pgvector:
- Simpler setup.
- Good for small to medium‑size datasets.
- May use IVF or other ANN methods, but typically without aggressive compression.
pgvectorscale (on top of pgvector):
- Introduces compressed, scale‑oriented indexes.
- Targets much larger datasets while staying in Postgres.
- More tuning knobs and complexity.
Rough guidelines:
- Start with pgvector alone.
- Move to pgvectorscale when:
- Dataset size or query load makes performance insufficient,
- Or memory footprint becomes problematic.
9.2 pgvectorscale vs external vector databases
External vector DBs (e.g., Pinecone, Weaviate, Qdrant, Milvus):
- Pros:
- Designed from the ground up for vector workloads.
- Often simpler horizontal scaling.
- Rich operational tooling and observability for vector search.
- Cons:
- Another system to deploy, patch, monitor, back up, and secure.
- Network hops between Postgres and vector DB.
- Potential consistency and dual‑write issues.
pgvectorscale:
- Pros:
- Single database (Postgres) for structured data and embeddings.
- Transactional consistency between vector and relational data.
- Simpler dev and ops story for many teams.
- Cons:
- You inherit Postgres’s scaling characteristics.
- Tuning may be more involved to reach the same throughput/latency as a specialized system at extreme scale.
The right choice depends on:
- Your scale (data size, QPS)
- Your team’s operational tolerance for extra systems
- How important transactional guarantees and simplicity are to you
10. Best Practices for Using pgvectorscale Effectively
10.1 Design with the right embedding model and dimension
- Choose an embedding model that:
- Matches your domain (code, documents, images, etc.).
- Doesn’t have an unnecessarily large dimension.
- Larger dimensions:
- Improve expressiveness but increase:
- Storage
- Index build time
- Query cost
- Improve expressiveness but increase:
- If possible, keep dimensions in the few hundred to low thousands range.
10.2 Normalize embeddings consistently
- If you plan to use cosine similarity:
- Normalize embeddings to unit length in your application before inserting.
- Consistent preprocessing:
- Tokenization
- Casing
- Language filtering
- This matters more than small differences in index tuning.
10.3 Partition data logically
For very large datasets:
- Consider table partitioning (e.g., by tenant, region, or time).
- Each partition can have its own:
- pgvectorscale index
- Maintenance and rebuild schedule
- Benefits:
- Smaller indexes per partition.
- Improved cache locality.
- Easier rolling maintenance.
10.4 Use hybrid scoring when appropriate
Don’t rely solely on vector distance if your use case can benefit from:
- Keyword matching (exact or fuzzy)
- Metadata filters (author, date, tags)
- Business rules (boost certain categories)
Combine scores:
- Use CTEs or subqueries to compute separate scores.
- Normalize and combine them in a final
ORDER BY.
This keeps vector search as one signal among many, which often improves relevance.
10.5 Test recall and user‑perceived quality
Technical metrics (recall@k, latency) are important, but they are proxies for one thing:
Do users find the results relevant?
Run:
- A/B tests where feasible.
- Human evaluation rounds on typical queries.
- Offline analysis using labeled datasets (if you have them).
Tune pgvectorscale parameters with both technical and user‑centric metrics in mind.
11. Summary
pgvectorscale is an important step in the evolution of Postgres‑native vector search:
- It builds on pgvector to provide scalable, compressed index structures for embeddings.
- It aims to let you store and query very large collections of vectors without leaving Postgres.
- It gives you tunable trade‑offs between:
- Recall
- Latency
- Memory and storage
- It keeps the operational story simple:
- Single database for structured data and vectors
- Familiar tools for backup, replication, and security
You should consider pgvectorscale when:
- You’re hitting the scaling limits of plain pgvector.
- You want to avoid adding a specialized vector database.
- You’re willing to invest in tuning for your workload.
At the same time, it’s not a silver bullet:
- For small datasets, it may be unnecessary complexity.
- For extreme scale or highly specialized needs, dedicated vector databases may still make sense.
Used thoughtfully, pgvectorscale allows many teams to push Postgres significantly further as the central engine for both traditional relational workloads and modern AI‑driven vector search.
12. Further Resources
To go deeper and get the latest, consult:
- The pgvectorscale GitHub repository
- Source code
- Installation instructions
- Examples and reference configuration options
- The pgvector documentation
- Vector type definition
- Similarity operators and metrics
- Basic indexing strategies
- Blog posts and talks from:
- The maintainers of pgvectorscale
- Postgres and vector‑search communities
Because pgvectorscale is actively evolving, always rely on the official docs for:
- Exact index creation syntax
- Supported operator classes and distance metrics
- Recommended tuning parameters for your data size and embedding dimension
If you share details about your specific workload (data size, QPS, latency targets, embedding model), I can help sketch a concrete pgvectorscale configuration and benchmarking plan tailored to your use case.