A Hybrid-Search Recipe With pgvector + pg_trgm
A dedicated vector DB is overkill for most RAG workloads. pgvector + pg_trgm + reciprocal-rank fusion gets you 90% of the quality at zero extra infra.
Schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE kb_chunks (
id uuid PRIMARY KEY,
source_kind text NOT NULL,
source_id text NOT NULL,
content text NOT NULL,
embedding vector(1024) NOT NULL,
tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);
CREATE INDEX kb_chunks_emb_hnsw_idx ON kb_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX kb_chunks_tsv_gin_idx ON kb_chunks USING gin (tsv);
The query
WITH sem AS (
SELECT id, 1 - (embedding <=> :q_vec) AS score
FROM kb_chunks ORDER BY embedding <=> :q_vec LIMIT 50
),
kw AS (
SELECT id, ts_rank_cd(tsv, plainto_tsquery(:q_text)) AS score
FROM kb_chunks WHERE tsv @@ plainto_tsquery(:q_text) LIMIT 50
)
SELECT c.id, c.content,
COALESCE(1.0 / (60 + sem_rank.rank), 0) + COALESCE(1.0 / (60 + kw_rank.rank), 0) AS fused
FROM kb_chunks c
LEFT JOIN (SELECT id, rank() OVER (ORDER BY score DESC) FROM sem) sem_rank USING (id)
LEFT JOIN (SELECT id, rank() OVER (ORDER BY score DESC) FROM kw) kw_rank USING (id)
WHERE c.id IN (SELECT id FROM sem UNION SELECT id FROM kw)
ORDER BY fused DESC
LIMIT 8;
Reciprocal rank fusion (the 1 / (60 + rank) trick) smooths across the two score distributions without needing to normalise them.
Chunking notes
- 400–800 tokens per chunk for prose.
- Respect section boundaries — a chunk cut mid-sentence confuses both the embedder and the reader.
- Attach a ~30-token
short_descriptionper chunk to thekb_chunks.metadata. The agent's citation quality lives and dies on that summary.