A Hybrid-Search Recipe With pgvector + pg_trgm

A dedicated vector DB is overkill for most RAG workloads. pgvector + pg_trgm + reciprocal-rank fusion gets you 90% of the quality at zero extra infra.

Schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE kb_chunks (
  id          uuid PRIMARY KEY,
  source_kind text NOT NULL,
  source_id   text NOT NULL,
  content     text NOT NULL,
  embedding   vector(1024) NOT NULL,
  tsv         tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);

CREATE INDEX kb_chunks_emb_hnsw_idx ON kb_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX kb_chunks_tsv_gin_idx  ON kb_chunks USING gin  (tsv);

The query

WITH sem AS (
  SELECT id, 1 - (embedding <=> :q_vec) AS score
  FROM kb_chunks ORDER BY embedding <=> :q_vec LIMIT 50
),
kw AS (
  SELECT id, ts_rank_cd(tsv, plainto_tsquery(:q_text)) AS score
  FROM kb_chunks WHERE tsv @@ plainto_tsquery(:q_text) LIMIT 50
)
SELECT c.id, c.content,
  COALESCE(1.0 / (60 + sem_rank.rank), 0) + COALESCE(1.0 / (60 + kw_rank.rank), 0) AS fused
FROM kb_chunks c
LEFT JOIN (SELECT id, rank() OVER (ORDER BY score DESC) FROM sem) sem_rank USING (id)
LEFT JOIN (SELECT id, rank() OVER (ORDER BY score DESC) FROM kw) kw_rank  USING (id)
WHERE c.id IN (SELECT id FROM sem UNION SELECT id FROM kw)
ORDER BY fused DESC
LIMIT 8;

Reciprocal rank fusion (the 1 / (60 + rank) trick) smooths across the two score distributions without needing to normalise them.

Chunking notes

400–800 tokens per chunk for prose.
Respect section boundaries — a chunk cut mid-sentence confuses both the embedder and the reader.
Attach a ~30-token short_description per chunk to the kb_chunks.metadata. The agent's citation quality lives and dies on that summary.