RAG & Search — Retrieval & Ranking

This page defines the retrieval contracts and ranking strategy for:

lexical (FTS) retrieval
vector (pgvector) retrieval
hybrid fusion (recommended)

Retrieval flow (hybrid)

flowchart TD
  Query[UserQuery] --> Lexical[FTS_plainto_tsquery]
  Query --> VecEmbed[QueryEmbedding]
  VecEmbed --> Vector[pgvector_KNN]
  Lexical --> Fuse[RRF_Fusion]
  Vector --> Fuse
  Fuse --> TopK[TopK_Candidates]

Query embedding

Rules:

Use the same embedding version family as your indexed content.
Normalize query text (trim, collapse whitespace); do not over-sanitize (don’t remove meaning).
Cache query embeddings per request when multiple vector searches are performed.

Vector retrieval (pgvector)

Similarity operator

Pick one similarity metric for a system and keep it consistent:

<-> (L2 distance)
<=> (cosine distance)

Standard:

document which operator is used and ensure embeddings are generated accordingly.

Minimal SQL pattern (KNN)

Example (scope-filtered KNN over chunks):

SELECT id, canonical_source, title, summary, content
FROM rag_chunks
WHERE scope_id = :scope_id
ORDER BY embedding <=> :query_embedding
LIMIT :top_k_vector;

Guardrails and knobs

Define and document:

top_k_vector: how many candidates to retrieve from vector search (e.g., 20–100)
max query length / max embedding requests per request

Lexical retrieval (Postgres FTS)

Search vector composition

Standards:

build search_vector from title + summary + content (and optionally selected metadata fields)
use coalesce(...) to avoid null issues
choose a language configuration (commonly english) and keep it consistent

Query type

Default:

plainto_tsquery for user-entered text

Use more advanced tsquery composition only when you have a clear UX requirement and tests.

Ranking

Default:

ts_rank (or ts_rank_cd) to sort lexical matches

Minimal SQL pattern (FTS)

Example (scope-filtered FTS over stored search_vector):

SELECT id, canonical_source, title, summary, content,
       ts_rank(search_vector, plainto_tsquery('english', :q)) AS rank
FROM rag_chunks
WHERE scope_id = :scope_id
  AND search_vector @@ plainto_tsquery('english', :q)
ORDER BY rank DESC
LIMIT :top_k_lexical;

Hybrid fusion (recommended): Reciprocal Rank Fusion (RRF)

RRF is robust and simple: it merges ranked lists from multiple retrieval channels.

Standard knobs

Define and document the following per system:

top_k_vector: number of vector candidates
top_k_lexical: number of lexical candidates
final_k: number of fused candidates to return
rrf_k: the RRF constant (commonly 60)

RRF definition

Given a document’s rank (r) in a list, its contribution is:

[ score = \frac{1}{k + r} ]

Final score is the sum across lists.

Minimal fusion implementation (pseudocode)

scores = {}
for ranked_list in [vector_ids, lexical_ids]:
  for rank, id in enumerate(ranked_list, start=1):
    scores[id] += 1 / (rrf_k + rank)

final_ids = take_top(scores, final_k)

Failure and degradation

Standards:

if one channel fails, return results from the remaining channel
if both fail, return an empty list with a safe message (do not raise in tool contexts)

Context formatting for RAG

RAG retrieval must output bounded, structured context.

Required fields per reference block

Each returned chunk must include:

title (chunk title or page title)
canonical source (URL or stable ID)
optional summary
optional contextual prefix (from metadata) when available
content

Reference block template (generic)

=== REFERENCE DOCUMENT ===
TITLE: <title>
CANONICAL SOURCE: <canonical_source>
IMPORTANT: When referencing this content, ALWAYS cite the CANONICAL SOURCE

Summary: <optional_summary>
Context: <optional_contextual_prefix>

CONTENT:
<chunk_content>
=== END REFERENCE DOCUMENT ===

Context budgeting

Define and enforce:

final_k max chunks
max total characters/tokens of returned context

When over budget:

prefer trimming number of chunks over truncating mid-chunk
if truncating, truncate at paragraph boundaries when possible

Tool-call budgeting (AI agent integrations)

If retrieval is exposed as an AI tool:

enforce a per-request “soft cap” on retrieval tool calls
when exhausted, return a message instructing the agent to proceed without more lookups
tools must be safe under missing dependencies (no DB session, no scope_id)