RAG & Search — Retrieval & Ranking
This page defines the retrieval contracts and ranking strategy for:
- lexical (FTS) retrieval
- vector (pgvector) retrieval
- hybrid fusion (recommended)
Retrieval flow (hybrid)
flowchart TD
Query[UserQuery] --> Lexical[FTS_plainto_tsquery]
Query --> VecEmbed[QueryEmbedding]
VecEmbed --> Vector[pgvector_KNN]
Lexical --> Fuse[RRF_Fusion]
Vector --> Fuse
Fuse --> TopK[TopK_Candidates]
Query embedding
Rules:
- Use the same embedding version family as your indexed content.
- Normalize query text (trim, collapse whitespace); do not over-sanitize (don’t remove meaning).
- Cache query embeddings per request when multiple vector searches are performed.
Vector retrieval (pgvector)
Similarity operator
Pick one similarity metric for a system and keep it consistent:
<->(L2 distance)<=>(cosine distance)
Standard:
- document which operator is used and ensure embeddings are generated accordingly.
Minimal SQL pattern (KNN)
Example (scope-filtered KNN over chunks):
SELECT id, canonical_source, title, summary, content
FROM rag_chunks
WHERE scope_id = :scope_id
ORDER BY embedding <=> :query_embedding
LIMIT :top_k_vector;
Guardrails and knobs
Define and document:
top_k_vector: how many candidates to retrieve from vector search (e.g., 20–100)- max query length / max embedding requests per request
Lexical retrieval (Postgres FTS)
Search vector composition
Standards:
- build
search_vectorfromtitle + summary + content(and optionally selected metadata fields) - use
coalesce(...)to avoid null issues - choose a language configuration (commonly
english) and keep it consistent
Query type
Default:
plainto_tsqueryfor user-entered text
Use more advanced tsquery composition only when you have a clear UX requirement and tests.
Ranking
Default:
ts_rank(orts_rank_cd) to sort lexical matches
Minimal SQL pattern (FTS)
Example (scope-filtered FTS over stored search_vector):
SELECT id, canonical_source, title, summary, content,
ts_rank(search_vector, plainto_tsquery('english', :q)) AS rank
FROM rag_chunks
WHERE scope_id = :scope_id
AND search_vector @@ plainto_tsquery('english', :q)
ORDER BY rank DESC
LIMIT :top_k_lexical;
Hybrid fusion (recommended): Reciprocal Rank Fusion (RRF)
RRF is robust and simple: it merges ranked lists from multiple retrieval channels.
Standard knobs
Define and document the following per system:
top_k_vector: number of vector candidatestop_k_lexical: number of lexical candidatesfinal_k: number of fused candidates to returnrrf_k: the RRF constant (commonly 60)
RRF definition
Given a document’s rank (r) in a list, its contribution is:
[ score = \frac{1}{k + r} ]
Final score is the sum across lists.
Minimal fusion implementation (pseudocode)
scores = {}
for ranked_list in [vector_ids, lexical_ids]:
for rank, id in enumerate(ranked_list, start=1):
scores[id] += 1 / (rrf_k + rank)
final_ids = take_top(scores, final_k)
Failure and degradation
Standards:
- if one channel fails, return results from the remaining channel
- if both fail, return an empty list with a safe message (do not raise in tool contexts)
Context formatting for RAG
RAG retrieval must output bounded, structured context.
Required fields per reference block
Each returned chunk must include:
- title (chunk title or page title)
- canonical source (URL or stable ID)
- optional summary
- optional contextual prefix (from metadata) when available
- content
Reference block template (generic)
=== REFERENCE DOCUMENT ===
TITLE: <title>
CANONICAL SOURCE: <canonical_source>
IMPORTANT: When referencing this content, ALWAYS cite the CANONICAL SOURCE
Summary: <optional_summary>
Context: <optional_contextual_prefix>
CONTENT:
<chunk_content>
=== END REFERENCE DOCUMENT ===
Context budgeting
Define and enforce:
final_kmax chunks- max total characters/tokens of returned context
When over budget:
- prefer trimming number of chunks over truncating mid-chunk
- if truncating, truncate at paragraph boundaries when possible
Tool-call budgeting (AI agent integrations)
If retrieval is exposed as an AI tool:
- enforce a per-request “soft cap” on retrieval tool calls
- when exhausted, return a message instructing the agent to proceed without more lookups
- tools must be safe under missing dependencies (no DB session, no scope_id)