flowCreate.solutions

RAG & Search — Operations

This page defines operational standards for keeping RAG/search reliable over time:

  • backfills
  • re-indexing
  • batching and concurrency
  • degradation modes during outages
  • cost controls

Operational principles

  • Idempotent jobs: every job must be safe to re-run without duplicating data or corrupting state.
  • Scoped safety: jobs must support running for a single scope_id (tenant/org/project) or a small subset of IDs.
  • Observable: every run must report counts, durations, and failures.
  • Failure-tolerant: provider outages should not break core CRUD or cause cascading failures.

Backfills

Backfills exist because embeddings and indexes are derived and may become:

  • missing (provider failure, partial indexing)
  • stale (new embedding_version)
  • invalid (bug fix in preprocessing)

What to backfill

Common targets:

  • embedding IS NULL
  • search_vector IS NULL
  • embedding_version != <current_version>
  • status in (pending, error)

Backfill job contract

Required inputs:

  • scope_id (optional but strongly recommended)
  • batch_size
  • max_concurrency
  • dry_run mode (recommended)

Required outputs:

  • counts: scanned, updated, skipped, failed
  • timings: total duration and per-batch duration
  • failure samples: top N errors with row IDs

Idempotency standards

  • Use stable IDs for rows.
  • Avoid “insert new chunk rows every run” unless you have strong dedupe guarantees.
  • Prefer UPSERT/update-in-place patterns.

Re-indexing (FTS)

If search_vector is stored:

  • define a deterministic “search text” generator
  • compute search_vector = to_tsvector(<lang>, <search_text>)

Re-indexing standards:

  • can be run independently of embedding backfills
  • safe to run many times
  • use batching to avoid large transactions

Embedding provider outages and degradation

Degradation modes

Define behavior for each layer:

  • Create/update write-path:
    • resilient mode (recommended): accept the write and mark for backfill
    • strict mode: fail the write if search fields cannot be updated
  • Read-path search:
    • if vector retrieval fails: lexical-only
    • if lexical retrieval fails: vector-only
    • if both fail: empty results with a safe message

Circuit breaking and budgets

Standards:

  • enforce a per-request budget for embedding calls (especially in AI tool contexts)
  • implement timeouts and capped retries
  • consider a circuit breaker to reduce repeated timeouts during provider incidents

Batching and concurrency

Batch sizing

Guidelines:

  • choose batch_size to keep memory stable and avoid long transactions
  • commit per batch (or smaller) so partial progress is preserved

Concurrency

Standards:

  • cap concurrent embedding calls (max_concurrency)
  • use exponential backoff with jitter on provider errors
  • ensure DB connections are not exhausted by background workers

Cost controls

Standards:

  • cache query embeddings per request (and optionally per short TTL)
  • prefer hybrid retrieval with smaller final_k rather than huge context payloads
  • store summaries/contextual prefixes when they reduce the needed chunk count

Monitoring and alerting (minimum viable)

Track and alert on:

  • embedding error rate (by provider + embedding_version)
  • indexing lag (pending/error counts)
  • retrieval latency percentiles
  • DB query latency for vector and FTS queries