Background Workers
This document defines generic engineering standards for background workers in backend services.
Workers are long-running async loops that perform maintenance, monitoring, and scheduled processing outside the request/response path.
Architecture
Workers run as asyncio tasks inside the application runtime and are started/stopped through FastAPI lifespan. Keep the architecture simple-by-default; only split into separate processes/services when required by scale or isolation.
Lifespan Events:
Workers are started/stopped via FastAPI lifespan events (a dedicated lifespan.py is recommended).
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup: start one or more worker loops
tasks = [
asyncio.create_task(worker_loop_a()),
asyncio.create_task(worker_loop_b()),
]
yield
# Shutdown: cancel tasks gracefully
for t in tasks:
t.cancel()
for t in tasks:
with suppress(asyncio.CancelledError):
await t
Common worker categories
Workers should fall into clear categories:
- Scheduled monitors: periodic scans and health checks (e.g., security scans, DB health).
- Reconciliation/sync loops: keep derived state consistent (e.g., periodic reconciliation jobs).
- Queue processors: claim pending work items from the database and process them safely.
- Maintenance: periodic cleanup or aggregation tasks.
- Operational reporting: periodic suite/report generation where appropriate.
Concurrency Control
Background workers must be safe under multiple app processes (e.g., multiple Uvicorn workers). Use layered concurrency controls:
- Startup single-run lock (process-level): ensure only one process starts the worker tasks.
- A common pattern is a file lock in the OS temp directory created with
O_EXCLso only one process “wins”.
- A common pattern is a file lock in the OS temp directory created with
- Per-loop single-run lock (database-level): ensure only one process executes a cycle at a time.
- A common pattern is a PostgreSQL advisory lock (
pg_try_advisory_lock) held for the duration of the cycle.
- A common pattern is a PostgreSQL advisory lock (
Queue/claiming pattern (safe parallelism)
When processing “pending items” from the database:
- Prefer
SELECT ... FOR UPDATE SKIP LOCKEDto claim work without contention. - If items must be processed sequentially per tenant/resource, add a second lock keyed by that id (e.g., transaction-scoped advisory locks).
Configuration
Workers should be enabled/disabled via environment variables. Standards:
- Each worker loop should check its flag on each cycle so it can be disabled without redeploying.
- Defaults may be environment-aware (e.g., disable some workers in local/test by default).
Example pattern:
ENABLE_<WORKER_NAME>_WORKER=1
<WORKER_NAME>_INTERVAL_SECONDS=300
Loop structure standards
Every worker loop should follow a predictable structure:
- Initialize required resources (e.g., ensure DB engine/session factory exists).
- Optional jitter on startup (small randomized delay) to avoid thundering-herd starts.
while True:- Check enable flag; exit if disabled.
- Acquire per-loop lock (if used).
- Execute one cycle of work inside a fresh DB session.
- Release lock.
- Sleep for the configured interval (plus optional jitter).
- Catch and log loop-level exceptions; sleep briefly and continue (avoid tight crash loops).
Database session standards
- Use a dedicated worker DB session/engine (separate from request sessions) so long-running loops do not interfere with request handling.
- Keep each cycle’s DB session scope tight (open → work → commit/close).
Observability + safety
- Log lifecycle events: loop started, lock acquired/released, cycle complete, cycle error.
- Prefer structured “counts”/summaries (e.g., scanned, processed, errors) so operators can see impact.
- Never let a worker crash the app: failures should be isolated to the loop and retried safely.