Example Tracker — Streaming Endpoint Stability

Started: YYYY-MM-DD
Owner: AI agent / Engineering
Objective: Eliminate a reproducible “hang” in a streaming endpoint so requests always release resources cleanly and tests exit reliably.

Ground Rules for Agents

Verify every claim against source code and logs; do not rely on assumptions.
Keep changes minimal and scoped to the streaming/cleanup path unless the lead approves broader refactors.
Do not introduce migrations or schema changes unless explicitly instructed.
After each substantive change, rerun the smallest reproducer test and record results in this tracker.
Update this tracker before ending a work session, including failures and rollbacks.

Workflow

Reproduce: confirm the hang with a timeout-wrapped test run and collect logs.
Locate cleanup paths: identify where streaming generators, background tasks, and DB sessions are created and closed.
Hypothesize root cause: describe which resource is leaking (task, session, connection, generator) and under what condition (client disconnect, test cancellation, exception).
Implement smallest fix: cancellation/cleanup in finally, explicit session close/rollback, and defensive task awaiting.
Verify: rerun reproducer + a small related suite; confirm no regressions in response contract.
Document: update progress, record the final approach, and list follow-ups.

Progress Tracker

✅ Completed

Created this tracker with reproducible steps and initial hypotheses.
Identified the streaming cleanup path and all resources created per request.

🔄 In Progress

Implement deterministic cleanup (cancel tasks, close sessions, finalize stream) and verify process exit.

⏳ Pending

Add regression tests to prevent resource leaks in future changes.
Document operational monitoring notes (what logs/metrics indicate a recurrence).

🚫 Out of Scope

Full redesign of streaming transport or API contract.
Performance tuning unrelated to resource cleanup.

Notes & Follow-Ups

Reproduction command: document the exact test command used (include a timeout strategy) and the expected failure mode.
Observed symptoms: record the error text, stack traces, and whether it occurs on client disconnect vs normal completion.
Hypotheses: list 2–3 candidate causes and rule them out one by one.
Results log:
- YYYY-MM-DD: attempted fix A → result (pass/fail) + notes
- YYYY-MM-DD: attempted fix B → result (pass/fail) + notes
Final fix: summarize what was changed, why it works, and the evidence (tests/logs).