Example Tracker — Streaming Endpoint Stability
Started: YYYY-MM-DD
Owner: AI agent / Engineering
Objective: Eliminate a reproducible “hang” in a streaming endpoint so requests always release resources cleanly and tests exit reliably.
Ground Rules for Agents
- Verify every claim against source code and logs; do not rely on assumptions.
- Keep changes minimal and scoped to the streaming/cleanup path unless the lead approves broader refactors.
- Do not introduce migrations or schema changes unless explicitly instructed.
- After each substantive change, rerun the smallest reproducer test and record results in this tracker.
- Update this tracker before ending a work session, including failures and rollbacks.
Workflow
- Reproduce: confirm the hang with a timeout-wrapped test run and collect logs.
- Locate cleanup paths: identify where streaming generators, background tasks, and DB sessions are created and closed.
- Hypothesize root cause: describe which resource is leaking (task, session, connection, generator) and under what condition (client disconnect, test cancellation, exception).
- Implement smallest fix: cancellation/cleanup in
finally, explicit session close/rollback, and defensive task awaiting. - Verify: rerun reproducer + a small related suite; confirm no regressions in response contract.
- Document: update progress, record the final approach, and list follow-ups.
Progress Tracker
✅ Completed
- Created this tracker with reproducible steps and initial hypotheses.
- Identified the streaming cleanup path and all resources created per request.
🔄 In Progress
- Implement deterministic cleanup (cancel tasks, close sessions, finalize stream) and verify process exit.
⏳ Pending
- Add regression tests to prevent resource leaks in future changes.
- Document operational monitoring notes (what logs/metrics indicate a recurrence).
🚫 Out of Scope
- Full redesign of streaming transport or API contract.
- Performance tuning unrelated to resource cleanup.
Notes & Follow-Ups
- Reproduction command: document the exact test command used (include a timeout strategy) and the expected failure mode.
- Observed symptoms: record the error text, stack traces, and whether it occurs on client disconnect vs normal completion.
- Hypotheses: list 2–3 candidate causes and rule them out one by one.
- Results log:
- YYYY-MM-DD: attempted fix A → result (pass/fail) + notes
- YYYY-MM-DD: attempted fix B → result (pass/fail) + notes
- Final fix: summarize what was changed, why it works, and the evidence (tests/logs).