Add rules from session audit: error handling, testing, debugging

- sessions.md: hard compaction limits, no-polling, file read budget
- debugging.md: root cause first, isolated testing, subprocess logging
- github.md: strengthen MCP-first enforcement
- error-handling.md (packages): no silent swallowing, protect ast.parse
- test-coverage.md (packages): every module needs tests, known gaps
This commit is contained in:
Kevin Turcios 2026-04-21 21:06:15 -05:00
parent cefd625d35
commit c0a72e978d
5 changed files with 102 additions and 4 deletions

View file

@ -0,0 +1,21 @@
# Debugging and Testing
## Root cause first
When encountering a bug, always investigate the root cause. Don't patch symptoms. If you're about to add a try/except, a fallback default, or a defensive check -- ask whether the real fix is upstream.
The user will tell you "fix the root cause" if you get this wrong. Don't make them say it twice.
## Isolated testing
Prefer running individual test functions or modules over full E2E suites. Only run the full suite when explicitly asked or before pushing.
- Single function: `uv run pytest tests/test_foo.py::TestBar::test_baz -v`
- Single module: `uv run pytest tests/test_foo.py -v`
- Full suite: only when asked, or before `git push`
When debugging a specific endpoint or integration, test it directly (e.g., `requests.post()` to a local server) instead of running the entire pipeline end-to-end.
## Subprocess failures
When a subprocess fails, always log stdout and stderr. "Exit code 1" with no output is useless. If you see a subprocess failure without output, add logging before moving on.

View file

@ -1,3 +1,5 @@
# GitHub Interactions
Prefer MCP GitHub tools (`mcp__github__*`) over the `gh` CLI for all GitHub operations. Only fall back to `gh` via Bash when no matching MCP tool exists.
ALWAYS use MCP GitHub tools (`mcp__github__*`) for GitHub operations. Check for a matching MCP tool first -- only fall back to `gh` via Bash when no MCP tool exists for the operation.
This also applies to other MCP-connected services (Linear, Granola). MCP first, CLI second.

View file

@ -6,7 +6,11 @@ One task per session. Don't mix implementation with communication drafting, tran
## Duration
Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context. If the session has overflowed context once, strongly consider starting a new session.
Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context.
- After 1 compaction: consider wrapping up the current task and handing off.
- After 3 compactions: stop what you're doing, update `status.md`, and tell the user to start a fresh session.
- Never continue past 5 compactions. Context is too degraded to be productive.
## Context preservation
@ -14,6 +18,12 @@ Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than let
- When compacting, preserve: modified files list, current branch, VM state, test commands used, key decisions made
- Use subagents for exploration to keep main context clean
## Avoid polling
## No polling
Don't use `/loop` to poll agent status -- it burns context on repetitive status messages. If you need to monitor a long-running agent, check the output file directly.
Never poll background tasks. No `wc -l`, no `tail -f`, no `sleep` loops checking output files. Use `run_in_background` and wait for the completion notification. One check after notification is fine. Sixty-two checks in a loop is not.
## File read budget
If you've read the same file 3+ times in a session, something is wrong. Either:
- The session is too long and compaction destroyed your context -- write a handoff
- You're not retaining key information from previous reads -- write it down in your response before it compacts away

View file

@ -0,0 +1,41 @@
# Error Handling in Pipeline Code
## No silent swallowing
Never write `except Exception: pass` or `except Exception: return`. Every except block must either:
- Log at WARNING or higher with `exc_info=True`
- Re-raise after cleanup
- Return a clearly documented sentinel value with at least DEBUG logging
The tracing code (`benchmarking/_tracing.py`) is the worst offender -- 7 bare excepts with zero logging. Don't add more.
## Protect ast.parse() calls
`ast.parse()` raises `SyntaxError` on malformed input. Always wrap it:
```python
try:
tree = ast.parse(source)
except SyntaxError:
log.warning("Failed to parse %s", path)
return None # or skip this file
```
Known unprotected calls: `_state.py:70`, `_ranking.py:33`.
## XML/TOML parsing
Always use `recover=True` for lxml XMLParser. Always wrap tomlkit.parse() in try/except. Log parsing failures at WARNING, not DEBUG -- config parsing failures matter.
## Format consistency: client-server boundary
The AI service returns markdown-fenced code blocks. Every endpoint response must be parsed with `CodeStringsMarkdown.parse_markdown_code()` before using the code. Currently only `/optimize` and `/optimize-line-profiler` do this correctly. The refinement, repair, and adaptive endpoints in `ai/_refinement.py` skip this step.
Pattern to follow (from `pipeline/_candidate_gen.py:83-91`):
```python
parsed = CodeStringsMarkdown.parse_markdown_code(c.code)
if not parsed.code_strings:
continue
plain_code = "\n\n".join(cs.code for cs in parsed.code_strings)
```

View file

@ -0,0 +1,24 @@
# Test Coverage Requirements
## Every module needs unit tests
If you add or modify a module, it must have a corresponding test file. No exceptions for "infrastructure" or "worker" code -- those break the hardest in E2E.
Modules that currently have zero unit tests and have broken in production:
- `pipeline/_candidate_gen.py` -- candidate generation strategies
- `analysis/_discovery_worker.py` -- pytest test collection subprocess
Other untested modules that need coverage:
- `codegen/_create_pr.py`
- `ai/_tabulate.py`
- `benchmarking/_trace_db.py`
- `benchmarking/_benchmark_worker.py`
## Test the error paths
The happy path usually works. What breaks in E2E is:
- Malformed input (bad XML, raw source where markdown expected, missing config keys)
- Subprocess crashes (exit code != 0 with no output)
- Version mismatches (pytest 9 vs 8, different config section names)
Write tests for these cases, not just the golden path.