diff --git a/.claude/rules/debugging.md b/.claude/rules/debugging.md new file mode 100644 index 0000000..d786dfc --- /dev/null +++ b/.claude/rules/debugging.md @@ -0,0 +1,21 @@ +# Debugging and Testing + +## Root cause first + +When encountering a bug, always investigate the root cause. Don't patch symptoms. If you're about to add a try/except, a fallback default, or a defensive check -- ask whether the real fix is upstream. + +The user will tell you "fix the root cause" if you get this wrong. Don't make them say it twice. + +## Isolated testing + +Prefer running individual test functions or modules over full E2E suites. Only run the full suite when explicitly asked or before pushing. + +- Single function: `uv run pytest tests/test_foo.py::TestBar::test_baz -v` +- Single module: `uv run pytest tests/test_foo.py -v` +- Full suite: only when asked, or before `git push` + +When debugging a specific endpoint or integration, test it directly (e.g., `requests.post()` to a local server) instead of running the entire pipeline end-to-end. + +## Subprocess failures + +When a subprocess fails, always log stdout and stderr. "Exit code 1" with no output is useless. If you see a subprocess failure without output, add logging before moving on. diff --git a/.claude/rules/github.md b/.claude/rules/github.md index be9d7c9..c0f0d7e 100644 --- a/.claude/rules/github.md +++ b/.claude/rules/github.md @@ -1,3 +1,5 @@ # GitHub Interactions -Prefer MCP GitHub tools (`mcp__github__*`) over the `gh` CLI for all GitHub operations. Only fall back to `gh` via Bash when no matching MCP tool exists. +ALWAYS use MCP GitHub tools (`mcp__github__*`) for GitHub operations. Check for a matching MCP tool first -- only fall back to `gh` via Bash when no MCP tool exists for the operation. + +This also applies to other MCP-connected services (Linear, Granola). MCP first, CLI second. diff --git a/.claude/rules/sessions.md b/.claude/rules/sessions.md index a9a5033..9384ea3 100644 --- a/.claude/rules/sessions.md +++ b/.claude/rules/sessions.md @@ -6,7 +6,11 @@ One task per session. Don't mix implementation with communication drafting, tran ## Duration -Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context. If the session has overflowed context once, strongly consider starting a new session. +Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context. + +- After 1 compaction: consider wrapping up the current task and handing off. +- After 3 compactions: stop what you're doing, update `status.md`, and tell the user to start a fresh session. +- Never continue past 5 compactions. Context is too degraded to be productive. ## Context preservation @@ -14,6 +18,12 @@ Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than let - When compacting, preserve: modified files list, current branch, VM state, test commands used, key decisions made - Use subagents for exploration to keep main context clean -## Avoid polling +## No polling -Don't use `/loop` to poll agent status -- it burns context on repetitive status messages. If you need to monitor a long-running agent, check the output file directly. +Never poll background tasks. No `wc -l`, no `tail -f`, no `sleep` loops checking output files. Use `run_in_background` and wait for the completion notification. One check after notification is fine. Sixty-two checks in a loop is not. + +## File read budget + +If you've read the same file 3+ times in a session, something is wrong. Either: +- The session is too long and compaction destroyed your context -- write a handoff +- You're not retaining key information from previous reads -- write it down in your response before it compacts away diff --git a/packages/.claude/rules/error-handling.md b/packages/.claude/rules/error-handling.md new file mode 100644 index 0000000..d55f119 --- /dev/null +++ b/packages/.claude/rules/error-handling.md @@ -0,0 +1,41 @@ +# Error Handling in Pipeline Code + +## No silent swallowing + +Never write `except Exception: pass` or `except Exception: return`. Every except block must either: +- Log at WARNING or higher with `exc_info=True` +- Re-raise after cleanup +- Return a clearly documented sentinel value with at least DEBUG logging + +The tracing code (`benchmarking/_tracing.py`) is the worst offender -- 7 bare excepts with zero logging. Don't add more. + +## Protect ast.parse() calls + +`ast.parse()` raises `SyntaxError` on malformed input. Always wrap it: + +```python +try: + tree = ast.parse(source) +except SyntaxError: + log.warning("Failed to parse %s", path) + return None # or skip this file +``` + +Known unprotected calls: `_state.py:70`, `_ranking.py:33`. + +## XML/TOML parsing + +Always use `recover=True` for lxml XMLParser. Always wrap tomlkit.parse() in try/except. Log parsing failures at WARNING, not DEBUG -- config parsing failures matter. + +## Format consistency: client-server boundary + +The AI service returns markdown-fenced code blocks. Every endpoint response must be parsed with `CodeStringsMarkdown.parse_markdown_code()` before using the code. Currently only `/optimize` and `/optimize-line-profiler` do this correctly. The refinement, repair, and adaptive endpoints in `ai/_refinement.py` skip this step. + +Pattern to follow (from `pipeline/_candidate_gen.py:83-91`): + +```python +parsed = CodeStringsMarkdown.parse_markdown_code(c.code) +if not parsed.code_strings: + continue +plain_code = "\n\n".join(cs.code for cs in parsed.code_strings) +``` diff --git a/packages/.claude/rules/test-coverage.md b/packages/.claude/rules/test-coverage.md new file mode 100644 index 0000000..7f15087 --- /dev/null +++ b/packages/.claude/rules/test-coverage.md @@ -0,0 +1,24 @@ +# Test Coverage Requirements + +## Every module needs unit tests + +If you add or modify a module, it must have a corresponding test file. No exceptions for "infrastructure" or "worker" code -- those break the hardest in E2E. + +Modules that currently have zero unit tests and have broken in production: +- `pipeline/_candidate_gen.py` -- candidate generation strategies +- `analysis/_discovery_worker.py` -- pytest test collection subprocess + +Other untested modules that need coverage: +- `codegen/_create_pr.py` +- `ai/_tabulate.py` +- `benchmarking/_trace_db.py` +- `benchmarking/_benchmark_worker.py` + +## Test the error paths + +The happy path usually works. What breaks in E2E is: +- Malformed input (bad XML, raw source where markdown expected, missing config keys) +- Subprocess crashes (exit code != 0 with no output) +- Version mismatches (pytest 9 vs 8, different config section names) + +Write tests for these cases, not just the golden path.