mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
Add rules from session audit: error handling, testing, debugging
- sessions.md: hard compaction limits, no-polling, file read budget - debugging.md: root cause first, isolated testing, subprocess logging - github.md: strengthen MCP-first enforcement - error-handling.md (packages): no silent swallowing, protect ast.parse - test-coverage.md (packages): every module needs tests, known gaps
This commit is contained in:
parent
cefd625d35
commit
c0a72e978d
5 changed files with 102 additions and 4 deletions
21
.claude/rules/debugging.md
Normal file
21
.claude/rules/debugging.md
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# Debugging and Testing
|
||||
|
||||
## Root cause first
|
||||
|
||||
When encountering a bug, always investigate the root cause. Don't patch symptoms. If you're about to add a try/except, a fallback default, or a defensive check -- ask whether the real fix is upstream.
|
||||
|
||||
The user will tell you "fix the root cause" if you get this wrong. Don't make them say it twice.
|
||||
|
||||
## Isolated testing
|
||||
|
||||
Prefer running individual test functions or modules over full E2E suites. Only run the full suite when explicitly asked or before pushing.
|
||||
|
||||
- Single function: `uv run pytest tests/test_foo.py::TestBar::test_baz -v`
|
||||
- Single module: `uv run pytest tests/test_foo.py -v`
|
||||
- Full suite: only when asked, or before `git push`
|
||||
|
||||
When debugging a specific endpoint or integration, test it directly (e.g., `requests.post()` to a local server) instead of running the entire pipeline end-to-end.
|
||||
|
||||
## Subprocess failures
|
||||
|
||||
When a subprocess fails, always log stdout and stderr. "Exit code 1" with no output is useless. If you see a subprocess failure without output, add logging before moving on.
|
||||
|
|
@ -1,3 +1,5 @@
|
|||
# GitHub Interactions
|
||||
|
||||
Prefer MCP GitHub tools (`mcp__github__*`) over the `gh` CLI for all GitHub operations. Only fall back to `gh` via Bash when no matching MCP tool exists.
|
||||
ALWAYS use MCP GitHub tools (`mcp__github__*`) for GitHub operations. Check for a matching MCP tool first -- only fall back to `gh` via Bash when no MCP tool exists for the operation.
|
||||
|
||||
This also applies to other MCP-connected services (Linear, Granola). MCP first, CLI second.
|
||||
|
|
|
|||
|
|
@ -6,7 +6,11 @@ One task per session. Don't mix implementation with communication drafting, tran
|
|||
|
||||
## Duration
|
||||
|
||||
Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context. If the session has overflowed context once, strongly consider starting a new session.
|
||||
Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than letting auto-compaction degrade context.
|
||||
|
||||
- After 1 compaction: consider wrapping up the current task and handing off.
|
||||
- After 3 compactions: stop what you're doing, update `status.md`, and tell the user to start a fresh session.
|
||||
- Never continue past 5 compactions. Context is too degraded to be productive.
|
||||
|
||||
## Context preservation
|
||||
|
||||
|
|
@ -14,6 +18,12 @@ Cap sessions at 2-3 hours. Use `/handoff` at natural breakpoints rather than let
|
|||
- When compacting, preserve: modified files list, current branch, VM state, test commands used, key decisions made
|
||||
- Use subagents for exploration to keep main context clean
|
||||
|
||||
## Avoid polling
|
||||
## No polling
|
||||
|
||||
Don't use `/loop` to poll agent status -- it burns context on repetitive status messages. If you need to monitor a long-running agent, check the output file directly.
|
||||
Never poll background tasks. No `wc -l`, no `tail -f`, no `sleep` loops checking output files. Use `run_in_background` and wait for the completion notification. One check after notification is fine. Sixty-two checks in a loop is not.
|
||||
|
||||
## File read budget
|
||||
|
||||
If you've read the same file 3+ times in a session, something is wrong. Either:
|
||||
- The session is too long and compaction destroyed your context -- write a handoff
|
||||
- You're not retaining key information from previous reads -- write it down in your response before it compacts away
|
||||
|
|
|
|||
41
packages/.claude/rules/error-handling.md
Normal file
41
packages/.claude/rules/error-handling.md
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
# Error Handling in Pipeline Code
|
||||
|
||||
## No silent swallowing
|
||||
|
||||
Never write `except Exception: pass` or `except Exception: return`. Every except block must either:
|
||||
- Log at WARNING or higher with `exc_info=True`
|
||||
- Re-raise after cleanup
|
||||
- Return a clearly documented sentinel value with at least DEBUG logging
|
||||
|
||||
The tracing code (`benchmarking/_tracing.py`) is the worst offender -- 7 bare excepts with zero logging. Don't add more.
|
||||
|
||||
## Protect ast.parse() calls
|
||||
|
||||
`ast.parse()` raises `SyntaxError` on malformed input. Always wrap it:
|
||||
|
||||
```python
|
||||
try:
|
||||
tree = ast.parse(source)
|
||||
except SyntaxError:
|
||||
log.warning("Failed to parse %s", path)
|
||||
return None # or skip this file
|
||||
```
|
||||
|
||||
Known unprotected calls: `_state.py:70`, `_ranking.py:33`.
|
||||
|
||||
## XML/TOML parsing
|
||||
|
||||
Always use `recover=True` for lxml XMLParser. Always wrap tomlkit.parse() in try/except. Log parsing failures at WARNING, not DEBUG -- config parsing failures matter.
|
||||
|
||||
## Format consistency: client-server boundary
|
||||
|
||||
The AI service returns markdown-fenced code blocks. Every endpoint response must be parsed with `CodeStringsMarkdown.parse_markdown_code()` before using the code. Currently only `/optimize` and `/optimize-line-profiler` do this correctly. The refinement, repair, and adaptive endpoints in `ai/_refinement.py` skip this step.
|
||||
|
||||
Pattern to follow (from `pipeline/_candidate_gen.py:83-91`):
|
||||
|
||||
```python
|
||||
parsed = CodeStringsMarkdown.parse_markdown_code(c.code)
|
||||
if not parsed.code_strings:
|
||||
continue
|
||||
plain_code = "\n\n".join(cs.code for cs in parsed.code_strings)
|
||||
```
|
||||
24
packages/.claude/rules/test-coverage.md
Normal file
24
packages/.claude/rules/test-coverage.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# Test Coverage Requirements
|
||||
|
||||
## Every module needs unit tests
|
||||
|
||||
If you add or modify a module, it must have a corresponding test file. No exceptions for "infrastructure" or "worker" code -- those break the hardest in E2E.
|
||||
|
||||
Modules that currently have zero unit tests and have broken in production:
|
||||
- `pipeline/_candidate_gen.py` -- candidate generation strategies
|
||||
- `analysis/_discovery_worker.py` -- pytest test collection subprocess
|
||||
|
||||
Other untested modules that need coverage:
|
||||
- `codegen/_create_pr.py`
|
||||
- `ai/_tabulate.py`
|
||||
- `benchmarking/_trace_db.py`
|
||||
- `benchmarking/_benchmark_worker.py`
|
||||
|
||||
## Test the error paths
|
||||
|
||||
The happy path usually works. What breaks in E2E is:
|
||||
- Malformed input (bad XML, raw source where markdown expected, missing config keys)
|
||||
- Subprocess crashes (exit code != 0 with no output)
|
||||
- Version mismatches (pytest 9 vs 8, different config section names)
|
||||
|
||||
Write tests for these cases, not just the golden path.
|
||||
Loading…
Reference in a new issue