codeflash-agent/plugin/references/shared/learnings-template.md

43 lines
2.2 KiB
Markdown
Raw Permalink Normal View History

fix: address session-analysis findings from 89 unstructured_org sessions Analyzed ~89 Claude Code sessions across 7 unstructured_org projects to identify recurring failures and friction points, then applied fixes: - Fix "ask then die" bug: skill now injects AUTONOMOUS MODE directive so domain agents work without interactive questions that kill the Agent tool - Fix git add -A: all 4 domain agents now stage specific files instead of blindly staging everything (caused accidental commits of scratch files) - Add pre-commit step: agents run pre-commit before every commit to catch linting failures before CI (ruff/undersort failures were recurring) - Add measurement methodology lock: prevents changing profiling flags mid-experiment which created uninterpretable deltas - Add branch state verification to router startup (prevents wrong-branch confusion that wasted multiple sessions) - Add multi-repo detection to router (original work spanned 4 repos) - Add library vs application awareness to memory agent (prevents wasting time on import-time optimizations in library projects) - Add dependency resilience to setup agent (uv run --with isolation warning, private PyPI failure guidance) - Add PR text quality guidelines (sessions showed AI-sounding text that required multiple user corrections) - Add chart generation guidelines to pr-preparation.md - Add context conservation rules (max 2 background tasks, use subagents) - Add cross-session learnings template for .codeflash/learnings.md - All domain agents now read learnings.md at startup
2026-03-27 15:08:50 +00:00
# Cross-Session Learnings
Non-obvious technical discoveries about this codebase. Read at session start to avoid repeating dead ends.
## How to use this file
- **Domain agents**: Add entries after discovering something non-obvious (keep or discard).
- **Router agent**: Read this file at every session start and include it in the domain agent's launch prompt.
- **Entries should be**: specific, technical, evidence-based. Not opinions or preferences.
- **Remove entries** when they become outdated (e.g., a library version changes and the workaround no longer applies).
## Template
```markdown
## <Short descriptive title>
**Domain:** memory | cpu | async | structure
**Discovered:** <date>
<1-3 sentences with the specific technical finding. Include evidence: profiler output, version numbers, error messages.>
**Implication:** <What this means for future optimization attempts. What to do or avoid.>
```
## Example entries
```markdown
## pytest-memray measures per-test peak only
**Domain:** memory
**Discovered:** 2026-03-17
pytest-memray's `@pytest.mark.limit_memory` and `--memray` flag measure memory allocated during the test function body only. Import-time allocations (module globals, C extension init) are NOT counted. Verified: 40 MiB english_words list invisible in pytest-memray but visible in `memray run`.
**Implication:** Import-time memory optimizations will show zero improvement in pytest-memray benchmarks. Use `memray run` on the full process to capture import-time.
## Paddle inference engine allocates in 500 MiB arena chunks
**Domain:** memory
**Discovered:** 2026-03-19
PaddleOCR's C++ inference engine allocates memory in 500 MiB arena chunks via `auto_growth` strategy. These are native memory pools, not proportional to data size. `config.memory_pool_init_size_mb()` is read-only (100 MiB default, but pool grows to 500 MiB). `enable_ort_optimization()` requires Paddle compiled with ONNX Runtime support. `rec_batch_num` controls the number of arena chunks allocated during recognition (6 -> 4 chunks, 1 -> 1 chunk).
**Implication:** Cannot cap Paddle arena size directly. Only lever is `rec_batch_num` to reduce number of chunks. Don't waste time on arena configuration APIs.
```