mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
66 lines
2.7 KiB
Markdown
66 lines
2.7 KiB
Markdown
|
|
# Rich Optimization — Lessons Learned
|
||
|
|
|
||
|
|
Full case study: [rich_org](https://github.com/KRRT7/rich_org)
|
||
|
|
|
||
|
|
## Context
|
||
|
|
|
||
|
|
pip vendors Rich for progress bars, logging, and error display. `from rich.console import Console` took 79ms on CPython 3.12 — a significant chunk of pip's startup.
|
||
|
|
|
||
|
|
## What we did
|
||
|
|
|
||
|
|
### Import deferral (PR #12 + #13)
|
||
|
|
|
||
|
|
Deferred 15+ imports across Rich's codebase. The pattern:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Before (module level)
|
||
|
|
import re
|
||
|
|
RE_COLOR = re.compile(r"...")
|
||
|
|
|
||
|
|
# After (lazy)
|
||
|
|
_RE_COLOR = None
|
||
|
|
|
||
|
|
def parse(color):
|
||
|
|
global _RE_COLOR
|
||
|
|
if _RE_COLOR is None:
|
||
|
|
import re
|
||
|
|
_RE_COLOR = re.compile(r"...")
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key insight**: Most regex patterns in Rich are behind LRU-cached methods, so the lazy compile cost is paid once and amortized.
|
||
|
|
|
||
|
|
### Architectural changes (PR #12)
|
||
|
|
|
||
|
|
1. **`@dataclass` → `__slots__`**: `ConsoleOptions` and `ConsoleThreadLocals` used `@dataclass`, pulling in `inspect` (~10ms). Replaced with plain classes + `__slots__`. Memory: 344 → 136 bytes per instance.
|
||
|
|
|
||
|
|
2. **Lazy emoji dict**: `_emoji_codes.EMOJI` (3,608 entries) loaded unconditionally. Deferred to first use via module-level `__getattr__`.
|
||
|
|
|
||
|
|
### Runtime micro-optimizations (PR #13)
|
||
|
|
|
||
|
|
1. `Style.__eq__` identity shortcut: `is` before hash comparison (1.84x for identity case)
|
||
|
|
2. `Style.combine/chain`: direct `_add` (LRU-cached) instead of `sum()` → `__add__` (1.34x)
|
||
|
|
3. `Segment.simplify`: `is` before `==` for style comparison (1.36x)
|
||
|
|
|
||
|
|
## Results
|
||
|
|
|
||
|
|
| Import | Before | After | Speedup |
|
||
|
|
|---|---|---|---|
|
||
|
|
| Console (3.12) | 79.1ms | 37.5ms | **2.11x** |
|
||
|
|
| Console (3.13) | 67.9ms | 33.6ms | **2.02x** |
|
||
|
|
| RichHandler (3.12) | 100.3ms | 39.6ms | **2.53x** |
|
||
|
|
|
||
|
|
## Key takeaways
|
||
|
|
|
||
|
|
1. **Python version matters**: `typing` imports `re` on 3.12 but not 3.13 — this made our `re` deferral a no-op on 3.12
|
||
|
|
2. **`from __future__ import annotations`** is the unlock for `TYPE_CHECKING` moves — without it, annotation-only names that share import lines with runtime names can't be separated
|
||
|
|
3. **Benchmark on controlled hardware**: Laptop results were noisy; Azure non-burstable VM gave consistent ±0.5ms stddev
|
||
|
|
4. **Maintainer engagement matters**: Direct Discord DM to Will McGugan got "Seems like a clear win. Feel free to open a PR" within 30 minutes
|
||
|
|
5. **Stack PRs, not scatter**: Started with 11 individual PRs, consolidated to 2 stacked PRs — much cleaner to review
|
||
|
|
|
||
|
|
## Applicable to codeflash
|
||
|
|
|
||
|
|
- Any Rich imports in codeflash's output/display layer are candidates for the same deferral
|
||
|
|
- If codeflash vendors or depends on Rich, the upstream improvements benefit automatically
|
||
|
|
- The `@dataclass` → `__slots__` pattern applies to any hot dataclass in codeflash
|
||
|
|
- Identity shortcut pattern (`is` before `==`) applies to any cached/interned objects
|