mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
2.7 KiB
2.7 KiB
Rich Optimization — Lessons Learned
Full case study: rich_org
Context
pip vendors Rich for progress bars, logging, and error display. from rich.console import Console took 79ms on CPython 3.12 — a significant chunk of pip's startup.
What we did
Import deferral (PR #12 + #13)
Deferred 15+ imports across Rich's codebase. The pattern:
# Before (module level)
import re
RE_COLOR = re.compile(r"...")
# After (lazy)
_RE_COLOR = None
def parse(color):
global _RE_COLOR
if _RE_COLOR is None:
import re
_RE_COLOR = re.compile(r"...")
Key insight: Most regex patterns in Rich are behind LRU-cached methods, so the lazy compile cost is paid once and amortized.
Architectural changes (PR #12)
-
@dataclass→__slots__:ConsoleOptionsandConsoleThreadLocalsused@dataclass, pulling ininspect(~10ms). Replaced with plain classes +__slots__. Memory: 344 → 136 bytes per instance. -
Lazy emoji dict:
_emoji_codes.EMOJI(3,608 entries) loaded unconditionally. Deferred to first use via module-level__getattr__.
Runtime micro-optimizations (PR #13)
Style.__eq__identity shortcut:isbefore hash comparison (1.84x for identity case)Style.combine/chain: direct_add(LRU-cached) instead ofsum()→__add__(1.34x)Segment.simplify:isbefore==for style comparison (1.36x)
Results
| Import | Before | After | Speedup |
|---|---|---|---|
| Console (3.12) | 79.1ms | 37.5ms | 2.11x |
| Console (3.13) | 67.9ms | 33.6ms | 2.02x |
| RichHandler (3.12) | 100.3ms | 39.6ms | 2.53x |
Key takeaways
- Python version matters:
typingimportsreon 3.12 but not 3.13 — this made ourredeferral a no-op on 3.12 from __future__ import annotationsis the unlock forTYPE_CHECKINGmoves — without it, annotation-only names that share import lines with runtime names can't be separated- Benchmark on controlled hardware: Laptop results were noisy; Azure non-burstable VM gave consistent ±0.5ms stddev
- Maintainer engagement matters: Direct Discord DM to Will McGugan got "Seems like a clear win. Feel free to open a PR" within 30 minutes
- Stack PRs, not scatter: Started with 11 individual PRs, consolidated to 2 stacked PRs — much cleaner to review
Applicable to codeflash
- Any Rich imports in codeflash's output/display layer are candidates for the same deferral
- If codeflash vendors or depends on Rich, the upstream improvements benefit automatically
- The
@dataclass→__slots__pattern applies to any hot dataclass in codeflash - Identity shortcut pattern (
isbefore==) applies to any cached/interned objects