mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
131 lines
5.7 KiB
Markdown
131 lines
5.7 KiB
Markdown
# Rich Performance Optimization
|
||
|
||
Upstream performance improvements to [Textualize/rich](https://github.com/Textualize/rich), motivated by pip startup time profiling.
|
||
|
||
## Background
|
||
|
||
pip vendors Rich for its progress bars, logging, and error display. Profiling `pip --version` revealed Rich as one of the heaviest imports in the startup chain — `from rich.console import Console` alone took ~79ms on CPython 3.12 (Standard_D2s_v5 VM).
|
||
|
||
Rather than patching pip's vendored copy, we contributed upstream so everyone benefits.
|
||
|
||
## Results
|
||
|
||
### Import Time (hyperfine, 30+ runs, Standard_D2s_v5)
|
||
|
||
#### CPython 3.12
|
||
|
||
| Import | master | optimized | Speedup |
|
||
|---|---|---|---|
|
||
| `Console` | 79.1 ± 0.8ms | 37.5 ± 0.5ms | **2.11x** |
|
||
| `RichHandler` | 100.3 ± 3.6ms | 39.6 ± 0.5ms | **2.53x** |
|
||
|
||
#### CPython 3.13
|
||
|
||
| Import | master | optimized | Speedup |
|
||
|---|---|---|---|
|
||
| `Console` | 67.9 ± 0.7ms | 33.6 ± 0.5ms | **2.02x** |
|
||
| `RichHandler` | — | 37.5 ± 0.4ms | — |
|
||
|
||
> On Python 3.13+, `typing` no longer imports `re`, so deferring all `re.compile()` calls eliminates `re` (+ `_sre`, `re._compiler`, `re._parser`, `re._constants`) from the Console import chain entirely.
|
||
|
||
### Runtime Micro-benchmarks (Python 3.13.13)
|
||
|
||
| Benchmark | Before | After | Speedup |
|
||
|---|---|---|---|
|
||
| Style.\_\_eq\_\_ (identity) | 114ns/call | 62ns/call | **1.84x** |
|
||
| Style.combine (3 styles) | 579ns/call | 433ns/call | **1.34x** |
|
||
| Segment.simplify (identity) | 1269ns/call | 931ns/call | **1.36x** |
|
||
| Style.chain (3 styles) | 959ns/call | 878ns/call | **1.09x** |
|
||
| E2E Console.print | 173.7us/call | 171.6us/call | ~1.01x |
|
||
|
||
## What We Changed
|
||
|
||
### PR #12 — Architectural wins ([KRRT7/rich#12](https://github.com/KRRT7/rich/pull/12))
|
||
|
||
- **Replace `@dataclass` with `__slots__` classes** — `ConsoleOptions` and `ConsoleThreadLocals` used `@dataclass`, which imports `inspect` at module level (~10ms). Replaced with plain classes + `__slots__`. ConsoleOptions memory: 344 → 136 bytes (60% reduction).
|
||
- **Lazy-load emoji dictionary** — `_emoji_codes.EMOJI` (3,608 entries) loaded unconditionally via `text.py → emoji.py`. Deferred to first use via module-level `__getattr__`.
|
||
- **Defer imports across 12+ modules** — `inspect`, `pretty`, `scope`, `getpass`, `configparser`, `html.escape`, `zlib`, `traceback`, `pathlib` → deferred to the methods that actually use them.
|
||
- **`from __future__ import annotations`** — Enabled in key modules to allow moving type-only imports to `TYPE_CHECKING`.
|
||
|
||
### PR #13 — Import deferral + runtime micro-opts ([KRRT7/rich#13](https://github.com/KRRT7/rich/pull/13))
|
||
|
||
**Import deferral (7 files):**
|
||
- `color.py`: `RE_COLOR` compiled lazily in `Color.parse()` (LRU-cached)
|
||
- `text.py`: `_re_whitespace` lazy; inline `import re` in 6 methods
|
||
- `markup.py`: `RE_TAGS` via `_compile_tags()`, `RE_HANDLER` and escape regex lazy
|
||
- `_emoji_replace.py`: regex default arg → lazy `_EMOJI_SUB` global
|
||
- `_wrap.py`: `re_word` → lazy `_re_word`
|
||
- `highlighter.py`: `import re` inside `JSONHighlighter.highlight()`
|
||
- `default_styles.py`: 3 `rgb(...)` strings → `Color.from_rgb()` to avoid `Color.parse()` regex at import
|
||
|
||
**Runtime micro-optimizations:**
|
||
- `Style.__eq__`/`__ne__`: identity shortcut (`is`) before hash comparison
|
||
- `Style.combine`/`chain`: use `_add` (LRU-cached) directly instead of `sum()` → `__add__` → `.copy()` check
|
||
- `Segment.simplify`: `is` before `==` for style comparison
|
||
|
||
### Upstream PR
|
||
|
||
- [Textualize/rich#4070](https://github.com/Textualize/rich/pull/4070) — Initial import deferral PR (subset of the above)
|
||
|
||
## Methodology
|
||
|
||
### Environment
|
||
|
||
- **VM**: Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable)
|
||
- **OS**: Ubuntu 24.04 LTS
|
||
- **Region**: westus2
|
||
- **Python**: 3.12 and 3.13 via uv
|
||
- **Tooling**: hyperfine (warmup 5, min-runs 30), timeit (best of 7)
|
||
|
||
Non-burstable VM chosen for consistent CPU performance — no thermal throttling or turbo variability.
|
||
|
||
### Benchmark harness
|
||
|
||
All scripts in [`bench/`](bench/):
|
||
|
||
| Script | Purpose |
|
||
|---|---|
|
||
| `bench_import.sh` | Overall `import rich` time via hyperfine |
|
||
| `bench_module.sh` | Per-module import time (Console, RichHandler, Traceback, etc.) |
|
||
| `bench_e2e.sh` | A/B comparison: master vs optimized branch |
|
||
| `bench_compare.sh` | Generic branch comparison wrapper |
|
||
| `bench_importtime.py` | `python -X importtime` parser → sorted TSV breakdown |
|
||
| `bench_runtime.py` | PR #12 runtime benchmarks (ConsoleOptions, emoji_replace) |
|
||
| `bench_runtime2.py` | PR #13 runtime benchmarks (Style.__eq__, combine, Segment.simplify) |
|
||
| `bench_text.py` | Text hot-path benchmarks (construction, copy, divide, render) |
|
||
| `test_all_impls.sh` | Run tests across CPython 3.9–3.14 + PyPy 3.10 |
|
||
|
||
### Raw data
|
||
|
||
Hyperfine JSON exports in [`data/`](data/).
|
||
|
||
## Maintainer Engagement
|
||
|
||
Reached out to Will McGugan (Textualize CEO) via Discord. Conversation in [`discord-transcript.md`](discord-transcript.md).
|
||
|
||
Key quotes:
|
||
- "Seems like a clear win. Feel free to open a PR."
|
||
- "I'd say single PR."
|
||
|
||
## Repo Structure
|
||
|
||
```
|
||
.
|
||
├── README.md # This file
|
||
├── cloud-init.yaml # VM provisioning (one-shot reproducible setup)
|
||
├── discord-transcript.md # Will McGugan conversation
|
||
├── bench/ # Benchmark scripts (from VM)
|
||
│ ├── bench_import.sh
|
||
│ ├── bench_module.sh
|
||
│ ├── bench_e2e.sh
|
||
│ ├── bench_compare.sh
|
||
│ ├── bench_importtime.py
|
||
│ ├── bench_runtime.py
|
||
│ ├── bench_runtime2.py
|
||
│ ├── bench_text.py
|
||
│ └── test_all_impls.sh
|
||
├── data/ # Raw benchmark data (hyperfine JSON)
|
||
│ ├── e2e-3.12/
|
||
│ └── runtime/
|
||
└── vm-setup.md # Azure VM provisioning instructions
|
||
```
|