Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references
5.7 KiB
Rich Performance Optimization
Upstream performance improvements to Textualize/rich, motivated by pip startup time profiling.
Background
pip vendors Rich for its progress bars, logging, and error display. Profiling pip --version revealed Rich as one of the heaviest imports in the startup chain — from rich.console import Console alone took ~79ms on CPython 3.12 (Standard_D2s_v5 VM).
Rather than patching pip's vendored copy, we contributed upstream so everyone benefits.
Results
Import Time (hyperfine, 30+ runs, Standard_D2s_v5)
CPython 3.12
| Import | master | optimized | Speedup |
|---|---|---|---|
Console |
79.1 ± 0.8ms | 37.5 ± 0.5ms | 2.11x |
RichHandler |
100.3 ± 3.6ms | 39.6 ± 0.5ms | 2.53x |
CPython 3.13
| Import | master | optimized | Speedup |
|---|---|---|---|
Console |
67.9 ± 0.7ms | 33.6 ± 0.5ms | 2.02x |
RichHandler |
— | 37.5 ± 0.4ms | — |
On Python 3.13+,
typingno longer importsre, so deferring allre.compile()calls eliminatesre(+_sre,re._compiler,re._parser,re._constants) from the Console import chain entirely.
Runtime Micro-benchmarks (Python 3.13.13)
| Benchmark | Before | After | Speedup |
|---|---|---|---|
| Style.__eq__ (identity) | 114ns/call | 62ns/call | 1.84x |
| Style.combine (3 styles) | 579ns/call | 433ns/call | 1.34x |
| Segment.simplify (identity) | 1269ns/call | 931ns/call | 1.36x |
| Style.chain (3 styles) | 959ns/call | 878ns/call | 1.09x |
| E2E Console.print | 173.7us/call | 171.6us/call | ~1.01x |
What We Changed
PR #12 — Architectural wins (KRRT7/rich#12)
- Replace
@dataclasswith__slots__classes —ConsoleOptionsandConsoleThreadLocalsused@dataclass, which importsinspectat module level (~10ms). Replaced with plain classes +__slots__. ConsoleOptions memory: 344 → 136 bytes (60% reduction). - Lazy-load emoji dictionary —
_emoji_codes.EMOJI(3,608 entries) loaded unconditionally viatext.py → emoji.py. Deferred to first use via module-level__getattr__. - Defer imports across 12+ modules —
inspect,pretty,scope,getpass,configparser,html.escape,zlib,traceback,pathlib→ deferred to the methods that actually use them. from __future__ import annotations— Enabled in key modules to allow moving type-only imports toTYPE_CHECKING.
PR #13 — Import deferral + runtime micro-opts (KRRT7/rich#13)
Import deferral (7 files):
color.py:RE_COLORcompiled lazily inColor.parse()(LRU-cached)text.py:_re_whitespacelazy; inlineimport rein 6 methodsmarkup.py:RE_TAGSvia_compile_tags(),RE_HANDLERand escape regex lazy_emoji_replace.py: regex default arg → lazy_EMOJI_SUBglobal_wrap.py:re_word→ lazy_re_wordhighlighter.py:import reinsideJSONHighlighter.highlight()default_styles.py: 3rgb(...)strings →Color.from_rgb()to avoidColor.parse()regex at import
Runtime micro-optimizations:
Style.__eq__/__ne__: identity shortcut (is) before hash comparisonStyle.combine/chain: use_add(LRU-cached) directly instead ofsum()→__add__→.copy()checkSegment.simplify:isbefore==for style comparison
Upstream PR
- Textualize/rich#4070 — Initial import deferral PR (subset of the above)
Methodology
Environment
- VM: Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable)
- OS: Ubuntu 24.04 LTS
- Region: westus2
- Python: 3.12 and 3.13 via uv
- Tooling: hyperfine (warmup 5, min-runs 30), timeit (best of 7)
Non-burstable VM chosen for consistent CPU performance — no thermal throttling or turbo variability.
Benchmark harness
All scripts in bench/:
| Script | Purpose |
|---|---|
bench_import.sh |
Overall import rich time via hyperfine |
bench_module.sh |
Per-module import time (Console, RichHandler, Traceback, etc.) |
bench_e2e.sh |
A/B comparison: master vs optimized branch |
bench_compare.sh |
Generic branch comparison wrapper |
bench_importtime.py |
python -X importtime parser → sorted TSV breakdown |
bench_runtime.py |
PR #12 runtime benchmarks (ConsoleOptions, emoji_replace) |
bench_runtime2.py |
PR #13 runtime benchmarks (Style.eq, combine, Segment.simplify) |
bench_text.py |
Text hot-path benchmarks (construction, copy, divide, render) |
test_all_impls.sh |
Run tests across CPython 3.9–3.14 + PyPy 3.10 |
Raw data
Hyperfine JSON exports in data/.
Maintainer Engagement
Reached out to Will McGugan (Textualize CEO) via Discord. Conversation in discord-transcript.md.
Key quotes:
- "Seems like a clear win. Feel free to open a PR."
- "I'd say single PR."
Repo Structure
.
├── README.md # This file
├── cloud-init.yaml # VM provisioning (one-shot reproducible setup)
├── discord-transcript.md # Will McGugan conversation
├── bench/ # Benchmark scripts (from VM)
│ ├── bench_import.sh
│ ├── bench_module.sh
│ ├── bench_e2e.sh
│ ├── bench_compare.sh
│ ├── bench_importtime.py
│ ├── bench_runtime.py
│ ├── bench_runtime2.py
│ ├── bench_text.py
│ └── test_all_impls.sh
├── data/ # Raw benchmark data (hyperfine JSON)
│ ├── e2e-3.12/
│ └── runtime/
└── vm-setup.md # Azure VM provisioning instructions