# Rich Performance Optimization Upstream performance improvements to [Textualize/rich](https://github.com/Textualize/rich), motivated by pip startup time profiling. ## Background pip vendors Rich for its progress bars, logging, and error display. Profiling `pip --version` revealed Rich as one of the heaviest imports in the startup chain — `from rich.console import Console` alone took ~79ms on CPython 3.12 (Standard_D2s_v5 VM). Rather than patching pip's vendored copy, we contributed upstream so everyone benefits. ## Results ### Import Time (hyperfine, 30+ runs, Standard_D2s_v5) #### CPython 3.12 | Import | master | optimized | Speedup | |---|---|---|---| | `Console` | 79.1 ± 0.8ms | 37.5 ± 0.5ms | **2.11x** | | `RichHandler` | 100.3 ± 3.6ms | 39.6 ± 0.5ms | **2.53x** | #### CPython 3.13 | Import | master | optimized | Speedup | |---|---|---|---| | `Console` | 67.9 ± 0.7ms | 33.6 ± 0.5ms | **2.02x** | | `RichHandler` | — | 37.5 ± 0.4ms | — | > On Python 3.13+, `typing` no longer imports `re`, so deferring all `re.compile()` calls eliminates `re` (+ `_sre`, `re._compiler`, `re._parser`, `re._constants`) from the Console import chain entirely. ### Runtime Micro-benchmarks (Python 3.13.13) | Benchmark | Before | After | Speedup | |---|---|---|---| | Style.\_\_eq\_\_ (identity) | 114ns/call | 62ns/call | **1.84x** | | Style.combine (3 styles) | 579ns/call | 433ns/call | **1.34x** | | Segment.simplify (identity) | 1269ns/call | 931ns/call | **1.36x** | | Style.chain (3 styles) | 959ns/call | 878ns/call | **1.09x** | | E2E Console.print | 173.7us/call | 171.6us/call | ~1.01x | ## What We Changed ### PR #12 — Architectural wins ([KRRT7/rich#12](https://github.com/KRRT7/rich/pull/12)) - **Replace `@dataclass` with `__slots__` classes** — `ConsoleOptions` and `ConsoleThreadLocals` used `@dataclass`, which imports `inspect` at module level (~10ms). Replaced with plain classes + `__slots__`. ConsoleOptions memory: 344 → 136 bytes (60% reduction). - **Lazy-load emoji dictionary** — `_emoji_codes.EMOJI` (3,608 entries) loaded unconditionally via `text.py → emoji.py`. Deferred to first use via module-level `__getattr__`. - **Defer imports across 12+ modules** — `inspect`, `pretty`, `scope`, `getpass`, `configparser`, `html.escape`, `zlib`, `traceback`, `pathlib` → deferred to the methods that actually use them. - **`from __future__ import annotations`** — Enabled in key modules to allow moving type-only imports to `TYPE_CHECKING`. ### PR #13 — Import deferral + runtime micro-opts ([KRRT7/rich#13](https://github.com/KRRT7/rich/pull/13)) **Import deferral (7 files):** - `color.py`: `RE_COLOR` compiled lazily in `Color.parse()` (LRU-cached) - `text.py`: `_re_whitespace` lazy; inline `import re` in 6 methods - `markup.py`: `RE_TAGS` via `_compile_tags()`, `RE_HANDLER` and escape regex lazy - `_emoji_replace.py`: regex default arg → lazy `_EMOJI_SUB` global - `_wrap.py`: `re_word` → lazy `_re_word` - `highlighter.py`: `import re` inside `JSONHighlighter.highlight()` - `default_styles.py`: 3 `rgb(...)` strings → `Color.from_rgb()` to avoid `Color.parse()` regex at import **Runtime micro-optimizations:** - `Style.__eq__`/`__ne__`: identity shortcut (`is`) before hash comparison - `Style.combine`/`chain`: use `_add` (LRU-cached) directly instead of `sum()` → `__add__` → `.copy()` check - `Segment.simplify`: `is` before `==` for style comparison ### Upstream PR - [Textualize/rich#4070](https://github.com/Textualize/rich/pull/4070) — Initial import deferral PR (subset of the above) ## Methodology ### Environment - **VM**: Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable) - **OS**: Ubuntu 24.04 LTS - **Region**: westus2 - **Python**: 3.12 and 3.13 via uv - **Tooling**: hyperfine (warmup 5, min-runs 30), timeit (best of 7) Non-burstable VM chosen for consistent CPU performance — no thermal throttling or turbo variability. ### Benchmark harness All scripts in [`bench/`](bench/): | Script | Purpose | |---|---| | `bench_import.sh` | Overall `import rich` time via hyperfine | | `bench_module.sh` | Per-module import time (Console, RichHandler, Traceback, etc.) | | `bench_e2e.sh` | A/B comparison: master vs optimized branch | | `bench_compare.sh` | Generic branch comparison wrapper | | `bench_importtime.py` | `python -X importtime` parser → sorted TSV breakdown | | `bench_runtime.py` | PR #12 runtime benchmarks (ConsoleOptions, emoji_replace) | | `bench_runtime2.py` | PR #13 runtime benchmarks (Style.__eq__, combine, Segment.simplify) | | `bench_text.py` | Text hot-path benchmarks (construction, copy, divide, render) | | `test_all_impls.sh` | Run tests across CPython 3.9–3.14 + PyPy 3.10 | ### Raw data Hyperfine JSON exports in [`data/`](data/). ## Maintainer Engagement Reached out to Will McGugan (Textualize CEO) via Discord. Conversation in [`discord-transcript.md`](discord-transcript.md). Key quotes: - "Seems like a clear win. Feel free to open a PR." - "I'd say single PR." ## Repo Structure ``` . ├── README.md # This file ├── cloud-init.yaml # VM provisioning (one-shot reproducible setup) ├── discord-transcript.md # Will McGugan conversation ├── bench/ # Benchmark scripts (from VM) │ ├── bench_import.sh │ ├── bench_module.sh │ ├── bench_e2e.sh │ ├── bench_compare.sh │ ├── bench_importtime.py │ ├── bench_runtime.py │ ├── bench_runtime2.py │ ├── bench_text.py │ └── test_all_impls.sh ├── data/ # Raw benchmark data (hyperfine JSON) │ ├── e2e-3.12/ │ └── runtime/ └── vm-setup.md # Azure VM provisioning instructions ```