codeflash-agent/.codeflash/textualize/rich/README.md

# Rich Performance Optimization

Upstream performance improvements to [Textualize/rich](https://github.com/Textualize/rich), motivated by pip startup time profiling.

## Background

pip vendors Rich for its progress bars, logging, and error display. Profiling `pip --version` revealed Rich as one of the heaviest imports in the startup chain — `from rich.console import Console` alone took ~79ms on CPython 3.12 (Standard_D2s_v5 VM).

Rather than patching pip's vendored copy, we contributed upstream so everyone benefits.

## Results

### Import Time (hyperfine, 30+ runs, Standard_D2s_v5)

#### CPython 3.12

| Import | master | optimized | Speedup |
|---|---|---|---|
| `Console` | 79.1 ± 0.8ms | 37.5 ± 0.5ms | **2.11x** |
| `RichHandler` | 100.3 ± 3.6ms | 39.6 ± 0.5ms | **2.53x** |

#### CPython 3.13

| Import | master | optimized | Speedup |
|---|---|---|---|
| `Console` | 67.9 ± 0.7ms | 33.6 ± 0.5ms | **2.02x** |
| `RichHandler` | — | 37.5 ± 0.4ms | — |

> On Python 3.13+, `typing` no longer imports `re`, so deferring all `re.compile()` calls eliminates `re` (+ `_sre`, `re._compiler`, `re._parser`, `re._constants`) from the Console import chain entirely.

### Runtime Micro-benchmarks (Python 3.13.13)

| Benchmark | Before | After | Speedup |
|---|---|---|---|
| Style.\_\_eq\_\_ (identity) | 114ns/call | 62ns/call | **1.84x** |
| Style.combine (3 styles) | 579ns/call | 433ns/call | **1.34x** |
| Segment.simplify (identity) | 1269ns/call | 931ns/call | **1.36x** |
| Style.chain (3 styles) | 959ns/call | 878ns/call | **1.09x** |
| E2E Console.print | 173.7us/call | 171.6us/call | ~1.01x |

## What We Changed

### PR #12 — Architectural wins ([KRRT7/rich#12](https://github.com/KRRT7/rich/pull/12))

- **Replace `@dataclass` with `__slots__` classes** — `ConsoleOptions` and `ConsoleThreadLocals` used `@dataclass`, which imports `inspect` at module level (~10ms). Replaced with plain classes + `__slots__`. ConsoleOptions memory: 344 → 136 bytes (60% reduction).
- **Lazy-load emoji dictionary** — `_emoji_codes.EMOJI` (3,608 entries) loaded unconditionally via `text.py → emoji.py`. Deferred to first use via module-level `__getattr__`.
- **Defer imports across 12+ modules** — `inspect`, `pretty`, `scope`, `getpass`, `configparser`, `html.escape`, `zlib`, `traceback`, `pathlib` → deferred to the methods that actually use them.
- **`from __future__ import annotations`** — Enabled in key modules to allow moving type-only imports to `TYPE_CHECKING`.

### PR #13 — Import deferral + runtime micro-opts ([KRRT7/rich#13](https://github.com/KRRT7/rich/pull/13))

**Import deferral (7 files):**
- `color.py`: `RE_COLOR` compiled lazily in `Color.parse()` (LRU-cached)
- `text.py`: `_re_whitespace` lazy; inline `import re` in 6 methods
- `markup.py`: `RE_TAGS` via `_compile_tags()`, `RE_HANDLER` and escape regex lazy
- `_emoji_replace.py`: regex default arg → lazy `_EMOJI_SUB` global
- `_wrap.py`: `re_word` → lazy `_re_word`
- `highlighter.py`: `import re` inside `JSONHighlighter.highlight()`
- `default_styles.py`: 3 `rgb(...)` strings → `Color.from_rgb()` to avoid `Color.parse()` regex at import

**Runtime micro-optimizations:**
- `Style.__eq__`/`__ne__`: identity shortcut (`is`) before hash comparison
- `Style.combine`/`chain`: use `_add` (LRU-cached) directly instead of `sum()` → `__add__` → `.copy()` check
- `Segment.simplify`: `is` before `==` for style comparison

### Upstream PR

- [Textualize/rich#4070](https://github.com/Textualize/rich/pull/4070) — Initial import deferral PR (subset of the above)

## Methodology

### Environment

- **VM**: Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable)
- **OS**: Ubuntu 24.04 LTS
- **Region**: westus2
- **Python**: 3.12 and 3.13 via uv
- **Tooling**: hyperfine (warmup 5, min-runs 30), timeit (best of 7)

Non-burstable VM chosen for consistent CPU performance — no thermal throttling or turbo variability.

### Benchmark harness

All scripts in [`bench/`](bench/):

| Script | Purpose |
|---|---|
| `bench_import.sh` | Overall `import rich` time via hyperfine |
| `bench_module.sh` | Per-module import time (Console, RichHandler, Traceback, etc.) |
| `bench_e2e.sh` | A/B comparison: master vs optimized branch |
| `bench_compare.sh` | Generic branch comparison wrapper |
| `bench_importtime.py` | `python -X importtime` parser → sorted TSV breakdown |
| `bench_runtime.py` | PR #12 runtime benchmarks (ConsoleOptions, emoji_replace) |
| `bench_runtime2.py` | PR #13 runtime benchmarks (Style.__eq__, combine, Segment.simplify) |
| `bench_text.py` | Text hot-path benchmarks (construction, copy, divide, render) |
| `test_all_impls.sh` | Run tests across CPython 3.9–3.14 + PyPy 3.10 |

### Raw data

Hyperfine JSON exports in [`data/`](data/).

## Maintainer Engagement

Reached out to Will McGugan (Textualize CEO) via Discord. Conversation in [`discord-transcript.md`](discord-transcript.md).

Key quotes:
- "Seems like a clear win. Feel free to open a PR."
- "I'd say single PR."

## Repo Structure

```
.
├── README.md              # This file
├── cloud-init.yaml        # VM provisioning (one-shot reproducible setup)
├── discord-transcript.md  # Will McGugan conversation
├── bench/                 # Benchmark scripts (from VM)
│   ├── bench_import.sh
│   ├── bench_module.sh
│   ├── bench_e2e.sh
│   ├── bench_compare.sh
│   ├── bench_importtime.py
│   ├── bench_runtime.py
│   ├── bench_runtime2.py
│   ├── bench_text.py
│   └── test_all_impls.sh
├── data/                  # Raw benchmark data (hyperfine JSON)
│   ├── e2e-3.12/
│   └── runtime/
└── vm-setup.md            # Azure VM provisioning instructions
```