mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

History

Kevin Turcios cc29a27289 Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 ) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references		2026-04-14 23:04:34 -05:00
..
bench	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
data	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
infra	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
.gitignore	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
README.md	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
status.md	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00

README.md

Rich Performance Optimization

Upstream performance improvements to Textualize/rich, motivated by pip startup time profiling.

Background

pip vendors Rich for its progress bars, logging, and error display. Profiling pip --version revealed Rich as one of the heaviest imports in the startup chain — from rich.console import Console alone took ~79ms on CPython 3.12 (Standard_D2s_v5 VM).

Rather than patching pip's vendored copy, we contributed upstream so everyone benefits.

Results

Import Time (hyperfine, 30+ runs, Standard_D2s_v5)

CPython 3.12

Import	master	optimized	Speedup
`Console`	79.1 ± 0.8ms	37.5 ± 0.5ms	2.11x
`RichHandler`	100.3 ± 3.6ms	39.6 ± 0.5ms	2.53x

CPython 3.13

Import	master	optimized	Speedup
`Console`	67.9 ± 0.7ms	33.6 ± 0.5ms	2.02x
`RichHandler`	—	37.5 ± 0.4ms	—

On Python 3.13+, typing no longer imports re, so deferring all re.compile() calls eliminates re (+ _sre, re._compiler, re._parser, re._constants) from the Console import chain entirely.

Runtime Micro-benchmarks (Python 3.13.13)

Benchmark	Before	After	Speedup
Style.__eq__ (identity)	114ns/call	62ns/call	1.84x
Style.combine (3 styles)	579ns/call	433ns/call	1.34x
Segment.simplify (identity)	1269ns/call	931ns/call	1.36x
Style.chain (3 styles)	959ns/call	878ns/call	1.09x
E2E Console.print	173.7us/call	171.6us/call	~1.01x

What We Changed

PR #12 — Architectural wins (KRRT7/rich#12)

Replace @dataclass with __slots__ classes — ConsoleOptions and ConsoleThreadLocals used @dataclass, which imports inspect at module level (~10ms). Replaced with plain classes + __slots__. ConsoleOptions memory: 344 → 136 bytes (60% reduction).
Lazy-load emoji dictionary — _emoji_codes.EMOJI (3,608 entries) loaded unconditionally via text.py → emoji.py. Deferred to first use via module-level __getattr__.
Defer imports across 12+ modules — inspect, pretty, scope, getpass, configparser, html.escape, zlib, traceback, pathlib → deferred to the methods that actually use them.
from __future__ import annotations — Enabled in key modules to allow moving type-only imports to TYPE_CHECKING.

PR #13 — Import deferral + runtime micro-opts (KRRT7/rich#13)

Import deferral (7 files):

color.py: RE_COLOR compiled lazily in Color.parse() (LRU-cached)
text.py: _re_whitespace lazy; inline import re in 6 methods
markup.py: RE_TAGS via _compile_tags(), RE_HANDLER and escape regex lazy
_emoji_replace.py: regex default arg → lazy _EMOJI_SUB global
_wrap.py: re_word → lazy _re_word
highlighter.py: import re inside JSONHighlighter.highlight()
default_styles.py: 3 rgb(...) strings → Color.from_rgb() to avoid Color.parse() regex at import

Runtime micro-optimizations:

Style.__eq__/__ne__: identity shortcut (is) before hash comparison
Style.combine/chain: use _add (LRU-cached) directly instead of sum() → __add__ → .copy() check
Segment.simplify: is before == for style comparison

Upstream PR

Textualize/rich#4070 — Initial import deferral PR (subset of the above)

Methodology

Environment

VM: Azure Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable)
OS: Ubuntu 24.04 LTS
Region: westus2
Python: 3.12 and 3.13 via uv
Tooling: hyperfine (warmup 5, min-runs 30), timeit (best of 7)

Non-burstable VM chosen for consistent CPU performance — no thermal throttling or turbo variability.

Benchmark harness

All scripts in bench/:

Script	Purpose
`bench_import.sh`	Overall `import rich` time via hyperfine
`bench_module.sh`	Per-module import time (Console, RichHandler, Traceback, etc.)
`bench_e2e.sh`	A/B comparison: master vs optimized branch
`bench_compare.sh`	Generic branch comparison wrapper
`bench_importtime.py`	`python -X importtime` parser → sorted TSV breakdown
`bench_runtime.py`	PR #12 runtime benchmarks (ConsoleOptions, emoji_replace)
`bench_runtime2.py`	PR #13 runtime benchmarks (Style.eq, combine, Segment.simplify)
`bench_text.py`	Text hot-path benchmarks (construction, copy, divide, render)
`test_all_impls.sh`	Run tests across CPython 3.9–3.14 + PyPy 3.10

Raw data

Hyperfine JSON exports in data/.

Maintainer Engagement

Reached out to Will McGugan (Textualize CEO) via Discord. Conversation in discord-transcript.md.

Key quotes:

"Seems like a clear win. Feel free to open a PR."
"I'd say single PR."

Repo Structure

.
├── README.md              # This file
├── cloud-init.yaml        # VM provisioning (one-shot reproducible setup)
├── discord-transcript.md  # Will McGugan conversation
├── bench/                 # Benchmark scripts (from VM)
│   ├── bench_import.sh
│   ├── bench_module.sh
│   ├── bench_e2e.sh
│   ├── bench_compare.sh
│   ├── bench_importtime.py
│   ├── bench_runtime.py
│   ├── bench_runtime2.py
│   ├── bench_text.py
│   └── test_all_impls.sh
├── data/                  # Raw benchmark data (hyperfine JSON)
│   ├── e2e-3.12/
│   └── runtime/
└── vm-setup.md            # Azure VM provisioning instructions

README.md Unescape Escape