Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references
5.7 KiB
Discord Conversation with Will McGugan
April 8–9, 2026
KRRT — Yesterday at 11:01 PM
Hey Will, I'm working on a POC project to convince my boss — part of that is optimizing pip's startup time. pip vendors in Rich, and it's one of the heavier imports in the chain. I've been profiling it and found some quick wins on the import-time side. I could just open a PR against pip's vendored copy, but I'd rather contribute upstream so everyone benefits. I want to skip some of the red tape and wanted to establish a conversation with you on this — I'm aware of the new AI policy. I have full benchmark data from a controlled environment if you're interested.
KRRT — Yesterday at 11:12 PM
https://github.com/KRRT7/rich/pull/1
here's a draft PR on my fork for your reference
Will McGugan — Yesterday at 11:34 PM
Seems like a clear win. Feel free to open a PR.
KRRT — Yesterday at 11:38 PM
thanks, let me clean it up.
KRRT — Yesterday at 11:48 PM
I've got 8 more import-time wins stacked on top of it. Combined E2E results:
Console import: 1.50x faster (77.1ms → 51.5ms) RichHandler import: 1.75x faster (97.6ms → 55.6ms)
All benchmarked on a dedicated VM with hyperfine, tests pass on CPython 3.9–3.14 and PyPy 3.10. The full breakdown is here: https://github.com/KRRT7/rich/pull/10
The changes are all the same pattern — deferring imports that are only used in specific code paths (inspect, pretty, scope, getpass, configparser, html/zlib for SVG export, plus a dead logging import removal and a pathlib→os.path swap). I have them as individual branches if you want to review separately, but it'd be cleaner to open a single combined PR upstream. What do you prefer?
you can see the other PRs in my fork
Will McGugan — Yesterday at 11:50 PM
I'd say single PR. Are they all needed at runtime? Maybe some can do in an if TYPE_CHECKING block.
KRRT — Yesterday at 11:52 PM
yeah, they're all needed at runtime, but not import time, that's why they're deferred to the methods that actually use them rather than put in TYPE_CHECKING
though with future annotations maybe we can do a mix of both to maintain the type checking
KRRT — 12:13 AM
ok, that worked even better than i expected Updated numbers with everything combined:
Console import: 1.52x faster (78.8ms → 52.0ms) RichHandler import: 2.0x faster (99.4ms → 50.0ms)
RichHandler is now faster than Console on master — it doesn't import rich.console at all anymore, defers it to first use via get_console().
I've updated the upstream PR with everything in a single commit: https://github.com/Textualize/rich/pull/4070
There's more TYPE_CHECKING opportunities in console.py, syntax.py, panel.py, and table.py too. this is just the initial low-hanging fruit, let me keep going
KRRT — 12:34 AM
I also profiled what's left after these changes and found a few bigger architectural wins that would need your input:
Replace @dataclass with plain classes + slots (~10ms import, 60% less memory)
console.py uses @dataclass for ConsoleOptions and ConsoleThreadLocals. The dataclasses module imports inspect at module level, so it's ~10ms to load. Replacing these with plain classes eliminates the entire dataclasses→inspect chain.
Adding slots at the same time gives a runtime win too: ConsoleOptions drops from 344 bytes to 136 bytes per instance (60% reduction). Since ConsoleOptions.update() creates a copy on every renderable, this adds up. The copy() method would change from dict.copy() to explicit slot assignment — I benchmarked this and it's the same speed (27.5ms vs 26.1ms per 100K copies). Style, Text, and Emoji already use slots, so this aligns with existing patterns.
Lazy emoji loading (~2ms)
_emoji_codes.py is a 3,608-entry dict that gets loaded unconditionally through text.py → emoji.py and console.py → _emoji_replace.py. Most users never use :emoji_name: syntax. If _emoji_codes.EMOJI were lazily loaded (e.g., a module-level getattr or moving the import inside _emoji_replace()), that's ~2ms back.
Remaining inspect imports in protocol.py and repr.py
protocol.py does from inspect import isclass — same pattern I fixed in console.py, replaceable with isinstance(x, type) repr.py does import inspect for inspect.signature() in one method — could be deferred to that method
These wouldn't save time on their own right now (inspect gets pulled in by dataclasses anyway), but they'd become free wins once #1 is done.
Codebase-wide from future import annotations
This is the bigger unlock. Right now, most TYPE_CHECKING wins are blocked because annotation-only names share import lines with runtime names (e.g., from .style import Style, StyleType where Style is runtime but StyleType is annotation-only). With future annotations everywhere, type aliases like StyleType, TextType, AlignMethod, JustifyMethod, VerticalAlignMethod etc. could all move to TYPE_CHECKING. This is a larger change that touches many files but is mechanical and low-risk.
KRRT — 1:32 AM
https://github.com/KRRT7/rich/pull/12
I went ahead and prototyped the bigger architectural changes I mentioned, figured it'd be easier to show than describe.
Replaced @dataclass with plain classes + slots for ConsoleOptions / ConsoleThreadLocals — eliminates the dataclasses→inspect import chain (~10ms). Also cuts ConsoleOptions memory from 344 → 136 bytes per instance (60% less). Style, Text, and Emoji already use slots so it's consistent with the codebase.
Lazy-loaded _emoji_codes.EMOJI — the 3,608-entry dict was loading unconditionally even though most code paths never use emoji markup. Deferred to first use via module-level getattr.
the stuff around emoji looks ugly / unpythonic but it's for performance reasons.