codeflash-agent/.codeflash/krrt7/python/pip/README.md
Kevin Turcios cc29a27289
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15)
Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references
2026-04-14 23:04:34 -05:00

7.1 KiB
Raw Blame History

pip Performance Optimization

End-to-end performance optimization of pip, the Python package installer. 122 commits across startup, dependency resolution, packaging, import deferral, and vendored Rich.

Results

Environment: Python 3.15.0a7, macOS arm64 (Apple Silicon), ~27 packages installed, HTTP cache warm, hyperfine (510 runs, 23 warmup)

Startup

Benchmark main optimized Speedup
pip --version 138ms 20ms 7.0x
pip --help 143ms 121ms 1.18x

Dependency Resolution

Benchmark main optimized Speedup
requests (~5 deps) 589ms 516ms 1.14x
flask + django (~15 deps) 708ms 599ms 1.18x
flask + django + boto3 + requests (~30 deps) 1,493ms 826ms 1.81x
fastapi[standard] (~42 deps) 13,325ms 11,664ms 1.14x

Package Operations

Benchmark main optimized Speedup
pip list 162ms 146ms 1.11x
pip freeze 225ms 211ms 1.07x
pip show pip 162ms 148ms 1.09x
install -r requirements.txt (21 pkgs) 1,344ms 740ms 1.82x

Totals

main optimized Speedup
All benchmarks 18,717ms 15,223ms 1.23x
Excluding fastapi[standard] 5,392ms 3,559ms 1.51x

What We Optimized (122 commits)

1. Startup

  • Ultra-fast --version path in __main__.py — exits before importing pip._internal (138ms → 20ms)
  • Fast-path --version in cli/main.py — avoids pip._internal.utils.misc import
  • Deferred base_command.py import chain to command creation time
  • Deferred Configuration module loading
  • Deferred autocompletion imports behind PIP_AUTO_COMPLETE check

2. Dependency Resolver — Architecture

  • Speculative metadata prefetch: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages
  • Conditional Criterion rebuild: _remove_information_from_criteria skips rebuilding unaffected criteria, eliminating ~95% of allocations
  • __slots__ on Criterion: reduces per-instance memory by ~100 bytes
  • Two-level cache for _iter_found_candidates (specifier merge + candidate infos)
  • Parallel index-page prefetch during dependency resolution
  • Unified shared ThreadPoolExecutor for parallel wheel downloads

3. Dependency Resolver — Micro

  • Cached wheel tag priority dict on TargetPython
  • Pre-extracted requirements tuple on Criterion for per-call avoidance of generator expressions
  • Cached Marker.evaluate() results for repeated extra lookups
  • Hoisted operator.methodcaller/attrgetter to module-level constants
  • Cached _sort_key results to avoid double evaluation in compute_best_candidate

4. Packaging (vendored pip._vendor.packaging)

  • Replaced _tokenizer dataclass with __slots__ class
  • Deferred Version.__hash__ computation until first call
  • Integer comparison key (_cmp_int) — avoids full _key tuple construction
  • Bisect-based filter_versions for O(log n + k) batch filtering
  • Pre-computed integer bounds on SpecifierSet for fast rejection
  • Cached parsed Version, Requirement, Specifier objects
  • Fast-path tokenizer for simple tokens to bypass regex engine
  • Direct release-tuple prefix comparison in _compare_equal / _compare_compatible
  • Pre-computed Link._is_wheel slot to avoid repeated splitext
  • Cached URL scheme on Link to skip urlsplit for is_vcs/is_file
  • Inlined Link construction in _evaluate_json_page to skip redundant work
  • rsplit instead of rfindx3 for wheel tag extraction
  • Cached parse_tag results to eliminate redundant Tag creation

6. I/O and Caching

  • Replaced pure-Python msgpack with stdlib JSON for cache serialization
  • Increased HTTP connection pool and prefetch concurrency

7. Import Deferral (vendored Rich)

  • Deferred all Rich imports to first use
  • Stripped unused Rich modules from import chain
  • Deferred heavy imports in Rich console.py (pretty/pager/scope/screen/export)
  • Deferred Rich imports in progress_bars.py and self_outdated_check.py

8. Micro-optimizations

  • Bypassed InstallationCandidate.__init__ with __new__ + direct slot assignment
  • Removed redundant O(n) subset assertion in BestCandidateResult
  • Cached Hashes.__hash__, Constraint.empty() singleton, Requirement.__str__
  • Bypassed email.parser for metadata parsing

Upstream Contributions

Bug fixes (PRs to pypa/pip)

PR Status Description
pypa/pip#13900 Open Fix --report - to use stdlib json instead of Rich for stdout output
pypa/pip#13902 Open Fix test_trailing_slash_directory_metadata for Python 3.15

Bug reports (issues on pypa/pip)

Issue Description
pypa/pip#13898 pip install --report - outputs invalid JSON when not combined with --quiet
pypa/pip#13901 test_trailing_slash_directory_metadata fails on Python 3.15.0a8

Rich upstream (separate case study)

PR Description
Textualize/rich#4070 Import deferral — 2x import speedup
KRRT7/rich#12 Architectural wins (dataclass→slots, lazy emoji)
KRRT7/rich#13 Import deferral + runtime micro-opts

See rich_org for the full Rich case study.

Methodology

Profiling approach

  1. python -X importtime — Identified the heaviest imports in the startup chain
  2. cProfile / py-spy — Found hot functions in the resolver and packaging layers
  3. Allocation counting — Tracked object creation counts to find redundant work (e.g., 45,301 → 1,559 Tag.__init__ calls with caching)
  4. E2E hyperfine — Validated every change with end-to-end benchmarks

Environment

  • Local: macOS arm64 (Apple Silicon), Python 3.15.0a7, ~27 packages installed
  • CI validation: Azure VM (Standard_D2s_v5, Ubuntu 24.04, Python 3.12), nox test sessions
  • Test suite: 1,690 unit tests + 15 functional tests passing throughout

Branch

All 122 optimization commits are on codeflash/optimize in the KRRT7/pip fork.

Repo Structure

.
├── README.md                       # This file
└── data/
    ├── benchmarks.md               # Full E2E benchmark results table
    ├── results.tsv                  # Per-optimization tracking (target, speedup, status)
    ├── benchmark-analysis.md       # Detailed profiling analysis
    ├── io-analysis.md              # I/O and caching analysis
    ├── coverage-analysis.md        # Test coverage analysis
    ├── learnings.md                # Session learnings and patterns
    └── session-handoff.md          # Optimization session state