# pip Performance Optimization End-to-end performance optimization of [pip](https://github.com/pypa/pip), the Python package installer. 122 commits across startup, dependency resolution, packaging, import deferral, and vendored Rich. ## Results **Environment**: Python 3.15.0a7, macOS arm64 (Apple Silicon), ~27 packages installed, HTTP cache warm, hyperfine (5–10 runs, 2–3 warmup) ### Startup | Benchmark | main | optimized | Speedup | |---|---:|---:|---:| | `pip --version` | 138ms | **20ms** | **7.0x** | | `pip --help` | 143ms | **121ms** | **1.18x** | ### Dependency Resolution | Benchmark | main | optimized | Speedup | |---|---:|---:|---:| | `requests` (~5 deps) | 589ms | **516ms** | **1.14x** | | `flask + django` (~15 deps) | 708ms | **599ms** | **1.18x** | | `flask + django + boto3 + requests` (~30 deps) | 1,493ms | **826ms** | **1.81x** | | `fastapi[standard]` (~42 deps) | 13,325ms | **11,664ms** | **1.14x** | ### Package Operations | Benchmark | main | optimized | Speedup | |---|---:|---:|---:| | `pip list` | 162ms | **146ms** | **1.11x** | | `pip freeze` | 225ms | **211ms** | **1.07x** | | `pip show pip` | 162ms | **148ms** | **1.09x** | | `install -r requirements.txt` (21 pkgs) | 1,344ms | **740ms** | **1.82x** | ### Totals | | main | optimized | Speedup | |-|---:|---:|---:| | **All benchmarks** | 18,717ms | 15,223ms | **1.23x** | | **Excluding fastapi[standard]** | 5,392ms | 3,559ms | **1.51x** | ## What We Optimized (122 commits) ### 1. Startup - Ultra-fast `--version` path in `__main__.py` — exits before importing `pip._internal` (138ms → 20ms) - Fast-path `--version` in `cli/main.py` — avoids `pip._internal.utils.misc` import - Deferred `base_command.py` import chain to command creation time - Deferred `Configuration` module loading - Deferred autocompletion imports behind `PIP_AUTO_COMPLETE` check ### 2. Dependency Resolver — Architecture - **Speculative metadata prefetch**: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages - **Conditional Criterion rebuild**: `_remove_information_from_criteria` skips rebuilding unaffected criteria, eliminating ~95% of allocations - **`__slots__` on Criterion**: reduces per-instance memory by ~100 bytes - Two-level cache for `_iter_found_candidates` (specifier merge + candidate infos) - Parallel index-page prefetch during dependency resolution - Unified shared ThreadPoolExecutor for parallel wheel downloads ### 3. Dependency Resolver — Micro - Cached wheel tag priority dict on `TargetPython` - Pre-extracted requirements tuple on `Criterion` for per-call avoidance of generator expressions - Cached `Marker.evaluate()` results for repeated extra lookups - Hoisted `operator.methodcaller`/`attrgetter` to module-level constants - Cached `_sort_key` results to avoid double evaluation in `compute_best_candidate` ### 4. Packaging (vendored `pip._vendor.packaging`) - Replaced `_tokenizer` dataclass with `__slots__` class - Deferred `Version.__hash__` computation until first call - Integer comparison key (`_cmp_int`) — avoids full `_key` tuple construction - Bisect-based `filter_versions` for O(log n + k) batch filtering - Pre-computed integer bounds on `SpecifierSet` for fast rejection - Cached parsed `Version`, `Requirement`, `Specifier` objects - Fast-path tokenizer for simple tokens to bypass regex engine - Direct release-tuple prefix comparison in `_compare_equal` / `_compare_compatible` ### 5. Link and Wheel Parsing - Pre-computed `Link._is_wheel` slot to avoid repeated `splitext` - Cached URL scheme on `Link` to skip `urlsplit` for `is_vcs`/`is_file` - Inlined Link construction in `_evaluate_json_page` to skip redundant work - `rsplit` instead of `rfind`x3 for wheel tag extraction - Cached `parse_tag` results to eliminate redundant `Tag` creation ### 6. I/O and Caching - Replaced pure-Python msgpack with stdlib JSON for cache serialization - Increased HTTP connection pool and prefetch concurrency ### 7. Import Deferral (vendored Rich) - Deferred all Rich imports to first use - Stripped unused Rich modules from import chain - Deferred heavy imports in Rich `console.py` (pretty/pager/scope/screen/export) - Deferred Rich imports in `progress_bars.py` and `self_outdated_check.py` ### 8. Micro-optimizations - Bypassed `InstallationCandidate.__init__` with `__new__` + direct slot assignment - Removed redundant O(n) subset assertion in `BestCandidateResult` - Cached `Hashes.__hash__`, `Constraint.empty()` singleton, `Requirement.__str__` - Bypassed `email.parser` for metadata parsing ## Upstream Contributions ### Bug fixes (PRs to pypa/pip) | PR | Status | Description | |---|---|---| | [pypa/pip#13900](https://github.com/pypa/pip/pull/13900) | Open | Fix `--report -` to use stdlib `json` instead of Rich for stdout output | | [pypa/pip#13902](https://github.com/pypa/pip/pull/13902) | Open | Fix `test_trailing_slash_directory_metadata` for Python 3.15 | ### Bug reports (issues on pypa/pip) | Issue | Description | |---|---| | [pypa/pip#13898](https://github.com/pypa/pip/issues/13898) | `pip install --report -` outputs invalid JSON when not combined with `--quiet` | | [pypa/pip#13901](https://github.com/pypa/pip/issues/13901) | `test_trailing_slash_directory_metadata` fails on Python 3.15.0a8 | ### Rich upstream (separate case study) | PR | Description | |---|---| | [Textualize/rich#4070](https://github.com/Textualize/rich/pull/4070) | Import deferral — 2x import speedup | | [KRRT7/rich#12](https://github.com/KRRT7/rich/pull/12) | Architectural wins (dataclass→__slots__, lazy emoji) | | [KRRT7/rich#13](https://github.com/KRRT7/rich/pull/13) | Import deferral + runtime micro-opts | See [rich_org](https://github.com/KRRT7/rich_org) for the full Rich case study. ## Methodology ### Profiling approach 1. **`python -X importtime`** — Identified the heaviest imports in the startup chain 2. **cProfile / py-spy** — Found hot functions in the resolver and packaging layers 3. **Allocation counting** — Tracked object creation counts to find redundant work (e.g., 45,301 → 1,559 `Tag.__init__` calls with caching) 4. **E2E hyperfine** — Validated every change with end-to-end benchmarks ### Environment - **Local**: macOS arm64 (Apple Silicon), Python 3.15.0a7, ~27 packages installed - **CI validation**: Azure VM (Standard_D2s_v5, Ubuntu 24.04, Python 3.12), nox test sessions - **Test suite**: 1,690 unit tests + 15 functional tests passing throughout ### Branch All 122 optimization commits are on [`codeflash/optimize`](https://github.com/KRRT7/pip/tree/codeflash/optimize) in the KRRT7/pip fork. ## Repo Structure ``` . ├── README.md # This file └── data/ ├── benchmarks.md # Full E2E benchmark results table ├── results.tsv # Per-optimization tracking (target, speedup, status) ├── benchmark-analysis.md # Detailed profiling analysis ├── io-analysis.md # I/O and caching analysis ├── coverage-analysis.md # Test coverage analysis ├── learnings.md # Session learnings and patterns └── session-handoff.md # Optimization session state ```