pip Performance Optimization
End-to-end performance optimization of pip, the Python package installer. 122 commits across startup, dependency resolution, packaging, import deferral, and vendored Rich.
Results
Environment: Python 3.15.0a7, macOS arm64 (Apple Silicon), ~27 packages installed, HTTP cache warm, hyperfine (5–10 runs, 2–3 warmup)
Startup
| Benchmark |
main |
optimized |
Speedup |
pip --version |
138ms |
20ms |
7.0x |
pip --help |
143ms |
121ms |
1.18x |
Dependency Resolution
| Benchmark |
main |
optimized |
Speedup |
requests (~5 deps) |
589ms |
516ms |
1.14x |
flask + django (~15 deps) |
708ms |
599ms |
1.18x |
flask + django + boto3 + requests (~30 deps) |
1,493ms |
826ms |
1.81x |
fastapi[standard] (~42 deps) |
13,325ms |
11,664ms |
1.14x |
Package Operations
| Benchmark |
main |
optimized |
Speedup |
pip list |
162ms |
146ms |
1.11x |
pip freeze |
225ms |
211ms |
1.07x |
pip show pip |
162ms |
148ms |
1.09x |
install -r requirements.txt (21 pkgs) |
1,344ms |
740ms |
1.82x |
Totals
|
main |
optimized |
Speedup |
| All benchmarks |
18,717ms |
15,223ms |
1.23x |
| Excluding fastapi[standard] |
5,392ms |
3,559ms |
1.51x |
What We Optimized (122 commits)
1. Startup
- Ultra-fast
--version path in __main__.py — exits before importing pip._internal (138ms → 20ms)
- Fast-path
--version in cli/main.py — avoids pip._internal.utils.misc import
- Deferred
base_command.py import chain to command creation time
- Deferred
Configuration module loading
- Deferred autocompletion imports behind
PIP_AUTO_COMPLETE check
2. Dependency Resolver — Architecture
- Speculative metadata prefetch: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages
- Conditional Criterion rebuild:
_remove_information_from_criteria skips rebuilding unaffected criteria, eliminating ~95% of allocations
__slots__ on Criterion: reduces per-instance memory by ~100 bytes
- Two-level cache for
_iter_found_candidates (specifier merge + candidate infos)
- Parallel index-page prefetch during dependency resolution
- Unified shared ThreadPoolExecutor for parallel wheel downloads
3. Dependency Resolver — Micro
- Cached wheel tag priority dict on
TargetPython
- Pre-extracted requirements tuple on
Criterion for per-call avoidance of generator expressions
- Cached
Marker.evaluate() results for repeated extra lookups
- Hoisted
operator.methodcaller/attrgetter to module-level constants
- Cached
_sort_key results to avoid double evaluation in compute_best_candidate
4. Packaging (vendored pip._vendor.packaging)
- Replaced
_tokenizer dataclass with __slots__ class
- Deferred
Version.__hash__ computation until first call
- Integer comparison key (
_cmp_int) — avoids full _key tuple construction
- Bisect-based
filter_versions for O(log n + k) batch filtering
- Pre-computed integer bounds on
SpecifierSet for fast rejection
- Cached parsed
Version, Requirement, Specifier objects
- Fast-path tokenizer for simple tokens to bypass regex engine
- Direct release-tuple prefix comparison in
_compare_equal / _compare_compatible
5. Link and Wheel Parsing
- Pre-computed
Link._is_wheel slot to avoid repeated splitext
- Cached URL scheme on
Link to skip urlsplit for is_vcs/is_file
- Inlined Link construction in
_evaluate_json_page to skip redundant work
rsplit instead of rfindx3 for wheel tag extraction
- Cached
parse_tag results to eliminate redundant Tag creation
6. I/O and Caching
- Replaced pure-Python msgpack with stdlib JSON for cache serialization
- Increased HTTP connection pool and prefetch concurrency
7. Import Deferral (vendored Rich)
- Deferred all Rich imports to first use
- Stripped unused Rich modules from import chain
- Deferred heavy imports in Rich
console.py (pretty/pager/scope/screen/export)
- Deferred Rich imports in
progress_bars.py and self_outdated_check.py
8. Micro-optimizations
- Bypassed
InstallationCandidate.__init__ with __new__ + direct slot assignment
- Removed redundant O(n) subset assertion in
BestCandidateResult
- Cached
Hashes.__hash__, Constraint.empty() singleton, Requirement.__str__
- Bypassed
email.parser for metadata parsing
Upstream Contributions
Bug fixes (PRs to pypa/pip)
| PR |
Status |
Description |
| pypa/pip#13900 |
Open |
Fix --report - to use stdlib json instead of Rich for stdout output |
| pypa/pip#13902 |
Open |
Fix test_trailing_slash_directory_metadata for Python 3.15 |
Bug reports (issues on pypa/pip)
| Issue |
Description |
| pypa/pip#13898 |
pip install --report - outputs invalid JSON when not combined with --quiet |
| pypa/pip#13901 |
test_trailing_slash_directory_metadata fails on Python 3.15.0a8 |
Rich upstream (separate case study)
See rich_org for the full Rich case study.
Methodology
Profiling approach
python -X importtime — Identified the heaviest imports in the startup chain
- cProfile / py-spy — Found hot functions in the resolver and packaging layers
- Allocation counting — Tracked object creation counts to find redundant work (e.g., 45,301 → 1,559
Tag.__init__ calls with caching)
- E2E hyperfine — Validated every change with end-to-end benchmarks
Environment
- Local: macOS arm64 (Apple Silicon), Python 3.15.0a7, ~27 packages installed
- CI validation: Azure VM (Standard_D2s_v5, Ubuntu 24.04, Python 3.12), nox test sessions
- Test suite: 1,690 unit tests + 15 functional tests passing throughout
Branch
All 122 optimization commits are on codeflash/optimize in the KRRT7/pip fork.
Repo Structure
.
├── README.md # This file
└── data/
├── benchmarks.md # Full E2E benchmark results table
├── results.tsv # Per-optimization tracking (target, speedup, status)
├── benchmark-analysis.md # Detailed profiling analysis
├── io-analysis.md # I/O and caching analysis
├── coverage-analysis.md # Test coverage analysis
├── learnings.md # Session learnings and patterns
└── session-handoff.md # Optimization session state