pip End-to-End Performance: main vs codeflash/optimize
Branch: codeflash/optimize (118 commits ahead of main)
Environment: Python 3.15.0a7 | macOS arm64 (Apple Silicon) | ~27 packages installed | HTTP cache warm
Tool: hyperfine (5-10 runs, 2-3 warmup)
Startup
| Benchmark |
Main |
Optimized |
Delta |
Speedup |
pip --version |
138 ms |
20 ms |
-118 ms |
7.0x |
pip --help |
143 ms |
121 ms |
-22 ms |
1.18x |
pip install --help |
207 ms |
208 ms |
+1 ms |
~1.0x |
Package Operations
| Benchmark |
Main |
Optimized |
Delta |
Speedup |
pip list |
162 ms |
146 ms |
-16 ms |
1.11x |
pip freeze |
225 ms |
211 ms |
-14 ms |
1.07x |
pip show pip |
162 ms |
148 ms |
-14 ms |
1.09x |
pip check |
191 ms |
174 ms |
-17 ms |
1.10x |
Dependency Resolution
Cached HTTP responses, --dry-run --ignore-installed to force full resolution.
| Benchmark |
Main |
Optimized |
Delta |
Speedup |
requests (simple, ~5 deps) |
589 ms |
516 ms |
-73 ms |
1.14x |
flask + django (medium, ~15 deps) |
708 ms |
599 ms |
-109 ms |
1.18x |
flask + django + boto3 + requests (complex, ~30 deps) |
1,493 ms |
826 ms |
-667 ms |
1.81x |
fastapi[standard] (heavy, ~42 deps) |
13,325 ms |
11,664 ms |
-1,661 ms |
1.14x |
Parsing
| Benchmark |
Main |
Optimized |
Delta |
Speedup |
install -r requirements.txt (21 pinned packages, --no-deps) |
1,344 ms |
740 ms |
-604 ms |
1.82x |
Import Time
| Benchmark |
Main |
Optimized |
Delta |
Speedup |
import pip._internal.cli.main |
50 ms |
50 ms |
0 ms |
1.0x |
Note: On Python 3.15 the import chain is already fast (50ms). The --version
fast-path bypasses this import entirely, which is why pip --version is 7x faster.
Totals
|
Main |
Optimized |
Speedup |
| All benchmarks (sum) |
18,717 ms |
15,223 ms |
1.23x (18.7% faster) |
| Excluding fastapi[standard] |
5,392 ms |
3,559 ms |
1.51x (34.0% faster) |
Top Improvements
| Rank |
Benchmark |
Improvement |
Time Saved |
| 1 |
resolve: fastapi[standard] |
12.5% |
1,661 ms |
| 2 |
resolve: flask+django+boto3+requests |
44.7% |
667 ms |
| 3 |
install -r requirements.txt |
44.9% |
604 ms |
| 4 |
pip --version |
85.5% |
118 ms |
| 5 |
resolve: flask+django |
15.4% |
109 ms |
| 6 |
resolve: requests |
12.4% |
73 ms |
What Was Optimized (118 commits)
1. Startup
- Ultra-fast
--version path in __main__.py that exits before importing pip._internal
- Fast-path
--version in cli/main.py that avoids pip._internal.utils.misc import
- Deferred
base_command.py import chain to command creation time (saves ~22ms on --help)
- Deferred
Configuration module loading
- Deferred autocompletion imports behind
PIP_AUTO_COMPLETE check
2. Dependency Resolver -- Architecture
- Speculative metadata prefetch: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages
- Conditional Criterion rebuild:
_remove_information_from_criteria now skips rebuilding unaffected criteria, eliminating ~95% of allocations
__slots__ on Criterion: reduces per-instance memory by ~100 bytes
- Two-level cache for
_iter_found_candidates (specifier merge cache + candidate infos cache)
- Fail-first preference heuristic (
candidate_count in resolver preference tuple)
ChainMap delta and plain dict in resolvelib state management
- Parallel index-page prefetch during dependency resolution
- Thread-safe
dist property on candidates for concurrent metadata access
3. Dependency Resolver -- Micro
- Cached wheel tag priority dict on
TargetPython
- Pre-extracted requirements tuple on
Criterion to avoid per-call generator expressions
- Cached specifier merge and candidate infos across resolver backtracking
- Cached
Marker.evaluate() results for repeated extra lookups
- Cached
_sort_key results to avoid double evaluation in compute_best_candidate
- Hoisted
operator.methodcaller/attrgetter to module-level constants
4. Packaging (vendored pip._vendor.packaging)
- Replaced
_tokenizer dataclass with __slots__ class
- Deferred
Version.__hash__ computation until first call
- Integer comparison key (
_cmp_int) for Version and Specifier -- avoids full _key tuple construction
- Bisect-based
filter_versions for O(log n + k) batch filtering
- Pre-computed integer bounds on
SpecifierSet for fast rejection
- Cached parsed
Version objects in _coerce_version
- Cached parsed
Requirement fields for repeated requirement strings
- Cached parsed
frozenset of Specifiers in SpecifierSet
- Fast-path tokenizer for simple tokens to bypass regex engine
- Ultra-fast path in
SpecifierSet.contains for prereleases=True
- Pre-computed
is_prerelease/is_postrelease flags at Version init
- Direct release-tuple prefix comparison in
_compare_equal and _compare_compatible
- Cached
Specifier.__str__ and __hash__
5. Link and Wheel Parsing
- Pre-computed
Link._is_wheel slot to avoid repeated splitext comparison
- Cached URL scheme on
Link to skip urlsplit for is_vcs/is_file
- Deferred URL path extraction in
Link.from_json when filename exists
- Inlined Link construction in
_evaluate_json_page to skip redundant work
- Direct string extraction replacing
parse_wheel_filename in sort path
rsplit instead of rfindx3 for wheel tag extraction
- Cached
parse_tag results to eliminate redundant Tag creation
6. I/O and Caching
- Replaced pure-Python msgpack with C-level stdlib JSON for cache serialization (backward compatible)
- Increased HTTP connection pool and prefetch concurrency
7. Import Deferral
- Deferred
base_command.py import chain to command creation time
- Deferred all Rich imports to first use
- Stripped unused Rich modules from import chain
- Deferred heavy imports in Rich
console.py (pretty/pager/scope/screen/export)
- Deferred Rich imports in
progress_bars.py and self_outdated_check.py
8. Micro-optimizations
- Bypassed
InstallationCandidate.__init__ with __new__ + direct slot assignment
- Removed redundant O(n) subset assertion in
BestCandidateResult
- Replaced
min() builtins with inline conditionals in _cmp_int
- Cached
Hashes.__hash__ to avoid repeated sort+join computation
- Cached
Constraint.empty() singleton to avoid 169K redundant allocations
- Bypassed
email.parser for metadata parsing