mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Kevin Turcios 3b59d97647 squash

2026-04-13 14:12:17 -05:00

6.8 KiB

Raw Blame History

pip End-to-End Performance: `main` vs `codeflash/optimize`

Branch: codeflash/optimize (118 commits ahead of main) Environment: Python 3.15.0a7 | macOS arm64 (Apple Silicon) | ~27 packages installed | HTTP cache warm Tool: hyperfine (5-10 runs, 2-3 warmup)

Startup

Benchmark	Main	Optimized	Delta	Speedup
`pip --version`	138 ms	20 ms	-118 ms	7.0x
`pip --help`	143 ms	121 ms	-22 ms	1.18x
`pip install --help`	207 ms	208 ms	+1 ms	~1.0x

Package Operations

Benchmark	Main	Optimized	Delta	Speedup
`pip list`	162 ms	146 ms	-16 ms	1.11x
`pip freeze`	225 ms	211 ms	-14 ms	1.07x
`pip show pip`	162 ms	148 ms	-14 ms	1.09x
`pip check`	191 ms	174 ms	-17 ms	1.10x

Dependency Resolution

Cached HTTP responses, --dry-run --ignore-installed to force full resolution.

Benchmark	Main	Optimized	Delta	Speedup
`requests` (simple, ~5 deps)	589 ms	516 ms	-73 ms	1.14x
`flask + django` (medium, ~15 deps)	708 ms	599 ms	-109 ms	1.18x
`flask + django + boto3 + requests` (complex, ~30 deps)	1,493 ms	826 ms	-667 ms	1.81x
`fastapi[standard]` (heavy, ~42 deps)	13,325 ms	11,664 ms	-1,661 ms	1.14x

Parsing

Benchmark	Main	Optimized	Delta	Speedup
`install -r requirements.txt` (21 pinned packages, `--no-deps`)	1,344 ms	740 ms	-604 ms	1.82x

Import Time

Benchmark	Main	Optimized	Delta	Speedup
`import pip._internal.cli.main`	50 ms	50 ms	0 ms	1.0x

Note: On Python 3.15 the import chain is already fast (50ms). The --version fast-path bypasses this import entirely, which is why pip --version is 7x faster.

Totals

	Main	Optimized	Speedup
All benchmarks (sum)	18,717 ms	15,223 ms	1.23x (18.7% faster)
Excluding fastapi[standard]	5,392 ms	3,559 ms	1.51x (34.0% faster)

Top Improvements

Rank	Benchmark	Improvement	Time Saved
1	`resolve: fastapi[standard]`	12.5%	1,661 ms
2	`resolve: flask+django+boto3+requests`	44.7%	667 ms
3	`install -r requirements.txt`	44.9%	604 ms
4	`pip --version`	85.5%	118 ms
5	`resolve: flask+django`	15.4%	109 ms
6	`resolve: requests`	12.4%	73 ms

What Was Optimized (118 commits)

1. Startup

Ultra-fast --version path in __main__.py that exits before importing pip._internal
Fast-path --version in cli/main.py that avoids pip._internal.utils.misc import
Deferred base_command.py import chain to command creation time (saves ~22ms on --help)
Deferred Configuration module loading
Deferred autocompletion imports behind PIP_AUTO_COMPLETE check

2. Dependency Resolver -- Architecture

Speculative metadata prefetch: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages
Conditional Criterion rebuild: _remove_information_from_criteria now skips rebuilding unaffected criteria, eliminating ~95% of allocations
__slots__ on Criterion: reduces per-instance memory by ~100 bytes
Two-level cache for _iter_found_candidates (specifier merge cache + candidate infos cache)
Fail-first preference heuristic (candidate_count in resolver preference tuple)
ChainMap delta and plain dict in resolvelib state management
Parallel index-page prefetch during dependency resolution
Thread-safe dist property on candidates for concurrent metadata access

3. Dependency Resolver -- Micro

Cached wheel tag priority dict on TargetPython
Pre-extracted requirements tuple on Criterion to avoid per-call generator expressions
Cached specifier merge and candidate infos across resolver backtracking
Cached Marker.evaluate() results for repeated extra lookups
Cached _sort_key results to avoid double evaluation in compute_best_candidate
Hoisted operator.methodcaller/attrgetter to module-level constants

4. Packaging (vendored `pip._vendor.packaging`)

Replaced _tokenizer dataclass with __slots__ class
Deferred Version.__hash__ computation until first call
Integer comparison key (_cmp_int) for Version and Specifier -- avoids full _key tuple construction
Bisect-based filter_versions for O(log n + k) batch filtering
Pre-computed integer bounds on SpecifierSet for fast rejection
Cached parsed Version objects in _coerce_version
Cached parsed Requirement fields for repeated requirement strings
Cached parsed frozenset of Specifiers in SpecifierSet
Fast-path tokenizer for simple tokens to bypass regex engine
Ultra-fast path in SpecifierSet.contains for prereleases=True
Pre-computed is_prerelease/is_postrelease flags at Version init
Direct release-tuple prefix comparison in _compare_equal and _compare_compatible
Cached Specifier.__str__ and __hash__

5. Link and Wheel Parsing

Pre-computed Link._is_wheel slot to avoid repeated splitext comparison
Cached URL scheme on Link to skip urlsplit for is_vcs/is_file
Deferred URL path extraction in Link.from_json when filename exists
Inlined Link construction in _evaluate_json_page to skip redundant work
Direct string extraction replacing parse_wheel_filename in sort path
rsplit instead of rfindx3 for wheel tag extraction
Cached parse_tag results to eliminate redundant Tag creation

6. I/O and Caching

Replaced pure-Python msgpack with C-level stdlib JSON for cache serialization (backward compatible)
Increased HTTP connection pool and prefetch concurrency

7. Import Deferral

Deferred base_command.py import chain to command creation time
Deferred all Rich imports to first use
Stripped unused Rich modules from import chain
Deferred heavy imports in Rich console.py (pretty/pager/scope/screen/export)
Deferred Rich imports in progress_bars.py and self_outdated_check.py

8. Micro-optimizations

Bypassed InstallationCandidate.__init__ with __new__ + direct slot assignment
Removed redundant O(n) subset assertion in BestCandidateResult
Replaced min() builtins with inline conditionals in _cmp_int
Cached Hashes.__hash__ to avoid repeated sort+join computation
Cached Constraint.empty() singleton to avoid 169K redundant allocations
Bypassed email.parser for metadata parsing

6.8 KiB Raw Blame History

pip End-to-End Performance: main vs codeflash/optimize