12 KiB
All dependencies are vendored
Everything in src/pip/_vendor/ is vendored from upstream: resolvelib, packaging, requests, urllib3, cachecontrol, certifi, distlib, importlib_metadata, pygments, rich, etc. These are copies of external libraries maintained via tools/vendoring/. Each vendor is a candidate for replacement — if pip only uses a subset of a library's API, a focused implementation covering just that subset will be faster (no generalized overhead). If a vendored library is fully replaced and no longer imported anywhere in src/pip/_internal/, delete it from _vendor/ and remove its entry from _vendor/vendor.txt. The vendoring manifest is at src/pip/_vendor/vendor.txt.
Resolution flow is the primary hot path
The dependency resolution call chain:
Resolver.resolve() [resolution/resolvelib/resolver.py]
-> resolvelib.Resolver (vendored algorithm)
-> PipProvider.find_matches() [resolution/resolvelib/provider.py]
-> Factory._iter_found_candidates() [resolution/resolvelib/factory.py]
-> PackageFinder.find_best_candidate() [index/package_finder.py]
-> LinkCollector.collect_sources() [index/collector.py]
-> LinkEvaluator.evaluate_link()
-> CandidateEvaluator.compute_best_candidate()
-> PipProvider.get_dependencies()
-> Candidate.iter_dependencies() [resolution/resolvelib/candidates.py]
The vendored resolvelib drives the algorithm; pip's layer (factory, provider, candidates, package_finder) is where the overhead lives.
Existing caching in pip — evaluate and improve
Current caching is a starting point, not a ceiling. Profile each one — the cache sizes, strategies, and data structures may all be suboptimal:
functools.lru_cache(maxsize=10000)onparse_versioninutils/packaging.py— is 10k the right size? Is lru_cache the fastest caching strategy here? Would a plain dict be faster (no LRU eviction overhead)?functools.lru_cache(maxsize=32)onget_requirementinutils/packaging.py— only 32 slots. During large resolutions this evicts constantly. Profile whether a larger cache or unbounded@functools.cacheis faster.@functools.cacheon Link properties inmodels/link.py— functools.cache has per-call overhead for hashing args. If Link properties are called withselfonly, a simple__dict__-based cache or__slots__with pre-computed values may be faster.@functools.cached_propertyon InstallRequirement properties — has thread-safety overhead in 3.12+. Evaluate whether a simpler lazy pattern is faster.@functools.cacheon candidate creation inresolution/resolvelib/provider.py— profile the cache hit rate. If it's low, the hashing overhead is pure waste.- HTTP caching via vendored
CacheControlinnetwork/session.py— a general-purpose HTTP cache. If pip only needs a subset of caching semantics, a focused implementation could be faster. - Wheel cache by URL hash in
cache.py— uses sha224. Profile whether a faster hash (xxhash via C, or even just dict key on URL string) would help at scale. - Lazy wheel loading in
network/lazy_wheel.py— the TODO says range requests aren't cached. Fix this, and also evaluate whether the lazy loading strategy itself is optimal (e.g., batch range requests, prefetch metadata sections).
Known TODOs from source — verified optimization opportunities
resolution/resolvelib/candidates.py~line 250: "TODO performance: this means we iterate dependencies at least twice" — dependencies are extracted from metadata, then iterated again during resolutionresolution/resolvelib/factory.py: "TODO: Check already installed candidate, and use it if the link and hash match" — redundant work when a compatible version is already installednetwork/lazy_wheel.py: "TODO: Get range requests to be correctly cached" — lazy wheel metadata fetches bypass the HTTP cache
Version parsing happens repeatedly
During candidate evaluation in package_finder.py, version strings are parsed from Link URLs multiple times across different stages (link evaluation, candidate evaluation, sorting). The lru_cache on parse_version helps but the cache key is the string — if the same version appears in different URL formats, it may be parsed redundantly.
Link objects are high-volume
models/link.py Link objects are created for every candidate from every index page. They use @functools.cache on properties, but the sheer volume (hundreds to thousands per resolution) means object creation overhead itself matters.
Tests structure
tests/unit/— fast, no network, good for profiling feedbacktests/unit/resolution_resolvelib/— resolver-specific unit teststests/functional/— slow, needs network, creates real virtualenvs- Socket disabled by default in pytest config
tests/unit/test_finder.py— tests for PackageFindertests/unit/test_req.py— tests for requirement handling
pip targets Python 3.9+ and PyPy
Cannot use: walrus operator in 3.9-incompatible ways, match/case (3.10+), exception groups (3.11+), or type statement (3.12+). typing.Self, typing.TypeAlias need imports from typing_extensions or __future__.
Ruff is the linter
Line length 88, target-version py39. Key ignores: PERF203 is explicitly ignored for src/pip/_internal/* (try-except in loop). Isort has a custom vendored section for pip._vendor.
packse is available for realistic resolver workloads
The sibling repo at ../packse/ contains 148 dependency resolution test scenarios. These can be used to create realistic profiling workloads by building the packse index and running pip install --index-url <packse-url> <scenario-package>. Categories with the most resolver stress: fork (32 scenarios), prereleases (20), local versions (16), requires-python (15).
Apr05 session: optimization results
Key findings from the optimization session:
-
get_supported() is the single most impactful cache target. A single
@functools.lru_cacheon the underlying implementation reduces Tag.init calls from 45K to 1.5K in resolver test workloads (97% reduction). The cache key is (version, platforms_tuple, impl, abis_tuple). Hit rate is high because the same TargetPython params are used across resolution. -
canonicalize_name() has 92% cache hit rate. Package names are canonicalized repeatedly during resolution — once for each candidate evaluation, each distribution check, and each requirement comparison. An
lru_cache(maxsize=1024)catches the vast majority of calls. -
Test suite wall-clock is poor proxy for pip performance. The unit test suite is dominated by test fixture creation (0.3s setup per resolver test × 40 tests = 12s), I/O (directory scanning), and subprocess calls (build isolation). Caches provide little benefit because each test creates fresh state. Real pip invocations process a single dependency tree where caches accumulate hits.
-
cProfile overhead is higher with lru_cache. cProfile tracks every function call including the lru_cache wrapper. The profiling overhead ratio is ~3.3x with cached functions vs ~1.2x without. This makes the optimized code look slower under cProfile, but real execution is equivalent or faster.
-
Python 3.15 is significantly faster than 3.14. The same unit test suite runs in ~37-42s on Py 3.15 vs ~130s on Py 3.14. This is from general CPython performance improvements, not pip-specific changes.
-
E2E profiling reveals completely different targets than unit tests. The unit test suite is dominated by test infrastructure (fixture creation, subprocess calls). Real
pip install --dry-run flask django boto3 requestswith cached metadata reveals: Link object creation (12K+), Version operations, URL cleaning, and filename parsing dominate. Always profile real workloads. -
Double urlsplit is a hidden 2.5% cost.
_ensure_quoted_urldoesurlsplitto check the path, thenLink.__init__doesurlsplitagain on the same URL. Integrating quoting into init eliminates this. For HTTP/HTTPS URLs with already-clean paths (99% from package indices), a regex fast-path (_PATH_ALREADY_QUOTED_RE) skips_clean_url_pathentirely. -
Pre-computing hot properties in init is the most effective pattern for high-volume objects. Link objects are created 12K+ times. Moving splitext, filename, and hash computation from property access to init eliminated ~7% of self-time because these properties were accessed 2-4x each per link during evaluation and sorting.
-
Lazy parsing for rarely-used fields saves significant time.
upload_time(ISO datetime) was parsed eagerly for all 12K links but only used when--uploaded-prior-toflag is set (rare). Deferring parse_iso_datetime to first property access eliminated 1.4% of self-time. -
Remaining performance floor after link/version optimizations. Profile is now flat: Version.init (7%), Link.init (7%), evaluate_link (5%), from_json (4%), specifier filtering (3%), version comparison (3%). These are core resolution operations — further gains require algorithmic changes (reducing candidate count) or resolver restructuring.
-
parse_wheel_filename cache has 75% hit rate in single-package installs. In larger resolutions with many candidates from the same package, hit rate is higher.
Apr 2026 session: optimization floor analysis
-
_evaluate_json_page is at the Python-level floor. Per-entry processing costs 4.2us across 13.7K entries. The cost is spread across dict.get (65K calls, 0.009s), str.endswith (21K, 0.003s), str operations (rsplit/find/split/startswith: ~0.005s total), object construction (15K new, 0.002s), and isinstance (19K, 0.002s). No single operation dominates. A py3-none-any fast path that replaces rsplit+set-lookup with a single endswith showed <2% improvement within noise. The function's self-time is fundamentally the cost of executing ~340 lines of Python bytecode per entry.
-
Resolution round counts are much lower than expected after caching. The two-level cache in _iter_found_candidates (experiment 18) reduced resolver iterations dramatically. flask+django+boto3+requests: 23 state pushes, 0 backtracks. fastapi[standard]: 48 pushes, 0 backtracks. COW state snapshots, IteratorMapping elimination, and other per-round optimizations have negligible impact at these scales.
-
Wall time is I/O-dominated after CPU optimizations. For flask+django+boto3+requests (826ms optimized), HTTP requests account for ~70% of wall time. The 41 HTTP requests (21 for index pages + 20 for metadata) are serialized by the resolver's sequential processing of packages. Our parallel prefetch infrastructure helps but can only overlap I/O for packages discovered through dependency traversal, not the initial set.
-
Benchmark results are highly sensitive to network conditions and cache state. The same benchmark can vary 2-3x depending on HTTP cache warmth and network latency. Always use hyperfine with warmup runs and report median/mean with sigma. The "Using cached" lines in pip output indicate cache hits; "Downloading" indicates misses.
-
make_install_req_from_link serialize-reparse is wasteful but low-impact. The function serializes a Requirement to string then re-parses it through install_req_from_line (which does os.path.normpath, os.path.abspath, URL parsing, etc.). But with only 21 calls at 0.1ms each (2.2ms total), the absolute impact is negligible. Would only matter for workloads with thousands of direct requirements.
Local warehouse (PyPI) is running
A full warehouse instance is running via Docker at http://localhost:80/. The Simple API is at http://localhost:80/simple/. This enables end-to-end profiling of the entire pip → network → warehouse → database stack. The warehouse source at ../warehouse/ is live-reloaded by gunicorn — changes to warehouse/api/simple.py (the Simple API endpoint) take effect immediately. Manage with cd ../warehouse && docker compose [up -d | down | logs web].