32 KiB
Pip I/O Layer Deep Analysis
Investigation date: 2026-04-08
Branch: codeflash/optimize
Investigator: Research agent
1. Request Flow Diagram
User: pip install <pkg>
|
v
Resolver (resolvelib)
|
+-- provider.get_dependencies(candidate)
| +-- prefetch_packages(dep_names) [background threads]
|
+-- provider.find_matches(identifier)
| +-- factory.find_candidates()
| +-- finder.find_best_candidate(name)
| +-- finder.find_all_candidates(name)
| |
| +-- [check _all_candidates cache]
| +-- [check _prefetch_futures]
| +-- _do_fetch_all_candidates(name)
| |
| +-- link_collector.collect_sources(name)
| | +-- search_scope.get_index_urls_locations(name)
| | # => ["https://pypi.org/simple/<name>/"]
| |
| +-- source.page_candidates()
| +-- process_project_url(url)
| |
| +-- link_collector.fetch_response(url)
| | +-- _get_index_content(url)
| | +-- _get_simple_response(url, session)
| | |
| | v
| | session.get(url, headers={
| | Accept: "application/vnd.pypi.simple.v1+json, ...",
| | Cache-Control: "max-age=0"
| | })
| | |
| | v
| | CacheControlAdapter.send()
| | +-- controller.cached_request()
| | | # max-age=0 => ALWAYS bypasses cache
| | | # Adds If-None-Match / If-Modified-Since
| | +-- controller.conditional_headers()
| | +-- HTTPAdapter.send()
| | +-- urllib3.HTTPSConnectionPool.urlopen()
| | +-- _get_conn() [from pool queue]
| | +-- TLS handshake (if new conn)
| | +-- HTTP/1.1 GET request
| | +-- _put_conn() [return to pool]
| | +-- controller.cache_response()
| | # Stores response w/ ETag for
| | # future conditional requests
| |
| +-- [JSON] _evaluate_json_page()
| +-- [HTML] parse_links() + evaluate_links()
|
+-- candidate.dist [triggers metadata fetch]
+-- _prepare()
+-- preparer.prepare_linked_requirement()
+-- _fetch_metadata_only()
| +-- [1] _fetch_metadata_using_link_data_attr()
| | # PEP 658: GET <url>.metadata
| +-- [2] _fetch_metadata_using_lazy_wheel()
| # HTTP Range requests on .whl
| +-- LazyZipOverHTTP(url, session)
| +-- session.head(url) # get Content-Length
| +-- _check_zip() # range-fetch tail
| +-- ZipFile(self) # parse EOCD
+-- [3] Full download as fallback
Request Count Per Package (typical PyPI resolution)
For each unique package name the resolver encounters:
- 1 GET for the index page (
/simple/<name>/) -- conditional if cached - 1 GET for metadata (PEP 658
.metadatafile) -- OR -- 1 HEAD + 1-2 Range GETs for lazy wheel metadata -- OR -- 1 GET full wheel download as fallback - 1 GET for the actual wheel download (after resolution)
For a workload like boto3 with ~40 transitive deps:
- ~40 index page GETs (conditional requests)
- ~40 metadata GETs (PEP 658 when available)
- ~40 wheel download GETs
- Total: ~120 HTTP requests minimum
2. Per-Area Findings
2.1 HTTP Request Flow
How requests are serialized:
- The resolver processes packages sequentially through resolvelib's
resolve()loop - Each
find_matches()call triggersfind_all_candidates(), which fetches the index page synchronously (unless prefetched) - Each
get_dependencies()call triggerscandidate.dist, which fetches metadata synchronously (unless prefetched)
Existing parallelism (two separate thread pools):
- Index page prefetch (
PackageFinder._prefetch_executor): 16 worker threads- Triggered in
provider.get_dependencies()for all discovered deps - Triggered in
resolver.resolve()for all root requirements - Workers call
_do_fetch_all_candidates()which does the full index fetch + evaluate pipeline
- Triggered in
- Metadata prefetch (
Factory._metadata_prefetch_executor): 8 worker threads- Triggered in
_iter_found_candidates()for the top candidate only - Workers call
candidate.distwhich triggers PEP 658 / lazy wheel
- Triggered in
Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate. The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."
Redundant requests found:
LazyZipOverHTTP.__init__()sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is never needed -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.- However, the HEAD request in
LazyZipOverHTTPis sent even if the wheel doesn't support range requests, wasting one round trip before discovering this. _get_simple_response()sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.
2.2 Connection Reuse & Pooling
Current configuration (session.py lines 388-389):
_pool_connections = 20 # urllib3 PoolManager caches pools for 20 distinct hosts
_pool_maxsize = 16 # Each pool keeps up to 16 idle connections
Analysis:
- Pool is correctly sized for 16 prefetch workers
pool_block=False(default) means excess connections proceed but aren't returned to pool- Connections ARE reused for same-host requests through urllib3's
HTTPSConnectionPool._get_conn()/_put_conn()mechanism - HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
- The connection pool is per-(host, port, scheme), so
pypi.org:443andfiles.pythonhosted.org:443each get their own pool - A typical pip install touches only 2-3 hosts:
pypi.org(index pages),files.pythonhosted.org(wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.
TLS Handshake Analysis:
- A TLS handshake happens once per connection (not per request)
- With pool_maxsize=16, up to 16 connections are kept alive per host
- The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
- Risk: If more than 16 requests fire concurrently to the same host, excess connections are created and then discarded (not pooled), causing extra TLS handshakes. With
pool_block=False, they proceed but the connection is thrown away after use.
Finding: The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.
2.3 Caching Layer
How CacheControl works with pip:
-
Index pages (
/simple/<name>/): Sent withCache-Control: max-age=0header- CacheControl controller sees
max-age=0and always bypasses the cache (controller.py line 184-186) - But it adds conditional headers (
If-None-Match,If-Modified-Since) viaconditional_headers() - On 304 Not Modified, the cached response is served (no body transfer)
- On 200, the response is cached with its ETag for next time
- This is working as intended -- ensures freshness while avoiding re-downloading unchanged index pages
- CacheControl controller sees
-
Package downloads (wheels, sdists): Sent via
Downloader._http_get()withAccept-Encoding: identity- No
Cache-Control: max-age=0header on these requests - CacheControl can serve fully cached responses for packages that haven't changed
SafeFileCachestores metadata + body as separate files on disk- Cache key is the full URL (after normalization)
- No
-
PEP 658 metadata files: Fetched via
get_http_url()using the Downloader- Same caching behavior as package downloads
- Small files (~5-50KB), cached effectively
-
Lazy wheel range requests: Sent with
Cache-Control: no-cache- Explicitly bypasses caching (lazy_wheel.py line 180)
- This is correct -- range requests for ZIP metadata shouldn't be cached as full responses
Cache efficiency finding:
- The
max-age=0on index pages means every resolution always incurs at least one conditional round-trip per package. This is the single biggest I/O constraint for warm-cache scenarios. - For a
pip install --upgradewith warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org). - Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds just in sequential conditional GETs (partially parallelized by prefetch).
2.4 Metadata Fetching
Fallback chain (prepare.py _fetch_metadata_only()):
-
PEP 658 metadata (
_fetch_metadata_using_link_data_attr()):- Checks
link.metadata_link()-- the link must havedata-dist-info-metadataorcore-metadataattribute - If present, downloads the separate
.metadatafile (tiny: 5-50KB) - PyPI supports PEP 658 for all wheels uploaded after ~2023
- This is the fastest path: single small GET
- Checks
-
Lazy wheel (
_fetch_metadata_using_lazy_wheel()):- Requires
--use-feature=fast-depsflag - Sends HEAD to get Content-Length and check Accept-Ranges
- Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
- Parses the ZIP to find METADATA file, downloads just that range
- 2-4 HTTP requests per wheel (HEAD + 1-3 range GETs)
- Has a
_lazy_wheel_cacheto avoid redundant range requests for same URL
- Requires
-
Full download (fallback):
- Downloads entire wheel/sdist
- For wheels: extracts metadata from the archive
- For sdists: runs
setup.py egg_infoorpyproject.tomlbuild - Most expensive path
Key finding: PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.
Optimization in place: _lazy_wheel_cache (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., pkg and pkg[extra]).
2.5 DNS & TLS
DNS resolution:
- urllib3 delegates to Python's
socket.create_connection()which callsgetaddrinfo() - No DNS caching in urllib3 or pip -- relies on OS-level DNS cache
- However, connection pooling effectively caches DNS results because connections persist
- With 16 pool connections to
pypi.org, DNS is resolved at most once per connection creation
TLS handshakes:
- One TLS handshake per connection (not per request)
- Connection pooling limits handshakes to pool_maxsize (16) per host
- Python's
sslmodule handles TLS session resumption at the OpenSSL level _SSLContextAdapterMixin(session.py line 255) properly forwards the SSL context to pools
Finding: DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.
2.6 HTTP/2 and Protocol
Current state: pip uses HTTP/1.1 exclusively.
- The vendored
urllib3(version appears to be 1.x/2.x line) does not support HTTP/2 - The vendored
requestslibrary has no HTTP/2 support - There are no references to HTTP/2, h2, or hyper anywhere in pip's codebase
Would HTTP/2 help?
- Index page fetches: HTTP/2 multiplexing would allow sending all ~40 index page requests over a single TCP connection to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
- Metadata fetches: Similarly multiplexed over the same connection.
- Package downloads: Less benefit -- these are large sequential downloads.
Estimated benefit: For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.
What it would take:
- Replace vendored
requests/urllib3withhttpx(supports HTTP/2 viah2) or addh2to urllib3 - Major architectural change -- affects all of pip's network layer
- PyPI's CDN (Fastly) already supports HTTP/2
2.7 Parallel I/O Architecture
Index page prefetch (PackageFinder):
# package_finder.py lines 1535-1556
def prefetch_packages(self, project_names):
with self._prefetch_lock:
for name in project_names:
if name in self._all_candidates or name in self._prefetch_futures:
continue
if self._prefetch_executor is None:
self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
self._prefetch_futures[name] = self._prefetch_executor.submit(
self._do_fetch_all_candidates, name
)
- Called from two places:
resolver.resolve()-- submits all root requirements upfrontprovider.get_dependencies()-- submits all discovered deps
- Workers run
_do_fetch_all_candidates()which does the full pipeline: collect_sources -> fetch_response -> parse/evaluate - Results cached in
_all_candidatesdict find_all_candidates()checks futures with 10s timeout
Metadata prefetch (Factory):
# factory.py lines 188-245
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
# Build top candidate eagerly (cheap: wheel-cache lookup)
candidate = build_func()
# Only prefetch for remote wheels
if link.is_file or not link.is_wheel:
return
def _do_prefetch():
candidate.dist # triggers prepare_linked_requirement()
# Submit to 8-thread pool
self._metadata_prefetch_executor.submit(_do_prefetch)
Serialization points that force sequential I/O:
- resolvelib's main loop is single-threaded. Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
_complete_partial_requirements()(prepare.py line 474) downloads all "needs more preparation" requirements sequentially viaself._download.batch()-- which is just a for-loop, NOT actually batched/parallel.- The
Downloader.batch()method (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:
This is a significant finding. All final wheel downloads happen sequentially.def batch(self, links, location): for link in links: filepath, content_type = self(link, location) yield link, (filepath, content_type)
2.8 Response Compression
Index page requests:
- The
_get_simple_response()in collector.py sets customAcceptheaders but does NOT setAccept-Encoding - Requests library's default
Accept-Encodingheader isgzip, deflate(from urllib3'sACCEPT_ENCODING = "gzip,deflate", applied by requests'default_headers()) - Index pages ARE compressed by PyPI/Fastly with gzip. The requests library transparently decompresses them.
- No brotli support (would require
brotliorbrotlicffipackage)
Package downloads:
Downloader._http_get()usesHEADERS = {"Accept-Encoding": "identity"}(utils.py line 26)- Package downloads explicitly disable compression. This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
response_chunks()usesdecode_content=Falseto preserve raw bytes for hash checking.
Finding: Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.
2.9 Lazy/Streaming Approaches
Current behavior:
- Index pages:
response.content(collector.py line 309) reads the entire response into memory before parsing - JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
- HTML index pages are similar in size
Streaming opportunity:
- JSON index pages COULD be streamed using an incremental JSON parser (e.g.,
ijson) - However,
json.loads()on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip - The real cost is not parsing but candidate evaluation -- the
_evaluate_json_page()fast path already handles this efficiently with a single-pass fused pipeline
Early abort opportunity:
- When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
- Problem: The index page must be fully fetched before we know all versions (no streaming API from PyPI)
- The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate
Finding: Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.
2.10 PyPI-Specific Optimizations
Bulk/batch APIs:
- PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
- The Simple Repository API (PEP 503/691) is package-by-package
- There is no "dependency tree" API that would let pip skip index page fetches
CDN-level optimizations already in use:
Cache-Control: max-age=0with conditional requests (ETags/Last-Modified) -- implemented- PyPI responses include strong ETags
- 304 responses save bandwidth but still cost one RTT each
JSON API:
- pip already prefers the JSON Simple API (
application/vnd.pypi.simple.v1+json) via Accept header priority - The JSON path (
_evaluate_json_page()) is heavily optimized with fused evaluation - PyPI's JSON API doesn't support partial responses or field selection
Server-push / Link preload:
- PyPI doesn't support HTTP/2 Server Push for metadata files
- Even with HTTP/2, the server can't know which wheel the client will pick
3. Optimization Ideas (Ranked by Expected Impact)
Tier 1: High Impact (10-30% wall-time reduction)
3.1 Parallel Wheel Downloads
What: Replace the sequential Downloader.batch() for-loop with concurrent.futures.ThreadPoolExecutor.
Where: src/pip/_internal/network/download.py lines 179-184 and src/pip/_internal/operations/prepare.py lines 492-493.
Why: After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages.
Expected improvement: 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time.
Complexity: Medium. Need to handle progress bar display for parallel downloads and ensure thread safety.
Risk: Low -- downloads are independent operations.
pip-only change: Yes.
3.2 Pipeline Index Fetch + Metadata Prefetch
What: When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result.
Where: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates() should call factory._prefetch_top_candidate_metadata() at the end.
Why: Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls _iter_found_candidates(). This gap can be 100ms-2s depending on how fast the resolver processes.
Expected improvement: 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts."
Complexity: Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one).
Risk: Low-medium -- need to ensure thread safety for candidate cache.
pip-only change: Yes.
3.3 Increase Metadata Prefetch Depth
What: Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks).
Where: src/pip/_internal/resolution/resolvelib/factory.py _prefetch_top_candidate_metadata().
Why: The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking.
Expected improvement: 3-8% for workloads with backtracking.
Complexity: Low.
Risk: Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB).
pip-only change: Yes.
Tier 2: Medium Impact (5-15% wall-time reduction)
3.4 HTTP/2 Support via httpx
What: Replace the requests + urllib3 stack with httpx which supports HTTP/2 multiplexing.
Why: With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses.
Expected improvement: 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small).
Complexity: Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters.
Risk: High -- potential for regressions across pip's extensive networking surface.
pip-only change: Yes, but major architectural change.
3.5 Conditional Request Short-Circuit for Index Pages
What: For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily.
Where: Before calling resolver.resolve(), pre-submit conditional GETs for ALL packages known from the lock file or previous resolution.
Why: Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously.
Expected improvement: 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones.
Complexity: Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result).
Risk: Low -- conditional GETs are safe to fire speculatively.
pip-only change: Yes.
3.6 Connection Pre-warming
What: Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests.
Where: src/pip/_internal/network/session.py PipSession.__init__().
Why: The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work.
Expected improvement: 2-5% (saves ~200ms one-time cost).
Complexity: Low.
Risk: Low -- harmless if the connections go unused (they just time out).
pip-only change: Yes.
Tier 3: Low Impact (1-5% wall-time reduction)
3.7 Cache Index ETags In-Memory Across Packages
What: After the first conditional GET returns an ETag for pypi.org/simple/, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age.
Expected improvement: Negligible (<1%). The conditional request still requires a round trip.
pip-only change: Yes.
3.8 Brotli Compression for Index Pages
What: Add brotli or brotlicffi as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip).
Why: Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages.
Expected improvement: 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that.
Complexity: Low. Just add the dependency and urllib3/requests will advertise brotli support.
Risk: Low. Optional dependency, gzip fallback.
pip-only change: Yes.
4. Quick Wins (< 50 lines of code)
QW1: Parallel Wheel Downloads (the biggest quick win)
File: src/pip/_internal/operations/prepare.py _complete_partial_requirements()
Change: Replace the sequential self._download.batch() loop with ThreadPoolExecutor.map():
# Current (sequential):
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
for link, (filepath, _) in batch_download:
...
# Proposed (parallel):
with ThreadPoolExecutor(max_workers=8) as pool:
results = pool.map(
lambda link: (link, self._download(link, temp_dir)),
links_to_fully_download.keys()
)
for link, (filepath, _) in results:
...
Lines: ~15 changed Impact: 15-25% wall-time reduction on download-heavy workloads
QW2: Pipeline Index + Metadata Prefetch
File: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates()
Change: After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:
# At the end of _do_fetch_all_candidates:
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])
Lines: ~20 changed (add callback registration + invocation) Impact: 5-15% for resolution-heavy workloads
QW3: Connection Pre-warming
File: src/pip/_internal/network/session.py
Change: Add a prewarm() method that opens connections to known hosts in background threads:
def prewarm(self, urls: list[str]) -> None:
"""Open TCP+TLS connections in background to reduce first-request latency."""
from concurrent.futures import ThreadPoolExecutor
def _warm(url):
try:
self.head(url, timeout=5)
except Exception:
pass
with ThreadPoolExecutor(max_workers=2) as pool:
pool.map(_warm, urls)
Lines: ~15 Impact: 2-5% (saves ~200ms startup)
QW4: Prefetch Top 2-3 Candidates' Metadata
File: src/pip/_internal/resolution/resolvelib/factory.py
Change: In _iter_found_candidates(), prefetch metadata for top 2-3 candidates instead of just top 1:
# Current: prefetch only infos_list[0]
# Proposed: prefetch infos_list[0:3]
for info in infos_list[:3]:
self._prefetch_top_candidate_metadata(name, info, extras, template)
Lines: ~10 changed Impact: 3-8% for workloads with backtracking
5. Big Bets (Architectural Changes for 20%+ Improvement)
BB1: Fully Parallel Resolution Pipeline
Description: Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified. Mechanism: This is essentially an async resolver. Could be implemented with:
- asyncio event loop driving the resolver
aiohttporhttpxasync client for HTTP- resolvelib with a coroutine-based provider Expected improvement: 30-50% for large dependency trees. Eliminates all serial I/O gaps. Complexity: Very high. Fundamental architectural change to pip's resolver integration. Risk: High -- resolvelib is synchronous by design.
BB2: HTTP/2 Multiplexing
Description: Replace vendored requests + urllib3 with httpx (which supports HTTP/2 via h2).
Expected improvement: 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests.
Complexity: Very high. ~500+ line change touching all network code.
Risk: High.
BB3: Dependency Prediction + Bulk Prefetch
Description: Maintain a local cache of "last resolved dependency tree" per project. On next pip install, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts.
Expected improvement: 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup.
Complexity: Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses.
Risk: Medium. Wrong predictions waste bandwidth but don't cause correctness issues.
BB4: Server-Side Dependency Resolution API
Description: Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests. Expected improvement: 50-80% for cold-cache scenarios. Eliminates all per-package round trips. Complexity: Very high. Requires PyPI server cooperation, PEP process, etc. Risk: High. Requires ecosystem buy-in. Fallback to current behavior needed.
6. Summary of Key Files
| File | Role |
|---|---|
src/pip/_internal/index/collector.py |
Fetches index pages, parses HTML/JSON |
src/pip/_internal/index/package_finder.py |
Evaluates candidates, manages prefetch pool (16 threads) |
src/pip/_internal/network/session.py |
PipSession, connection pool config (20/16), adapters |
src/pip/_internal/network/cache.py |
SafeFileCache (filesystem-based HTTP cache) |
src/pip/_internal/network/download.py |
Downloader (sequential batch downloads!) |
src/pip/_internal/network/lazy_wheel.py |
LazyZipOverHTTP for range-request metadata |
src/pip/_internal/network/utils.py |
Accept-Encoding: identity for downloads, chunk streaming |
src/pip/_internal/operations/prepare.py |
RequirementPreparer, metadata fetch chain |
src/pip/_internal/resolution/resolvelib/factory.py |
Metadata prefetch pool (8 threads), candidate building |
src/pip/_internal/resolution/resolvelib/provider.py |
Triggers dep prefetch in get_dependencies() |
src/pip/_internal/resolution/resolvelib/resolver.py |
Kicks off root requirement prefetch |
src/pip/_internal/resolution/resolvelib/candidates.py |
Thread-safe dist preparation with _prepare_lock |
src/pip/_vendor/cachecontrol/adapter.py |
CacheControlAdapter -- intercepts requests for caching |
src/pip/_vendor/cachecontrol/controller.py |
Cache logic: max-age=0 bypass, conditional headers, 304 handling |
7. Critical Finding: Sequential Download Is The Biggest Remaining Win
The single most impactful optimization remaining is parallelizing wheel downloads. After resolution completes, _complete_partial_requirements() downloads all wheels sequentially through Downloader.batch(). This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.