codeflash-agent/.codeflash/pypa/pip/data/io-analysis.md
Kevin Turcios 3b59d97647 squash
2026-04-13 14:12:17 -05:00

32 KiB

Pip I/O Layer Deep Analysis

Investigation date: 2026-04-08 Branch: codeflash/optimize Investigator: Research agent


1. Request Flow Diagram

User: pip install <pkg>
  |
  v
Resolver (resolvelib)
  |
  +-- provider.get_dependencies(candidate)
  |     +-- prefetch_packages(dep_names)  [background threads]
  |
  +-- provider.find_matches(identifier)
  |     +-- factory.find_candidates()
  |           +-- finder.find_best_candidate(name)
  |                 +-- finder.find_all_candidates(name)
  |                       |
  |                       +-- [check _all_candidates cache]
  |                       +-- [check _prefetch_futures]
  |                       +-- _do_fetch_all_candidates(name)
  |                             |
  |                             +-- link_collector.collect_sources(name)
  |                             |     +-- search_scope.get_index_urls_locations(name)
  |                             |           # => ["https://pypi.org/simple/<name>/"]
  |                             |
  |                             +-- source.page_candidates()
  |                                   +-- process_project_url(url)
  |                                         |
  |                                         +-- link_collector.fetch_response(url)
  |                                         |     +-- _get_index_content(url)
  |                                         |           +-- _get_simple_response(url, session)
  |                                         |                 |
  |                                         |                 v
  |                                         |           session.get(url, headers={
  |                                         |             Accept: "application/vnd.pypi.simple.v1+json, ...",
  |                                         |             Cache-Control: "max-age=0"
  |                                         |           })
  |                                         |                 |
  |                                         |                 v
  |                                         |           CacheControlAdapter.send()
  |                                         |             +-- controller.cached_request()
  |                                         |             |     # max-age=0 => ALWAYS bypasses cache
  |                                         |             |     # Adds If-None-Match / If-Modified-Since
  |                                         |             +-- controller.conditional_headers()
  |                                         |             +-- HTTPAdapter.send()
  |                                         |                   +-- urllib3.HTTPSConnectionPool.urlopen()
  |                                         |                         +-- _get_conn() [from pool queue]
  |                                         |                         +-- TLS handshake (if new conn)
  |                                         |                         +-- HTTP/1.1 GET request
  |                                         |                         +-- _put_conn() [return to pool]
  |                                         |             +-- controller.cache_response()
  |                                         |                   # Stores response w/ ETag for
  |                                         |                   # future conditional requests
  |                                         |
  |                                         +-- [JSON] _evaluate_json_page()
  |                                         +-- [HTML] parse_links() + evaluate_links()
  |
  +-- candidate.dist  [triggers metadata fetch]
        +-- _prepare()
              +-- preparer.prepare_linked_requirement()
                    +-- _fetch_metadata_only()
                    |     +-- [1] _fetch_metadata_using_link_data_attr()
                    |     |     # PEP 658: GET <url>.metadata
                    |     +-- [2] _fetch_metadata_using_lazy_wheel()
                    |           # HTTP Range requests on .whl
                    |           +-- LazyZipOverHTTP(url, session)
                    |                 +-- session.head(url)  # get Content-Length
                    |                 +-- _check_zip()       # range-fetch tail
                    |                 +-- ZipFile(self)       # parse EOCD
                    +-- [3] Full download as fallback

Request Count Per Package (typical PyPI resolution)

For each unique package name the resolver encounters:

  1. 1 GET for the index page (/simple/<name>/) -- conditional if cached
  2. 1 GET for metadata (PEP 658 .metadata file) -- OR -- 1 HEAD + 1-2 Range GETs for lazy wheel metadata -- OR -- 1 GET full wheel download as fallback
  3. 1 GET for the actual wheel download (after resolution)

For a workload like boto3 with ~40 transitive deps:

  • ~40 index page GETs (conditional requests)
  • ~40 metadata GETs (PEP 658 when available)
  • ~40 wheel download GETs
  • Total: ~120 HTTP requests minimum

2. Per-Area Findings

2.1 HTTP Request Flow

How requests are serialized:

  • The resolver processes packages sequentially through resolvelib's resolve() loop
  • Each find_matches() call triggers find_all_candidates(), which fetches the index page synchronously (unless prefetched)
  • Each get_dependencies() call triggers candidate.dist, which fetches metadata synchronously (unless prefetched)

Existing parallelism (two separate thread pools):

  1. Index page prefetch (PackageFinder._prefetch_executor): 16 worker threads
    • Triggered in provider.get_dependencies() for all discovered deps
    • Triggered in resolver.resolve() for all root requirements
    • Workers call _do_fetch_all_candidates() which does the full index fetch + evaluate pipeline
  2. Metadata prefetch (Factory._metadata_prefetch_executor): 8 worker threads
    • Triggered in _iter_found_candidates() for the top candidate only
    • Workers call candidate.dist which triggers PEP 658 / lazy wheel

Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate. The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."

Redundant requests found:

  • LazyZipOverHTTP.__init__() sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is never needed -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.
  • However, the HEAD request in LazyZipOverHTTP is sent even if the wheel doesn't support range requests, wasting one round trip before discovering this.
  • _get_simple_response() sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.

2.2 Connection Reuse & Pooling

Current configuration (session.py lines 388-389):

_pool_connections = 20   # urllib3 PoolManager caches pools for 20 distinct hosts
_pool_maxsize = 16       # Each pool keeps up to 16 idle connections

Analysis:

  • Pool is correctly sized for 16 prefetch workers
  • pool_block=False (default) means excess connections proceed but aren't returned to pool
  • Connections ARE reused for same-host requests through urllib3's HTTPSConnectionPool._get_conn() / _put_conn() mechanism
  • HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
  • The connection pool is per-(host, port, scheme), so pypi.org:443 and files.pythonhosted.org:443 each get their own pool
  • A typical pip install touches only 2-3 hosts: pypi.org (index pages), files.pythonhosted.org (wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.

TLS Handshake Analysis:

  • A TLS handshake happens once per connection (not per request)
  • With pool_maxsize=16, up to 16 connections are kept alive per host
  • The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
  • Risk: If more than 16 requests fire concurrently to the same host, excess connections are created and then discarded (not pooled), causing extra TLS handshakes. With pool_block=False, they proceed but the connection is thrown away after use.

Finding: The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.

2.3 Caching Layer

How CacheControl works with pip:

  1. Index pages (/simple/<name>/): Sent with Cache-Control: max-age=0 header

    • CacheControl controller sees max-age=0 and always bypasses the cache (controller.py line 184-186)
    • But it adds conditional headers (If-None-Match, If-Modified-Since) via conditional_headers()
    • On 304 Not Modified, the cached response is served (no body transfer)
    • On 200, the response is cached with its ETag for next time
    • This is working as intended -- ensures freshness while avoiding re-downloading unchanged index pages
  2. Package downloads (wheels, sdists): Sent via Downloader._http_get() with Accept-Encoding: identity

    • No Cache-Control: max-age=0 header on these requests
    • CacheControl can serve fully cached responses for packages that haven't changed
    • SafeFileCache stores metadata + body as separate files on disk
    • Cache key is the full URL (after normalization)
  3. PEP 658 metadata files: Fetched via get_http_url() using the Downloader

    • Same caching behavior as package downloads
    • Small files (~5-50KB), cached effectively
  4. Lazy wheel range requests: Sent with Cache-Control: no-cache

    • Explicitly bypasses caching (lazy_wheel.py line 180)
    • This is correct -- range requests for ZIP metadata shouldn't be cached as full responses

Cache efficiency finding:

  • The max-age=0 on index pages means every resolution always incurs at least one conditional round-trip per package. This is the single biggest I/O constraint for warm-cache scenarios.
  • For a pip install --upgrade with warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org).
  • Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds just in sequential conditional GETs (partially parallelized by prefetch).

2.4 Metadata Fetching

Fallback chain (prepare.py _fetch_metadata_only()):

  1. PEP 658 metadata (_fetch_metadata_using_link_data_attr()):

    • Checks link.metadata_link() -- the link must have data-dist-info-metadata or core-metadata attribute
    • If present, downloads the separate .metadata file (tiny: 5-50KB)
    • PyPI supports PEP 658 for all wheels uploaded after ~2023
    • This is the fastest path: single small GET
  2. Lazy wheel (_fetch_metadata_using_lazy_wheel()):

    • Requires --use-feature=fast-deps flag
    • Sends HEAD to get Content-Length and check Accept-Ranges
    • Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
    • Parses the ZIP to find METADATA file, downloads just that range
    • 2-4 HTTP requests per wheel (HEAD + 1-3 range GETs)
    • Has a _lazy_wheel_cache to avoid redundant range requests for same URL
  3. Full download (fallback):

    • Downloads entire wheel/sdist
    • For wheels: extracts metadata from the archive
    • For sdists: runs setup.py egg_info or pyproject.toml build
    • Most expensive path

Key finding: PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.

Optimization in place: _lazy_wheel_cache (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., pkg and pkg[extra]).

2.5 DNS & TLS

DNS resolution:

  • urllib3 delegates to Python's socket.create_connection() which calls getaddrinfo()
  • No DNS caching in urllib3 or pip -- relies on OS-level DNS cache
  • However, connection pooling effectively caches DNS results because connections persist
  • With 16 pool connections to pypi.org, DNS is resolved at most once per connection creation

TLS handshakes:

  • One TLS handshake per connection (not per request)
  • Connection pooling limits handshakes to pool_maxsize (16) per host
  • Python's ssl module handles TLS session resumption at the OpenSSL level
  • _SSLContextAdapterMixin (session.py line 255) properly forwards the SSL context to pools

Finding: DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.

2.6 HTTP/2 and Protocol

Current state: pip uses HTTP/1.1 exclusively.

  • The vendored urllib3 (version appears to be 1.x/2.x line) does not support HTTP/2
  • The vendored requests library has no HTTP/2 support
  • There are no references to HTTP/2, h2, or hyper anywhere in pip's codebase

Would HTTP/2 help?

  • Index page fetches: HTTP/2 multiplexing would allow sending all ~40 index page requests over a single TCP connection to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
  • Metadata fetches: Similarly multiplexed over the same connection.
  • Package downloads: Less benefit -- these are large sequential downloads.

Estimated benefit: For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.

What it would take:

  • Replace vendored requests/urllib3 with httpx (supports HTTP/2 via h2) or add h2 to urllib3
  • Major architectural change -- affects all of pip's network layer
  • PyPI's CDN (Fastly) already supports HTTP/2

2.7 Parallel I/O Architecture

Index page prefetch (PackageFinder):

# package_finder.py lines 1535-1556
def prefetch_packages(self, project_names):
    with self._prefetch_lock:
        for name in project_names:
            if name in self._all_candidates or name in self._prefetch_futures:
                continue
            if self._prefetch_executor is None:
                self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
            self._prefetch_futures[name] = self._prefetch_executor.submit(
                self._do_fetch_all_candidates, name
            )
  • Called from two places:
    1. resolver.resolve() -- submits all root requirements upfront
    2. provider.get_dependencies() -- submits all discovered deps
  • Workers run _do_fetch_all_candidates() which does the full pipeline: collect_sources -> fetch_response -> parse/evaluate
  • Results cached in _all_candidates dict
  • find_all_candidates() checks futures with 10s timeout

Metadata prefetch (Factory):

# factory.py lines 188-245
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
    # Build top candidate eagerly (cheap: wheel-cache lookup)
    candidate = build_func()
    # Only prefetch for remote wheels
    if link.is_file or not link.is_wheel:
        return
    def _do_prefetch():
        candidate.dist  # triggers prepare_linked_requirement()
    # Submit to 8-thread pool
    self._metadata_prefetch_executor.submit(_do_prefetch)

Serialization points that force sequential I/O:

  1. resolvelib's main loop is single-threaded. Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
  2. _complete_partial_requirements() (prepare.py line 474) downloads all "needs more preparation" requirements sequentially via self._download.batch() -- which is just a for-loop, NOT actually batched/parallel.
  3. The Downloader.batch() method (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:
    def batch(self, links, location):
        for link in links:
            filepath, content_type = self(link, location)
            yield link, (filepath, content_type)
    
    This is a significant finding. All final wheel downloads happen sequentially.

2.8 Response Compression

Index page requests:

  • The _get_simple_response() in collector.py sets custom Accept headers but does NOT set Accept-Encoding
  • Requests library's default Accept-Encoding header is gzip, deflate (from urllib3's ACCEPT_ENCODING = "gzip,deflate", applied by requests' default_headers())
  • Index pages ARE compressed by PyPI/Fastly with gzip. The requests library transparently decompresses them.
  • No brotli support (would require brotli or brotlicffi package)

Package downloads:

  • Downloader._http_get() uses HEADERS = {"Accept-Encoding": "identity"} (utils.py line 26)
  • Package downloads explicitly disable compression. This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
  • response_chunks() uses decode_content=False to preserve raw bytes for hash checking.

Finding: Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.

2.9 Lazy/Streaming Approaches

Current behavior:

  • Index pages: response.content (collector.py line 309) reads the entire response into memory before parsing
  • JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
  • HTML index pages are similar in size

Streaming opportunity:

  • JSON index pages COULD be streamed using an incremental JSON parser (e.g., ijson)
  • However, json.loads() on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip
  • The real cost is not parsing but candidate evaluation -- the _evaluate_json_page() fast path already handles this efficiently with a single-pass fused pipeline

Early abort opportunity:

  • When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
  • Problem: The index page must be fully fetched before we know all versions (no streaming API from PyPI)
  • The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate

Finding: Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.

2.10 PyPI-Specific Optimizations

Bulk/batch APIs:

  • PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
  • The Simple Repository API (PEP 503/691) is package-by-package
  • There is no "dependency tree" API that would let pip skip index page fetches

CDN-level optimizations already in use:

  • Cache-Control: max-age=0 with conditional requests (ETags/Last-Modified) -- implemented
  • PyPI responses include strong ETags
  • 304 responses save bandwidth but still cost one RTT each

JSON API:

  • pip already prefers the JSON Simple API (application/vnd.pypi.simple.v1+json) via Accept header priority
  • The JSON path (_evaluate_json_page()) is heavily optimized with fused evaluation
  • PyPI's JSON API doesn't support partial responses or field selection

Server-push / Link preload:

  • PyPI doesn't support HTTP/2 Server Push for metadata files
  • Even with HTTP/2, the server can't know which wheel the client will pick

3. Optimization Ideas (Ranked by Expected Impact)

Tier 1: High Impact (10-30% wall-time reduction)

3.1 Parallel Wheel Downloads

What: Replace the sequential Downloader.batch() for-loop with concurrent.futures.ThreadPoolExecutor. Where: src/pip/_internal/network/download.py lines 179-184 and src/pip/_internal/operations/prepare.py lines 492-493. Why: After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages. Expected improvement: 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time. Complexity: Medium. Need to handle progress bar display for parallel downloads and ensure thread safety. Risk: Low -- downloads are independent operations. pip-only change: Yes.

3.2 Pipeline Index Fetch + Metadata Prefetch

What: When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result. Where: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates() should call factory._prefetch_top_candidate_metadata() at the end. Why: Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls _iter_found_candidates(). This gap can be 100ms-2s depending on how fast the resolver processes. Expected improvement: 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts." Complexity: Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one). Risk: Low-medium -- need to ensure thread safety for candidate cache. pip-only change: Yes.

3.3 Increase Metadata Prefetch Depth

What: Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks). Where: src/pip/_internal/resolution/resolvelib/factory.py _prefetch_top_candidate_metadata(). Why: The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking. Expected improvement: 3-8% for workloads with backtracking. Complexity: Low. Risk: Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB). pip-only change: Yes.

Tier 2: Medium Impact (5-15% wall-time reduction)

3.4 HTTP/2 Support via httpx

What: Replace the requests + urllib3 stack with httpx which supports HTTP/2 multiplexing. Why: With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses. Expected improvement: 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small). Complexity: Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters. Risk: High -- potential for regressions across pip's extensive networking surface. pip-only change: Yes, but major architectural change.

3.5 Conditional Request Short-Circuit for Index Pages

What: For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily. Where: Before calling resolver.resolve(), pre-submit conditional GETs for ALL packages known from the lock file or previous resolution. Why: Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously. Expected improvement: 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones. Complexity: Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result). Risk: Low -- conditional GETs are safe to fire speculatively. pip-only change: Yes.

3.6 Connection Pre-warming

What: Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests. Where: src/pip/_internal/network/session.py PipSession.__init__(). Why: The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work. Expected improvement: 2-5% (saves ~200ms one-time cost). Complexity: Low. Risk: Low -- harmless if the connections go unused (they just time out). pip-only change: Yes.

Tier 3: Low Impact (1-5% wall-time reduction)

3.7 Cache Index ETags In-Memory Across Packages

What: After the first conditional GET returns an ETag for pypi.org/simple/, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age. Expected improvement: Negligible (<1%). The conditional request still requires a round trip. pip-only change: Yes.

3.8 Brotli Compression for Index Pages

What: Add brotli or brotlicffi as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip). Why: Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages. Expected improvement: 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that. Complexity: Low. Just add the dependency and urllib3/requests will advertise brotli support. Risk: Low. Optional dependency, gzip fallback. pip-only change: Yes.


4. Quick Wins (< 50 lines of code)

QW1: Parallel Wheel Downloads (the biggest quick win)

File: src/pip/_internal/operations/prepare.py _complete_partial_requirements() Change: Replace the sequential self._download.batch() loop with ThreadPoolExecutor.map():

# Current (sequential):
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
for link, (filepath, _) in batch_download:
    ...

# Proposed (parallel):
with ThreadPoolExecutor(max_workers=8) as pool:
    results = pool.map(
        lambda link: (link, self._download(link, temp_dir)),
        links_to_fully_download.keys()
    )
    for link, (filepath, _) in results:
        ...

Lines: ~15 changed Impact: 15-25% wall-time reduction on download-heavy workloads

QW2: Pipeline Index + Metadata Prefetch

File: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates() Change: After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:

# At the end of _do_fetch_all_candidates:
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
    self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])

Lines: ~20 changed (add callback registration + invocation) Impact: 5-15% for resolution-heavy workloads

QW3: Connection Pre-warming

File: src/pip/_internal/network/session.py Change: Add a prewarm() method that opens connections to known hosts in background threads:

def prewarm(self, urls: list[str]) -> None:
    """Open TCP+TLS connections in background to reduce first-request latency."""
    from concurrent.futures import ThreadPoolExecutor
    def _warm(url):
        try:
            self.head(url, timeout=5)
        except Exception:
            pass
    with ThreadPoolExecutor(max_workers=2) as pool:
        pool.map(_warm, urls)

Lines: ~15 Impact: 2-5% (saves ~200ms startup)

QW4: Prefetch Top 2-3 Candidates' Metadata

File: src/pip/_internal/resolution/resolvelib/factory.py Change: In _iter_found_candidates(), prefetch metadata for top 2-3 candidates instead of just top 1:

# Current: prefetch only infos_list[0]
# Proposed: prefetch infos_list[0:3]
for info in infos_list[:3]:
    self._prefetch_top_candidate_metadata(name, info, extras, template)

Lines: ~10 changed Impact: 3-8% for workloads with backtracking


5. Big Bets (Architectural Changes for 20%+ Improvement)

BB1: Fully Parallel Resolution Pipeline

Description: Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified. Mechanism: This is essentially an async resolver. Could be implemented with:

  • asyncio event loop driving the resolver
  • aiohttp or httpx async client for HTTP
  • resolvelib with a coroutine-based provider Expected improvement: 30-50% for large dependency trees. Eliminates all serial I/O gaps. Complexity: Very high. Fundamental architectural change to pip's resolver integration. Risk: High -- resolvelib is synchronous by design.

BB2: HTTP/2 Multiplexing

Description: Replace vendored requests + urllib3 with httpx (which supports HTTP/2 via h2). Expected improvement: 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests. Complexity: Very high. ~500+ line change touching all network code. Risk: High.

BB3: Dependency Prediction + Bulk Prefetch

Description: Maintain a local cache of "last resolved dependency tree" per project. On next pip install, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts. Expected improvement: 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup. Complexity: Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses. Risk: Medium. Wrong predictions waste bandwidth but don't cause correctness issues.

BB4: Server-Side Dependency Resolution API

Description: Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests. Expected improvement: 50-80% for cold-cache scenarios. Eliminates all per-package round trips. Complexity: Very high. Requires PyPI server cooperation, PEP process, etc. Risk: High. Requires ecosystem buy-in. Fallback to current behavior needed.


6. Summary of Key Files

File Role
src/pip/_internal/index/collector.py Fetches index pages, parses HTML/JSON
src/pip/_internal/index/package_finder.py Evaluates candidates, manages prefetch pool (16 threads)
src/pip/_internal/network/session.py PipSession, connection pool config (20/16), adapters
src/pip/_internal/network/cache.py SafeFileCache (filesystem-based HTTP cache)
src/pip/_internal/network/download.py Downloader (sequential batch downloads!)
src/pip/_internal/network/lazy_wheel.py LazyZipOverHTTP for range-request metadata
src/pip/_internal/network/utils.py Accept-Encoding: identity for downloads, chunk streaming
src/pip/_internal/operations/prepare.py RequirementPreparer, metadata fetch chain
src/pip/_internal/resolution/resolvelib/factory.py Metadata prefetch pool (8 threads), candidate building
src/pip/_internal/resolution/resolvelib/provider.py Triggers dep prefetch in get_dependencies()
src/pip/_internal/resolution/resolvelib/resolver.py Kicks off root requirement prefetch
src/pip/_internal/resolution/resolvelib/candidates.py Thread-safe dist preparation with _prepare_lock
src/pip/_vendor/cachecontrol/adapter.py CacheControlAdapter -- intercepts requests for caching
src/pip/_vendor/cachecontrol/controller.py Cache logic: max-age=0 bypass, conditional headers, 304 handling

7. Critical Finding: Sequential Download Is The Biggest Remaining Win

The single most impactful optimization remaining is parallelizing wheel downloads. After resolution completes, _complete_partial_requirements() downloads all wheels sequentially through Downloader.batch(). This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.