codeflash-admin/codeflash-agent

Fork 0

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Kevin Turcios 3b59d97647 squash

2026-04-13 14:12:17 -05:00

32 KiB

Raw Blame History

Pip I/O Layer Deep Analysis

Investigation date: 2026-04-08 Branch: codeflash/optimize Investigator: Research agent

1. Request Flow Diagram

User: pip install <pkg>
  |
  v
Resolver (resolvelib)
  |
  +-- provider.get_dependencies(candidate)
  |     +-- prefetch_packages(dep_names)  [background threads]
  |
  +-- provider.find_matches(identifier)
  |     +-- factory.find_candidates()
  |           +-- finder.find_best_candidate(name)
  |                 +-- finder.find_all_candidates(name)
  |                       |
  |                       +-- [check _all_candidates cache]
  |                       +-- [check _prefetch_futures]
  |                       +-- _do_fetch_all_candidates(name)
  |                             |
  |                             +-- link_collector.collect_sources(name)
  |                             |     +-- search_scope.get_index_urls_locations(name)
  |                             |           # => ["https://pypi.org/simple/<name>/"]
  |                             |
  |                             +-- source.page_candidates()
  |                                   +-- process_project_url(url)
  |                                         |
  |                                         +-- link_collector.fetch_response(url)
  |                                         |     +-- _get_index_content(url)
  |                                         |           +-- _get_simple_response(url, session)
  |                                         |                 |
  |                                         |                 v
  |                                         |           session.get(url, headers={
  |                                         |             Accept: "application/vnd.pypi.simple.v1+json, ...",
  |                                         |             Cache-Control: "max-age=0"
  |                                         |           })
  |                                         |                 |
  |                                         |                 v
  |                                         |           CacheControlAdapter.send()
  |                                         |             +-- controller.cached_request()
  |                                         |             |     # max-age=0 => ALWAYS bypasses cache
  |                                         |             |     # Adds If-None-Match / If-Modified-Since
  |                                         |             +-- controller.conditional_headers()
  |                                         |             +-- HTTPAdapter.send()
  |                                         |                   +-- urllib3.HTTPSConnectionPool.urlopen()
  |                                         |                         +-- _get_conn() [from pool queue]
  |                                         |                         +-- TLS handshake (if new conn)
  |                                         |                         +-- HTTP/1.1 GET request
  |                                         |                         +-- _put_conn() [return to pool]
  |                                         |             +-- controller.cache_response()
  |                                         |                   # Stores response w/ ETag for
  |                                         |                   # future conditional requests
  |                                         |
  |                                         +-- [JSON] _evaluate_json_page()
  |                                         +-- [HTML] parse_links() + evaluate_links()
  |
  +-- candidate.dist  [triggers metadata fetch]
        +-- _prepare()
              +-- preparer.prepare_linked_requirement()
                    +-- _fetch_metadata_only()
                    |     +-- [1] _fetch_metadata_using_link_data_attr()
                    |     |     # PEP 658: GET <url>.metadata
                    |     +-- [2] _fetch_metadata_using_lazy_wheel()
                    |           # HTTP Range requests on .whl
                    |           +-- LazyZipOverHTTP(url, session)
                    |                 +-- session.head(url)  # get Content-Length
                    |                 +-- _check_zip()       # range-fetch tail
                    |                 +-- ZipFile(self)       # parse EOCD
                    +-- [3] Full download as fallback

Request Count Per Package (typical PyPI resolution)

For each unique package name the resolver encounters:

1 GET for the index page (/simple/<name>/) -- conditional if cached
1 GET for metadata (PEP 658 .metadata file) -- OR -- 1 HEAD + 1-2 Range GETs for lazy wheel metadata -- OR -- 1 GET full wheel download as fallback
1 GET for the actual wheel download (after resolution)

For a workload like boto3 with ~40 transitive deps:

~40 index page GETs (conditional requests)
~40 metadata GETs (PEP 658 when available)
~40 wheel download GETs
Total: ~120 HTTP requests minimum

2. Per-Area Findings

2.1 HTTP Request Flow

How requests are serialized:

The resolver processes packages sequentially through resolvelib's resolve() loop
Each find_matches() call triggers find_all_candidates(), which fetches the index page synchronously (unless prefetched)
Each get_dependencies() call triggers candidate.dist, which fetches metadata synchronously (unless prefetched)

Existing parallelism (two separate thread pools):

Index page prefetch (PackageFinder._prefetch_executor): 16 worker threads
- Triggered in provider.get_dependencies() for all discovered deps
- Triggered in resolver.resolve() for all root requirements
- Workers call _do_fetch_all_candidates() which does the full index fetch + evaluate pipeline
Metadata prefetch (Factory._metadata_prefetch_executor): 8 worker threads
- Triggered in _iter_found_candidates() for the top candidate only
- Workers call candidate.dist which triggers PEP 658 / lazy wheel

Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate. The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."

Redundant requests found:

LazyZipOverHTTP.__init__() sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is never needed -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.
However, the HEAD request in LazyZipOverHTTP is sent even if the wheel doesn't support range requests, wasting one round trip before discovering this.
_get_simple_response() sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.

2.2 Connection Reuse & Pooling

Current configuration (session.py lines 388-389):

_pool_connections = 20   # urllib3 PoolManager caches pools for 20 distinct hosts
_pool_maxsize = 16       # Each pool keeps up to 16 idle connections

Analysis:

Pool is correctly sized for 16 prefetch workers
pool_block=False (default) means excess connections proceed but aren't returned to pool
Connections ARE reused for same-host requests through urllib3's HTTPSConnectionPool._get_conn() / _put_conn() mechanism
HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
The connection pool is per-(host, port, scheme), so pypi.org:443 and files.pythonhosted.org:443 each get their own pool
A typical pip install touches only 2-3 hosts: pypi.org (index pages), files.pythonhosted.org (wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.

TLS Handshake Analysis:

A TLS handshake happens once per connection (not per request)
With pool_maxsize=16, up to 16 connections are kept alive per host
The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
Risk: If more than 16 requests fire concurrently to the same host, excess connections are created and then discarded (not pooled), causing extra TLS handshakes. With pool_block=False, they proceed but the connection is thrown away after use.

Finding: The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.

2.3 Caching Layer

How CacheControl works with pip:

Index pages (/simple/<name>/): Sent with Cache-Control: max-age=0 header
- CacheControl controller sees max-age=0 and always bypasses the cache (controller.py line 184-186)
- But it adds conditional headers (If-None-Match, If-Modified-Since) via conditional_headers()
- On 304 Not Modified, the cached response is served (no body transfer)
- On 200, the response is cached with its ETag for next time
- This is working as intended -- ensures freshness while avoiding re-downloading unchanged index pages
Package downloads (wheels, sdists): Sent via Downloader._http_get() with Accept-Encoding: identity
- No Cache-Control: max-age=0 header on these requests
- CacheControl can serve fully cached responses for packages that haven't changed
- SafeFileCache stores metadata + body as separate files on disk
- Cache key is the full URL (after normalization)
PEP 658 metadata files: Fetched via get_http_url() using the Downloader
- Same caching behavior as package downloads
- Small files (~5-50KB), cached effectively
Lazy wheel range requests: Sent with Cache-Control: no-cache
- Explicitly bypasses caching (lazy_wheel.py line 180)
- This is correct -- range requests for ZIP metadata shouldn't be cached as full responses

Cache efficiency finding:

The max-age=0 on index pages means every resolution always incurs at least one conditional round-trip per package. This is the single biggest I/O constraint for warm-cache scenarios.
For a pip install --upgrade with warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org).
Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds just in sequential conditional GETs (partially parallelized by prefetch).

2.4 Metadata Fetching

Fallback chain (prepare.py _fetch_metadata_only()):

PEP 658 metadata (_fetch_metadata_using_link_data_attr()):
- Checks link.metadata_link() -- the link must have data-dist-info-metadata or core-metadata attribute
- If present, downloads the separate .metadata file (tiny: 5-50KB)
- PyPI supports PEP 658 for all wheels uploaded after ~2023
- This is the fastest path: single small GET
Lazy wheel (_fetch_metadata_using_lazy_wheel()):
- Requires --use-feature=fast-deps flag
- Sends HEAD to get Content-Length and check Accept-Ranges
- Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
- Parses the ZIP to find METADATA file, downloads just that range
- 2-4 HTTP requests per wheel (HEAD + 1-3 range GETs)
- Has a _lazy_wheel_cache to avoid redundant range requests for same URL
Full download (fallback):
- Downloads entire wheel/sdist
- For wheels: extracts metadata from the archive
- For sdists: runs setup.py egg_info or pyproject.toml build
- Most expensive path

Key finding: PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.

Optimization in place: _lazy_wheel_cache (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., pkg and pkg[extra]).

2.5 DNS & TLS

DNS resolution:

urllib3 delegates to Python's socket.create_connection() which calls getaddrinfo()
No DNS caching in urllib3 or pip -- relies on OS-level DNS cache
However, connection pooling effectively caches DNS results because connections persist
With 16 pool connections to pypi.org, DNS is resolved at most once per connection creation

TLS handshakes:

One TLS handshake per connection (not per request)
Connection pooling limits handshakes to pool_maxsize (16) per host
Python's ssl module handles TLS session resumption at the OpenSSL level
_SSLContextAdapterMixin (session.py line 255) properly forwards the SSL context to pools

Finding: DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.

2.6 HTTP/2 and Protocol

Current state: pip uses HTTP/1.1 exclusively.

The vendored urllib3 (version appears to be 1.x/2.x line) does not support HTTP/2
The vendored requests library has no HTTP/2 support
There are no references to HTTP/2, h2, or hyper anywhere in pip's codebase

Would HTTP/2 help?

Index page fetches: HTTP/2 multiplexing would allow sending all ~40 index page requests over a single TCP connection to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
Metadata fetches: Similarly multiplexed over the same connection.
Package downloads: Less benefit -- these are large sequential downloads.

Estimated benefit: For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.

What it would take:

Replace vendored requests/urllib3 with httpx (supports HTTP/2 via h2) or add h2 to urllib3
Major architectural change -- affects all of pip's network layer
PyPI's CDN (Fastly) already supports HTTP/2

2.7 Parallel I/O Architecture

Index page prefetch (PackageFinder):

# package_finder.py lines 1535-1556
def prefetch_packages(self, project_names):
    with self._prefetch_lock:
        for name in project_names:
            if name in self._all_candidates or name in self._prefetch_futures:
                continue
            if self._prefetch_executor is None:
                self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
            self._prefetch_futures[name] = self._prefetch_executor.submit(
                self._do_fetch_all_candidates, name
            )

Called from two places:
1. resolver.resolve() -- submits all root requirements upfront
2. provider.get_dependencies() -- submits all discovered deps
Workers run _do_fetch_all_candidates() which does the full pipeline: collect_sources -> fetch_response -> parse/evaluate
Results cached in _all_candidates dict
find_all_candidates() checks futures with 10s timeout

Metadata prefetch (Factory):

# factory.py lines 188-245
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
    # Build top candidate eagerly (cheap: wheel-cache lookup)
    candidate = build_func()
    # Only prefetch for remote wheels
    if link.is_file or not link.is_wheel:
        return
    def _do_prefetch():
        candidate.dist  # triggers prepare_linked_requirement()
    # Submit to 8-thread pool
    self._metadata_prefetch_executor.submit(_do_prefetch)

Serialization points that force sequential I/O:

resolvelib's main loop is single-threaded. Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
_complete_partial_requirements() (prepare.py line 474) downloads all "needs more preparation" requirements sequentially via self._download.batch() -- which is just a for-loop, NOT actually batched/parallel.

The Downloader.batch() method (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:

def batch(self, links, location):
    for link in links:
        filepath, content_type = self(link, location)
        yield link, (filepath, content_type)

This is a significant finding. All final wheel downloads happen sequentially.

2.8 Response Compression

Index page requests:

The _get_simple_response() in collector.py sets custom Accept headers but does NOT set Accept-Encoding
Requests library's default Accept-Encoding header is gzip, deflate (from urllib3's ACCEPT_ENCODING = "gzip,deflate", applied by requests' default_headers())
Index pages ARE compressed by PyPI/Fastly with gzip. The requests library transparently decompresses them.
No brotli support (would require brotli or brotlicffi package)

Package downloads:

Downloader._http_get() uses HEADERS = {"Accept-Encoding": "identity"} (utils.py line 26)
Package downloads explicitly disable compression. This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
response_chunks() uses decode_content=False to preserve raw bytes for hash checking.

Finding: Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.

2.9 Lazy/Streaming Approaches

Current behavior:

Index pages: response.content (collector.py line 309) reads the entire response into memory before parsing
JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
HTML index pages are similar in size

Streaming opportunity:

JSON index pages COULD be streamed using an incremental JSON parser (e.g., ijson)
However, json.loads() on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip
The real cost is not parsing but candidate evaluation -- the _evaluate_json_page() fast path already handles this efficiently with a single-pass fused pipeline

Early abort opportunity:

When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
Problem: The index page must be fully fetched before we know all versions (no streaming API from PyPI)
The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate

Finding: Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.

2.10 PyPI-Specific Optimizations

Bulk/batch APIs:

PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
The Simple Repository API (PEP 503/691) is package-by-package
There is no "dependency tree" API that would let pip skip index page fetches

CDN-level optimizations already in use:

Cache-Control: max-age=0 with conditional requests (ETags/Last-Modified) -- implemented
PyPI responses include strong ETags
304 responses save bandwidth but still cost one RTT each

JSON API:

pip already prefers the JSON Simple API (application/vnd.pypi.simple.v1+json) via Accept header priority
The JSON path (_evaluate_json_page()) is heavily optimized with fused evaluation
PyPI's JSON API doesn't support partial responses or field selection

Server-push / Link preload:

PyPI doesn't support HTTP/2 Server Push for metadata files
Even with HTTP/2, the server can't know which wheel the client will pick

3. Optimization Ideas (Ranked by Expected Impact)

Tier 1: High Impact (10-30% wall-time reduction)

3.1 Parallel Wheel Downloads

What: Replace the sequential Downloader.batch() for-loop with concurrent.futures.ThreadPoolExecutor. Where: src/pip/_internal/network/download.py lines 179-184 and src/pip/_internal/operations/prepare.py lines 492-493. Why: After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages. Expected improvement: 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time. Complexity: Medium. Need to handle progress bar display for parallel downloads and ensure thread safety. Risk: Low -- downloads are independent operations. pip-only change: Yes.

3.2 Pipeline Index Fetch + Metadata Prefetch

What: When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result. Where: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates() should call factory._prefetch_top_candidate_metadata() at the end. Why: Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls _iter_found_candidates(). This gap can be 100ms-2s depending on how fast the resolver processes. Expected improvement: 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts." Complexity: Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one). Risk: Low-medium -- need to ensure thread safety for candidate cache. pip-only change: Yes.

3.3 Increase Metadata Prefetch Depth

What: Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks). Where: src/pip/_internal/resolution/resolvelib/factory.py _prefetch_top_candidate_metadata(). Why: The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking. Expected improvement: 3-8% for workloads with backtracking. Complexity: Low. Risk: Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB). pip-only change: Yes.

Tier 2: Medium Impact (5-15% wall-time reduction)

3.4 HTTP/2 Support via httpx

What: Replace the requests + urllib3 stack with httpx which supports HTTP/2 multiplexing. Why: With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses. Expected improvement: 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small). Complexity: Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters. Risk: High -- potential for regressions across pip's extensive networking surface. pip-only change: Yes, but major architectural change.

3.5 Conditional Request Short-Circuit for Index Pages

What: For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily. Where: Before calling resolver.resolve(), pre-submit conditional GETs for ALL packages known from the lock file or previous resolution. Why: Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously. Expected improvement: 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones. Complexity: Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result). Risk: Low -- conditional GETs are safe to fire speculatively. pip-only change: Yes.

3.6 Connection Pre-warming

What: Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests. Where: src/pip/_internal/network/session.py PipSession.__init__(). Why: The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work. Expected improvement: 2-5% (saves ~200ms one-time cost). Complexity: Low. Risk: Low -- harmless if the connections go unused (they just time out). pip-only change: Yes.

Tier 3: Low Impact (1-5% wall-time reduction)

3.7 Cache Index ETags In-Memory Across Packages

What: After the first conditional GET returns an ETag for pypi.org/simple/, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age. Expected improvement: Negligible (<1%). The conditional request still requires a round trip. pip-only change: Yes.

3.8 Brotli Compression for Index Pages

What: Add brotli or brotlicffi as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip). Why: Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages. Expected improvement: 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that. Complexity: Low. Just add the dependency and urllib3/requests will advertise brotli support. Risk: Low. Optional dependency, gzip fallback. pip-only change: Yes.

4. Quick Wins (< 50 lines of code)

QW1: Parallel Wheel Downloads (the biggest quick win)

File: src/pip/_internal/operations/prepare.py _complete_partial_requirements() Change: Replace the sequential self._download.batch() loop with ThreadPoolExecutor.map():

# Current (sequential):
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
for link, (filepath, _) in batch_download:
    ...

# Proposed (parallel):
with ThreadPoolExecutor(max_workers=8) as pool:
    results = pool.map(
        lambda link: (link, self._download(link, temp_dir)),
        links_to_fully_download.keys()
    )
    for link, (filepath, _) in results:
        ...

Lines: ~15 changed Impact: 15-25% wall-time reduction on download-heavy workloads

QW2: Pipeline Index + Metadata Prefetch

File: src/pip/_internal/index/package_finder.py _do_fetch_all_candidates() Change: After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:

# At the end of _do_fetch_all_candidates:
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
    self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])

Lines: ~20 changed (add callback registration + invocation) Impact: 5-15% for resolution-heavy workloads

QW3: Connection Pre-warming

File: src/pip/_internal/network/session.py Change: Add a prewarm() method that opens connections to known hosts in background threads:

def prewarm(self, urls: list[str]) -> None:
    """Open TCP+TLS connections in background to reduce first-request latency."""
    from concurrent.futures import ThreadPoolExecutor
    def _warm(url):
        try:
            self.head(url, timeout=5)
        except Exception:
            pass
    with ThreadPoolExecutor(max_workers=2) as pool:
        pool.map(_warm, urls)

Lines: ~15 Impact: 2-5% (saves ~200ms startup)

QW4: Prefetch Top 2-3 Candidates' Metadata

File: src/pip/_internal/resolution/resolvelib/factory.py Change: In _iter_found_candidates(), prefetch metadata for top 2-3 candidates instead of just top 1:

# Current: prefetch only infos_list[0]
# Proposed: prefetch infos_list[0:3]
for info in infos_list[:3]:
    self._prefetch_top_candidate_metadata(name, info, extras, template)

Lines: ~10 changed Impact: 3-8% for workloads with backtracking

5. Big Bets (Architectural Changes for 20%+ Improvement)

BB1: Fully Parallel Resolution Pipeline

Description: Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified. Mechanism: This is essentially an async resolver. Could be implemented with:

asyncio event loop driving the resolver
aiohttp or httpx async client for HTTP
resolvelib with a coroutine-based provider Expected improvement: 30-50% for large dependency trees. Eliminates all serial I/O gaps. Complexity: Very high. Fundamental architectural change to pip's resolver integration. Risk: High -- resolvelib is synchronous by design.

BB2: HTTP/2 Multiplexing

Description: Replace vendored requests + urllib3 with httpx (which supports HTTP/2 via h2). Expected improvement: 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests. Complexity: Very high. ~500+ line change touching all network code. Risk: High.

BB3: Dependency Prediction + Bulk Prefetch

Description: Maintain a local cache of "last resolved dependency tree" per project. On next pip install, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts. Expected improvement: 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup. Complexity: Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses. Risk: Medium. Wrong predictions waste bandwidth but don't cause correctness issues.

BB4: Server-Side Dependency Resolution API

Description: Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests. Expected improvement: 50-80% for cold-cache scenarios. Eliminates all per-package round trips. Complexity: Very high. Requires PyPI server cooperation, PEP process, etc. Risk: High. Requires ecosystem buy-in. Fallback to current behavior needed.

6. Summary of Key Files

File	Role
`src/pip/_internal/index/collector.py`	Fetches index pages, parses HTML/JSON
`src/pip/_internal/index/package_finder.py`	Evaluates candidates, manages prefetch pool (16 threads)
`src/pip/_internal/network/session.py`	PipSession, connection pool config (20/16), adapters
`src/pip/_internal/network/cache.py`	SafeFileCache (filesystem-based HTTP cache)
`src/pip/_internal/network/download.py`	Downloader (sequential batch downloads!)
`src/pip/_internal/network/lazy_wheel.py`	LazyZipOverHTTP for range-request metadata
`src/pip/_internal/network/utils.py`	Accept-Encoding: identity for downloads, chunk streaming
`src/pip/_internal/operations/prepare.py`	RequirementPreparer, metadata fetch chain
`src/pip/_internal/resolution/resolvelib/factory.py`	Metadata prefetch pool (8 threads), candidate building
`src/pip/_internal/resolution/resolvelib/provider.py`	Triggers dep prefetch in get_dependencies()
`src/pip/_internal/resolution/resolvelib/resolver.py`	Kicks off root requirement prefetch
`src/pip/_internal/resolution/resolvelib/candidates.py`	Thread-safe dist preparation with _prepare_lock
`src/pip/_vendor/cachecontrol/adapter.py`	CacheControlAdapter -- intercepts requests for caching
`src/pip/_vendor/cachecontrol/controller.py`	Cache logic: max-age=0 bypass, conditional headers, 304 handling

7. Critical Finding: Sequential Download Is The Biggest Remaining Win

The single most impactful optimization remaining is parallelizing wheel downloads. After resolution completes, _complete_partial_requirements() downloads all wheels sequentially through Downloader.batch(). This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.

32 KiB Raw Blame History