codeflash-agent/.codeflash/krrt7/python/pip/data/io-analysis.md

# Pip I/O Layer Deep Analysis

Investigation date: 2026-04-08
Branch: `codeflash/optimize`
Investigator: Research agent

---

## 1. Request Flow Diagram

```
User: pip install <pkg>
  |
  v
Resolver (resolvelib)
  |
  +-- provider.get_dependencies(candidate)
  |     +-- prefetch_packages(dep_names)  [background threads]
  |
  +-- provider.find_matches(identifier)
  |     +-- factory.find_candidates()
  |           +-- finder.find_best_candidate(name)
  |                 +-- finder.find_all_candidates(name)
  |                       |
  |                       +-- [check _all_candidates cache]
  |                       +-- [check _prefetch_futures]
  |                       +-- _do_fetch_all_candidates(name)
  |                             |
  |                             +-- link_collector.collect_sources(name)
  |                             |     +-- search_scope.get_index_urls_locations(name)
  |                             |           # => ["https://pypi.org/simple/<name>/"]
  |                             |
  |                             +-- source.page_candidates()
  |                                   +-- process_project_url(url)
  |                                         |
  |                                         +-- link_collector.fetch_response(url)
  |                                         |     +-- _get_index_content(url)
  |                                         |           +-- _get_simple_response(url, session)
  |                                         |                 |
  |                                         |                 v
  |                                         |           session.get(url, headers={
  |                                         |             Accept: "application/vnd.pypi.simple.v1+json, ...",
  |                                         |             Cache-Control: "max-age=0"
  |                                         |           })
  |                                         |                 |
  |                                         |                 v
  |                                         |           CacheControlAdapter.send()
  |                                         |             +-- controller.cached_request()
  |                                         |             |     # max-age=0 => ALWAYS bypasses cache
  |                                         |             |     # Adds If-None-Match / If-Modified-Since
  |                                         |             +-- controller.conditional_headers()
  |                                         |             +-- HTTPAdapter.send()
  |                                         |                   +-- urllib3.HTTPSConnectionPool.urlopen()
  |                                         |                         +-- _get_conn() [from pool queue]
  |                                         |                         +-- TLS handshake (if new conn)
  |                                         |                         +-- HTTP/1.1 GET request
  |                                         |                         +-- _put_conn() [return to pool]
  |                                         |             +-- controller.cache_response()
  |                                         |                   # Stores response w/ ETag for
  |                                         |                   # future conditional requests
  |                                         |
  |                                         +-- [JSON] _evaluate_json_page()
  |                                         +-- [HTML] parse_links() + evaluate_links()
  |
  +-- candidate.dist  [triggers metadata fetch]
        +-- _prepare()
              +-- preparer.prepare_linked_requirement()
                    +-- _fetch_metadata_only()
                    |     +-- [1] _fetch_metadata_using_link_data_attr()
                    |     |     # PEP 658: GET <url>.metadata
                    |     +-- [2] _fetch_metadata_using_lazy_wheel()
                    |           # HTTP Range requests on .whl
                    |           +-- LazyZipOverHTTP(url, session)
                    |                 +-- session.head(url)  # get Content-Length
                    |                 +-- _check_zip()       # range-fetch tail
                    |                 +-- ZipFile(self)       # parse EOCD
                    +-- [3] Full download as fallback
```

### Request Count Per Package (typical PyPI resolution)

For each **unique package name** the resolver encounters:
1. **1 GET** for the index page (`/simple/<name>/`) -- conditional if cached
2. **1 GET** for metadata (PEP 658 `.metadata` file) -- OR --
   **1 HEAD + 1-2 Range GETs** for lazy wheel metadata -- OR --
   **1 GET** full wheel download as fallback
3. **1 GET** for the actual wheel download (after resolution)

For a workload like `boto3` with ~40 transitive deps:
- ~40 index page GETs (conditional requests)
- ~40 metadata GETs (PEP 658 when available)
- ~40 wheel download GETs
- **Total: ~120 HTTP requests minimum**

---

## 2. Per-Area Findings

### 2.1 HTTP Request Flow

**How requests are serialized:**
- The resolver processes packages **sequentially** through resolvelib's `resolve()` loop
- Each `find_matches()` call triggers `find_all_candidates()`, which fetches the index page **synchronously** (unless prefetched)
- Each `get_dependencies()` call triggers `candidate.dist`, which fetches metadata **synchronously** (unless prefetched)

**Existing parallelism (two separate thread pools):**
1. **Index page prefetch** (`PackageFinder._prefetch_executor`): 16 worker threads
   - Triggered in `provider.get_dependencies()` for all discovered deps
   - Triggered in `resolver.resolve()` for all root requirements
   - Workers call `_do_fetch_all_candidates()` which does the full index fetch + evaluate pipeline
2. **Metadata prefetch** (`Factory._metadata_prefetch_executor`): 8 worker threads
   - Triggered in `_iter_found_candidates()` for the top candidate only
   - Workers call `candidate.dist` which triggers PEP 658 / lazy wheel

**Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate.** The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."

**Redundant requests found:**
- `LazyZipOverHTTP.__init__()` sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is **never needed** -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.
- However, the HEAD request in `LazyZipOverHTTP` is sent **even if the wheel doesn't support range requests**, wasting one round trip before discovering this.
- `_get_simple_response()` sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.

### 2.2 Connection Reuse & Pooling

**Current configuration (session.py lines 388-389):**
```python
_pool_connections = 20   # urllib3 PoolManager caches pools for 20 distinct hosts
_pool_maxsize = 16       # Each pool keeps up to 16 idle connections
```

**Analysis:**
- Pool is correctly sized for 16 prefetch workers
- `pool_block=False` (default) means excess connections proceed but aren't returned to pool
- **Connections ARE reused** for same-host requests through urllib3's `HTTPSConnectionPool._get_conn()` / `_put_conn()` mechanism
- HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
- The connection pool is per-(host, port, scheme), so `pypi.org:443` and `files.pythonhosted.org:443` each get their own pool
- **A typical pip install touches only 2-3 hosts**: `pypi.org` (index pages), `files.pythonhosted.org` (wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.

**TLS Handshake Analysis:**
- A TLS handshake happens **once per connection** (not per request)
- With pool_maxsize=16, up to 16 connections are kept alive per host
- The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
- **Risk:** If more than 16 requests fire concurrently to the same host, excess connections are created and then **discarded** (not pooled), causing extra TLS handshakes. With `pool_block=False`, they proceed but the connection is thrown away after use.

**Finding:** The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.

### 2.3 Caching Layer

**How CacheControl works with pip:**

1. **Index pages (`/simple/<name>/`):** Sent with `Cache-Control: max-age=0` header
   - CacheControl controller sees `max-age=0` and **always bypasses the cache** (controller.py line 184-186)
   - But it adds conditional headers (`If-None-Match`, `If-Modified-Since`) via `conditional_headers()`
   - On 304 Not Modified, the cached response is served (no body transfer)
   - On 200, the response is cached with its ETag for next time
   - **This is working as intended** -- ensures freshness while avoiding re-downloading unchanged index pages

2. **Package downloads (wheels, sdists):** Sent via `Downloader._http_get()` with `Accept-Encoding: identity`
   - No `Cache-Control: max-age=0` header on these requests
   - CacheControl can serve fully cached responses for packages that haven't changed
   - `SafeFileCache` stores metadata + body as separate files on disk
   - Cache key is the full URL (after normalization)

3. **PEP 658 metadata files:** Fetched via `get_http_url()` using the Downloader
   - Same caching behavior as package downloads
   - Small files (~5-50KB), cached effectively

4. **Lazy wheel range requests:** Sent with `Cache-Control: no-cache`
   - **Explicitly bypasses caching** (lazy_wheel.py line 180)
   - This is correct -- range requests for ZIP metadata shouldn't be cached as full responses

**Cache efficiency finding:**
- The `max-age=0` on index pages means **every resolution always incurs at least one conditional round-trip per package**. This is the single biggest I/O constraint for warm-cache scenarios.
- For a `pip install --upgrade` with warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org).
- **Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds** just in sequential conditional GETs (partially parallelized by prefetch).

### 2.4 Metadata Fetching

**Fallback chain (prepare.py `_fetch_metadata_only()`):**
1. **PEP 658 metadata** (`_fetch_metadata_using_link_data_attr()`): 
   - Checks `link.metadata_link()` -- the link must have `data-dist-info-metadata` or `core-metadata` attribute
   - If present, downloads the separate `.metadata` file (tiny: 5-50KB)
   - **PyPI supports PEP 658** for all wheels uploaded after ~2023
   - This is the fastest path: single small GET

2. **Lazy wheel** (`_fetch_metadata_using_lazy_wheel()`):
   - Requires `--use-feature=fast-deps` flag
   - Sends HEAD to get Content-Length and check Accept-Ranges
   - Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
   - Parses the ZIP to find METADATA file, downloads just that range
   - **2-4 HTTP requests per wheel** (HEAD + 1-3 range GETs)
   - Has a `_lazy_wheel_cache` to avoid redundant range requests for same URL

3. **Full download** (fallback):
   - Downloads entire wheel/sdist
   - For wheels: extracts metadata from the archive
   - For sdists: runs `setup.py egg_info` or `pyproject.toml` build
   - **Most expensive path**

**Key finding:** PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.

**Optimization in place:** `_lazy_wheel_cache` (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., `pkg` and `pkg[extra]`).

### 2.5 DNS & TLS

**DNS resolution:**
- urllib3 delegates to Python's `socket.create_connection()` which calls `getaddrinfo()`
- **No DNS caching in urllib3 or pip** -- relies on OS-level DNS cache
- However, connection pooling effectively caches DNS results because connections persist
- With 16 pool connections to `pypi.org`, DNS is resolved at most once per connection creation

**TLS handshakes:**
- One TLS handshake per connection (not per request)
- Connection pooling limits handshakes to pool_maxsize (16) per host
- Python's `ssl` module handles TLS session resumption at the OpenSSL level
- `_SSLContextAdapterMixin` (session.py line 255) properly forwards the SSL context to pools

**Finding:** DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.

### 2.6 HTTP/2 and Protocol

**Current state: pip uses HTTP/1.1 exclusively.**

- The vendored `urllib3` (version appears to be 1.x/2.x line) does not support HTTP/2
- The vendored `requests` library has no HTTP/2 support
- There are **no references** to HTTP/2, h2, or hyper anywhere in pip's codebase

**Would HTTP/2 help?**
- **Index page fetches:** HTTP/2 multiplexing would allow sending all ~40 index page requests over a **single TCP connection** to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
- **Metadata fetches:** Similarly multiplexed over the same connection.
- **Package downloads:** Less benefit -- these are large sequential downloads.

**Estimated benefit:** For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.

**What it would take:**
- Replace vendored `requests`/`urllib3` with `httpx` (supports HTTP/2 via `h2`) or add `h2` to urllib3
- Major architectural change -- affects all of pip's network layer
- PyPI's CDN (Fastly) already supports HTTP/2

### 2.7 Parallel I/O Architecture

**Index page prefetch (PackageFinder):**
```python
# package_finder.py lines 1535-1556
def prefetch_packages(self, project_names):
    with self._prefetch_lock:
        for name in project_names:
            if name in self._all_candidates or name in self._prefetch_futures:
                continue
            if self._prefetch_executor is None:
                self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
            self._prefetch_futures[name] = self._prefetch_executor.submit(
                self._do_fetch_all_candidates, name
            )
```

- Called from two places:
  1. `resolver.resolve()` -- submits all root requirements upfront
  2. `provider.get_dependencies()` -- submits all discovered deps
- Workers run `_do_fetch_all_candidates()` which does the full pipeline:
  collect_sources -> fetch_response -> parse/evaluate
- Results cached in `_all_candidates` dict
- `find_all_candidates()` checks futures with 10s timeout

**Metadata prefetch (Factory):**
```python
# factory.py lines 188-245
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
    # Build top candidate eagerly (cheap: wheel-cache lookup)
    candidate = build_func()
    # Only prefetch for remote wheels
    if link.is_file or not link.is_wheel:
        return
    def _do_prefetch():
        candidate.dist  # triggers prepare_linked_requirement()
    # Submit to 8-thread pool
    self._metadata_prefetch_executor.submit(_do_prefetch)
```

**Serialization points that force sequential I/O:**
1. **resolvelib's main loop is single-threaded.** Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
2. **`_complete_partial_requirements()`** (prepare.py line 474) downloads all "needs more preparation" requirements **sequentially** via `self._download.batch()` -- which is just a for-loop, NOT actually batched/parallel.
3. **The `Downloader.batch()` method** (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:
   ```python
   def batch(self, links, location):
       for link in links:
           filepath, content_type = self(link, location)
           yield link, (filepath, content_type)
   ```
   **This is a significant finding.** All final wheel downloads happen sequentially.

### 2.8 Response Compression

**Index page requests:**
- The `_get_simple_response()` in collector.py sets custom `Accept` headers but does NOT set `Accept-Encoding`
- Requests library's default `Accept-Encoding` header is `gzip, deflate` (from urllib3's `ACCEPT_ENCODING = "gzip,deflate"`, applied by requests' `default_headers()`)
- **Index pages ARE compressed** by PyPI/Fastly with gzip. The requests library transparently decompresses them.
- No brotli support (would require `brotli` or `brotlicffi` package)

**Package downloads:**
- `Downloader._http_get()` uses `HEADERS = {"Accept-Encoding": "identity"}` (utils.py line 26)
- **Package downloads explicitly disable compression.** This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
- `response_chunks()` uses `decode_content=False` to preserve raw bytes for hash checking.

**Finding:** Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.

### 2.9 Lazy/Streaming Approaches

**Current behavior:**
- Index pages: `response.content` (collector.py line 309) reads the entire response into memory before parsing
- JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
- HTML index pages are similar in size

**Streaming opportunity:**
- JSON index pages COULD be streamed using an incremental JSON parser (e.g., `ijson`)
- However, `json.loads()` on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip
- The real cost is not parsing but **candidate evaluation** -- the `_evaluate_json_page()` fast path already handles this efficiently with a single-pass fused pipeline

**Early abort opportunity:**
- When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
- **Problem:** The index page must be fully fetched before we know all versions (no streaming API from PyPI)
- The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate

**Finding:** Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.

### 2.10 PyPI-Specific Optimizations

**Bulk/batch APIs:**
- PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
- The Simple Repository API (PEP 503/691) is package-by-package
- There is no "dependency tree" API that would let pip skip index page fetches

**CDN-level optimizations already in use:**
- `Cache-Control: max-age=0` with conditional requests (ETags/Last-Modified) -- implemented
- PyPI responses include strong ETags
- 304 responses save bandwidth but still cost one RTT each

**JSON API:**
- pip already prefers the JSON Simple API (`application/vnd.pypi.simple.v1+json`) via Accept header priority
- The JSON path (`_evaluate_json_page()`) is heavily optimized with fused evaluation
- PyPI's JSON API doesn't support partial responses or field selection

**Server-push / Link preload:**
- PyPI doesn't support HTTP/2 Server Push for metadata files
- Even with HTTP/2, the server can't know which wheel the client will pick

---

## 3. Optimization Ideas (Ranked by Expected Impact)

### Tier 1: High Impact (10-30% wall-time reduction)

#### 3.1 Parallel Wheel Downloads
**What:** Replace the sequential `Downloader.batch()` for-loop with concurrent.futures.ThreadPoolExecutor.
**Where:** `src/pip/_internal/network/download.py` lines 179-184 and `src/pip/_internal/operations/prepare.py` lines 492-493.
**Why:** After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages.
**Expected improvement:** 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time.
**Complexity:** Medium. Need to handle progress bar display for parallel downloads and ensure thread safety.
**Risk:** Low -- downloads are independent operations.
**pip-only change:** Yes.

#### 3.2 Pipeline Index Fetch + Metadata Prefetch
**What:** When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result.
**Where:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()` should call `factory._prefetch_top_candidate_metadata()` at the end.
**Why:** Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls `_iter_found_candidates()`. This gap can be 100ms-2s depending on how fast the resolver processes.
**Expected improvement:** 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts."
**Complexity:** Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one).
**Risk:** Low-medium -- need to ensure thread safety for candidate cache.
**pip-only change:** Yes.

#### 3.3 Increase Metadata Prefetch Depth
**What:** Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks).
**Where:** `src/pip/_internal/resolution/resolvelib/factory.py` `_prefetch_top_candidate_metadata()`.
**Why:** The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking.
**Expected improvement:** 3-8% for workloads with backtracking.
**Complexity:** Low.
**Risk:** Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB).
**pip-only change:** Yes.

### Tier 2: Medium Impact (5-15% wall-time reduction)

#### 3.4 HTTP/2 Support via httpx
**What:** Replace the `requests` + `urllib3` stack with `httpx` which supports HTTP/2 multiplexing.
**Why:** With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses.
**Expected improvement:** 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small).
**Complexity:** Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters.
**Risk:** High -- potential for regressions across pip's extensive networking surface.
**pip-only change:** Yes, but major architectural change.

#### 3.5 Conditional Request Short-Circuit for Index Pages
**What:** For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily.
**Where:** Before calling `resolver.resolve()`, pre-submit conditional GETs for ALL packages known from the lock file or previous resolution.
**Why:** Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously.
**Expected improvement:** 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones.
**Complexity:** Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result).
**Risk:** Low -- conditional GETs are safe to fire speculatively.
**pip-only change:** Yes.

#### 3.6 Connection Pre-warming
**What:** Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests.
**Where:** `src/pip/_internal/network/session.py` `PipSession.__init__()`.
**Why:** The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work.
**Expected improvement:** 2-5% (saves ~200ms one-time cost).
**Complexity:** Low.
**Risk:** Low -- harmless if the connections go unused (they just time out).
**pip-only change:** Yes.

### Tier 3: Low Impact (1-5% wall-time reduction)

#### 3.7 Cache Index ETags In-Memory Across Packages
**What:** After the first conditional GET returns an ETag for `pypi.org/simple/`, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age.
**Expected improvement:** Negligible (<1%). The conditional request still requires a round trip.
**pip-only change:** Yes.

#### 3.8 Brotli Compression for Index Pages
**What:** Add `brotli` or `brotlicffi` as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip).
**Why:** Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages.
**Expected improvement:** 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that.
**Complexity:** Low. Just add the dependency and urllib3/requests will advertise brotli support.
**Risk:** Low. Optional dependency, gzip fallback.
**pip-only change:** Yes.

---

## 4. Quick Wins (< 50 lines of code)

### QW1: Parallel Wheel Downloads (the biggest quick win)
**File:** `src/pip/_internal/operations/prepare.py` `_complete_partial_requirements()`
**Change:** Replace the sequential `self._download.batch()` loop with `ThreadPoolExecutor.map()`:
```python
# Current (sequential):
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
for link, (filepath, _) in batch_download:
    ...

# Proposed (parallel):
with ThreadPoolExecutor(max_workers=8) as pool:
    results = pool.map(
        lambda link: (link, self._download(link, temp_dir)),
        links_to_fully_download.keys()
    )
    for link, (filepath, _) in results:
        ...
```
**Lines:** ~15 changed
**Impact:** 15-25% wall-time reduction on download-heavy workloads

### QW2: Pipeline Index + Metadata Prefetch
**File:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()`
**Change:** After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:
```python
# At the end of _do_fetch_all_candidates:
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
    self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])
```
**Lines:** ~20 changed (add callback registration + invocation)
**Impact:** 5-15% for resolution-heavy workloads

### QW3: Connection Pre-warming
**File:** `src/pip/_internal/network/session.py`
**Change:** Add a `prewarm()` method that opens connections to known hosts in background threads:
```python
def prewarm(self, urls: list[str]) -> None:
    """Open TCP+TLS connections in background to reduce first-request latency."""
    from concurrent.futures import ThreadPoolExecutor
    def _warm(url):
        try:
            self.head(url, timeout=5)
        except Exception:
            pass
    with ThreadPoolExecutor(max_workers=2) as pool:
        pool.map(_warm, urls)
```
**Lines:** ~15
**Impact:** 2-5% (saves ~200ms startup)

### QW4: Prefetch Top 2-3 Candidates' Metadata
**File:** `src/pip/_internal/resolution/resolvelib/factory.py`
**Change:** In `_iter_found_candidates()`, prefetch metadata for top 2-3 candidates instead of just top 1:
```python
# Current: prefetch only infos_list[0]
# Proposed: prefetch infos_list[0:3]
for info in infos_list[:3]:
    self._prefetch_top_candidate_metadata(name, info, extras, template)
```
**Lines:** ~10 changed
**Impact:** 3-8% for workloads with backtracking

---

## 5. Big Bets (Architectural Changes for 20%+ Improvement)

### BB1: Fully Parallel Resolution Pipeline
**Description:** Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified.
**Mechanism:** This is essentially an async resolver. Could be implemented with:
- asyncio event loop driving the resolver
- `aiohttp` or `httpx` async client for HTTP
- resolvelib with a coroutine-based provider
**Expected improvement:** 30-50% for large dependency trees. Eliminates all serial I/O gaps.
**Complexity:** Very high. Fundamental architectural change to pip's resolver integration.
**Risk:** High -- resolvelib is synchronous by design.

### BB2: HTTP/2 Multiplexing
**Description:** Replace vendored `requests` + `urllib3` with `httpx` (which supports HTTP/2 via h2).
**Expected improvement:** 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests.
**Complexity:** Very high. ~500+ line change touching all network code.
**Risk:** High.

### BB3: Dependency Prediction + Bulk Prefetch
**Description:** Maintain a local cache of "last resolved dependency tree" per project. On next `pip install`, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts.
**Expected improvement:** 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup.
**Complexity:** Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses.
**Risk:** Medium. Wrong predictions waste bandwidth but don't cause correctness issues.

### BB4: Server-Side Dependency Resolution API
**Description:** Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests.
**Expected improvement:** 50-80% for cold-cache scenarios. Eliminates all per-package round trips.
**Complexity:** Very high. Requires PyPI server cooperation, PEP process, etc.
**Risk:** High. Requires ecosystem buy-in. Fallback to current behavior needed.

---

## 6. Summary of Key Files

| File | Role |
|------|------|
| `src/pip/_internal/index/collector.py` | Fetches index pages, parses HTML/JSON |
| `src/pip/_internal/index/package_finder.py` | Evaluates candidates, manages prefetch pool (16 threads) |
| `src/pip/_internal/network/session.py` | PipSession, connection pool config (20/16), adapters |
| `src/pip/_internal/network/cache.py` | SafeFileCache (filesystem-based HTTP cache) |
| `src/pip/_internal/network/download.py` | Downloader (sequential batch downloads!) |
| `src/pip/_internal/network/lazy_wheel.py` | LazyZipOverHTTP for range-request metadata |
| `src/pip/_internal/network/utils.py` | Accept-Encoding: identity for downloads, chunk streaming |
| `src/pip/_internal/operations/prepare.py` | RequirementPreparer, metadata fetch chain |
| `src/pip/_internal/resolution/resolvelib/factory.py` | Metadata prefetch pool (8 threads), candidate building |
| `src/pip/_internal/resolution/resolvelib/provider.py` | Triggers dep prefetch in get_dependencies() |
| `src/pip/_internal/resolution/resolvelib/resolver.py` | Kicks off root requirement prefetch |
| `src/pip/_internal/resolution/resolvelib/candidates.py` | Thread-safe dist preparation with _prepare_lock |
| `src/pip/_vendor/cachecontrol/adapter.py` | CacheControlAdapter -- intercepts requests for caching |
| `src/pip/_vendor/cachecontrol/controller.py` | Cache logic: max-age=0 bypass, conditional headers, 304 handling |

## 7. Critical Finding: Sequential Download Is The Biggest Remaining Win

The single most impactful optimization remaining is **parallelizing wheel downloads**. After resolution completes, `_complete_partial_requirements()` downloads all wheels sequentially through `Downloader.batch()`. This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.