codeflash-agent/.codeflash/krrt7/python/pip/data/io-analysis.md

544 lines
32 KiB
Markdown
Raw Permalink Normal View History

2026-04-09 08:36:01 +00:00
# Pip I/O Layer Deep Analysis
Investigation date: 2026-04-08
Branch: `codeflash/optimize`
Investigator: Research agent
---
## 1. Request Flow Diagram
```
User: pip install <pkg>
|
v
Resolver (resolvelib)
|
+-- provider.get_dependencies(candidate)
| +-- prefetch_packages(dep_names) [background threads]
|
+-- provider.find_matches(identifier)
| +-- factory.find_candidates()
| +-- finder.find_best_candidate(name)
| +-- finder.find_all_candidates(name)
| |
| +-- [check _all_candidates cache]
| +-- [check _prefetch_futures]
| +-- _do_fetch_all_candidates(name)
| |
| +-- link_collector.collect_sources(name)
| | +-- search_scope.get_index_urls_locations(name)
| | # => ["https://pypi.org/simple/<name>/"]
| |
| +-- source.page_candidates()
| +-- process_project_url(url)
| |
| +-- link_collector.fetch_response(url)
| | +-- _get_index_content(url)
| | +-- _get_simple_response(url, session)
| | |
| | v
| | session.get(url, headers={
| | Accept: "application/vnd.pypi.simple.v1+json, ...",
| | Cache-Control: "max-age=0"
| | })
| | |
| | v
| | CacheControlAdapter.send()
| | +-- controller.cached_request()
| | | # max-age=0 => ALWAYS bypasses cache
| | | # Adds If-None-Match / If-Modified-Since
| | +-- controller.conditional_headers()
| | +-- HTTPAdapter.send()
| | +-- urllib3.HTTPSConnectionPool.urlopen()
| | +-- _get_conn() [from pool queue]
| | +-- TLS handshake (if new conn)
| | +-- HTTP/1.1 GET request
| | +-- _put_conn() [return to pool]
| | +-- controller.cache_response()
| | # Stores response w/ ETag for
| | # future conditional requests
| |
| +-- [JSON] _evaluate_json_page()
| +-- [HTML] parse_links() + evaluate_links()
|
+-- candidate.dist [triggers metadata fetch]
+-- _prepare()
+-- preparer.prepare_linked_requirement()
+-- _fetch_metadata_only()
| +-- [1] _fetch_metadata_using_link_data_attr()
| | # PEP 658: GET <url>.metadata
| +-- [2] _fetch_metadata_using_lazy_wheel()
| # HTTP Range requests on .whl
| +-- LazyZipOverHTTP(url, session)
| +-- session.head(url) # get Content-Length
| +-- _check_zip() # range-fetch tail
| +-- ZipFile(self) # parse EOCD
+-- [3] Full download as fallback
```
### Request Count Per Package (typical PyPI resolution)
For each **unique package name** the resolver encounters:
1. **1 GET** for the index page (`/simple/<name>/`) -- conditional if cached
2. **1 GET** for metadata (PEP 658 `.metadata` file) -- OR --
**1 HEAD + 1-2 Range GETs** for lazy wheel metadata -- OR --
**1 GET** full wheel download as fallback
3. **1 GET** for the actual wheel download (after resolution)
For a workload like `boto3` with ~40 transitive deps:
- ~40 index page GETs (conditional requests)
- ~40 metadata GETs (PEP 658 when available)
- ~40 wheel download GETs
- **Total: ~120 HTTP requests minimum**
---
## 2. Per-Area Findings
### 2.1 HTTP Request Flow
**How requests are serialized:**
- The resolver processes packages **sequentially** through resolvelib's `resolve()` loop
- Each `find_matches()` call triggers `find_all_candidates()`, which fetches the index page **synchronously** (unless prefetched)
- Each `get_dependencies()` call triggers `candidate.dist`, which fetches metadata **synchronously** (unless prefetched)
**Existing parallelism (two separate thread pools):**
1. **Index page prefetch** (`PackageFinder._prefetch_executor`): 16 worker threads
- Triggered in `provider.get_dependencies()` for all discovered deps
- Triggered in `resolver.resolve()` for all root requirements
- Workers call `_do_fetch_all_candidates()` which does the full index fetch + evaluate pipeline
2. **Metadata prefetch** (`Factory._metadata_prefetch_executor`): 8 worker threads
- Triggered in `_iter_found_candidates()` for the top candidate only
- Workers call `candidate.dist` which triggers PEP 658 / lazy wheel
**Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate.** The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."
**Redundant requests found:**
- `LazyZipOverHTTP.__init__()` sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is **never needed** -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.
- However, the HEAD request in `LazyZipOverHTTP` is sent **even if the wheel doesn't support range requests**, wasting one round trip before discovering this.
- `_get_simple_response()` sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.
### 2.2 Connection Reuse & Pooling
**Current configuration (session.py lines 388-389):**
```python
_pool_connections = 20 # urllib3 PoolManager caches pools for 20 distinct hosts
_pool_maxsize = 16 # Each pool keeps up to 16 idle connections
```
**Analysis:**
- Pool is correctly sized for 16 prefetch workers
- `pool_block=False` (default) means excess connections proceed but aren't returned to pool
- **Connections ARE reused** for same-host requests through urllib3's `HTTPSConnectionPool._get_conn()` / `_put_conn()` mechanism
- HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
- The connection pool is per-(host, port, scheme), so `pypi.org:443` and `files.pythonhosted.org:443` each get their own pool
- **A typical pip install touches only 2-3 hosts**: `pypi.org` (index pages), `files.pythonhosted.org` (wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.
**TLS Handshake Analysis:**
- A TLS handshake happens **once per connection** (not per request)
- With pool_maxsize=16, up to 16 connections are kept alive per host
- The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
- **Risk:** If more than 16 requests fire concurrently to the same host, excess connections are created and then **discarded** (not pooled), causing extra TLS handshakes. With `pool_block=False`, they proceed but the connection is thrown away after use.
**Finding:** The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.
### 2.3 Caching Layer
**How CacheControl works with pip:**
1. **Index pages (`/simple/<name>/`):** Sent with `Cache-Control: max-age=0` header
- CacheControl controller sees `max-age=0` and **always bypasses the cache** (controller.py line 184-186)
- But it adds conditional headers (`If-None-Match`, `If-Modified-Since`) via `conditional_headers()`
- On 304 Not Modified, the cached response is served (no body transfer)
- On 200, the response is cached with its ETag for next time
- **This is working as intended** -- ensures freshness while avoiding re-downloading unchanged index pages
2. **Package downloads (wheels, sdists):** Sent via `Downloader._http_get()` with `Accept-Encoding: identity`
- No `Cache-Control: max-age=0` header on these requests
- CacheControl can serve fully cached responses for packages that haven't changed
- `SafeFileCache` stores metadata + body as separate files on disk
- Cache key is the full URL (after normalization)
3. **PEP 658 metadata files:** Fetched via `get_http_url()` using the Downloader
- Same caching behavior as package downloads
- Small files (~5-50KB), cached effectively
4. **Lazy wheel range requests:** Sent with `Cache-Control: no-cache`
- **Explicitly bypasses caching** (lazy_wheel.py line 180)
- This is correct -- range requests for ZIP metadata shouldn't be cached as full responses
**Cache efficiency finding:**
- The `max-age=0` on index pages means **every resolution always incurs at least one conditional round-trip per package**. This is the single biggest I/O constraint for warm-cache scenarios.
- For a `pip install --upgrade` with warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org).
- **Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds** just in sequential conditional GETs (partially parallelized by prefetch).
### 2.4 Metadata Fetching
**Fallback chain (prepare.py `_fetch_metadata_only()`):**
1. **PEP 658 metadata** (`_fetch_metadata_using_link_data_attr()`):
- Checks `link.metadata_link()` -- the link must have `data-dist-info-metadata` or `core-metadata` attribute
- If present, downloads the separate `.metadata` file (tiny: 5-50KB)
- **PyPI supports PEP 658** for all wheels uploaded after ~2023
- This is the fastest path: single small GET
2. **Lazy wheel** (`_fetch_metadata_using_lazy_wheel()`):
- Requires `--use-feature=fast-deps` flag
- Sends HEAD to get Content-Length and check Accept-Ranges
- Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
- Parses the ZIP to find METADATA file, downloads just that range
- **2-4 HTTP requests per wheel** (HEAD + 1-3 range GETs)
- Has a `_lazy_wheel_cache` to avoid redundant range requests for same URL
3. **Full download** (fallback):
- Downloads entire wheel/sdist
- For wheels: extracts metadata from the archive
- For sdists: runs `setup.py egg_info` or `pyproject.toml` build
- **Most expensive path**
**Key finding:** PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.
**Optimization in place:** `_lazy_wheel_cache` (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., `pkg` and `pkg[extra]`).
### 2.5 DNS & TLS
**DNS resolution:**
- urllib3 delegates to Python's `socket.create_connection()` which calls `getaddrinfo()`
- **No DNS caching in urllib3 or pip** -- relies on OS-level DNS cache
- However, connection pooling effectively caches DNS results because connections persist
- With 16 pool connections to `pypi.org`, DNS is resolved at most once per connection creation
**TLS handshakes:**
- One TLS handshake per connection (not per request)
- Connection pooling limits handshakes to pool_maxsize (16) per host
- Python's `ssl` module handles TLS session resumption at the OpenSSL level
- `_SSLContextAdapterMixin` (session.py line 255) properly forwards the SSL context to pools
**Finding:** DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.
### 2.6 HTTP/2 and Protocol
**Current state: pip uses HTTP/1.1 exclusively.**
- The vendored `urllib3` (version appears to be 1.x/2.x line) does not support HTTP/2
- The vendored `requests` library has no HTTP/2 support
- There are **no references** to HTTP/2, h2, or hyper anywhere in pip's codebase
**Would HTTP/2 help?**
- **Index page fetches:** HTTP/2 multiplexing would allow sending all ~40 index page requests over a **single TCP connection** to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
- **Metadata fetches:** Similarly multiplexed over the same connection.
- **Package downloads:** Less benefit -- these are large sequential downloads.
**Estimated benefit:** For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.
**What it would take:**
- Replace vendored `requests`/`urllib3` with `httpx` (supports HTTP/2 via `h2`) or add `h2` to urllib3
- Major architectural change -- affects all of pip's network layer
- PyPI's CDN (Fastly) already supports HTTP/2
### 2.7 Parallel I/O Architecture
**Index page prefetch (PackageFinder):**
```python
# package_finder.py lines 1535-1556
def prefetch_packages(self, project_names):
with self._prefetch_lock:
for name in project_names:
if name in self._all_candidates or name in self._prefetch_futures:
continue
if self._prefetch_executor is None:
self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
self._prefetch_futures[name] = self._prefetch_executor.submit(
self._do_fetch_all_candidates, name
)
```
- Called from two places:
1. `resolver.resolve()` -- submits all root requirements upfront
2. `provider.get_dependencies()` -- submits all discovered deps
- Workers run `_do_fetch_all_candidates()` which does the full pipeline:
collect_sources -> fetch_response -> parse/evaluate
- Results cached in `_all_candidates` dict
- `find_all_candidates()` checks futures with 10s timeout
**Metadata prefetch (Factory):**
```python
# factory.py lines 188-245
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
# Build top candidate eagerly (cheap: wheel-cache lookup)
candidate = build_func()
# Only prefetch for remote wheels
if link.is_file or not link.is_wheel:
return
def _do_prefetch():
candidate.dist # triggers prepare_linked_requirement()
# Submit to 8-thread pool
self._metadata_prefetch_executor.submit(_do_prefetch)
```
**Serialization points that force sequential I/O:**
1. **resolvelib's main loop is single-threaded.** Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
2. **`_complete_partial_requirements()`** (prepare.py line 474) downloads all "needs more preparation" requirements **sequentially** via `self._download.batch()` -- which is just a for-loop, NOT actually batched/parallel.
3. **The `Downloader.batch()` method** (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:
```python
def batch(self, links, location):
for link in links:
filepath, content_type = self(link, location)
yield link, (filepath, content_type)
```
**This is a significant finding.** All final wheel downloads happen sequentially.
### 2.8 Response Compression
**Index page requests:**
- The `_get_simple_response()` in collector.py sets custom `Accept` headers but does NOT set `Accept-Encoding`
- Requests library's default `Accept-Encoding` header is `gzip, deflate` (from urllib3's `ACCEPT_ENCODING = "gzip,deflate"`, applied by requests' `default_headers()`)
- **Index pages ARE compressed** by PyPI/Fastly with gzip. The requests library transparently decompresses them.
- No brotli support (would require `brotli` or `brotlicffi` package)
**Package downloads:**
- `Downloader._http_get()` uses `HEADERS = {"Accept-Encoding": "identity"}` (utils.py line 26)
- **Package downloads explicitly disable compression.** This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
- `response_chunks()` uses `decode_content=False` to preserve raw bytes for hash checking.
**Finding:** Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.
### 2.9 Lazy/Streaming Approaches
**Current behavior:**
- Index pages: `response.content` (collector.py line 309) reads the entire response into memory before parsing
- JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
- HTML index pages are similar in size
**Streaming opportunity:**
- JSON index pages COULD be streamed using an incremental JSON parser (e.g., `ijson`)
- However, `json.loads()` on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip
- The real cost is not parsing but **candidate evaluation** -- the `_evaluate_json_page()` fast path already handles this efficiently with a single-pass fused pipeline
**Early abort opportunity:**
- When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
- **Problem:** The index page must be fully fetched before we know all versions (no streaming API from PyPI)
- The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate
**Finding:** Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.
### 2.10 PyPI-Specific Optimizations
**Bulk/batch APIs:**
- PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
- The Simple Repository API (PEP 503/691) is package-by-package
- There is no "dependency tree" API that would let pip skip index page fetches
**CDN-level optimizations already in use:**
- `Cache-Control: max-age=0` with conditional requests (ETags/Last-Modified) -- implemented
- PyPI responses include strong ETags
- 304 responses save bandwidth but still cost one RTT each
**JSON API:**
- pip already prefers the JSON Simple API (`application/vnd.pypi.simple.v1+json`) via Accept header priority
- The JSON path (`_evaluate_json_page()`) is heavily optimized with fused evaluation
- PyPI's JSON API doesn't support partial responses or field selection
**Server-push / Link preload:**
- PyPI doesn't support HTTP/2 Server Push for metadata files
- Even with HTTP/2, the server can't know which wheel the client will pick
---
## 3. Optimization Ideas (Ranked by Expected Impact)
### Tier 1: High Impact (10-30% wall-time reduction)
#### 3.1 Parallel Wheel Downloads
**What:** Replace the sequential `Downloader.batch()` for-loop with concurrent.futures.ThreadPoolExecutor.
**Where:** `src/pip/_internal/network/download.py` lines 179-184 and `src/pip/_internal/operations/prepare.py` lines 492-493.
**Why:** After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages.
**Expected improvement:** 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time.
**Complexity:** Medium. Need to handle progress bar display for parallel downloads and ensure thread safety.
**Risk:** Low -- downloads are independent operations.
**pip-only change:** Yes.
#### 3.2 Pipeline Index Fetch + Metadata Prefetch
**What:** When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result.
**Where:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()` should call `factory._prefetch_top_candidate_metadata()` at the end.
**Why:** Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls `_iter_found_candidates()`. This gap can be 100ms-2s depending on how fast the resolver processes.
**Expected improvement:** 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts."
**Complexity:** Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one).
**Risk:** Low-medium -- need to ensure thread safety for candidate cache.
**pip-only change:** Yes.
#### 3.3 Increase Metadata Prefetch Depth
**What:** Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks).
**Where:** `src/pip/_internal/resolution/resolvelib/factory.py` `_prefetch_top_candidate_metadata()`.
**Why:** The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking.
**Expected improvement:** 3-8% for workloads with backtracking.
**Complexity:** Low.
**Risk:** Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB).
**pip-only change:** Yes.
### Tier 2: Medium Impact (5-15% wall-time reduction)
#### 3.4 HTTP/2 Support via httpx
**What:** Replace the `requests` + `urllib3` stack with `httpx` which supports HTTP/2 multiplexing.
**Why:** With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses.
**Expected improvement:** 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small).
**Complexity:** Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters.
**Risk:** High -- potential for regressions across pip's extensive networking surface.
**pip-only change:** Yes, but major architectural change.
#### 3.5 Conditional Request Short-Circuit for Index Pages
**What:** For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily.
**Where:** Before calling `resolver.resolve()`, pre-submit conditional GETs for ALL packages known from the lock file or previous resolution.
**Why:** Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously.
**Expected improvement:** 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones.
**Complexity:** Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result).
**Risk:** Low -- conditional GETs are safe to fire speculatively.
**pip-only change:** Yes.
#### 3.6 Connection Pre-warming
**What:** Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests.
**Where:** `src/pip/_internal/network/session.py` `PipSession.__init__()`.
**Why:** The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work.
**Expected improvement:** 2-5% (saves ~200ms one-time cost).
**Complexity:** Low.
**Risk:** Low -- harmless if the connections go unused (they just time out).
**pip-only change:** Yes.
### Tier 3: Low Impact (1-5% wall-time reduction)
#### 3.7 Cache Index ETags In-Memory Across Packages
**What:** After the first conditional GET returns an ETag for `pypi.org/simple/`, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age.
**Expected improvement:** Negligible (<1%). The conditional request still requires a round trip.
**pip-only change:** Yes.
#### 3.8 Brotli Compression for Index Pages
**What:** Add `brotli` or `brotlicffi` as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip).
**Why:** Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages.
**Expected improvement:** 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that.
**Complexity:** Low. Just add the dependency and urllib3/requests will advertise brotli support.
**Risk:** Low. Optional dependency, gzip fallback.
**pip-only change:** Yes.
---
## 4. Quick Wins (< 50 lines of code)
### QW1: Parallel Wheel Downloads (the biggest quick win)
**File:** `src/pip/_internal/operations/prepare.py` `_complete_partial_requirements()`
**Change:** Replace the sequential `self._download.batch()` loop with `ThreadPoolExecutor.map()`:
```python
# Current (sequential):
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
for link, (filepath, _) in batch_download:
...
# Proposed (parallel):
with ThreadPoolExecutor(max_workers=8) as pool:
results = pool.map(
lambda link: (link, self._download(link, temp_dir)),
links_to_fully_download.keys()
)
for link, (filepath, _) in results:
...
```
**Lines:** ~15 changed
**Impact:** 15-25% wall-time reduction on download-heavy workloads
### QW2: Pipeline Index + Metadata Prefetch
**File:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()`
**Change:** After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:
```python
# At the end of _do_fetch_all_candidates:
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])
```
**Lines:** ~20 changed (add callback registration + invocation)
**Impact:** 5-15% for resolution-heavy workloads
### QW3: Connection Pre-warming
**File:** `src/pip/_internal/network/session.py`
**Change:** Add a `prewarm()` method that opens connections to known hosts in background threads:
```python
def prewarm(self, urls: list[str]) -> None:
"""Open TCP+TLS connections in background to reduce first-request latency."""
from concurrent.futures import ThreadPoolExecutor
def _warm(url):
try:
self.head(url, timeout=5)
except Exception:
pass
with ThreadPoolExecutor(max_workers=2) as pool:
pool.map(_warm, urls)
```
**Lines:** ~15
**Impact:** 2-5% (saves ~200ms startup)
### QW4: Prefetch Top 2-3 Candidates' Metadata
**File:** `src/pip/_internal/resolution/resolvelib/factory.py`
**Change:** In `_iter_found_candidates()`, prefetch metadata for top 2-3 candidates instead of just top 1:
```python
# Current: prefetch only infos_list[0]
# Proposed: prefetch infos_list[0:3]
for info in infos_list[:3]:
self._prefetch_top_candidate_metadata(name, info, extras, template)
```
**Lines:** ~10 changed
**Impact:** 3-8% for workloads with backtracking
---
## 5. Big Bets (Architectural Changes for 20%+ Improvement)
### BB1: Fully Parallel Resolution Pipeline
**Description:** Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified.
**Mechanism:** This is essentially an async resolver. Could be implemented with:
- asyncio event loop driving the resolver
- `aiohttp` or `httpx` async client for HTTP
- resolvelib with a coroutine-based provider
**Expected improvement:** 30-50% for large dependency trees. Eliminates all serial I/O gaps.
**Complexity:** Very high. Fundamental architectural change to pip's resolver integration.
**Risk:** High -- resolvelib is synchronous by design.
### BB2: HTTP/2 Multiplexing
**Description:** Replace vendored `requests` + `urllib3` with `httpx` (which supports HTTP/2 via h2).
**Expected improvement:** 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests.
**Complexity:** Very high. ~500+ line change touching all network code.
**Risk:** High.
### BB3: Dependency Prediction + Bulk Prefetch
**Description:** Maintain a local cache of "last resolved dependency tree" per project. On next `pip install`, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts.
**Expected improvement:** 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup.
**Complexity:** Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses.
**Risk:** Medium. Wrong predictions waste bandwidth but don't cause correctness issues.
### BB4: Server-Side Dependency Resolution API
**Description:** Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests.
**Expected improvement:** 50-80% for cold-cache scenarios. Eliminates all per-package round trips.
**Complexity:** Very high. Requires PyPI server cooperation, PEP process, etc.
**Risk:** High. Requires ecosystem buy-in. Fallback to current behavior needed.
---
## 6. Summary of Key Files
| File | Role |
|------|------|
| `src/pip/_internal/index/collector.py` | Fetches index pages, parses HTML/JSON |
| `src/pip/_internal/index/package_finder.py` | Evaluates candidates, manages prefetch pool (16 threads) |
| `src/pip/_internal/network/session.py` | PipSession, connection pool config (20/16), adapters |
| `src/pip/_internal/network/cache.py` | SafeFileCache (filesystem-based HTTP cache) |
| `src/pip/_internal/network/download.py` | Downloader (sequential batch downloads!) |
| `src/pip/_internal/network/lazy_wheel.py` | LazyZipOverHTTP for range-request metadata |
| `src/pip/_internal/network/utils.py` | Accept-Encoding: identity for downloads, chunk streaming |
| `src/pip/_internal/operations/prepare.py` | RequirementPreparer, metadata fetch chain |
| `src/pip/_internal/resolution/resolvelib/factory.py` | Metadata prefetch pool (8 threads), candidate building |
| `src/pip/_internal/resolution/resolvelib/provider.py` | Triggers dep prefetch in get_dependencies() |
| `src/pip/_internal/resolution/resolvelib/resolver.py` | Kicks off root requirement prefetch |
| `src/pip/_internal/resolution/resolvelib/candidates.py` | Thread-safe dist preparation with _prepare_lock |
| `src/pip/_vendor/cachecontrol/adapter.py` | CacheControlAdapter -- intercepts requests for caching |
| `src/pip/_vendor/cachecontrol/controller.py` | Cache logic: max-age=0 bypass, conditional headers, 304 handling |
## 7. Critical Finding: Sequential Download Is The Biggest Remaining Win
The single most impactful optimization remaining is **parallelizing wheel downloads**. After resolution completes, `_complete_partial_requirements()` downloads all wheels sequentially through `Downloader.batch()`. This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.