544 lines
32 KiB
Markdown
544 lines
32 KiB
Markdown
|
|
# Pip I/O Layer Deep Analysis
|
||
|
|
|
||
|
|
Investigation date: 2026-04-08
|
||
|
|
Branch: `codeflash/optimize`
|
||
|
|
Investigator: Research agent
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Request Flow Diagram
|
||
|
|
|
||
|
|
```
|
||
|
|
User: pip install <pkg>
|
||
|
|
|
|
||
|
|
v
|
||
|
|
Resolver (resolvelib)
|
||
|
|
|
|
||
|
|
+-- provider.get_dependencies(candidate)
|
||
|
|
| +-- prefetch_packages(dep_names) [background threads]
|
||
|
|
|
|
||
|
|
+-- provider.find_matches(identifier)
|
||
|
|
| +-- factory.find_candidates()
|
||
|
|
| +-- finder.find_best_candidate(name)
|
||
|
|
| +-- finder.find_all_candidates(name)
|
||
|
|
| |
|
||
|
|
| +-- [check _all_candidates cache]
|
||
|
|
| +-- [check _prefetch_futures]
|
||
|
|
| +-- _do_fetch_all_candidates(name)
|
||
|
|
| |
|
||
|
|
| +-- link_collector.collect_sources(name)
|
||
|
|
| | +-- search_scope.get_index_urls_locations(name)
|
||
|
|
| | # => ["https://pypi.org/simple/<name>/"]
|
||
|
|
| |
|
||
|
|
| +-- source.page_candidates()
|
||
|
|
| +-- process_project_url(url)
|
||
|
|
| |
|
||
|
|
| +-- link_collector.fetch_response(url)
|
||
|
|
| | +-- _get_index_content(url)
|
||
|
|
| | +-- _get_simple_response(url, session)
|
||
|
|
| | |
|
||
|
|
| | v
|
||
|
|
| | session.get(url, headers={
|
||
|
|
| | Accept: "application/vnd.pypi.simple.v1+json, ...",
|
||
|
|
| | Cache-Control: "max-age=0"
|
||
|
|
| | })
|
||
|
|
| | |
|
||
|
|
| | v
|
||
|
|
| | CacheControlAdapter.send()
|
||
|
|
| | +-- controller.cached_request()
|
||
|
|
| | | # max-age=0 => ALWAYS bypasses cache
|
||
|
|
| | | # Adds If-None-Match / If-Modified-Since
|
||
|
|
| | +-- controller.conditional_headers()
|
||
|
|
| | +-- HTTPAdapter.send()
|
||
|
|
| | +-- urllib3.HTTPSConnectionPool.urlopen()
|
||
|
|
| | +-- _get_conn() [from pool queue]
|
||
|
|
| | +-- TLS handshake (if new conn)
|
||
|
|
| | +-- HTTP/1.1 GET request
|
||
|
|
| | +-- _put_conn() [return to pool]
|
||
|
|
| | +-- controller.cache_response()
|
||
|
|
| | # Stores response w/ ETag for
|
||
|
|
| | # future conditional requests
|
||
|
|
| |
|
||
|
|
| +-- [JSON] _evaluate_json_page()
|
||
|
|
| +-- [HTML] parse_links() + evaluate_links()
|
||
|
|
|
|
||
|
|
+-- candidate.dist [triggers metadata fetch]
|
||
|
|
+-- _prepare()
|
||
|
|
+-- preparer.prepare_linked_requirement()
|
||
|
|
+-- _fetch_metadata_only()
|
||
|
|
| +-- [1] _fetch_metadata_using_link_data_attr()
|
||
|
|
| | # PEP 658: GET <url>.metadata
|
||
|
|
| +-- [2] _fetch_metadata_using_lazy_wheel()
|
||
|
|
| # HTTP Range requests on .whl
|
||
|
|
| +-- LazyZipOverHTTP(url, session)
|
||
|
|
| +-- session.head(url) # get Content-Length
|
||
|
|
| +-- _check_zip() # range-fetch tail
|
||
|
|
| +-- ZipFile(self) # parse EOCD
|
||
|
|
+-- [3] Full download as fallback
|
||
|
|
```
|
||
|
|
|
||
|
|
### Request Count Per Package (typical PyPI resolution)
|
||
|
|
|
||
|
|
For each **unique package name** the resolver encounters:
|
||
|
|
1. **1 GET** for the index page (`/simple/<name>/`) -- conditional if cached
|
||
|
|
2. **1 GET** for metadata (PEP 658 `.metadata` file) -- OR --
|
||
|
|
**1 HEAD + 1-2 Range GETs** for lazy wheel metadata -- OR --
|
||
|
|
**1 GET** full wheel download as fallback
|
||
|
|
3. **1 GET** for the actual wheel download (after resolution)
|
||
|
|
|
||
|
|
For a workload like `boto3` with ~40 transitive deps:
|
||
|
|
- ~40 index page GETs (conditional requests)
|
||
|
|
- ~40 metadata GETs (PEP 658 when available)
|
||
|
|
- ~40 wheel download GETs
|
||
|
|
- **Total: ~120 HTTP requests minimum**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Per-Area Findings
|
||
|
|
|
||
|
|
### 2.1 HTTP Request Flow
|
||
|
|
|
||
|
|
**How requests are serialized:**
|
||
|
|
- The resolver processes packages **sequentially** through resolvelib's `resolve()` loop
|
||
|
|
- Each `find_matches()` call triggers `find_all_candidates()`, which fetches the index page **synchronously** (unless prefetched)
|
||
|
|
- Each `get_dependencies()` call triggers `candidate.dist`, which fetches metadata **synchronously** (unless prefetched)
|
||
|
|
|
||
|
|
**Existing parallelism (two separate thread pools):**
|
||
|
|
1. **Index page prefetch** (`PackageFinder._prefetch_executor`): 16 worker threads
|
||
|
|
- Triggered in `provider.get_dependencies()` for all discovered deps
|
||
|
|
- Triggered in `resolver.resolve()` for all root requirements
|
||
|
|
- Workers call `_do_fetch_all_candidates()` which does the full index fetch + evaluate pipeline
|
||
|
|
2. **Metadata prefetch** (`Factory._metadata_prefetch_executor`): 8 worker threads
|
||
|
|
- Triggered in `_iter_found_candidates()` for the top candidate only
|
||
|
|
- Workers call `candidate.dist` which triggers PEP 658 / lazy wheel
|
||
|
|
|
||
|
|
**Key finding: The two prefetch mechanisms are independent and both effective, but they don't coordinate.** The metadata prefetch for package B can't start until B's index page fetch completes. There is no pipelining of "index fetch -> immediately prefetch top candidate metadata."
|
||
|
|
|
||
|
|
**Redundant requests found:**
|
||
|
|
- `LazyZipOverHTTP.__init__()` sends a HEAD request (line 57 of lazy_wheel.py). If PEP 658 metadata is available, this HEAD is **never needed** -- the code tries PEP 658 first and only falls back to lazy wheel. This is correct and not redundant.
|
||
|
|
- However, the HEAD request in `LazyZipOverHTTP` is sent **even if the wheel doesn't support range requests**, wasting one round trip before discovering this.
|
||
|
|
- `_get_simple_response()` sends a HEAD before GET only if the URL looks like an archive (line 120-121 of collector.py). This is a rare case and correctly guarded.
|
||
|
|
|
||
|
|
### 2.2 Connection Reuse & Pooling
|
||
|
|
|
||
|
|
**Current configuration (session.py lines 388-389):**
|
||
|
|
```python
|
||
|
|
_pool_connections = 20 # urllib3 PoolManager caches pools for 20 distinct hosts
|
||
|
|
_pool_maxsize = 16 # Each pool keeps up to 16 idle connections
|
||
|
|
```
|
||
|
|
|
||
|
|
**Analysis:**
|
||
|
|
- Pool is correctly sized for 16 prefetch workers
|
||
|
|
- `pool_block=False` (default) means excess connections proceed but aren't returned to pool
|
||
|
|
- **Connections ARE reused** for same-host requests through urllib3's `HTTPSConnectionPool._get_conn()` / `_put_conn()` mechanism
|
||
|
|
- HTTP/1.1 keep-alive works by default (urllib3 uses persistent connections)
|
||
|
|
- The connection pool is per-(host, port, scheme), so `pypi.org:443` and `files.pythonhosted.org:443` each get their own pool
|
||
|
|
- **A typical pip install touches only 2-3 hosts**: `pypi.org` (index pages), `files.pythonhosted.org` (wheel downloads, metadata), and possibly an extra index. Pool of 20 is more than adequate.
|
||
|
|
|
||
|
|
**TLS Handshake Analysis:**
|
||
|
|
- A TLS handshake happens **once per connection** (not per request)
|
||
|
|
- With pool_maxsize=16, up to 16 connections are kept alive per host
|
||
|
|
- The 16 prefetch threads can each hold a connection, so in theory all 16 reuse their TLS sessions
|
||
|
|
- **Risk:** If more than 16 requests fire concurrently to the same host, excess connections are created and then **discarded** (not pooled), causing extra TLS handshakes. With `pool_block=False`, they proceed but the connection is thrown away after use.
|
||
|
|
|
||
|
|
**Finding:** The pool is sized well for the current prefetch concurrency. No wasted TLS handshakes under normal operation.
|
||
|
|
|
||
|
|
### 2.3 Caching Layer
|
||
|
|
|
||
|
|
**How CacheControl works with pip:**
|
||
|
|
|
||
|
|
1. **Index pages (`/simple/<name>/`):** Sent with `Cache-Control: max-age=0` header
|
||
|
|
- CacheControl controller sees `max-age=0` and **always bypasses the cache** (controller.py line 184-186)
|
||
|
|
- But it adds conditional headers (`If-None-Match`, `If-Modified-Since`) via `conditional_headers()`
|
||
|
|
- On 304 Not Modified, the cached response is served (no body transfer)
|
||
|
|
- On 200, the response is cached with its ETag for next time
|
||
|
|
- **This is working as intended** -- ensures freshness while avoiding re-downloading unchanged index pages
|
||
|
|
|
||
|
|
2. **Package downloads (wheels, sdists):** Sent via `Downloader._http_get()` with `Accept-Encoding: identity`
|
||
|
|
- No `Cache-Control: max-age=0` header on these requests
|
||
|
|
- CacheControl can serve fully cached responses for packages that haven't changed
|
||
|
|
- `SafeFileCache` stores metadata + body as separate files on disk
|
||
|
|
- Cache key is the full URL (after normalization)
|
||
|
|
|
||
|
|
3. **PEP 658 metadata files:** Fetched via `get_http_url()` using the Downloader
|
||
|
|
- Same caching behavior as package downloads
|
||
|
|
- Small files (~5-50KB), cached effectively
|
||
|
|
|
||
|
|
4. **Lazy wheel range requests:** Sent with `Cache-Control: no-cache`
|
||
|
|
- **Explicitly bypasses caching** (lazy_wheel.py line 180)
|
||
|
|
- This is correct -- range requests for ZIP metadata shouldn't be cached as full responses
|
||
|
|
|
||
|
|
**Cache efficiency finding:**
|
||
|
|
- The `max-age=0` on index pages means **every resolution always incurs at least one conditional round-trip per package**. This is the single biggest I/O constraint for warm-cache scenarios.
|
||
|
|
- For a `pip install --upgrade` with warm cache, all 40 index page requests still go to the network (as conditional GETs), but most return 304 with no body. Each 304 round-trip costs ~50-100ms (RTT to pypi.org).
|
||
|
|
- **Total warm-cache overhead: 40 * ~80ms = ~3.2 seconds** just in sequential conditional GETs (partially parallelized by prefetch).
|
||
|
|
|
||
|
|
### 2.4 Metadata Fetching
|
||
|
|
|
||
|
|
**Fallback chain (prepare.py `_fetch_metadata_only()`):**
|
||
|
|
1. **PEP 658 metadata** (`_fetch_metadata_using_link_data_attr()`):
|
||
|
|
- Checks `link.metadata_link()` -- the link must have `data-dist-info-metadata` or `core-metadata` attribute
|
||
|
|
- If present, downloads the separate `.metadata` file (tiny: 5-50KB)
|
||
|
|
- **PyPI supports PEP 658** for all wheels uploaded after ~2023
|
||
|
|
- This is the fastest path: single small GET
|
||
|
|
|
||
|
|
2. **Lazy wheel** (`_fetch_metadata_using_lazy_wheel()`):
|
||
|
|
- Requires `--use-feature=fast-deps` flag
|
||
|
|
- Sends HEAD to get Content-Length and check Accept-Ranges
|
||
|
|
- Downloads the tail of the wheel (ZIP end-of-central-directory) via range requests
|
||
|
|
- Parses the ZIP to find METADATA file, downloads just that range
|
||
|
|
- **2-4 HTTP requests per wheel** (HEAD + 1-3 range GETs)
|
||
|
|
- Has a `_lazy_wheel_cache` to avoid redundant range requests for same URL
|
||
|
|
|
||
|
|
3. **Full download** (fallback):
|
||
|
|
- Downloads entire wheel/sdist
|
||
|
|
- For wheels: extracts metadata from the archive
|
||
|
|
- For sdists: runs `setup.py egg_info` or `pyproject.toml` build
|
||
|
|
- **Most expensive path**
|
||
|
|
|
||
|
|
**Key finding:** PEP 658 is the dominant path for PyPI packages. The speculative metadata prefetch (factory.py) eagerly builds the top candidate and submits a background thread to fetch its metadata. This overlaps metadata I/O with resolution logic.
|
||
|
|
|
||
|
|
**Optimization in place:** `_lazy_wheel_cache` (prepare.py line 288) prevents duplicate range requests when a package is evaluated with different extras (e.g., `pkg` and `pkg[extra]`).
|
||
|
|
|
||
|
|
### 2.5 DNS & TLS
|
||
|
|
|
||
|
|
**DNS resolution:**
|
||
|
|
- urllib3 delegates to Python's `socket.create_connection()` which calls `getaddrinfo()`
|
||
|
|
- **No DNS caching in urllib3 or pip** -- relies on OS-level DNS cache
|
||
|
|
- However, connection pooling effectively caches DNS results because connections persist
|
||
|
|
- With 16 pool connections to `pypi.org`, DNS is resolved at most once per connection creation
|
||
|
|
|
||
|
|
**TLS handshakes:**
|
||
|
|
- One TLS handshake per connection (not per request)
|
||
|
|
- Connection pooling limits handshakes to pool_maxsize (16) per host
|
||
|
|
- Python's `ssl` module handles TLS session resumption at the OpenSSL level
|
||
|
|
- `_SSLContextAdapterMixin` (session.py line 255) properly forwards the SSL context to pools
|
||
|
|
|
||
|
|
**Finding:** DNS and TLS are not significant bottlenecks. The connection pool effectively amortizes both costs. Pre-warming is not needed because the first batch of prefetch requests creates all needed connections.
|
||
|
|
|
||
|
|
### 2.6 HTTP/2 and Protocol
|
||
|
|
|
||
|
|
**Current state: pip uses HTTP/1.1 exclusively.**
|
||
|
|
|
||
|
|
- The vendored `urllib3` (version appears to be 1.x/2.x line) does not support HTTP/2
|
||
|
|
- The vendored `requests` library has no HTTP/2 support
|
||
|
|
- There are **no references** to HTTP/2, h2, or hyper anywhere in pip's codebase
|
||
|
|
|
||
|
|
**Would HTTP/2 help?**
|
||
|
|
- **Index page fetches:** HTTP/2 multiplexing would allow sending all ~40 index page requests over a **single TCP connection** to pypi.org. Currently, each of the 16 prefetch threads uses its own connection. With HTTP/2, one connection handles all requests, eliminating 15 TLS handshakes and reducing head-of-line blocking.
|
||
|
|
- **Metadata fetches:** Similarly multiplexed over the same connection.
|
||
|
|
- **Package downloads:** Less benefit -- these are large sequential downloads.
|
||
|
|
|
||
|
|
**Estimated benefit:** For index-heavy workloads (many small packages), HTTP/2 could reduce the connection setup overhead by ~90% and improve throughput by 20-30% due to multiplexing.
|
||
|
|
|
||
|
|
**What it would take:**
|
||
|
|
- Replace vendored `requests`/`urllib3` with `httpx` (supports HTTP/2 via `h2`) or add `h2` to urllib3
|
||
|
|
- Major architectural change -- affects all of pip's network layer
|
||
|
|
- PyPI's CDN (Fastly) already supports HTTP/2
|
||
|
|
|
||
|
|
### 2.7 Parallel I/O Architecture
|
||
|
|
|
||
|
|
**Index page prefetch (PackageFinder):**
|
||
|
|
```python
|
||
|
|
# package_finder.py lines 1535-1556
|
||
|
|
def prefetch_packages(self, project_names):
|
||
|
|
with self._prefetch_lock:
|
||
|
|
for name in project_names:
|
||
|
|
if name in self._all_candidates or name in self._prefetch_futures:
|
||
|
|
continue
|
||
|
|
if self._prefetch_executor is None:
|
||
|
|
self._prefetch_executor = ThreadPoolExecutor(max_workers=16)
|
||
|
|
self._prefetch_futures[name] = self._prefetch_executor.submit(
|
||
|
|
self._do_fetch_all_candidates, name
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
- Called from two places:
|
||
|
|
1. `resolver.resolve()` -- submits all root requirements upfront
|
||
|
|
2. `provider.get_dependencies()` -- submits all discovered deps
|
||
|
|
- Workers run `_do_fetch_all_candidates()` which does the full pipeline:
|
||
|
|
collect_sources -> fetch_response -> parse/evaluate
|
||
|
|
- Results cached in `_all_candidates` dict
|
||
|
|
- `find_all_candidates()` checks futures with 10s timeout
|
||
|
|
|
||
|
|
**Metadata prefetch (Factory):**
|
||
|
|
```python
|
||
|
|
# factory.py lines 188-245
|
||
|
|
def _prefetch_top_candidate_metadata(self, name, top_info, extras, template):
|
||
|
|
# Build top candidate eagerly (cheap: wheel-cache lookup)
|
||
|
|
candidate = build_func()
|
||
|
|
# Only prefetch for remote wheels
|
||
|
|
if link.is_file or not link.is_wheel:
|
||
|
|
return
|
||
|
|
def _do_prefetch():
|
||
|
|
candidate.dist # triggers prepare_linked_requirement()
|
||
|
|
# Submit to 8-thread pool
|
||
|
|
self._metadata_prefetch_executor.submit(_do_prefetch)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Serialization points that force sequential I/O:**
|
||
|
|
1. **resolvelib's main loop is single-threaded.** Each round processes one package at a time. Even with prefetching, the resolver can only consume one result at a time.
|
||
|
|
2. **`_complete_partial_requirements()`** (prepare.py line 474) downloads all "needs more preparation" requirements **sequentially** via `self._download.batch()` -- which is just a for-loop, NOT actually batched/parallel.
|
||
|
|
3. **The `Downloader.batch()` method** (download.py line 179-184) is misleadingly named -- it's a sequential for-loop:
|
||
|
|
```python
|
||
|
|
def batch(self, links, location):
|
||
|
|
for link in links:
|
||
|
|
filepath, content_type = self(link, location)
|
||
|
|
yield link, (filepath, content_type)
|
||
|
|
```
|
||
|
|
**This is a significant finding.** All final wheel downloads happen sequentially.
|
||
|
|
|
||
|
|
### 2.8 Response Compression
|
||
|
|
|
||
|
|
**Index page requests:**
|
||
|
|
- The `_get_simple_response()` in collector.py sets custom `Accept` headers but does NOT set `Accept-Encoding`
|
||
|
|
- Requests library's default `Accept-Encoding` header is `gzip, deflate` (from urllib3's `ACCEPT_ENCODING = "gzip,deflate"`, applied by requests' `default_headers()`)
|
||
|
|
- **Index pages ARE compressed** by PyPI/Fastly with gzip. The requests library transparently decompresses them.
|
||
|
|
- No brotli support (would require `brotli` or `brotlicffi` package)
|
||
|
|
|
||
|
|
**Package downloads:**
|
||
|
|
- `Downloader._http_get()` uses `HEADERS = {"Accept-Encoding": "identity"}` (utils.py line 26)
|
||
|
|
- **Package downloads explicitly disable compression.** This is intentional -- packages are already compressed archives (wheels are ZIP files, sdists are .tar.gz). Re-compressing would waste CPU and break hash verification.
|
||
|
|
- `response_chunks()` uses `decode_content=False` to preserve raw bytes for hash checking.
|
||
|
|
|
||
|
|
**Finding:** Compression is correctly handled. Index pages use gzip (transparent). Packages disable compression (correct). No improvement opportunity here.
|
||
|
|
|
||
|
|
### 2.9 Lazy/Streaming Approaches
|
||
|
|
|
||
|
|
**Current behavior:**
|
||
|
|
- Index pages: `response.content` (collector.py line 309) reads the entire response into memory before parsing
|
||
|
|
- JSON index pages can be 50KB-2MB for popular packages (e.g., boto3 has ~12,000 file entries)
|
||
|
|
- HTML index pages are similar in size
|
||
|
|
|
||
|
|
**Streaming opportunity:**
|
||
|
|
- JSON index pages COULD be streamed using an incremental JSON parser (e.g., `ijson`)
|
||
|
|
- However, `json.loads()` on a 1MB string takes ~5ms -- negligible compared to the ~80ms network round-trip
|
||
|
|
- The real cost is not parsing but **candidate evaluation** -- the `_evaluate_json_page()` fast path already handles this efficiently with a single-pass fused pipeline
|
||
|
|
|
||
|
|
**Early abort opportunity:**
|
||
|
|
- When the resolver only needs the "best" (newest compatible) version, we could theoretically abort after finding it
|
||
|
|
- **Problem:** The index page must be fully fetched before we know all versions (no streaming API from PyPI)
|
||
|
|
- The speculative metadata prefetch already handles this by eagerly fetching metadata for the top candidate
|
||
|
|
|
||
|
|
**Finding:** Streaming/early-abort offers negligible benefit for index pages because network latency dominates. The JSON parsing is already fast.
|
||
|
|
|
||
|
|
### 2.10 PyPI-Specific Optimizations
|
||
|
|
|
||
|
|
**Bulk/batch APIs:**
|
||
|
|
- PyPI has no bulk metadata API (no way to get metadata for 40 packages in one request)
|
||
|
|
- The Simple Repository API (PEP 503/691) is package-by-package
|
||
|
|
- There is no "dependency tree" API that would let pip skip index page fetches
|
||
|
|
|
||
|
|
**CDN-level optimizations already in use:**
|
||
|
|
- `Cache-Control: max-age=0` with conditional requests (ETags/Last-Modified) -- implemented
|
||
|
|
- PyPI responses include strong ETags
|
||
|
|
- 304 responses save bandwidth but still cost one RTT each
|
||
|
|
|
||
|
|
**JSON API:**
|
||
|
|
- pip already prefers the JSON Simple API (`application/vnd.pypi.simple.v1+json`) via Accept header priority
|
||
|
|
- The JSON path (`_evaluate_json_page()`) is heavily optimized with fused evaluation
|
||
|
|
- PyPI's JSON API doesn't support partial responses or field selection
|
||
|
|
|
||
|
|
**Server-push / Link preload:**
|
||
|
|
- PyPI doesn't support HTTP/2 Server Push for metadata files
|
||
|
|
- Even with HTTP/2, the server can't know which wheel the client will pick
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Optimization Ideas (Ranked by Expected Impact)
|
||
|
|
|
||
|
|
### Tier 1: High Impact (10-30% wall-time reduction)
|
||
|
|
|
||
|
|
#### 3.1 Parallel Wheel Downloads
|
||
|
|
**What:** Replace the sequential `Downloader.batch()` for-loop with concurrent.futures.ThreadPoolExecutor.
|
||
|
|
**Where:** `src/pip/_internal/network/download.py` lines 179-184 and `src/pip/_internal/operations/prepare.py` lines 492-493.
|
||
|
|
**Why:** After resolution completes, all wheels are downloaded sequentially. For 40 packages, this is 40 sequential HTTP GETs. Parallelizing would overlap download + write for multiple packages.
|
||
|
|
**Expected improvement:** 15-25% of total wall time for download-heavy workloads. With 8 parallel downloads, the download phase shrinks from ~40 * avg_time to ~5 * avg_time.
|
||
|
|
**Complexity:** Medium. Need to handle progress bar display for parallel downloads and ensure thread safety.
|
||
|
|
**Risk:** Low -- downloads are independent operations.
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
#### 3.2 Pipeline Index Fetch + Metadata Prefetch
|
||
|
|
**What:** When an index page prefetch completes, immediately trigger metadata prefetch for the top candidate -- don't wait for the resolver to consume the index result.
|
||
|
|
**Where:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()` should call `factory._prefetch_top_candidate_metadata()` at the end.
|
||
|
|
**Why:** Currently, there's a gap between index fetch completion and metadata prefetch submission. The metadata prefetch only fires when the resolver calls `_iter_found_candidates()`. This gap can be 100ms-2s depending on how fast the resolver processes.
|
||
|
|
**Expected improvement:** 5-15% for resolution-heavy workloads. Eliminates the serial gap between "index data ready" and "metadata fetch starts."
|
||
|
|
**Complexity:** Medium. Requires threading coordination between PackageFinder and Factory. The PackageFinder would need a reference to the Factory (currently doesn't have one).
|
||
|
|
**Risk:** Low-medium -- need to ensure thread safety for candidate cache.
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
#### 3.3 Increase Metadata Prefetch Depth
|
||
|
|
**What:** Prefetch metadata for top N candidates (not just the top 1), and prefetch for ALL packages whose index is ready (not just when the resolver asks).
|
||
|
|
**Where:** `src/pip/_internal/resolution/resolvelib/factory.py` `_prefetch_top_candidate_metadata()`.
|
||
|
|
**Why:** The resolver sometimes backtracks and needs the 2nd or 3rd candidate. Currently only the top candidate's metadata is prefetched. Prefetching the top 2-3 would prevent serial metadata fetches during backtracking.
|
||
|
|
**Expected improvement:** 3-8% for workloads with backtracking.
|
||
|
|
**Complexity:** Low.
|
||
|
|
**Risk:** Low. Wastes some bandwidth on metadata that may not be needed, but metadata files are tiny (5-50KB).
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
### Tier 2: Medium Impact (5-15% wall-time reduction)
|
||
|
|
|
||
|
|
#### 3.4 HTTP/2 Support via httpx
|
||
|
|
**What:** Replace the `requests` + `urllib3` stack with `httpx` which supports HTTP/2 multiplexing.
|
||
|
|
**Why:** With HTTP/2, all index page requests and metadata fetches to pypi.org can be multiplexed over a single TCP connection. This eliminates 15 extra TLS handshakes and allows the server to interleave responses.
|
||
|
|
**Expected improvement:** 10-20% for cold-cache workloads (fewer TLS handshakes, multiplexed requests). Less impact for warm-cache (304 responses are already small).
|
||
|
|
**Complexity:** Very high. Fundamental change to pip's network layer. Would affect caching, authentication, proxies, all adapters.
|
||
|
|
**Risk:** High -- potential for regressions across pip's extensive networking surface.
|
||
|
|
**pip-only change:** Yes, but major architectural change.
|
||
|
|
|
||
|
|
#### 3.5 Conditional Request Short-Circuit for Index Pages
|
||
|
|
**What:** For warm-cache scenarios, batch all conditional index page requests into concurrent futures BEFORE the resolver starts, rather than lazily.
|
||
|
|
**Where:** Before calling `resolver.resolve()`, pre-submit conditional GETs for ALL packages known from the lock file or previous resolution.
|
||
|
|
**Why:** Currently, prefetch only fires as the resolver discovers dependencies. If pip could predict the dependency set (from a lock file or previous run), all ~40 conditional GETs could be fired simultaneously.
|
||
|
|
**Expected improvement:** 5-10% for warm-cache repeat installs. Turns 3.2s of serial conditional GETs into <0.5s of parallel ones.
|
||
|
|
**Complexity:** Medium. Need a mechanism to predict the package set (lock file, cache of previous resolution result).
|
||
|
|
**Risk:** Low -- conditional GETs are safe to fire speculatively.
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
#### 3.6 Connection Pre-warming
|
||
|
|
**What:** Open TLS connections to pypi.org and files.pythonhosted.org at session creation time, before any requests.
|
||
|
|
**Where:** `src/pip/_internal/network/session.py` `PipSession.__init__()`.
|
||
|
|
**Why:** The first request to each host pays the TCP + TLS handshake cost (~100-200ms). Pre-warming during argument parsing / environment setup overlaps this with CPU work.
|
||
|
|
**Expected improvement:** 2-5% (saves ~200ms one-time cost).
|
||
|
|
**Complexity:** Low.
|
||
|
|
**Risk:** Low -- harmless if the connections go unused (they just time out).
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
### Tier 3: Low Impact (1-5% wall-time reduction)
|
||
|
|
|
||
|
|
#### 3.7 Cache Index ETags In-Memory Across Packages
|
||
|
|
**What:** After the first conditional GET returns an ETag for `pypi.org/simple/`, cache the server's response pattern in memory. Some CDNs return the same 304 pattern for all resources with the same age.
|
||
|
|
**Expected improvement:** Negligible (<1%). The conditional request still requires a round trip.
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
#### 3.8 Brotli Compression for Index Pages
|
||
|
|
**What:** Add `brotli` or `brotlicffi` as an optional dependency so index page responses can be compressed with brotli (better compression ratio than gzip).
|
||
|
|
**Why:** Brotli can compress JSON index pages 20-30% better than gzip, reducing transfer time for large index pages.
|
||
|
|
**Expected improvement:** 1-3% for cold-cache scenarios. Index pages are typically 50KB-2MB; brotli saves ~30% of that.
|
||
|
|
**Complexity:** Low. Just add the dependency and urllib3/requests will advertise brotli support.
|
||
|
|
**Risk:** Low. Optional dependency, gzip fallback.
|
||
|
|
**pip-only change:** Yes.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Quick Wins (< 50 lines of code)
|
||
|
|
|
||
|
|
### QW1: Parallel Wheel Downloads (the biggest quick win)
|
||
|
|
**File:** `src/pip/_internal/operations/prepare.py` `_complete_partial_requirements()`
|
||
|
|
**Change:** Replace the sequential `self._download.batch()` loop with `ThreadPoolExecutor.map()`:
|
||
|
|
```python
|
||
|
|
# Current (sequential):
|
||
|
|
batch_download = self._download.batch(links_to_fully_download.keys(), temp_dir)
|
||
|
|
for link, (filepath, _) in batch_download:
|
||
|
|
...
|
||
|
|
|
||
|
|
# Proposed (parallel):
|
||
|
|
with ThreadPoolExecutor(max_workers=8) as pool:
|
||
|
|
results = pool.map(
|
||
|
|
lambda link: (link, self._download(link, temp_dir)),
|
||
|
|
links_to_fully_download.keys()
|
||
|
|
)
|
||
|
|
for link, (filepath, _) in results:
|
||
|
|
...
|
||
|
|
```
|
||
|
|
**Lines:** ~15 changed
|
||
|
|
**Impact:** 15-25% wall-time reduction on download-heavy workloads
|
||
|
|
|
||
|
|
### QW2: Pipeline Index + Metadata Prefetch
|
||
|
|
**File:** `src/pip/_internal/index/package_finder.py` `_do_fetch_all_candidates()`
|
||
|
|
**Change:** After building the candidate list, immediately trigger metadata prefetch for the top candidate if a factory callback is registered:
|
||
|
|
```python
|
||
|
|
# At the end of _do_fetch_all_candidates:
|
||
|
|
if self._metadata_prefetch_callback and self._all_candidates[project_name]:
|
||
|
|
self._metadata_prefetch_callback(project_name, self._all_candidates[project_name])
|
||
|
|
```
|
||
|
|
**Lines:** ~20 changed (add callback registration + invocation)
|
||
|
|
**Impact:** 5-15% for resolution-heavy workloads
|
||
|
|
|
||
|
|
### QW3: Connection Pre-warming
|
||
|
|
**File:** `src/pip/_internal/network/session.py`
|
||
|
|
**Change:** Add a `prewarm()` method that opens connections to known hosts in background threads:
|
||
|
|
```python
|
||
|
|
def prewarm(self, urls: list[str]) -> None:
|
||
|
|
"""Open TCP+TLS connections in background to reduce first-request latency."""
|
||
|
|
from concurrent.futures import ThreadPoolExecutor
|
||
|
|
def _warm(url):
|
||
|
|
try:
|
||
|
|
self.head(url, timeout=5)
|
||
|
|
except Exception:
|
||
|
|
pass
|
||
|
|
with ThreadPoolExecutor(max_workers=2) as pool:
|
||
|
|
pool.map(_warm, urls)
|
||
|
|
```
|
||
|
|
**Lines:** ~15
|
||
|
|
**Impact:** 2-5% (saves ~200ms startup)
|
||
|
|
|
||
|
|
### QW4: Prefetch Top 2-3 Candidates' Metadata
|
||
|
|
**File:** `src/pip/_internal/resolution/resolvelib/factory.py`
|
||
|
|
**Change:** In `_iter_found_candidates()`, prefetch metadata for top 2-3 candidates instead of just top 1:
|
||
|
|
```python
|
||
|
|
# Current: prefetch only infos_list[0]
|
||
|
|
# Proposed: prefetch infos_list[0:3]
|
||
|
|
for info in infos_list[:3]:
|
||
|
|
self._prefetch_top_candidate_metadata(name, info, extras, template)
|
||
|
|
```
|
||
|
|
**Lines:** ~10 changed
|
||
|
|
**Impact:** 3-8% for workloads with backtracking
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Big Bets (Architectural Changes for 20%+ Improvement)
|
||
|
|
|
||
|
|
### BB1: Fully Parallel Resolution Pipeline
|
||
|
|
**Description:** Replace the sequential resolvelib loop with a resolution architecture where ALL I/O is fully parallel. When the resolver needs data for package X, it doesn't block -- it queues the need and processes another package. When I/O completes, the resolver is notified.
|
||
|
|
**Mechanism:** This is essentially an async resolver. Could be implemented with:
|
||
|
|
- asyncio event loop driving the resolver
|
||
|
|
- `aiohttp` or `httpx` async client for HTTP
|
||
|
|
- resolvelib with a coroutine-based provider
|
||
|
|
**Expected improvement:** 30-50% for large dependency trees. Eliminates all serial I/O gaps.
|
||
|
|
**Complexity:** Very high. Fundamental architectural change to pip's resolver integration.
|
||
|
|
**Risk:** High -- resolvelib is synchronous by design.
|
||
|
|
|
||
|
|
### BB2: HTTP/2 Multiplexing
|
||
|
|
**Description:** Replace vendored `requests` + `urllib3` with `httpx` (which supports HTTP/2 via h2).
|
||
|
|
**Expected improvement:** 20-30% for cold-cache workloads. All requests to pypi.org multiplex over one connection. No head-of-line blocking between index page requests.
|
||
|
|
**Complexity:** Very high. ~500+ line change touching all network code.
|
||
|
|
**Risk:** High.
|
||
|
|
|
||
|
|
### BB3: Dependency Prediction + Bulk Prefetch
|
||
|
|
**Description:** Maintain a local cache of "last resolved dependency tree" per project. On next `pip install`, immediately fire all index page + metadata prefetch requests for the predicted set BEFORE the resolver starts.
|
||
|
|
**Expected improvement:** 20-40% for repeat installs. Instead of discovering dependencies one-by-one through resolution, fire all 40+ conditional GETs simultaneously at startup.
|
||
|
|
**Complexity:** Medium-high. Need a prediction cache format, staleness detection, and graceful handling of prediction misses.
|
||
|
|
**Risk:** Medium. Wrong predictions waste bandwidth but don't cause correctness issues.
|
||
|
|
|
||
|
|
### BB4: Server-Side Dependency Resolution API
|
||
|
|
**Description:** Propose a PyPI API extension that accepts a requirements list and returns the resolved dependency tree (with all metadata). One HTTP request replaces 120+ requests.
|
||
|
|
**Expected improvement:** 50-80% for cold-cache scenarios. Eliminates all per-package round trips.
|
||
|
|
**Complexity:** Very high. Requires PyPI server cooperation, PEP process, etc.
|
||
|
|
**Risk:** High. Requires ecosystem buy-in. Fallback to current behavior needed.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Summary of Key Files
|
||
|
|
|
||
|
|
| File | Role |
|
||
|
|
|------|------|
|
||
|
|
| `src/pip/_internal/index/collector.py` | Fetches index pages, parses HTML/JSON |
|
||
|
|
| `src/pip/_internal/index/package_finder.py` | Evaluates candidates, manages prefetch pool (16 threads) |
|
||
|
|
| `src/pip/_internal/network/session.py` | PipSession, connection pool config (20/16), adapters |
|
||
|
|
| `src/pip/_internal/network/cache.py` | SafeFileCache (filesystem-based HTTP cache) |
|
||
|
|
| `src/pip/_internal/network/download.py` | Downloader (sequential batch downloads!) |
|
||
|
|
| `src/pip/_internal/network/lazy_wheel.py` | LazyZipOverHTTP for range-request metadata |
|
||
|
|
| `src/pip/_internal/network/utils.py` | Accept-Encoding: identity for downloads, chunk streaming |
|
||
|
|
| `src/pip/_internal/operations/prepare.py` | RequirementPreparer, metadata fetch chain |
|
||
|
|
| `src/pip/_internal/resolution/resolvelib/factory.py` | Metadata prefetch pool (8 threads), candidate building |
|
||
|
|
| `src/pip/_internal/resolution/resolvelib/provider.py` | Triggers dep prefetch in get_dependencies() |
|
||
|
|
| `src/pip/_internal/resolution/resolvelib/resolver.py` | Kicks off root requirement prefetch |
|
||
|
|
| `src/pip/_internal/resolution/resolvelib/candidates.py` | Thread-safe dist preparation with _prepare_lock |
|
||
|
|
| `src/pip/_vendor/cachecontrol/adapter.py` | CacheControlAdapter -- intercepts requests for caching |
|
||
|
|
| `src/pip/_vendor/cachecontrol/controller.py` | Cache logic: max-age=0 bypass, conditional headers, 304 handling |
|
||
|
|
|
||
|
|
## 7. Critical Finding: Sequential Download Is The Biggest Remaining Win
|
||
|
|
|
||
|
|
The single most impactful optimization remaining is **parallelizing wheel downloads**. After resolution completes, `_complete_partial_requirements()` downloads all wheels sequentially through `Downloader.batch()`. This is purely sequential I/O with no data dependencies between packages. With 40 packages averaging 500KB each at ~50ms per download, the sequential phase takes ~2 seconds. Parallelizing with 8 workers would reduce this to ~0.25 seconds -- a potential 15-25% total wall-time improvement depending on the workload.
|