mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
157 lines
7.1 KiB
Markdown
157 lines
7.1 KiB
Markdown
|
|
# pip Performance Optimization
|
|||
|
|
|
|||
|
|
End-to-end performance optimization of [pip](https://github.com/pypa/pip), the Python package installer. 122 commits across startup, dependency resolution, packaging, import deferral, and vendored Rich.
|
|||
|
|
|
|||
|
|
## Results
|
|||
|
|
|
|||
|
|
**Environment**: Python 3.15.0a7, macOS arm64 (Apple Silicon), ~27 packages installed, HTTP cache warm, hyperfine (5–10 runs, 2–3 warmup)
|
|||
|
|
|
|||
|
|
### Startup
|
|||
|
|
|
|||
|
|
| Benchmark | main | optimized | Speedup |
|
|||
|
|
|---|---:|---:|---:|
|
|||
|
|
| `pip --version` | 138ms | **20ms** | **7.0x** |
|
|||
|
|
| `pip --help` | 143ms | **121ms** | **1.18x** |
|
|||
|
|
|
|||
|
|
### Dependency Resolution
|
|||
|
|
|
|||
|
|
| Benchmark | main | optimized | Speedup |
|
|||
|
|
|---|---:|---:|---:|
|
|||
|
|
| `requests` (~5 deps) | 589ms | **516ms** | **1.14x** |
|
|||
|
|
| `flask + django` (~15 deps) | 708ms | **599ms** | **1.18x** |
|
|||
|
|
| `flask + django + boto3 + requests` (~30 deps) | 1,493ms | **826ms** | **1.81x** |
|
|||
|
|
| `fastapi[standard]` (~42 deps) | 13,325ms | **11,664ms** | **1.14x** |
|
|||
|
|
|
|||
|
|
### Package Operations
|
|||
|
|
|
|||
|
|
| Benchmark | main | optimized | Speedup |
|
|||
|
|
|---|---:|---:|---:|
|
|||
|
|
| `pip list` | 162ms | **146ms** | **1.11x** |
|
|||
|
|
| `pip freeze` | 225ms | **211ms** | **1.07x** |
|
|||
|
|
| `pip show pip` | 162ms | **148ms** | **1.09x** |
|
|||
|
|
| `install -r requirements.txt` (21 pkgs) | 1,344ms | **740ms** | **1.82x** |
|
|||
|
|
|
|||
|
|
### Totals
|
|||
|
|
|
|||
|
|
| | main | optimized | Speedup |
|
|||
|
|
|-|---:|---:|---:|
|
|||
|
|
| **All benchmarks** | 18,717ms | 15,223ms | **1.23x** |
|
|||
|
|
| **Excluding fastapi[standard]** | 5,392ms | 3,559ms | **1.51x** |
|
|||
|
|
|
|||
|
|
## What We Optimized (122 commits)
|
|||
|
|
|
|||
|
|
### 1. Startup
|
|||
|
|
- Ultra-fast `--version` path in `__main__.py` — exits before importing `pip._internal` (138ms → 20ms)
|
|||
|
|
- Fast-path `--version` in `cli/main.py` — avoids `pip._internal.utils.misc` import
|
|||
|
|
- Deferred `base_command.py` import chain to command creation time
|
|||
|
|
- Deferred `Configuration` module loading
|
|||
|
|
- Deferred autocompletion imports behind `PIP_AUTO_COMPLETE` check
|
|||
|
|
|
|||
|
|
### 2. Dependency Resolver — Architecture
|
|||
|
|
- **Speculative metadata prefetch**: background thread downloads PEP 658 metadata for the top candidate while the resolver processes other packages
|
|||
|
|
- **Conditional Criterion rebuild**: `_remove_information_from_criteria` skips rebuilding unaffected criteria, eliminating ~95% of allocations
|
|||
|
|
- **`__slots__` on Criterion**: reduces per-instance memory by ~100 bytes
|
|||
|
|
- Two-level cache for `_iter_found_candidates` (specifier merge + candidate infos)
|
|||
|
|
- Parallel index-page prefetch during dependency resolution
|
|||
|
|
- Unified shared ThreadPoolExecutor for parallel wheel downloads
|
|||
|
|
|
|||
|
|
### 3. Dependency Resolver — Micro
|
|||
|
|
- Cached wheel tag priority dict on `TargetPython`
|
|||
|
|
- Pre-extracted requirements tuple on `Criterion` for per-call avoidance of generator expressions
|
|||
|
|
- Cached `Marker.evaluate()` results for repeated extra lookups
|
|||
|
|
- Hoisted `operator.methodcaller`/`attrgetter` to module-level constants
|
|||
|
|
- Cached `_sort_key` results to avoid double evaluation in `compute_best_candidate`
|
|||
|
|
|
|||
|
|
### 4. Packaging (vendored `pip._vendor.packaging`)
|
|||
|
|
- Replaced `_tokenizer` dataclass with `__slots__` class
|
|||
|
|
- Deferred `Version.__hash__` computation until first call
|
|||
|
|
- Integer comparison key (`_cmp_int`) — avoids full `_key` tuple construction
|
|||
|
|
- Bisect-based `filter_versions` for O(log n + k) batch filtering
|
|||
|
|
- Pre-computed integer bounds on `SpecifierSet` for fast rejection
|
|||
|
|
- Cached parsed `Version`, `Requirement`, `Specifier` objects
|
|||
|
|
- Fast-path tokenizer for simple tokens to bypass regex engine
|
|||
|
|
- Direct release-tuple prefix comparison in `_compare_equal` / `_compare_compatible`
|
|||
|
|
|
|||
|
|
### 5. Link and Wheel Parsing
|
|||
|
|
- Pre-computed `Link._is_wheel` slot to avoid repeated `splitext`
|
|||
|
|
- Cached URL scheme on `Link` to skip `urlsplit` for `is_vcs`/`is_file`
|
|||
|
|
- Inlined Link construction in `_evaluate_json_page` to skip redundant work
|
|||
|
|
- `rsplit` instead of `rfind`x3 for wheel tag extraction
|
|||
|
|
- Cached `parse_tag` results to eliminate redundant `Tag` creation
|
|||
|
|
|
|||
|
|
### 6. I/O and Caching
|
|||
|
|
- Replaced pure-Python msgpack with stdlib JSON for cache serialization
|
|||
|
|
- Increased HTTP connection pool and prefetch concurrency
|
|||
|
|
|
|||
|
|
### 7. Import Deferral (vendored Rich)
|
|||
|
|
- Deferred all Rich imports to first use
|
|||
|
|
- Stripped unused Rich modules from import chain
|
|||
|
|
- Deferred heavy imports in Rich `console.py` (pretty/pager/scope/screen/export)
|
|||
|
|
- Deferred Rich imports in `progress_bars.py` and `self_outdated_check.py`
|
|||
|
|
|
|||
|
|
### 8. Micro-optimizations
|
|||
|
|
- Bypassed `InstallationCandidate.__init__` with `__new__` + direct slot assignment
|
|||
|
|
- Removed redundant O(n) subset assertion in `BestCandidateResult`
|
|||
|
|
- Cached `Hashes.__hash__`, `Constraint.empty()` singleton, `Requirement.__str__`
|
|||
|
|
- Bypassed `email.parser` for metadata parsing
|
|||
|
|
|
|||
|
|
## Upstream Contributions
|
|||
|
|
|
|||
|
|
### Bug fixes (PRs to pypa/pip)
|
|||
|
|
|
|||
|
|
| PR | Status | Description |
|
|||
|
|
|---|---|---|
|
|||
|
|
| [pypa/pip#13900](https://github.com/pypa/pip/pull/13900) | Open | Fix `--report -` to use stdlib `json` instead of Rich for stdout output |
|
|||
|
|
| [pypa/pip#13902](https://github.com/pypa/pip/pull/13902) | Open | Fix `test_trailing_slash_directory_metadata` for Python 3.15 |
|
|||
|
|
|
|||
|
|
### Bug reports (issues on pypa/pip)
|
|||
|
|
|
|||
|
|
| Issue | Description |
|
|||
|
|
|---|---|
|
|||
|
|
| [pypa/pip#13898](https://github.com/pypa/pip/issues/13898) | `pip install --report -` outputs invalid JSON when not combined with `--quiet` |
|
|||
|
|
| [pypa/pip#13901](https://github.com/pypa/pip/issues/13901) | `test_trailing_slash_directory_metadata` fails on Python 3.15.0a8 |
|
|||
|
|
|
|||
|
|
### Rich upstream (separate case study)
|
|||
|
|
|
|||
|
|
| PR | Description |
|
|||
|
|
|---|---|
|
|||
|
|
| [Textualize/rich#4070](https://github.com/Textualize/rich/pull/4070) | Import deferral — 2x import speedup |
|
|||
|
|
| [KRRT7/rich#12](https://github.com/KRRT7/rich/pull/12) | Architectural wins (dataclass→__slots__, lazy emoji) |
|
|||
|
|
| [KRRT7/rich#13](https://github.com/KRRT7/rich/pull/13) | Import deferral + runtime micro-opts |
|
|||
|
|
|
|||
|
|
See [rich_org](https://github.com/KRRT7/rich_org) for the full Rich case study.
|
|||
|
|
|
|||
|
|
## Methodology
|
|||
|
|
|
|||
|
|
### Profiling approach
|
|||
|
|
|
|||
|
|
1. **`python -X importtime`** — Identified the heaviest imports in the startup chain
|
|||
|
|
2. **cProfile / py-spy** — Found hot functions in the resolver and packaging layers
|
|||
|
|
3. **Allocation counting** — Tracked object creation counts to find redundant work (e.g., 45,301 → 1,559 `Tag.__init__` calls with caching)
|
|||
|
|
4. **E2E hyperfine** — Validated every change with end-to-end benchmarks
|
|||
|
|
|
|||
|
|
### Environment
|
|||
|
|
|
|||
|
|
- **Local**: macOS arm64 (Apple Silicon), Python 3.15.0a7, ~27 packages installed
|
|||
|
|
- **CI validation**: Azure VM (Standard_D2s_v5, Ubuntu 24.04, Python 3.12), nox test sessions
|
|||
|
|
- **Test suite**: 1,690 unit tests + 15 functional tests passing throughout
|
|||
|
|
|
|||
|
|
### Branch
|
|||
|
|
|
|||
|
|
All 122 optimization commits are on [`codeflash/optimize`](https://github.com/KRRT7/pip/tree/codeflash/optimize) in the KRRT7/pip fork.
|
|||
|
|
|
|||
|
|
## Repo Structure
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
.
|
|||
|
|
├── README.md # This file
|
|||
|
|
└── data/
|
|||
|
|
├── benchmarks.md # Full E2E benchmark results table
|
|||
|
|
├── results.tsv # Per-optimization tracking (target, speedup, status)
|
|||
|
|
├── benchmark-analysis.md # Detailed profiling analysis
|
|||
|
|
├── io-analysis.md # I/O and caching analysis
|
|||
|
|
├── coverage-analysis.md # Test coverage analysis
|
|||
|
|
├── learnings.md # Session learnings and patterns
|
|||
|
|
└── session-handoff.md # Optimization session state
|
|||
|
|
```
|