Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references
18 KiB
18 KiB
Optimization Session — apr05
Environment
- Python 3.15.0a7, macOS arm64
- Branch: codeflash/optimize (off main 8df7b668b)
- Tests: 1690/1690 unit tests passing, 40/40 resolver tests passing
- Lint: ruff E,F clean on all modified files
- Run tag: apr05
Baseline Profile (resolver unit tests, cProfile)
- Wall: ~5-13s (40 tests, high variance from subprocess calls)
- Project self-time: 0.155s
- Top targets by self-time:
- Tag.init: 25.1% (0.039s, 45,301 calls)
- cpython_tags: 8.6% (0.013s, 21,630 calls)
- compatible_tags: 5.8% (0.009s, 23,520 calls)
- _version_nodot: 4.9% (0.008s, 18,570 calls)
- find_legacy_editables: 3.6% (0.006s, 160 calls)
- canonicalize_name: 2.4% (0.004s, 3,244 calls)
- parse_name_and_version_from_info_directory: 2.1% (0.003s, 2,734 calls)
Baseline Profile (e2e single install, cProfile)
- Project self-time: 0.027s
- Tag.init: 3,014 calls, 5.9% of project CPU
- cpython_tags: 1,442 calls, 2.6%
- _version_nodot: 1,238 calls, 1.4%
Optimized Profile (resolver unit tests, cProfile)
- Project self-time: 0.261s (higher due to cProfile overhead on lru_cache wrappers)
- Tag.init: 1,559 calls (from 45,301 — 97% reduction)
- cpython_tags: 721 calls (from 21,630 — 97% reduction)
- Profile is flat — no function above 3.3% except find_legacy_editables at 25.1% (I/O bound)
Optimized Profile (e2e single install, cProfile)
- Project self-time: 0.089s (higher due to cProfile overhead on lru_cache wrappers)
- Tag.init: 1,505 calls (from 3,014 — 50% reduction)
- canonicalize_name: 92.4% cache hit rate (327 hits / 27 misses)
- parse_wheel_filename: 75% cache hit rate (3 hits / 1 miss)
Cache Hit Rates (real pip install --dry-run requests)
- get_supported: 50% (1 hit, 1 miss) — saves full 1500-tag generation
- canonicalize_name: 92.4% (327/354) — most impactful cache
- parse_wheel_filename: 75% (3/4)
- Version.parse: 20% (1/5) — higher for large dependency trees
Strategy
- Target 1: packaging.tags — global caching of tag lists via lru_cache
- Target 2: Distribution scanning — pathlib to os.scandir, dict cache for O(1) lookups
- Target 3: canonicalize_name — lru_cache with 92% hit rate
- Target 4: Version/wheel parsing — lru_cache, pre-compiled regex
- Target 5: _version_nodot — dict cache
Experiments
Experiment 1: lru_cache on get_supported()
- File: src/pip/_internal/utils/compatibility_tags.py
- Change: Split get_supported() into public wrapper + @functools.lru_cache(maxsize=32) cached impl
- Result: Tag.init calls 45,301 → 1,559 (97% reduction)
- Status: KEEP
Experiment 2: Pre-compiled regex for wheel name validation
- File: src/pip/_vendor/packaging/utils.py
- Change: Moved inline re.match() to module-level _wheel_name_regex
- Result: Eliminates re.compile per call (313 calls in resolver tests)
- Status: KEEP
Experiment 3: Pre-compiled regex for project name validation
- File: src/pip/_internal/metadata/base.py
- Change: Moved inline re.match() to module-level _VALID_PROJECT_NAME
- Result: Eliminates re.compile per call (~700 calls in iter_all_distributions)
- Status: KEEP
Experiment 4: pathlib to os.scandir + distribution dict cache
- File: src/pip/_internal/metadata/importlib/_envs.py
- Change: Replaced pathlib.Path.iterdir() with os.scandir(); added _distributions_cache dict for O(1) get_distribution lookups
- Result: get_distribution changes from O(n) linear scan to O(1) after first build
- Status: KEEP
Experiment 5: lru_cache on canonicalize_name, parse_wheel_filename, Version.parse
- Files: src/pip/_vendor/packaging/utils.py, src/pip/_vendor/packaging/version.py
- Change: Added @functools.lru_cache to canonicalize_name (maxsize=1024), parse_wheel_filename (maxsize=512), Version.parse (maxsize=1024)
- Result: canonicalize_name 92.4% hit rate; parse_wheel_filename 75% hit rate
- Status: KEEP
Experiment 6: _version_nodot dict cache
- File: src/pip/_vendor/packaging/tags.py
- Change: Added module-level dict cache for _version_nodot results
- Result: Avoids repeated "".join(map(str, version)) calls
- Status: KEEP
E2E Profile (pip install --dry-run flask django boto3 requests)
Before this round (cached metadata, cProfile)
- Wall: ~5.9s
- Project self-time: ~1.9s
- Top targets:
- Version.init: 6.7%, 12,537 calls
- Link.from_json: 6.6%, 12,567 calls
- evaluate_link: 4.3%, 12,584 calls
- _clean_url_path: 3.0%, 12,567 calls
- _ensure_quoted_url: 2.8%, 12,567 calls
- splitext (link): 2.6%, 23,473 calls
- Version.str: 2.6%, 12,442 calls
- splitext (misc): 2.4%, 23,532 calls
- Link.filename: 2.6%, 12,257 calls
- Link.hash: 2.0%, 48,203 calls
After this round
- Wall: ~2.5s (median of 3 runs: 2.1s, 2.6s, 3.6s)
- Project self-time: ~0.83s (58% reduction)
- Top targets now:
- Version.init: 7.3%, 12,537 calls — fundamental, per-candidate
- Link.init: 6.7%, 12,618 calls — includes pre-computation
- evaluate_link: 5.0%, 12,584 calls — the evaluation algorithm
- from_json: 4.4%, 12,567 calls — JSON dict access
- _sort_key: 2.7% — sorting candidates
- filter (specifiers): 2.6% — version filtering
- _key (Version): 2.5%, 183K calls — comparison key (cached)
- Eliminated from hot path: _clean_url_path, _ensure_quoted_url, splitext, Link.filename, Link.hash, parse_iso_datetime
Experiments (this round)
Experiment 7: Link URL quoting integrated into init
- File: src/pip/_internal/models/link.py
- Change: Moved _ensure_quoted_url logic into Link.init, sharing the single urlsplit call. Added _PATH_ALREADY_QUOTED_RE fast path for HTTP/HTTPS URLs that skip _clean_url_path entirely (99% of package index URLs).
- Impact: Eliminated double urlsplit for every Link. _ensure_quoted_url (2.5%) and _clean_url_path (3.0%) gone from profile.
- Status: KEEP
Experiment 8: Link._splitext pre-computed in init
- File: src/pip/_internal/models/link.py
- Change: Pre-compute splitext result during Link construction. splitext() method and ext property return cached values.
- Impact: splitext (link) 2.6% + splitext (misc) 2.4% → eliminated from profile
- Status: KEEP
Experiment 9: Link._filename and Link._hash pre-computed
- File: src/pip/_internal/models/link.py
- Change: Pre-compute filename (posixpath.basename) and hash(url) during construction.
- Impact: Link.filename (2.6%) + Link.hash (2.0%) → eliminated from profile
- Status: KEEP
Experiment 10: Version.str caching
- File: src/pip/_vendor/packaging/version.py
- Change: Added _str_cache slot, cache string representation on first str call. Also fixed _TrimmedRelease to initialize the cache.
- Impact: Version.str 2.6% → 2.0% (35% faster per call, cached for repeated access)
- Status: KEEP
Experiment 11: Lazy upload_time parsing
- File: src/pip/_internal/models/link.py
- Change: Store raw ISO string in from_json, defer parse_iso_datetime to first access of upload_time property. Only parsed when --uploaded-prior-to is used.
- Impact: parse_iso_datetime (1.4%, 12,568 calls) → eliminated from hot path
- Status: KEEP
Experiment 12: parse_wheel_filename cache size 512 → 4096
- File: src/pip/_vendor/packaging/utils.py
- Change: Increased lru_cache maxsize from 512 to 4096 to handle large resolutions (10,980 unique filenames observed in multi-package installs).
- Impact: Better cache hit rate for large dependency trees
- Status: KEEP
Round 3: Algorithmic changes to _evaluate_json_page
Experiment 13-16: Tag-first parsing, direct JSON filename, endswith checks
- Files: src/pip/_internal/index/package_finder.py, src/pip/_internal/models/target_python.py
- Changes:
- New
_evaluate_json_page()method: single-pass over raw JSON, checks extension via endswith, extracts wheel tags from filename end using rfind, checks tag compatibility via frozenset before name parsing - Direct use of PEP 691
filenamefield (avoids URL construction) - Version interning across platform wheels
- Tag tuples frozenset cached on TargetPython
- New
- Impact: _evaluate_json_page self-time reduced ~33%, from_json calls reduced from ~10,899 to ~200 per page (only surviving candidates)
- Status: KEEP (experiments 13-16 committed as 4 separate commits)
Experiment 17: Two-level platform pre-filter
- Tried adding platform-only pre-filter (1 rfind) before full 3-rfind extraction
- Results: Within noise margin (2-5%), code complexity not justified
- Status: DISCARD
Round 4: Resolver backtracking cache (_iter_found_candidates)
Experiment 18: Two-level cache on _iter_found_candidates
- File: src/pip/_internal/resolution/resolvelib/factory.py
- Problem: During fastapi[standard] resolution, _iter_found_candidates is called 134K+ times with only ~120 unique (name, specifier, hashes, extras) tuples. Each call redundantly: merges specifiers (set+update+frozenset), calls find_best_candidate (dict lookup), scans all_yanked, checks is_pinned, allocates functools.partial objects. Total: ~9.4s of 60s wall time.
- Change: Two-level cache:
- Level 1 (merge cache): Maps raw specifier inputs (constraint _specs + each ireq's _specs frozenset) to merged result (specifier, hashes, extras). Uses frozenset VALUES (not id()) for GC safety. Frozenset hashing is O(1) after first call. Eliminates specifier merge on 99.9% of calls.
- Level 2 (infos cache): Maps merged (name, specs, hashes, extras) to the list of (version, build_func) tuples from find_best_candidate. Eliminates find_best_candidate call, all_yanked scan, is_pinned check, and functools.partial allocation.
- Inlined _get_installed_candidate (runs fresh every call — depends on incompatible_ids which changes during backtracking).
- Correctness note: Initial attempt used id()-based L1 cache for speed. This caused InconsistentCandidate errors because Python reuses memory addresses for gc'd objects during resolver backtracking, producing stale cache hits. Fixed by using frozenset value-based keys.
- Impact:
- _iter_found_candidates: 9.4s → 1.85s (5x faster)
- fastapi[standard] resolution: 37.9s → 15.2s (2.49x faster)
- boto3: 0.65s → 0.33s (1.95x)
- django: 0.24s → 0.18s (1.35x)
- requests: 0.51s → 0.28s (1.84x)
- black: 0.33s → 0.30s (1.12x)
- Status: KEEP
Plateau Analysis (Updated Apr 2026)
- Resolver runs 22-48 rounds for typical workloads (flask+django+boto3+requests: 23 pushes, 0 backtracks; fastapi[standard]: 48 pushes, 0 backtracks). COW state snapshots would save negligible time at these scales.
- _evaluate_json_page: 55.7% of project self-time (0.84s out of 1.5s), but per-entry cost is 4.2us dominated by dict.get (65K calls), Link construction (14 attr assignments), and version interning. No single operation dominates. py3-none-any fast path tested and discarded (<2% improvement within noise).
- install_req_from_line: 21 calls at 0.1ms each = 2.2ms total. Not worth bypassing the serialize-reparse pattern at this call count.
- IteratorMapping: 4-6 objects per round x 48 rounds = ~240 allocations. Each is 3 attribute assignments. Total cost negligible.
- Wall time dominated by HTTP I/O: 41 requests account for ~70% of wall time. Network latency and TLS handshakes are irreducible.
- Profile is genuinely flat: after _evaluate_json_page (55.7%), the next project function is _iter_found_candidates at 5.2%, then TLS at ~12%. No single function has enough headroom for meaningful improvement.
- Further gains require: (a) moving hot Python loops to C, (b) protocol-level changes (e.g. server-side filtering), or (c) fundamentally different resolution strategies (e.g. SAT solver).
Pre-submit Review Findings
- CRITICAL (fixed):
get_applicable_candidates()sorting was removed in an earlier optimization, breaking the resolver's assumption that applicable_candidates are version-sorted. The resolver iteratesreversed(icans)expecting newest-first order. Fixed by sorting incompute_best_candidatewhile usingmax()for best-candidate tiebreaker stability. - F821 lint (fixed):
Versiontype annotation in_evaluate_json_pagereferenced undefined name. Changed to_BaseVersion. - Reviewed (safe):
InstallationCandidatefrozen removal — no code compares candidates by value. Identity-based assertions already updated. - Reviewed (safe):
_lazy_wheel_cache— bounded by dependency tree size (20-200 packages). - Reviewed (safe):
specifier._specsdirect access — vendored library under our control. - Reviewed (safe):
_prereleases = Nonein bulk merge — pip never sets non-None prereleases on SpecifierSet in the factory path.
Adversarial Review Findings
Round 1
- HIGH (fixed): Link.from_json query string stripping — signed URLs (?X-Amz-Signature=...) corrupted _path/_filename causing is_wheel=False. Fixed by finding earliest of ? or # to delimit path end.
- HIGH (fixed): _build_distribution_cache dict comprehension kept last-seen instead of first-seen for duplicate names. Fixed with setdefault.
- MEDIUM (safe): factory.py same-version installed candidate reuse. Investigated — FoundCandidates.iter filters by incompatible_ids, and the original code already skips remote candidates for installed versions via versions_found set. No behavior change.
Round 2
- HIGH (fixed): JSON sdist extensions — _evaluate_json_page only checked .tar.gz/.zip/.tar.bz2, missing .tgz/.tar/.tbz/.tar.xz etc. Fixed by adding all SUPPORTED_EXTENSIONS.
- HIGH (fixed): JSON wheel fast path accepted malformed wheel names. Fixed by validating via parse_wheel_filename() (lru_cached).
- MEDIUM (fixed): HTML pages lost _sort_links() dedup/precedence. Restored evaluate_links() call.
- MEDIUM (fixed): datetime.fromisoformat() fails on Python 3.9/3.10 with trailing 'Z'. Replaced with parse_iso_datetime().
Round 3
- HIGH (fixed): Link.from_json derived _filename from URL path, not JSON filename field. Fixed to prefer file_data["filename"].
- MEDIUM (fixed): _log_skipped_link had early return on non-DEBUG that prevented requires-python skip bookkeeping. Fixed to always record. JSON path also records skip reasons in dedicated set.
Round 4
- HIGH (fixed): Reverted factory.py installed-candidate reuse — conflated installed and index artifacts for the same version, blocking resolver backtracking.
- HIGH (fixed): Link.from_json crashes on authority-only URLs (no path). Changed url.index to url.find with fallback.
- MEDIUM (dismissed): Missing filename fallback in JSON path — PEP 691 requires filename field. Non-conformant indexes fall back to standard parse_links path.
- MEDIUM (fixed): _FastMetadata.get_payload() returned empty string, dropping long descriptions from metadata_dict/pip inspect. Now preserves body text.
Refreshed E2E Benchmarks (Apr 2026, Py 3.15.0a7)
All measured with hyperfine (5-10 runs, 2-3 warmup), HTTP cache warm.
| Benchmark | Main | Optimized | Speedup |
|---|---|---|---|
| pip --version | 138ms | 20ms | 7.0x |
| pip --help | 143ms | 121ms | 1.18x |
| pip list | 162ms | 146ms | 1.11x |
| pip freeze | 225ms | 211ms | 1.07x |
| pip show pip | 162ms | 148ms | 1.09x |
| pip check | 191ms | 174ms | 1.10x |
| requests | 589ms | 516ms | 1.14x |
| flask+django | 708ms | 599ms | 1.18x |
| flask+django+boto3+requests | 1493ms | 826ms | 1.81x |
| fastapi[standard] | 13325ms | 11664ms | 1.14x |
| -r requirements.txt (21 pkgs) | 1344ms | 740ms | 1.82x |
Notes:
- fastapi[standard] installs 42 packages including C extensions (uvloop, pydantic_core) that require sdist building. The 11.7s is dominated by build system overhead, not resolution.
- The complex resolution benchmark (flask+django+boto3+requests) shows the largest resolution-specific speedup (1.81x) because it exercises the largest JSON pages (botocore 4692 entries, boto3 4020 entries).
Files Modified
- src/pip/_internal/utils/compatibility_tags.py — lru_cache on get_supported
- src/pip/_vendor/packaging/utils.py — lru_cache on canonicalize_name/parse_wheel_filename (16384), pre-compiled regex
- src/pip/_vendor/packaging/version.py — lru_cache on parse(), str/hash caching
- src/pip/_vendor/packaging/tags.py — dict cache on _version_nodot
- src/pip/_internal/metadata/base.py — pre-compiled project name regex
- src/pip/_internal/metadata/importlib/_envs.py — os.scandir + distribution dict cache
- src/pip/_internal/models/link.py — direct JSON construction, lazy URL parsing, pre-computed splitext/filename/hash, lazy upload_time
- src/pip/_internal/index/package_finder.py — fused _evaluate_json_page, tag-first parsing, version interning, sorted applicable_candidates restoration
- src/pip/_internal/models/target_python.py — tag tuples frozenset, tag priority cache
- src/pip/_internal/resolution/resolvelib/factory.py — bulk specifier merge, hashes fast-path, two-level candidate infos cache
- src/pip/_internal/models/candidate.py — version pass-through, removed frozen dataclass overhead
- src/pip/_vendor/packaging/specifiers.py — canonical_spec cache, str/hash caching, eq fast-path
- src/pip/_vendor/packaging/requirements.py — str caching
- src/pip/_vendor/packaging/markers.py — default_environment caching
- src/pip/_vendor/requests/utils.py — proxy detection memoization
- src/pip/_vendor/resolvelib/resolvers/resolution.py — hoisted method/attrgetter constants
- src/pip/_vendor/resolvelib/structs.py — guard for empty appends
- src/pip/_internal/utils/hashes.py — hash caching, supported_hashes fast-path
- src/pip/_internal/resolution/resolvelib/base.py — Constraint.empty() singleton
- src/pip/_internal/operations/prepare.py — lazy wheel metadata cache