codeflash-agent/.codeflash/krrt7/python/pip/data/session-handoff.md
Kevin Turcios cc29a27289
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15)
Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references
2026-04-14 23:04:34 -05:00

18 KiB

Optimization Session — apr05

Environment

  • Python 3.15.0a7, macOS arm64
  • Branch: codeflash/optimize (off main 8df7b668b)
  • Tests: 1690/1690 unit tests passing, 40/40 resolver tests passing
  • Lint: ruff E,F clean on all modified files
  • Run tag: apr05

Baseline Profile (resolver unit tests, cProfile)

  • Wall: ~5-13s (40 tests, high variance from subprocess calls)
  • Project self-time: 0.155s
  • Top targets by self-time:
    1. Tag.init: 25.1% (0.039s, 45,301 calls)
    2. cpython_tags: 8.6% (0.013s, 21,630 calls)
    3. compatible_tags: 5.8% (0.009s, 23,520 calls)
    4. _version_nodot: 4.9% (0.008s, 18,570 calls)
    5. find_legacy_editables: 3.6% (0.006s, 160 calls)
    6. canonicalize_name: 2.4% (0.004s, 3,244 calls)
    7. parse_name_and_version_from_info_directory: 2.1% (0.003s, 2,734 calls)

Baseline Profile (e2e single install, cProfile)

  • Project self-time: 0.027s
  • Tag.init: 3,014 calls, 5.9% of project CPU
  • cpython_tags: 1,442 calls, 2.6%
  • _version_nodot: 1,238 calls, 1.4%

Optimized Profile (resolver unit tests, cProfile)

  • Project self-time: 0.261s (higher due to cProfile overhead on lru_cache wrappers)
  • Tag.init: 1,559 calls (from 45,301 — 97% reduction)
  • cpython_tags: 721 calls (from 21,630 — 97% reduction)
  • Profile is flat — no function above 3.3% except find_legacy_editables at 25.1% (I/O bound)

Optimized Profile (e2e single install, cProfile)

  • Project self-time: 0.089s (higher due to cProfile overhead on lru_cache wrappers)
  • Tag.init: 1,505 calls (from 3,014 — 50% reduction)
  • canonicalize_name: 92.4% cache hit rate (327 hits / 27 misses)
  • parse_wheel_filename: 75% cache hit rate (3 hits / 1 miss)

Cache Hit Rates (real pip install --dry-run requests)

  • get_supported: 50% (1 hit, 1 miss) — saves full 1500-tag generation
  • canonicalize_name: 92.4% (327/354) — most impactful cache
  • parse_wheel_filename: 75% (3/4)
  • Version.parse: 20% (1/5) — higher for large dependency trees

Strategy

  • Target 1: packaging.tags — global caching of tag lists via lru_cache
  • Target 2: Distribution scanning — pathlib to os.scandir, dict cache for O(1) lookups
  • Target 3: canonicalize_name — lru_cache with 92% hit rate
  • Target 4: Version/wheel parsing — lru_cache, pre-compiled regex
  • Target 5: _version_nodot — dict cache

Experiments

Experiment 1: lru_cache on get_supported()

  • File: src/pip/_internal/utils/compatibility_tags.py
  • Change: Split get_supported() into public wrapper + @functools.lru_cache(maxsize=32) cached impl
  • Result: Tag.init calls 45,301 → 1,559 (97% reduction)
  • Status: KEEP

Experiment 2: Pre-compiled regex for wheel name validation

  • File: src/pip/_vendor/packaging/utils.py
  • Change: Moved inline re.match() to module-level _wheel_name_regex
  • Result: Eliminates re.compile per call (313 calls in resolver tests)
  • Status: KEEP

Experiment 3: Pre-compiled regex for project name validation

  • File: src/pip/_internal/metadata/base.py
  • Change: Moved inline re.match() to module-level _VALID_PROJECT_NAME
  • Result: Eliminates re.compile per call (~700 calls in iter_all_distributions)
  • Status: KEEP

Experiment 4: pathlib to os.scandir + distribution dict cache

  • File: src/pip/_internal/metadata/importlib/_envs.py
  • Change: Replaced pathlib.Path.iterdir() with os.scandir(); added _distributions_cache dict for O(1) get_distribution lookups
  • Result: get_distribution changes from O(n) linear scan to O(1) after first build
  • Status: KEEP

Experiment 5: lru_cache on canonicalize_name, parse_wheel_filename, Version.parse

  • Files: src/pip/_vendor/packaging/utils.py, src/pip/_vendor/packaging/version.py
  • Change: Added @functools.lru_cache to canonicalize_name (maxsize=1024), parse_wheel_filename (maxsize=512), Version.parse (maxsize=1024)
  • Result: canonicalize_name 92.4% hit rate; parse_wheel_filename 75% hit rate
  • Status: KEEP

Experiment 6: _version_nodot dict cache

  • File: src/pip/_vendor/packaging/tags.py
  • Change: Added module-level dict cache for _version_nodot results
  • Result: Avoids repeated "".join(map(str, version)) calls
  • Status: KEEP

E2E Profile (pip install --dry-run flask django boto3 requests)

Before this round (cached metadata, cProfile)

  • Wall: ~5.9s
  • Project self-time: ~1.9s
  • Top targets:
    1. Version.init: 6.7%, 12,537 calls
    2. Link.from_json: 6.6%, 12,567 calls
    3. evaluate_link: 4.3%, 12,584 calls
    4. _clean_url_path: 3.0%, 12,567 calls
    5. _ensure_quoted_url: 2.8%, 12,567 calls
    6. splitext (link): 2.6%, 23,473 calls
    7. Version.str: 2.6%, 12,442 calls
    8. splitext (misc): 2.4%, 23,532 calls
    9. Link.filename: 2.6%, 12,257 calls
    10. Link.hash: 2.0%, 48,203 calls

After this round

  • Wall: ~2.5s (median of 3 runs: 2.1s, 2.6s, 3.6s)
  • Project self-time: ~0.83s (58% reduction)
  • Top targets now:
    1. Version.init: 7.3%, 12,537 calls — fundamental, per-candidate
    2. Link.init: 6.7%, 12,618 calls — includes pre-computation
    3. evaluate_link: 5.0%, 12,584 calls — the evaluation algorithm
    4. from_json: 4.4%, 12,567 calls — JSON dict access
    5. _sort_key: 2.7% — sorting candidates
    6. filter (specifiers): 2.6% — version filtering
    7. _key (Version): 2.5%, 183K calls — comparison key (cached)
  • Eliminated from hot path: _clean_url_path, _ensure_quoted_url, splitext, Link.filename, Link.hash, parse_iso_datetime

Experiments (this round)

  • File: src/pip/_internal/models/link.py
  • Change: Moved _ensure_quoted_url logic into Link.init, sharing the single urlsplit call. Added _PATH_ALREADY_QUOTED_RE fast path for HTTP/HTTPS URLs that skip _clean_url_path entirely (99% of package index URLs).
  • Impact: Eliminated double urlsplit for every Link. _ensure_quoted_url (2.5%) and _clean_url_path (3.0%) gone from profile.
  • Status: KEEP
  • File: src/pip/_internal/models/link.py
  • Change: Pre-compute splitext result during Link construction. splitext() method and ext property return cached values.
  • Impact: splitext (link) 2.6% + splitext (misc) 2.4% → eliminated from profile
  • Status: KEEP
  • File: src/pip/_internal/models/link.py
  • Change: Pre-compute filename (posixpath.basename) and hash(url) during construction.
  • Impact: Link.filename (2.6%) + Link.hash (2.0%) → eliminated from profile
  • Status: KEEP

Experiment 10: Version.str caching

  • File: src/pip/_vendor/packaging/version.py
  • Change: Added _str_cache slot, cache string representation on first str call. Also fixed _TrimmedRelease to initialize the cache.
  • Impact: Version.str 2.6% → 2.0% (35% faster per call, cached for repeated access)
  • Status: KEEP

Experiment 11: Lazy upload_time parsing

  • File: src/pip/_internal/models/link.py
  • Change: Store raw ISO string in from_json, defer parse_iso_datetime to first access of upload_time property. Only parsed when --uploaded-prior-to is used.
  • Impact: parse_iso_datetime (1.4%, 12,568 calls) → eliminated from hot path
  • Status: KEEP

Experiment 12: parse_wheel_filename cache size 512 → 4096

  • File: src/pip/_vendor/packaging/utils.py
  • Change: Increased lru_cache maxsize from 512 to 4096 to handle large resolutions (10,980 unique filenames observed in multi-package installs).
  • Impact: Better cache hit rate for large dependency trees
  • Status: KEEP

Round 3: Algorithmic changes to _evaluate_json_page

Experiment 13-16: Tag-first parsing, direct JSON filename, endswith checks

  • Files: src/pip/_internal/index/package_finder.py, src/pip/_internal/models/target_python.py
  • Changes:
    • New _evaluate_json_page() method: single-pass over raw JSON, checks extension via endswith, extracts wheel tags from filename end using rfind, checks tag compatibility via frozenset before name parsing
    • Direct use of PEP 691 filename field (avoids URL construction)
    • Version interning across platform wheels
    • Tag tuples frozenset cached on TargetPython
  • Impact: _evaluate_json_page self-time reduced ~33%, from_json calls reduced from ~10,899 to ~200 per page (only surviving candidates)
  • Status: KEEP (experiments 13-16 committed as 4 separate commits)

Experiment 17: Two-level platform pre-filter

  • Tried adding platform-only pre-filter (1 rfind) before full 3-rfind extraction
  • Results: Within noise margin (2-5%), code complexity not justified
  • Status: DISCARD

Round 4: Resolver backtracking cache (_iter_found_candidates)

Experiment 18: Two-level cache on _iter_found_candidates

  • File: src/pip/_internal/resolution/resolvelib/factory.py
  • Problem: During fastapi[standard] resolution, _iter_found_candidates is called 134K+ times with only ~120 unique (name, specifier, hashes, extras) tuples. Each call redundantly: merges specifiers (set+update+frozenset), calls find_best_candidate (dict lookup), scans all_yanked, checks is_pinned, allocates functools.partial objects. Total: ~9.4s of 60s wall time.
  • Change: Two-level cache:
    • Level 1 (merge cache): Maps raw specifier inputs (constraint _specs + each ireq's _specs frozenset) to merged result (specifier, hashes, extras). Uses frozenset VALUES (not id()) for GC safety. Frozenset hashing is O(1) after first call. Eliminates specifier merge on 99.9% of calls.
    • Level 2 (infos cache): Maps merged (name, specs, hashes, extras) to the list of (version, build_func) tuples from find_best_candidate. Eliminates find_best_candidate call, all_yanked scan, is_pinned check, and functools.partial allocation.
    • Inlined _get_installed_candidate (runs fresh every call — depends on incompatible_ids which changes during backtracking).
  • Correctness note: Initial attempt used id()-based L1 cache for speed. This caused InconsistentCandidate errors because Python reuses memory addresses for gc'd objects during resolver backtracking, producing stale cache hits. Fixed by using frozenset value-based keys.
  • Impact:
    • _iter_found_candidates: 9.4s → 1.85s (5x faster)
    • fastapi[standard] resolution: 37.9s → 15.2s (2.49x faster)
    • boto3: 0.65s → 0.33s (1.95x)
    • django: 0.24s → 0.18s (1.35x)
    • requests: 0.51s → 0.28s (1.84x)
    • black: 0.33s → 0.30s (1.12x)
  • Status: KEEP

Plateau Analysis (Updated Apr 2026)

  • Resolver runs 22-48 rounds for typical workloads (flask+django+boto3+requests: 23 pushes, 0 backtracks; fastapi[standard]: 48 pushes, 0 backtracks). COW state snapshots would save negligible time at these scales.
  • _evaluate_json_page: 55.7% of project self-time (0.84s out of 1.5s), but per-entry cost is 4.2us dominated by dict.get (65K calls), Link construction (14 attr assignments), and version interning. No single operation dominates. py3-none-any fast path tested and discarded (<2% improvement within noise).
  • install_req_from_line: 21 calls at 0.1ms each = 2.2ms total. Not worth bypassing the serialize-reparse pattern at this call count.
  • IteratorMapping: 4-6 objects per round x 48 rounds = ~240 allocations. Each is 3 attribute assignments. Total cost negligible.
  • Wall time dominated by HTTP I/O: 41 requests account for ~70% of wall time. Network latency and TLS handshakes are irreducible.
  • Profile is genuinely flat: after _evaluate_json_page (55.7%), the next project function is _iter_found_candidates at 5.2%, then TLS at ~12%. No single function has enough headroom for meaningful improvement.
  • Further gains require: (a) moving hot Python loops to C, (b) protocol-level changes (e.g. server-side filtering), or (c) fundamentally different resolution strategies (e.g. SAT solver).

Pre-submit Review Findings

  1. CRITICAL (fixed): get_applicable_candidates() sorting was removed in an earlier optimization, breaking the resolver's assumption that applicable_candidates are version-sorted. The resolver iterates reversed(icans) expecting newest-first order. Fixed by sorting in compute_best_candidate while using max() for best-candidate tiebreaker stability.
  2. F821 lint (fixed): Version type annotation in _evaluate_json_page referenced undefined name. Changed to _BaseVersion.
  3. Reviewed (safe): InstallationCandidate frozen removal — no code compares candidates by value. Identity-based assertions already updated.
  4. Reviewed (safe): _lazy_wheel_cache — bounded by dependency tree size (20-200 packages).
  5. Reviewed (safe): specifier._specs direct access — vendored library under our control.
  6. Reviewed (safe): _prereleases = None in bulk merge — pip never sets non-None prereleases on SpecifierSet in the factory path.

Adversarial Review Findings

Round 1

  1. HIGH (fixed): Link.from_json query string stripping — signed URLs (?X-Amz-Signature=...) corrupted _path/_filename causing is_wheel=False. Fixed by finding earliest of ? or # to delimit path end.
  2. HIGH (fixed): _build_distribution_cache dict comprehension kept last-seen instead of first-seen for duplicate names. Fixed with setdefault.
  3. MEDIUM (safe): factory.py same-version installed candidate reuse. Investigated — FoundCandidates.iter filters by incompatible_ids, and the original code already skips remote candidates for installed versions via versions_found set. No behavior change.

Round 2

  1. HIGH (fixed): JSON sdist extensions — _evaluate_json_page only checked .tar.gz/.zip/.tar.bz2, missing .tgz/.tar/.tbz/.tar.xz etc. Fixed by adding all SUPPORTED_EXTENSIONS.
  2. HIGH (fixed): JSON wheel fast path accepted malformed wheel names. Fixed by validating via parse_wheel_filename() (lru_cached).
  3. MEDIUM (fixed): HTML pages lost _sort_links() dedup/precedence. Restored evaluate_links() call.
  4. MEDIUM (fixed): datetime.fromisoformat() fails on Python 3.9/3.10 with trailing 'Z'. Replaced with parse_iso_datetime().

Round 3

  1. HIGH (fixed): Link.from_json derived _filename from URL path, not JSON filename field. Fixed to prefer file_data["filename"].
  2. MEDIUM (fixed): _log_skipped_link had early return on non-DEBUG that prevented requires-python skip bookkeeping. Fixed to always record. JSON path also records skip reasons in dedicated set.

Round 4

  1. HIGH (fixed): Reverted factory.py installed-candidate reuse — conflated installed and index artifacts for the same version, blocking resolver backtracking.
  2. HIGH (fixed): Link.from_json crashes on authority-only URLs (no path). Changed url.index to url.find with fallback.
  3. MEDIUM (dismissed): Missing filename fallback in JSON path — PEP 691 requires filename field. Non-conformant indexes fall back to standard parse_links path.
  4. MEDIUM (fixed): _FastMetadata.get_payload() returned empty string, dropping long descriptions from metadata_dict/pip inspect. Now preserves body text.

Refreshed E2E Benchmarks (Apr 2026, Py 3.15.0a7)

All measured with hyperfine (5-10 runs, 2-3 warmup), HTTP cache warm.

Benchmark Main Optimized Speedup
pip --version 138ms 20ms 7.0x
pip --help 143ms 121ms 1.18x
pip list 162ms 146ms 1.11x
pip freeze 225ms 211ms 1.07x
pip show pip 162ms 148ms 1.09x
pip check 191ms 174ms 1.10x
requests 589ms 516ms 1.14x
flask+django 708ms 599ms 1.18x
flask+django+boto3+requests 1493ms 826ms 1.81x
fastapi[standard] 13325ms 11664ms 1.14x
-r requirements.txt (21 pkgs) 1344ms 740ms 1.82x

Notes:

  • fastapi[standard] installs 42 packages including C extensions (uvloop, pydantic_core) that require sdist building. The 11.7s is dominated by build system overhead, not resolution.
  • The complex resolution benchmark (flask+django+boto3+requests) shows the largest resolution-specific speedup (1.81x) because it exercises the largest JSON pages (botocore 4692 entries, boto3 4020 entries).

Files Modified

  1. src/pip/_internal/utils/compatibility_tags.py — lru_cache on get_supported
  2. src/pip/_vendor/packaging/utils.py — lru_cache on canonicalize_name/parse_wheel_filename (16384), pre-compiled regex
  3. src/pip/_vendor/packaging/version.py — lru_cache on parse(), str/hash caching
  4. src/pip/_vendor/packaging/tags.py — dict cache on _version_nodot
  5. src/pip/_internal/metadata/base.py — pre-compiled project name regex
  6. src/pip/_internal/metadata/importlib/_envs.py — os.scandir + distribution dict cache
  7. src/pip/_internal/models/link.py — direct JSON construction, lazy URL parsing, pre-computed splitext/filename/hash, lazy upload_time
  8. src/pip/_internal/index/package_finder.py — fused _evaluate_json_page, tag-first parsing, version interning, sorted applicable_candidates restoration
  9. src/pip/_internal/models/target_python.py — tag tuples frozenset, tag priority cache
  10. src/pip/_internal/resolution/resolvelib/factory.py — bulk specifier merge, hashes fast-path, two-level candidate infos cache
  11. src/pip/_internal/models/candidate.py — version pass-through, removed frozen dataclass overhead
  12. src/pip/_vendor/packaging/specifiers.py — canonical_spec cache, str/hash caching, eq fast-path
  13. src/pip/_vendor/packaging/requirements.py — str caching
  14. src/pip/_vendor/packaging/markers.py — default_environment caching
  15. src/pip/_vendor/requests/utils.py — proxy detection memoization
  16. src/pip/_vendor/resolvelib/resolvers/resolution.py — hoisted method/attrgetter constants
  17. src/pip/_vendor/resolvelib/structs.py — guard for empty appends
  18. src/pip/_internal/utils/hashes.py — hash caching, supported_hashes fast-path
  19. src/pip/_internal/resolution/resolvelib/base.py — Constraint.empty() singleton
  20. src/pip/_internal/operations/prepare.py — lazy wheel metadata cache