Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )

Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references

2026-04-14 23:04:34 -05:00

18 KiB

Raw Blame History

Optimization Session — apr05

Environment

Python 3.15.0a7, macOS arm64
Branch: codeflash/optimize (off main 8df7b668b)
Tests: 1690/1690 unit tests passing, 40/40 resolver tests passing
Lint: ruff E,F clean on all modified files
Run tag: apr05

Baseline Profile (resolver unit tests, cProfile)

Wall: ~5-13s (40 tests, high variance from subprocess calls)
Project self-time: 0.155s
Top targets by self-time:
1. Tag.init: 25.1% (0.039s, 45,301 calls)
2. cpython_tags: 8.6% (0.013s, 21,630 calls)
3. compatible_tags: 5.8% (0.009s, 23,520 calls)
4. _version_nodot: 4.9% (0.008s, 18,570 calls)
5. find_legacy_editables: 3.6% (0.006s, 160 calls)
6. canonicalize_name: 2.4% (0.004s, 3,244 calls)
7. parse_name_and_version_from_info_directory: 2.1% (0.003s, 2,734 calls)

Baseline Profile (e2e single install, cProfile)

Project self-time: 0.027s
Tag.init: 3,014 calls, 5.9% of project CPU
cpython_tags: 1,442 calls, 2.6%
_version_nodot: 1,238 calls, 1.4%

Optimized Profile (resolver unit tests, cProfile)

Project self-time: 0.261s (higher due to cProfile overhead on lru_cache wrappers)
Tag.init: 1,559 calls (from 45,301 — 97% reduction)
cpython_tags: 721 calls (from 21,630 — 97% reduction)
Profile is flat — no function above 3.3% except find_legacy_editables at 25.1% (I/O bound)

Optimized Profile (e2e single install, cProfile)

Project self-time: 0.089s (higher due to cProfile overhead on lru_cache wrappers)
Tag.init: 1,505 calls (from 3,014 — 50% reduction)
canonicalize_name: 92.4% cache hit rate (327 hits / 27 misses)
parse_wheel_filename: 75% cache hit rate (3 hits / 1 miss)

Cache Hit Rates (real pip install --dry-run requests)

get_supported: 50% (1 hit, 1 miss) — saves full 1500-tag generation
canonicalize_name: 92.4% (327/354) — most impactful cache
parse_wheel_filename: 75% (3/4)
Version.parse: 20% (1/5) — higher for large dependency trees

Strategy

Target 1: packaging.tags — global caching of tag lists via lru_cache
Target 2: Distribution scanning — pathlib to os.scandir, dict cache for O(1) lookups
Target 3: canonicalize_name — lru_cache with 92% hit rate
Target 4: Version/wheel parsing — lru_cache, pre-compiled regex
Target 5: _version_nodot — dict cache

Experiments

Experiment 1: lru_cache on get_supported()

File: src/pip/_internal/utils/compatibility_tags.py
Change: Split get_supported() into public wrapper + @functools.lru_cache(maxsize=32) cached impl
Result: Tag.init calls 45,301 → 1,559 (97% reduction)
Status: KEEP

Experiment 2: Pre-compiled regex for wheel name validation

File: src/pip/_vendor/packaging/utils.py
Change: Moved inline re.match() to module-level _wheel_name_regex
Result: Eliminates re.compile per call (313 calls in resolver tests)
Status: KEEP

Experiment 3: Pre-compiled regex for project name validation

File: src/pip/_internal/metadata/base.py
Change: Moved inline re.match() to module-level _VALID_PROJECT_NAME
Result: Eliminates re.compile per call (~700 calls in iter_all_distributions)
Status: KEEP

Experiment 4: pathlib to os.scandir + distribution dict cache

File: src/pip/_internal/metadata/importlib/_envs.py
Change: Replaced pathlib.Path.iterdir() with os.scandir(); added _distributions_cache dict for O(1) get_distribution lookups
Result: get_distribution changes from O(n) linear scan to O(1) after first build
Status: KEEP

Experiment 5: lru_cache on canonicalize_name, parse_wheel_filename, Version.parse

Files: src/pip/_vendor/packaging/utils.py, src/pip/_vendor/packaging/version.py
Change: Added @functools.lru_cache to canonicalize_name (maxsize=1024), parse_wheel_filename (maxsize=512), Version.parse (maxsize=1024)
Result: canonicalize_name 92.4% hit rate; parse_wheel_filename 75% hit rate
Status: KEEP

Experiment 6: _version_nodot dict cache

File: src/pip/_vendor/packaging/tags.py
Change: Added module-level dict cache for _version_nodot results
Result: Avoids repeated "".join(map(str, version)) calls
Status: KEEP

E2E Profile (pip install --dry-run flask django boto3 requests)

Before this round (cached metadata, cProfile)

Wall: ~5.9s
Project self-time: ~1.9s
Top targets:
1. Version.init: 6.7%, 12,537 calls
2. Link.from_json: 6.6%, 12,567 calls
3. evaluate_link: 4.3%, 12,584 calls
4. _clean_url_path: 3.0%, 12,567 calls
5. _ensure_quoted_url: 2.8%, 12,567 calls
6. splitext (link): 2.6%, 23,473 calls
7. Version.str: 2.6%, 12,442 calls
8. splitext (misc): 2.4%, 23,532 calls
9. Link.filename: 2.6%, 12,257 calls
10. Link.hash: 2.0%, 48,203 calls

After this round

Wall: ~2.5s (median of 3 runs: 2.1s, 2.6s, 3.6s)
Project self-time: ~0.83s (58% reduction)
Top targets now:
1. Version.init: 7.3%, 12,537 calls — fundamental, per-candidate
2. Link.init: 6.7%, 12,618 calls — includes pre-computation
3. evaluate_link: 5.0%, 12,584 calls — the evaluation algorithm
4. from_json: 4.4%, 12,567 calls — JSON dict access
5. _sort_key: 2.7% — sorting candidates
6. filter (specifiers): 2.6% — version filtering
7. _key (Version): 2.5%, 183K calls — comparison key (cached)
Eliminated from hot path: _clean_url_path, _ensure_quoted_url, splitext, Link.filename, Link.hash, parse_iso_datetime

Experiments (this round)

Experiment 7: Link URL quoting integrated into init

File: src/pip/_internal/models/link.py
Change: Moved _ensure_quoted_url logic into Link.init, sharing the single urlsplit call. Added _PATH_ALREADY_QUOTED_RE fast path for HTTP/HTTPS URLs that skip _clean_url_path entirely (99% of package index URLs).
Impact: Eliminated double urlsplit for every Link. _ensure_quoted_url (2.5%) and _clean_url_path (3.0%) gone from profile.
Status: KEEP

Experiment 8: Link._splitext pre-computed in init

File: src/pip/_internal/models/link.py
Change: Pre-compute splitext result during Link construction. splitext() method and ext property return cached values.
Impact: splitext (link) 2.6% + splitext (misc) 2.4% → eliminated from profile
Status: KEEP

Experiment 9: Link._filename and Link._hash pre-computed

File: src/pip/_internal/models/link.py
Change: Pre-compute filename (posixpath.basename) and hash(url) during construction.
Impact: Link.filename (2.6%) + Link.hash (2.0%) → eliminated from profile
Status: KEEP

Experiment 10: Version.str caching

File: src/pip/_vendor/packaging/version.py
Change: Added _str_cache slot, cache string representation on first str call. Also fixed _TrimmedRelease to initialize the cache.
Impact: Version.str 2.6% → 2.0% (35% faster per call, cached for repeated access)
Status: KEEP

Experiment 11: Lazy upload_time parsing

File: src/pip/_internal/models/link.py
Change: Store raw ISO string in from_json, defer parse_iso_datetime to first access of upload_time property. Only parsed when --uploaded-prior-to is used.
Impact: parse_iso_datetime (1.4%, 12,568 calls) → eliminated from hot path
Status: KEEP

Experiment 12: parse_wheel_filename cache size 512 → 4096

File: src/pip/_vendor/packaging/utils.py
Change: Increased lru_cache maxsize from 512 to 4096 to handle large resolutions (10,980 unique filenames observed in multi-package installs).
Impact: Better cache hit rate for large dependency trees
Status: KEEP

Round 3: Algorithmic changes to _evaluate_json_page

Experiment 13-16: Tag-first parsing, direct JSON filename, endswith checks

Files: src/pip/_internal/index/package_finder.py, src/pip/_internal/models/target_python.py
Changes:
- New _evaluate_json_page() method: single-pass over raw JSON, checks extension via endswith, extracts wheel tags from filename end using rfind, checks tag compatibility via frozenset before name parsing
- Direct use of PEP 691 filename field (avoids URL construction)
- Version interning across platform wheels
- Tag tuples frozenset cached on TargetPython
Impact: _evaluate_json_page self-time reduced ~33%, from_json calls reduced from ~10,899 to ~200 per page (only surviving candidates)
Status: KEEP (experiments 13-16 committed as 4 separate commits)

Experiment 17: Two-level platform pre-filter

Tried adding platform-only pre-filter (1 rfind) before full 3-rfind extraction
Results: Within noise margin (2-5%), code complexity not justified
Status: DISCARD

Round 4: Resolver backtracking cache (_iter_found_candidates)

Experiment 18: Two-level cache on _iter_found_candidates

File: src/pip/_internal/resolution/resolvelib/factory.py
Problem: During fastapi[standard] resolution, _iter_found_candidates is called 134K+ times with only ~120 unique (name, specifier, hashes, extras) tuples. Each call redundantly: merges specifiers (set+update+frozenset), calls find_best_candidate (dict lookup), scans all_yanked, checks is_pinned, allocates functools.partial objects. Total: ~9.4s of 60s wall time.
Change: Two-level cache:
- Level 1 (merge cache): Maps raw specifier inputs (constraint _specs + each ireq's _specs frozenset) to merged result (specifier, hashes, extras). Uses frozenset VALUES (not id()) for GC safety. Frozenset hashing is O(1) after first call. Eliminates specifier merge on 99.9% of calls.
- Level 2 (infos cache): Maps merged (name, specs, hashes, extras) to the list of (version, build_func) tuples from find_best_candidate. Eliminates find_best_candidate call, all_yanked scan, is_pinned check, and functools.partial allocation.
- Inlined _get_installed_candidate (runs fresh every call — depends on incompatible_ids which changes during backtracking).
Correctness note: Initial attempt used id()-based L1 cache for speed. This caused InconsistentCandidate errors because Python reuses memory addresses for gc'd objects during resolver backtracking, producing stale cache hits. Fixed by using frozenset value-based keys.
Impact:
- _iter_found_candidates: 9.4s → 1.85s (5x faster)
- fastapi[standard] resolution: 37.9s → 15.2s (2.49x faster)
- boto3: 0.65s → 0.33s (1.95x)
- django: 0.24s → 0.18s (1.35x)
- requests: 0.51s → 0.28s (1.84x)
- black: 0.33s → 0.30s (1.12x)
Status: KEEP

Plateau Analysis (Updated Apr 2026)

Resolver runs 22-48 rounds for typical workloads (flask+django+boto3+requests: 23 pushes, 0 backtracks; fastapi[standard]: 48 pushes, 0 backtracks). COW state snapshots would save negligible time at these scales.
_evaluate_json_page: 55.7% of project self-time (0.84s out of 1.5s), but per-entry cost is 4.2us dominated by dict.get (65K calls), Link construction (14 attr assignments), and version interning. No single operation dominates. py3-none-any fast path tested and discarded (<2% improvement within noise).
install_req_from_line: 21 calls at 0.1ms each = 2.2ms total. Not worth bypassing the serialize-reparse pattern at this call count.
IteratorMapping: 4-6 objects per round x 48 rounds = ~240 allocations. Each is 3 attribute assignments. Total cost negligible.
Wall time dominated by HTTP I/O: 41 requests account for ~70% of wall time. Network latency and TLS handshakes are irreducible.
Profile is genuinely flat: after _evaluate_json_page (55.7%), the next project function is _iter_found_candidates at 5.2%, then TLS at ~12%. No single function has enough headroom for meaningful improvement.
Further gains require: (a) moving hot Python loops to C, (b) protocol-level changes (e.g. server-side filtering), or (c) fundamentally different resolution strategies (e.g. SAT solver).

Pre-submit Review Findings

CRITICAL (fixed): get_applicable_candidates() sorting was removed in an earlier optimization, breaking the resolver's assumption that applicable_candidates are version-sorted. The resolver iterates reversed(icans) expecting newest-first order. Fixed by sorting in compute_best_candidate while using max() for best-candidate tiebreaker stability.
F821 lint (fixed): Version type annotation in _evaluate_json_page referenced undefined name. Changed to _BaseVersion.
Reviewed (safe): InstallationCandidate frozen removal — no code compares candidates by value. Identity-based assertions already updated.
Reviewed (safe): _lazy_wheel_cache — bounded by dependency tree size (20-200 packages).
Reviewed (safe): specifier._specs direct access — vendored library under our control.
Reviewed (safe): _prereleases = None in bulk merge — pip never sets non-None prereleases on SpecifierSet in the factory path.

Adversarial Review Findings

Round 1

HIGH (fixed): Link.from_json query string stripping — signed URLs (?X-Amz-Signature=...) corrupted _path/_filename causing is_wheel=False. Fixed by finding earliest of ? or # to delimit path end.
HIGH (fixed): _build_distribution_cache dict comprehension kept last-seen instead of first-seen for duplicate names. Fixed with setdefault.
MEDIUM (safe): factory.py same-version installed candidate reuse. Investigated — FoundCandidates.iter filters by incompatible_ids, and the original code already skips remote candidates for installed versions via versions_found set. No behavior change.

Round 2

HIGH (fixed): JSON sdist extensions — _evaluate_json_page only checked .tar.gz/.zip/.tar.bz2, missing .tgz/.tar/.tbz/.tar.xz etc. Fixed by adding all SUPPORTED_EXTENSIONS.
HIGH (fixed): JSON wheel fast path accepted malformed wheel names. Fixed by validating via parse_wheel_filename() (lru_cached).
MEDIUM (fixed): HTML pages lost _sort_links() dedup/precedence. Restored evaluate_links() call.
MEDIUM (fixed): datetime.fromisoformat() fails on Python 3.9/3.10 with trailing 'Z'. Replaced with parse_iso_datetime().

Round 3

HIGH (fixed): Link.from_json derived _filename from URL path, not JSON filename field. Fixed to prefer file_data["filename"].
MEDIUM (fixed): _log_skipped_link had early return on non-DEBUG that prevented requires-python skip bookkeeping. Fixed to always record. JSON path also records skip reasons in dedicated set.

Round 4

HIGH (fixed): Reverted factory.py installed-candidate reuse — conflated installed and index artifacts for the same version, blocking resolver backtracking.
HIGH (fixed): Link.from_json crashes on authority-only URLs (no path). Changed url.index to url.find with fallback.
MEDIUM (dismissed): Missing filename fallback in JSON path — PEP 691 requires filename field. Non-conformant indexes fall back to standard parse_links path.
MEDIUM (fixed): _FastMetadata.get_payload() returned empty string, dropping long descriptions from metadata_dict/pip inspect. Now preserves body text.

Refreshed E2E Benchmarks (Apr 2026, Py 3.15.0a7)

All measured with hyperfine (5-10 runs, 2-3 warmup), HTTP cache warm.

Benchmark	Main	Optimized	Speedup
pip --version	138ms	20ms	7.0x
pip --help	143ms	121ms	1.18x
pip list	162ms	146ms	1.11x
pip freeze	225ms	211ms	1.07x
pip show pip	162ms	148ms	1.09x
pip check	191ms	174ms	1.10x
requests	589ms	516ms	1.14x
flask+django	708ms	599ms	1.18x
flask+django+boto3+requests	1493ms	826ms	1.81x
fastapi[standard]	13325ms	11664ms	1.14x
-r requirements.txt (21 pkgs)	1344ms	740ms	1.82x

Notes:

fastapi[standard] installs 42 packages including C extensions (uvloop, pydantic_core) that require sdist building. The 11.7s is dominated by build system overhead, not resolution.
The complex resolution benchmark (flask+django+boto3+requests) shows the largest resolution-specific speedup (1.81x) because it exercises the largest JSON pages (botocore 4692 entries, boto3 4020 entries).

Files Modified

src/pip/_internal/utils/compatibility_tags.py — lru_cache on get_supported
src/pip/_vendor/packaging/utils.py — lru_cache on canonicalize_name/parse_wheel_filename (16384), pre-compiled regex
src/pip/_vendor/packaging/version.py — lru_cache on parse(), str/hash caching
src/pip/_vendor/packaging/tags.py — dict cache on _version_nodot
src/pip/_internal/metadata/base.py — pre-compiled project name regex
src/pip/_internal/metadata/importlib/_envs.py — os.scandir + distribution dict cache
src/pip/_internal/models/link.py — direct JSON construction, lazy URL parsing, pre-computed splitext/filename/hash, lazy upload_time
src/pip/_internal/index/package_finder.py — fused _evaluate_json_page, tag-first parsing, version interning, sorted applicable_candidates restoration
src/pip/_internal/models/target_python.py — tag tuples frozenset, tag priority cache
src/pip/_internal/resolution/resolvelib/factory.py — bulk specifier merge, hashes fast-path, two-level candidate infos cache
src/pip/_internal/models/candidate.py — version pass-through, removed frozen dataclass overhead
src/pip/_vendor/packaging/specifiers.py — canonical_spec cache, str/hash caching, eq fast-path
src/pip/_vendor/packaging/requirements.py — str caching
src/pip/_vendor/packaging/markers.py — default_environment caching
src/pip/_vendor/requests/utils.py — proxy detection memoization
src/pip/_vendor/resolvelib/resolvers/resolution.py — hoisted method/attrgetter constants
src/pip/_vendor/resolvelib/structs.py — guard for empty appends
src/pip/_internal/utils/hashes.py — hash caching, supported_hashes fast-path
src/pip/_internal/resolution/resolvelib/base.py — Constraint.empty() singleton
src/pip/_internal/operations/prepare.py — lazy wheel metadata cache

18 KiB Raw Blame History

Optimization Session — apr05

Environment

Baseline Profile (resolver unit tests, cProfile)

Baseline Profile (e2e single install, cProfile)

Optimized Profile (resolver unit tests, cProfile)

Optimized Profile (e2e single install, cProfile)

Cache Hit Rates (real pip install --dry-run requests)

Strategy

Experiments

Experiment 1: lru_cache on get_supported()

Experiment 2: Pre-compiled regex for wheel name validation

Experiment 3: Pre-compiled regex for project name validation

Experiment 4: pathlib to os.scandir + distribution dict cache

Experiment 5: lru_cache on canonicalize_name, parse_wheel_filename, Version.parse

Experiment 6: _version_nodot dict cache

E2E Profile (pip install --dry-run flask django boto3 requests)

Before this round (cached metadata, cProfile)

After this round

Experiments (this round)

Experiment 7: Link URL quoting integrated into init

Experiment 8: Link._splitext pre-computed in init

Experiment 9: Link._filename and Link._hash pre-computed

Experiment 10: Version.str caching

Experiment 11: Lazy upload_time parsing

Experiment 12: parse_wheel_filename cache size 512 → 4096

Round 3: Algorithmic changes to _evaluate_json_page

Experiment 13-16: Tag-first parsing, direct JSON filename, endswith checks

Experiment 17: Two-level platform pre-filter

Round 4: Resolver backtracking cache (_iter_found_candidates)

Experiment 18: Two-level cache on _iter_found_candidates

Plateau Analysis (Updated Apr 2026)

Pre-submit Review Findings

Adversarial Review Findings

Round 1

Round 2

Round 3

Round 4

Refreshed E2E Benchmarks (Apr 2026, Py 3.15.0a7)

Files Modified

18 KiB

Raw Blame History