codeflash-agent/.codeflash/krrt7/netflix/metaflow
Kevin Turcios cc29a27289
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15)
Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references
2026-04-14 23:04:34 -05:00
..
bench Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) 2026-04-14 23:04:34 -05:00
data Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) 2026-04-14 23:04:34 -05:00
infra Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) 2026-04-14 23:04:34 -05:00
README.md Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) 2026-04-14 23:04:34 -05:00
status.md Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) 2026-04-14 23:04:34 -05:00

metaflow Performance Optimization

Upstream performance improvements to Netflix/metaflow, a human-centric framework for data science and ML workflows.

Background

Metaflow is Netflix's open-source Python framework for building and managing real-world data science projects. It handles workflow orchestration, versioning, and execution across local, cloud, and Kubernetes environments.

Profiling reveals two main optimization surfaces:

  1. Import time (~513ms): Heavy optional dependencies (requests, kubernetes, asyncio, yaml) loaded eagerly even when not needed. Plugin resolution alone accounts for 65% of import time.
  2. Runtime hot paths: Double gzip compression on every artifact, SHA1 hashing where faster non-crypto hashes suffice, sleep-based polling in multiprocessing utilities.

Optimization Targets

Import Time (Phase 1 — ~200ms savings estimated)

Target Current Savings Approach
Defer requests in metadata providers 128ms ~108ms Lazy import inside ServiceMetadataProvider
Lazy-load Kubernetes clients 50ms ~48ms Conditional import when K8s decorator used
Defer asyncio in subprocess_manager 91ms ~41ms Import inside async functions only
Defer YAML/cards infrastructure 52ms ~37ms Move YAML import to card render time

Runtime (Phase 2)

Target File Approach
Double gzip compression content_addressed_store.py Single compression, tune level
SHA1 content hashing content_addressed_store.py Switch to xxHash/BLAKE3
Sleep-based polling multicore_utils.py Event-based waiting
Extension loading cache extension_support/__init__.py Mtime-based cache

Results

No optimizations applied yet.

Benchmark Before After Speedup
import metaflow 513ms
metaflow --version CLI ~1.8s

PRs

None yet.

PR Branch Status Description