mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

History

Kevin Turcios cc29a27289 Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 ) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references		2026-04-14 23:04:34 -05:00
..
bench	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
data	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
infra	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
README.md	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
status.md	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00

README.md

metaflow Performance Optimization

Upstream performance improvements to Netflix/metaflow, a human-centric framework for data science and ML workflows.

Background

Metaflow is Netflix's open-source Python framework for building and managing real-world data science projects. It handles workflow orchestration, versioning, and execution across local, cloud, and Kubernetes environments.

Profiling reveals two main optimization surfaces:

Import time (~513ms): Heavy optional dependencies (requests, kubernetes, asyncio, yaml) loaded eagerly even when not needed. Plugin resolution alone accounts for 65% of import time.
Runtime hot paths: Double gzip compression on every artifact, SHA1 hashing where faster non-crypto hashes suffice, sleep-based polling in multiprocessing utilities.

Optimization Targets

Import Time (Phase 1 — ~200ms savings estimated)

Target	Current	Savings	Approach
Defer `requests` in metadata providers	128ms	~108ms	Lazy import inside ServiceMetadataProvider
Lazy-load Kubernetes clients	50ms	~48ms	Conditional import when K8s decorator used
Defer `asyncio` in subprocess_manager	91ms	~41ms	Import inside async functions only
Defer YAML/cards infrastructure	52ms	~37ms	Move YAML import to card render time

Runtime (Phase 2)

Target	File	Approach
Double gzip compression	`content_addressed_store.py`	Single compression, tune level
SHA1 content hashing	`content_addressed_store.py`	Switch to xxHash/BLAKE3
Sleep-based polling	`multicore_utils.py`	Event-based waiting
Extension loading cache	`extension_support/__init__.py`	Mtime-based cache

Results

No optimizations applied yet.

Benchmark	Before	After	Speedup
`import metaflow`	513ms	—	—
`metaflow --version` CLI	~1.8s	—	—

PRs

None yet.

PR	Branch	Status	Description