codeflash-agent/.codeflash/netflix/metaflow/status.md
Kevin Turcios 3b59d97647 squash
2026-04-13 14:12:17 -05:00

1.9 KiB

metaflow Status

Last updated: 2026-04-10

Current state

First PR open upstream. Waiting for maintainer feedback.

Target repo

~/Desktop/work/netflix_org/metaflow — fork remote: KRRT7/metaflow

VM

Azure Standard_D2s_v5, IP: 20.112.32.177, RG: metaflow-BENCH-RG (deallocated)

  • Python 3.12, pip editable install, lz4/xxhash/numpy/hyperfine installed
  • Baseline + realistic benchmarks complete in ~/results/

PRs

PR Branch Status Description
Netflix/metaflow#3090 perf/lz4-artifact-compression Open, waiting for review Replace gzip with lz4 in CAS — 7-18x on realistic data
KRRT7/metaflow#1 perf/lz4-artifact-compression Draft (mirror) Same, on fork

Key results (realistic artifacts)

Payload Pickled Size gzip total lz4 total Speedup
Small dict (config) 233B 0.341ms 0.218ms 1.6x
Metrics dict (feature stats) 52KB 2.278ms 0.327ms 7.0x
Numpy float64 (embeddings) 800KB 29.111ms 1.557ms 18.7x
Numpy float64 (model weights) 8MB 289.234ms 15.792ms 18.3x
Random bytes (opaque model) 5MB 118.315ms 9.646ms 12.3x

Open questions on PR

  • Hard vs soft dependency for lz4
  • Forward compat story (old metaflow can't read cas_version=2)
  • Benchmark scripts to be reverted before merge

Next steps (pending maintainer response)

  1. If approach accepted: make lz4 optional, revert benchmark scripts, address feedback
  2. If rejected on dependency grounds: explore zlib.compress directly (no new dep, smaller win)
  3. Open SHA1 discussion issue (data in data/sha1-proposal.md)
  4. Multicore polling improvement (low priority, marginal impact)

Blockers

Waiting on Netflix/metaflow#3090 review.