In 7 weeks, codeflash-agent cut the production bill from ~$10K/month to ~$1.1K/month — with zero regressions across 354 tests.
Unstructured processes billions of documents per month, turning raw PDFs, images, and office files into structured elements for LLM and RAG pipelines. The product ran on Standard_D48s_v5 nodes, RAM-bound at 5 pods per node. Engineers knew performance could improve — they didn't have the bandwidth. Functionality ranked higher. Occasional OOMs pushed defensive over-allocation, which pushed the bill up.
Humans keep missing these because they're layered under each other. Fix one, the next one surfaces.
os.cpu_count() was lying.Returned 48 on a 1-CPU pod. OCR spawned 4 workers each loading a full ONNX model — 4× memory, zero parallelism. Agent replaced with container-aware CPU detection. Worker pool collapsed to one.
Hidden in baseline noise, in PIL preprocessing. Reordered allocation, dropped to 17 MB.
In _patch_current_chars_with_render_mode. Rewritten as a single-pass. 14× speedup on 5000-char pages.
In the OCR pipeline. Switched to BMP format. Removed CPU work that wasn't doing anything.
"We knew we could do better, but we didn't have the bandwidth."
"Operationally, the cost-performance profile of our running instances is in a different place. We're seeing fewer tail events where you might hit an OOM, and faster scale-out."
"A PR comes in, we measure RSS before and after on a running system, and the improvement is 2× or 3×. Real demonstrable progress, not theoretical."
Cut pod cost without touching product surface area. Reproduced baselines on the customer's own VM.
Agent ran autonomously in sandbox for 7 weeks.
Codeflash engineers reviewed every candidate. 24 accepted, shipped as PRs with benchmarks and reproducibility notes.
Same agent now watches every new PR in those repos. The gains compound instead of decaying.
Unstructured extended the engagement. The next phase is a discovery across roughly $1M/year of infrastructure spend, with an end-to-end optimization plan.
Book a 20-minute diagnostic. We'll tell you where the waste is before you commit to anything.
Book a diagnostic