Wireframe · Case study: Unstructured
Case study · Unstructured

90% infra cost cut at Unstructured.

In 7 weeks, codeflash-agent cut the production bill from ~$10K/month to ~$1.1K/month — with zero regressions across 354 tests.

90%
infra cost reduction
9.2×
pod density
24
PRs merged across 5 repos
0
regressions

Billions of documents. A bill that kept climbing.

Unstructured processes billions of documents per month, turning raw PDFs, images, and office files into structured elements for LLM and RAG pipelines. The product ran on Standard_D48s_v5 nodes, RAM-bound at 5 pods per node. Engineers knew performance could improve — they didn't have the bandwidth. Functionality ranked higher. Occasional OOMs pushed defensive over-allocation, which pushed the bill up.

Four stacked bottlenecks.

Humans keep missing these because they're layered under each other. Fix one, the next one surfaces.

  1. os.cpu_count() was lying.

    Returned 48 on a 1-CPU pod. OCR spawned 4 workers each loading a full ONNX model — 4× memory, zero parallelism. Agent replaced with container-aware CPU detection. Worker pool collapsed to one.

  2. 24 MB memory creep per request.

    Hidden in baseline noise, in PIL preprocessing. Reordered allocation, dropped to 17 MB.

  3. An O(N²) hotspot.

    In _patch_current_chars_with_render_mode. Rewritten as a single-pass. 14× speedup on 5000-char pages.

  4. Three redundant PNG round-trips.

    In the OCR pipeline. Switched to BMP format. Removed CPU work that wasn't doing anything.

Each fix links to its merged PR.

Three quotes from Crag Wolfe, Chief Architect.

"We knew we could do better, but we didn't have the bandwidth."
"Operationally, the cost-performance profile of our running instances is in a different place. We're seeing fewer tail events where you might hit an OOM, and faster scale-out."
"A PR comes in, we measure RSS before and after on a running system, and the improvement is 2× or 3×. Real demonstrable progress, not theoretical."

How the engagement ran.

01

Scope

Cut pod cost without touching product surface area. Reproduced baselines on the customer's own VM.

02

Discover

Agent ran autonomously in sandbox for 7 weeks.

03

Review

Codeflash engineers reviewed every candidate. 24 accepted, shipped as PRs with benchmarks and reproducibility notes.

04

Stay Optimal

Same agent now watches every new PR in those repos. The gains compound instead of decaying.

~$1M/year of infrastructure under discovery.

Unstructured extended the engagement. The next phase is a discovery across roughly $1M/year of infrastructure spend, with an end-to-end optimization plan.

If you run ML at scale, we've probably optimized your stack.

Book a 20-minute diagnostic. We'll tell you where the waste is before you commit to anything.

Book a diagnostic