Move heavy module-level imports in cli.py (console, env_utils,
code_utils, config_parser, lsp.helpers, version) into the functions
that actually use them. Split main.py imports so parse_args() is
called before loading the full stack — --help exits via argparse
before any heavy modules load.
Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30):
--help: 297ms → 39ms (7.7x faster)
--version: 17ms (unchanged)
The profiler's save() was called every 100 hit() calls. With O(n²)
algorithms this produced hundreds of thousands of writeFileSync calls,
each truncating the file to 0 bytes before writing. If the subprocess
timed out (SIGKILL), the file was left at 0 bytes → JSONDecodeError.
Fixes:
- Move require('fs')/require('path') to module scope (not inside save())
- Reduce save-every-N from 100 → 10,000 hits (100x fewer syscalls)
- Pre-create output file with {} before running Jest (safety net)
- Handle empty files gracefully in parse_results
- Fix misleading "file not found" warning → "file empty or no timing data"
The change detection for JS E2E tests was missing the test fixture
directory, so PRs that only modify JS test data (like this one) were
skipped. Java already had its equivalent path included.
The js-ts-class E2E test was flaky because n=100 is too small for
the O(n²)→O(n) optimization to overcome Map/Set per-operation overhead.
At n=100, the LLM correctly generates a Map-based O(n) solution but it
benchmarks as slower (-10.6%) due to constant factor dominance.
Bump to n=10,000 so the algorithmic improvement produces measurable
speedup, making the 30% E2E threshold reliably achievable.
- Remove dead `import shutil` from test_comparator.py
- Rename `requires_java` → `requires_java_runtime` for consistency with test_run_and_parse.py
- Remove redundant `@requires_java_runtime` on test_behavior_return_value_correctness (class already has it)
- Delete standalone java-e2e-tests.yml (duplicate of ci.yaml e2e-java)
- Add npm cache to e2e-js jobs via setup-node cache option
- Consolidate Maven build: mvn clean package + install → single mvn install
- Add .github/workflows/ci.yaml and .github/actions/** to push paths
so CI validates its own changes when merged to main
Apply @requires_java_runtime to TestJavaRunAndParseBehavior and
TestJavaRunAndParsePerformance at the class level. The performance
test was failing on Windows with a flaky 10ms timing assertion
(10.515ms actual, 5% tolerance) — pre-existing issue masked by
continue-on-error.
Same fix as test_comparator.py — uses _find_comparator_jar() to skip
when the codeflash-runtime JAR isn't built. Fixes Windows unit-tests
which don't have Java pre-installed (unlike Linux runners).
Ubuntu runners have Java/Maven pre-installed, so checking for java/mvn
binaries doesn't skip. The actual dependency is the codeflash-runtime
JAR which must be built from codeflash-java-runtime/ via Maven.
The requires_java marker only checked for java binary but the tests
also need mvn to build the codeflash-runtime JAR. These 13 tests
were silently failing in unit-tests (masked by continue-on-error).
- Remove codeflash-java-runtime/ from unit_tests change detection
- Narrow e2e flag from codeflash/ to explicit Python subdirs (excludes java/, javascript/)
- Narrow tests/ in e2e_java/e2e_js to specific test scripts
- Extract duplicated Validate PR step into composite action
- Use fetch-depth: 1 for unit-tests and type-check (no git history needed)
- Remove continue-on-error: true from unit-tests (was masking real failures)
- Change git add -A to git add -u in prek auto-fix (won't stage untracked files)
Expand e2e_java and e2e_js change detection to include shared pipeline
code (optimization/, verification/, languages/base.py) but decouple
from the broad e2e flag. A change to codeflash/version.py now only
triggers Python E2Es, not Java/JS E2Es.
Only 5 of 3,943 unit tests need Java, and they already have
skip_if_maven_not_available() guards. Java execution is validated
by the e2e-java job. Saves ~30-60s per matrix entry (7 entries).
When matrix jobs are skipped, `${{ matrix.name }}` is never expanded,
showing literal "matrix.name" in the checks UI. Removing the `name:`
field lets GitHub use the job ID when skipped and auto-expand matrix
values when running.
Add all non-required-check E2E workflows and prek lint to the
consolidated ci.yaml:
- 4 standard Python E2Es (async, benchmark, coverage, init)
- 3 JS E2Es (cjs-function, esm-async, ts-class)
- 2 Java E2Es (fibonacci-nogit, tracer)
- prek lint
New change detection outputs:
- e2e_js: triggers JS E2Es when packages/ changes
- e2e_java: triggers Java E2Es when java runtime/fixtures change
Total: 17 jobs + determine-changes + gate = 19 jobs in one file.
Down from 22 workflow files to 7 (remaining are non-test: claude,
codeflash self-optimize, label-workflow-changes, publish, java-e2e).
Additional savings per irrelevant PR: ~$0.80 (10 jobs x ~$0.08).
Total per skipped PR: ~$1.85.