Multi-module Gradle projects declare JUnit 5 dependencies in submodule
build files, not the root. The detection function only checked the root
build.gradle, missing JUnit 5 entirely and falling back to JUnit 4.
Now scans immediate child directories for build files. Also changes the
default framework from JUnit 4 to JUnit 5 (standard since 2017).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The flat 90s timeout was too aggressive for LLM-powered endpoints
(/testgen, /optimize, /refinement) under load, causing ReadTimeoutError
and failing the async-optimization E2E test. Split into (10s connect,
300s read) tuple so connections fail fast but LLM inference gets adequate time.
Move the 4 most common return-value types (str, list/tuple, dict) to
`orig_type is T` identity checks at the top of the dispatch chain,
before the frozenset lookup. A single pointer comparison is cheaper
than a frozenset hash, and these types need special handling anyway
(temp-path normalization, recursive comparison, superset support).
Before: dict traversed ~8 isinstance checks before being handled.
After: dict is handled at check #3 via `orig_type is dict`.
The isinstance fallbacks remain as slow-paths for subclasses (deque,
ChainMap, defaultdict, scipy dok_matrix, etc.).
Backported from codeflash-python dispatch ordering.
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files
- test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
Measures median wall-clock time for --version, --help, auth status,
and compare --help across 30 runs with 3 warmups.
Usage:
codeflash compare main codeflash/optimize \
--script "python benchmarks/bench_cli_startup.py" \
--script-output benchmarks/results.json
Cache the visitor dispatch tables that libcst rebuilds on every
MatcherDecoratableTransformer/Visitor instantiation. The tables
depend only on the class, not the instance, so caching by type is
safe. Saves ~27ms per visitor instantiation (24x faster).
Also fix pre-existing ruff F821 in cli.py (missing exit_with_message
import in process_pyproject_config).
Move libcst, rich.tree.Tree, console, comparator, code_utils, registry,
lsp.helpers, and LspMarkdownMessage from module-level to the methods that
use them. Only pydantic and TestType remain at module level (needed for
class definitions).
models.py import: 633ms → 125ms on Azure Standard_D4s_v5.
Defer console, formatter, code_utils, registry, and lsp.helpers imports
from module level into the functions that use them. Inline is_LSP_enabled
(a one-liner env var check) to avoid importing lsp.helpers on the happy
path of get_codeflash_api_key.
auth status: 237ms → 160ms on Azure Standard_D4s_v5.
uv-dynamic-versioning rewrites version.py on every `uv run`, so the
ruff auto-format job was inadvertently committing dev version strings.
Restore version.py files after formatting and revert the ones already
changed on this branch.
Move heavy module-level imports in cli.py (console, env_utils,
code_utils, config_parser, lsp.helpers, version) into the functions
that actually use them. Split main.py imports so parse_args() is
called before loading the full stack — --help exits via argparse
before any heavy modules load.
Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30):
--help: 297ms → 39ms (7.7x faster)
--version: 17ms (unchanged)
The profiler's save() was called every 100 hit() calls. With O(n²)
algorithms this produced hundreds of thousands of writeFileSync calls,
each truncating the file to 0 bytes before writing. If the subprocess
timed out (SIGKILL), the file was left at 0 bytes → JSONDecodeError.
Fixes:
- Move require('fs')/require('path') to module scope (not inside save())
- Reduce save-every-N from 100 → 10,000 hits (100x fewer syscalls)
- Pre-create output file with {} before running Jest (safety net)
- Handle empty files gracefully in parse_results
- Fix misleading "file not found" warning → "file empty or no timing data"
The change detection for JS E2E tests was missing the test fixture
directory, so PRs that only modify JS test data (like this one) were
skipped. Java already had its equivalent path included.
The js-ts-class E2E test was flaky because n=100 is too small for
the O(n²)→O(n) optimization to overcome Map/Set per-operation overhead.
At n=100, the LLM correctly generates a Map-based O(n) solution but it
benchmarks as slower (-10.6%) due to constant factor dominance.
Bump to n=10,000 so the algorithmic improvement produces measurable
speedup, making the 30% E2E threshold reliably achievable.
- Remove dead `import shutil` from test_comparator.py
- Rename `requires_java` → `requires_java_runtime` for consistency with test_run_and_parse.py
- Remove redundant `@requires_java_runtime` on test_behavior_return_value_correctness (class already has it)
- Delete standalone java-e2e-tests.yml (duplicate of ci.yaml e2e-java)
- Add npm cache to e2e-js jobs via setup-node cache option
- Consolidate Maven build: mvn clean package + install → single mvn install
- Add .github/workflows/ci.yaml and .github/actions/** to push paths
so CI validates its own changes when merged to main
Apply @requires_java_runtime to TestJavaRunAndParseBehavior and
TestJavaRunAndParsePerformance at the class level. The performance
test was failing on Windows with a flaky 10ms timing assertion
(10.515ms actual, 5% tolerance) — pre-existing issue masked by
continue-on-error.