Rich renders the banner panel with box-drawing characters (╭, ╮, │, etc.)
that cp1252 cannot decode. On Windows, subprocess.run(..., text=True) uses
cp1252 by default, so decoding the child stdout raises UnicodeDecodeError
and subprocess sets result.stdout to None — breaking the assertion with a
misleading "argument of type 'NoneType' is not iterable".
Pass encoding="utf-8" explicitly so the test passes on every platform.
Rich renders the banner panel with box-drawing characters (╭, ╮, │, etc.)
that cp1252 cannot decode. On Windows, subprocess.run(..., text=True) uses
cp1252 by default, so decoding the child stdout raises UnicodeDecodeError
and subprocess sets result.stdout to None — breaking the assertion with a
misleading "argument of type 'NoneType' is not iterable".
Pass encoding="utf-8" explicitly so the test passes on every platform.
The prek mypy hook runs on changed files and bypasses the pyproject.toml
tests/ exclude, surfacing pre-existing errors in both context.py and
test_context.py that block CI for this PR. Fixes applied:
- Import Language from language_enum instead of base (base re-exports are
not explicit; strict mypy flags attr-defined)
- Annotate _extract_class_declaration, _import_to_statement,
get_java_imported_type_skeletons, and resolved_imports
- Guard None start/end_line in _extract_function_source_by_lines and
find_helper_functions; guard None file_path in the import skeleton loop
- Drop unreachable `if not node: continue` in _extract_public_method_signatures
(JavaMethodNode.node is non-nullable)
- Add -> None to every test method and fix an `int | None` comparison in
test_context.py
All 880 Java tests pass after the change.
Add -> None return annotations and Path / JavaSupport parameter annotations
to every test method + fixture so the prek mypy hook passes when the file
is in the CI diff.
Replaces source-level JavaScript function tracing with Babel AST
transformation via babel-tracer-plugin.js and trace-runner.js. Adds
replay test generation, Python-side tracer runner, and --language
flag to the tracer CLI for explicit JS/TS routing.
Keep the combined JFR + tracing agent single JVM invocation from main while
preserving the fix's intent: raise when trace-db was not created, warn when
exit code is non-zero but trace-db exists. Integration tests rewritten to
match the combined-invocation semantics.
Gradle evaluates all project configurations during the configuration
phase, even when only one module is targeted. Multi-module projects with
diverse toolchain requirements (e.g., OpenRewrite's rewrite-gradle needs
JDK 8) fail when an unrelated module's toolchain isn't available.
Adds --configure-on-demand to all 8 Gradle command construction sites
so Gradle only configures projects needed for the requested task.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_run_java_with_graceful_timeout() discarded the subprocess exit code in
both the no-timeout and timeout paths. If Maven/Gradle failed (compilation
error, OOM, etc.), the tracer silently continued with missing/stale data.
Now returns the exit code. Stage 1 (JFR profiling) warns on failure but
continues. Stage 2 (argument capture) raises RuntimeError since trace
data is essential for replay test generation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep only repeatString which reliably produces 284% improvement.
Drop computeSum (marginal 16%), filterEvens and instanceMethod (no
optimization found). Reduces tracer E2E from ~1h27m to ~21m.
Move codeflash's own benchmarks to .codeflash/benchmarks/. Add
auto-discovery of .codeflash/benchmarks/ in codeflash compare and
benchmark mode -- when benchmarks-root is not explicitly configured,
the CLI checks for .codeflash/benchmarks/ before erroring.
Backwards compatible: users with existing benchmarks-root config
are unaffected. Docs continue to show tests/benchmarks as the
example path.
Drop repeatString from the Workload fixture (2→1 function).
computeSum alone exercises the full tracer→optimizer pipeline
(trace → replay tests → optimize → evaluate → rank → explain → review).
The second function added no additional pipeline coverage.
Remove filterEvens and instanceMethod from the Workload fixture (4→2
functions) and reduce main() loop from 1000→100 rounds. The E2E test
only needs to verify the tracer→optimizer pipeline works end-to-end;
it doesn't need 4 functions or 1604 replay tests to prove that.
Expected impact: ~2 functions × ~8 candidates × fewer replay tests
should bring the job from ~75 min down to ~10-15 min.
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files
- test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
- Remove dead `import shutil` from test_comparator.py
- Rename `requires_java` → `requires_java_runtime` for consistency with test_run_and_parse.py
- Remove redundant `@requires_java_runtime` on test_behavior_return_value_correctness (class already has it)
Apply @requires_java_runtime to TestJavaRunAndParseBehavior and
TestJavaRunAndParsePerformance at the class level. The performance
test was failing on Windows with a flaky 10ms timing assertion
(10.515ms actual, 5% tolerance) — pre-existing issue masked by
continue-on-error.
Same fix as test_comparator.py — uses _find_comparator_jar() to skip
when the codeflash-runtime JAR isn't built. Fixes Windows unit-tests
which don't have Java pre-installed (unlike Linux runners).
Ubuntu runners have Java/Maven pre-installed, so checking for java/mvn
binaries doesn't skip. The actual dependency is the codeflash-runtime
JAR which must be built from codeflash-java-runtime/ via Maven.
The requires_java marker only checked for java binary but the tests
also need mvn to build the codeflash-runtime JAR. These 13 tests
were silently failing in unit-tests (masked by continue-on-error).
Normalize paths to forward slashes in JS/TS code generation and coverage
parsing — backslashes are escape chars in JavaScript strings and cause
silent corruption on Windows. Also relax timing test thresholds for CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regex for extracting modules from settings.gradle only matched
single-line include statements. Multi-line includes like eureka's
(include 'a',\n 'b',\n 'c') only captured the first module, causing
test_module to be None and breaking multi-module path resolution
(e.g., classfiles lookup for JaCoCo coverage conversion).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Generalize _find_top_level_dependencies_block() into _find_top_level_block(name)
so it can find any top-level block (dependencies, repositories, etc.)
- Rewrite _ensure_maven_central_repo() to use tree-sitter instead of regex,
preventing false matches inside buildscript/subprojects/allprojects blocks
- Add _update_existing_codeflash_dependency() to replace stale versions or
old files() format with the current Maven Central coordinate
- Wire version update into add_codeflash_dependency() and
add_codeflash_dependency_multimodule() so old entries get updated instead
of silently skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse listOf(...) patterns in settings.gradle.kts for projects that
build include lists dynamically (e.g. OpenRewrite)
- Use word boundary in include regex to avoid matching variable names
like 'includedProjects'
- Break module voting ties using codeflash.toml module-root config,
so the function's own module is preferred over cross-module tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>