Commit graph

7976 commits

Author SHA1 Message Date
Kevin Turcios
986654b7e6 fix: pin PYTHONHASHSEED=0 in test env and enhance diff diagnostics
Set PYTHONHASHSEED=0 in test subprocess environments so original and
candidate runs use identical hash behavior, eliminating a source of
non-deterministic return-value comparisons.

Also upgrade diff logging from debug to info level with actual types
and repr values for DID_PASS, RETURN_VALUE, and STDOUT diffs.
2026-04-10 06:38:08 -05:00
Kevin Turcios
e191f74aa6 chore: add diagnostic logging to compare_test_results
Temporary instrumentation to debug flaky futurehouse E2E test.
Logs matched/skipped/timed-out counts and did_all_timeout state.
2026-04-10 06:16:39 -05:00
Kevin Turcios
fefccd5935 fix: drop JFR inline event config that breaks JDK 11
The jdk.ExecutionSample#period=1ms syntax in -XX:StartFlightRecording
is only supported on JDK 13+. On JDK 11 (CI), it causes
"Failure when starting JFR on_create_vm_2" and no JFR file is created.
The settings=profile preset still provides 10ms CPU sampling.
2026-04-10 05:28:34 -05:00
Kevin Turcios
bfe6f3a828 Remove debug timing instrumentation from tracer
Strip AtomicLong accumulators, System.nanoTime() timing, and
getTimingSummary() that were added for profiling. No functional change.
2026-04-10 05:16:49 -05:00
Kevin Turcios
01e22152c7 flexing 2026-04-10 05:07:53 -05:00
Kevin Turcios
e81f25f825 fix: remove stale repeatString assertions from integration tests
repeatString was removed from Workload.java in the E2E reduction.
2026-04-10 05:05:17 -05:00
Kevin Turcios
0772398c59 perf: optimize Java tracing agent serialization and writes
- Reuse ThreadLocal Kryo Output buffers (eliminates #1 allocation hotspot)
- Fast-path inline serialization for safe arg types (bypasses executor)
- Skip verification roundtrip for known-safe containers (ArrayList, HashMap, etc.)
- Batch SQLite inserts (256/txn) with permanent autocommit-off
- Switch to ArrayBlockingQueue (no per-element Node allocation)
- Add opt-in in-memory SQLite mode (VACUUM INTO at shutdown), enabled in CI
- Add timing instrumentation (onEntry, serialization, writes, dump)
- Add ProfilingWorkload fixture for benchmarking

Benchmark (50k captures): onEntry 5200ms→1200ms (4.3x), avg/capture
0.43ms→0.02ms (21x), writes 3200ms→900ms (3.5x) with in-memory mode.
2026-04-10 04:55:36 -05:00
Kevin Turcios
08aa94c54a perf: reduce java-tracer E2E to single function for ~11 min target
Drop repeatString from the Workload fixture (2→1 function).
computeSum alone exercises the full tracer→optimizer pipeline
(trace → replay tests → optimize → evaluate → rank → explain → review).
The second function added no additional pipeline coverage.
2026-04-10 03:44:54 -05:00
Kevin Turcios
46957e190f fix: update java tracer unit tests for reduced Workload fixture
Remove assertions for filterEvens and instanceMethod which were removed
from the Workload fixture. Adjust expected invocation counts accordingly.
2026-04-10 03:17:46 -05:00
Kevin Turcios
21f61ec93d ci: add java_tracer_e2e fixture path to e2e_java change detection
The fixture directory wasn't in the path filter, so changes to
Workload.java didn't trigger the java E2E tests.
2026-04-10 03:08:03 -05:00
Kevin Turcios
2b0f633c0f perf: reduce java-tracer E2E from ~75 min to ~15 min
Remove filterEvens and instanceMethod from the Workload fixture (4→2
functions) and reduce main() loop from 1000→100 rounds. The E2E test
only needs to verify the tracer→optimizer pipeline works end-to-end;
it doesn't need 4 functions or 1604 replay tests to prove that.

Expected impact: ~2 functions × ~8 candidates × fewer replay tests
should bring the job from ~75 min down to ~10-15 min.
2026-04-10 03:04:29 -05:00
Kevin Turcios
5ee642e35e
Merge pull request #2057 from codeflash-ai/fix/api-read-timeout
fix: increase API read timeout to prevent flaky E2E failures
2026-04-10 02:45:31 -05:00
Kevin Turcios
4ac573f10f fix: increase API read timeout from 90s to 300s to prevent flaky E2E failures
The flat 90s timeout was too aggressive for LLM-powered endpoints
(/testgen, /optimize, /refinement) under load, causing ReadTimeoutError
and failing the async-optimization E2E test. Split into (10s connect,
300s read) tuple so connections fail fast but LLM inference gets adequate time.
2026-04-10 02:33:16 -05:00
Kevin Turcios
72a41a5665
Merge pull request #2055 from codeflash-ai/perf/defer-cli-imports
perf: defer cli.py imports for 7.7x faster --help
2026-04-10 01:59:57 -05:00
Kevin Turcios
93810f8be6
Merge pull request #2056 from codeflash-ai/chore/delete-disabled-workflows
chore: delete disabled codeflash.yaml workflow
2026-04-10 01:52:47 -05:00
Kevin Turcios
79d47e0fae chore: delete disabled codeflash.yaml workflow
JS ESM integration test — disabled and superseded by ci.yaml's e2e-js matrix.
2026-04-10 01:51:52 -05:00
Kevin Turcios
381d1319ea fix: specify utf-8 encoding in benchmark read_text for Windows CI
Windows defaults to cp1252 which can't decode some source file bytes.
2026-04-10 01:48:31 -05:00
Kevin Turcios
fe39d40e1b perf: add type identity fast-paths for str/list/tuple/dict in comparator
Move the 4 most common return-value types (str, list/tuple, dict) to
`orig_type is T` identity checks at the top of the dispatch chain,
before the frozenset lookup.  A single pointer comparison is cheaper
than a frozenset hash, and these types need special handling anyway
(temp-path normalization, recursive comparison, superset support).

Before: dict traversed ~8 isinstance checks before being handled.
After:  dict is handled at check #3 via `orig_type is dict`.

The isinstance fallbacks remain as slow-paths for subclasses (deque,
ChainMap, defaultdict, scipy dok_matrix, etc.).

Backported from codeflash-python dispatch ordering.
2026-04-10 01:25:05 -05:00
Kevin Turcios
5a5b6e46ac bench: add dedicated comparator microbenchmark for frozenset fast-path
5 scenarios: primitives, nested dicts, DB rows, deep nesting,
and identity types (frozenset/range/complex/Decimal/OrderedDict).
2026-04-10 01:05:02 -05:00
Kevin Turcios
4c3c6ea167 perf: add frozenset fast-path for comparator type dispatch
Use O(1) frozenset membership test with type identity before falling
through to isinstance MRO traversal. Backported from codeflash-python.
2026-04-10 00:53:55 -05:00
Kevin Turcios
accbab4a16 fix: update test_cmd_auth patches for deferred imports
Imports in cmd_auth.py were moved into function bodies, so mock
patches must target the source modules instead of cmd_auth's namespace.
2026-04-10 00:36:02 -05:00
Kevin Turcios
2e2e19f7ae bench: add libcst visitor benchmarks for multi-file and full pipeline
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files
- test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
2026-04-10 00:21:45 -05:00
Kevin Turcios
1a25f05e14 fix: remove unnecessary Optimizer from benchmark test
The test only needs project_root, not a full Optimizer (which requires
an API key). Also adds missing __init__.py to tests/benchmarks/.
2026-04-10 00:10:36 -05:00
Kevin Turcios
2208e8ca77 bench: add CLI startup benchmark for codeflash compare --script
Measures median wall-clock time for --version, --help, auth status,
and compare --help across 30 runs with 3 warmups.

Usage:
  codeflash compare main codeflash/optimize \
    --script "python benchmarks/bench_cli_startup.py" \
    --script-output benchmarks/results.json
2026-04-09 23:59:26 -05:00
Kevin Turcios
b533f50bdc perf: backport libcst visitor dispatch cache from codeflash-python
Cache the visitor dispatch tables that libcst rebuilds on every
MatcherDecoratableTransformer/Visitor instantiation. The tables
depend only on the class, not the instance, so caching by type is
safe. Saves ~27ms per visitor instantiation (24x faster).

Also fix pre-existing ruff F821 in cli.py (missing exit_with_message
import in process_pyproject_config).
2026-04-09 23:46:45 -05:00
github-actions[bot]
61053be9ce style: auto-format with ruff 2026-04-10 04:39:45 +00:00
Kevin Turcios
436d642847 perf: defer libcst, Rich, comparator imports in models.py
Move libcst, rich.tree.Tree, console, comparator, code_utils, registry,
lsp.helpers, and LspMarkdownMessage from module-level to the methods that
use them. Only pydantic and TestType remain at module level (needed for
class definitions).

models.py import: 633ms → 125ms on Azure Standard_D4s_v5.
2026-04-09 23:38:40 -05:00
github-actions[bot]
88babfef25 style: auto-format with ruff 2026-04-10 04:30:36 +00:00
Kevin Turcios
2fc528ebda perf: defer heavy imports in env_utils and shell_utils
Defer console, formatter, code_utils, registry, and lsp.helpers imports
from module level into the functions that use them. Inline is_LSP_enabled
(a one-liner env var check) to avoid importing lsp.helpers on the happy
path of get_codeflash_api_key.

auth status: 237ms → 160ms on Azure Standard_D4s_v5.
2026-04-09 23:29:31 -05:00
Kevin Turcios
992e91abc7 fix: prevent ruff auto-format from rewriting version.py placeholders
uv-dynamic-versioning rewrites version.py on every `uv run`, so the
ruff auto-format job was inadvertently committing dev version strings.
Restore version.py files after formatting and revert the ones already
changed on this branch.
2026-04-09 23:21:25 -05:00
github-actions[bot]
1e8e5d2cc2 style: auto-format with ruff 2026-04-10 04:14:58 +00:00
Kevin Turcios
a8c004164e perf: skip telemetry/banner for auth and compare commands
Restructure main() command dispatch so auth and compare exit early
without loading telemetry (sentry, posthog), version_check, or the
banner. Defer cmd_auth.py imports into functions.

auth status: ~1000ms → 237ms (4.2x)
compare --help: ~297ms → 38ms (7.9x)
2026-04-09 23:14:03 -05:00
github-actions[bot]
05a7641405 style: auto-format with ruff 2026-04-10 04:09:00 +00:00
Kevin Turcios
70e3ce1a67 perf: defer cli.py imports for 7.7x faster --help
Move heavy module-level imports in cli.py (console, env_utils,
code_utils, config_parser, lsp.helpers, version) into the functions
that actually use them. Split main.py imports so parse_args() is
called before loading the full stack — --help exits via argparse
before any heavy modules load.

Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30):
  --help: 297ms → 39ms (7.7x faster)
  --version: 17ms (unchanged)
2026-04-09 23:08:22 -05:00
Kevin Turcios
7351d0f0ba
Merge pull request #2051 from codeflash-ai/fix/ts-e2e-test-data-size
Increase TS E2E test data size to fix flaky js-ts-class
2026-04-09 22:26:38 -05:00
Kevin Turcios
8ca0f8d2cc Fix JS line profiler empty output file causing JSONDecodeError
The profiler's save() was called every 100 hit() calls. With O(n²)
algorithms this produced hundreds of thousands of writeFileSync calls,
each truncating the file to 0 bytes before writing. If the subprocess
timed out (SIGKILL), the file was left at 0 bytes → JSONDecodeError.

Fixes:
- Move require('fs')/require('path') to module scope (not inside save())
- Reduce save-every-N from 100 → 10,000 hits (100x fewer syscalls)
- Pre-create output file with {} before running Jest (safety net)
- Handle empty files gracefully in parse_results
- Fix misleading "file not found" warning → "file empty or no timing data"
2026-04-09 22:26:23 -05:00
github-actions[bot]
23d9e73bfa style: auto-format with ruff 2026-04-09 22:26:23 -05:00
Kevin Turcios
b7bcd0fe2e ci: add code_to_optimize/js/ to e2e_js path filter
The change detection for JS E2E tests was missing the test fixture
directory, so PRs that only modify JS test data (like this one) were
skipped. Java already had its equivalent path included.
2026-04-09 22:26:19 -05:00
Kevin Turcios
a73ccca426 Increase test data size for TS findDuplicates benchmark
The js-ts-class E2E test was flaky because n=100 is too small for
the O(n²)→O(n) optimization to overcome Map/Set per-operation overhead.
At n=100, the LLM correctly generates a Map-based O(n) solution but it
benchmarks as slower (-10.6%) due to constant factor dominance.

Bump to n=10,000 so the algorithmic improvement produces measurable
speedup, making the 30% E2E threshold reliably achievable.
2026-04-09 22:26:19 -05:00
Kevin Turcios
477dfa246e
Merge pull request #2049 from codeflash-ai/ci/cleanup-test-markers
Clean up Java test skip markers
2026-04-09 22:26:10 -05:00
github-actions[bot]
41841325e2 style: auto-format with ruff 2026-04-10 03:23:38 +00:00
Kevin Turcios
da536db8a2 Clean up Java test skip markers
- Remove dead `import shutil` from test_comparator.py
- Rename `requires_java` → `requires_java_runtime` for consistency with test_run_and_parse.py
- Remove redundant `@requires_java_runtime` on test_behavior_return_value_correctness (class already has it)
2026-04-09 22:22:39 -05:00
Kevin Turcios
e73492f414
Merge pull request #2053 from codeflash-ai/fix/ci-windows-shell
fix(ci): add shell: bash to conditional install step for Windows
2026-04-09 22:22:29 -05:00
Kevin Turcios
5b6318fcbb fix(ci): add shell: bash to conditional install step for Windows
The bash [[ ]] syntax fails on Windows runners which default to
PowerShell. Explicitly setting shell: bash fixes the ParserError.
2026-04-09 22:22:11 -05:00
Kevin Turcios
145043fdb3
Merge pull request #2052 from codeflash-ai/ci/workflow-upgrades-and-fixes
ci: upgrade action versions, add uv cache, fix broken paths, DRY publish
2026-04-09 22:10:58 -05:00
Kevin Turcios
7c4d98c6e7 ci: restore uv venv --seed in claude.yml
uv venv --seed makes pip available in the venv, which the
Claude Code action may need.
2026-04-09 22:08:59 -05:00
Kevin Turcios
be4c459d01 ci: upgrade action versions, add uv cache, fix broken paths, DRY publish
- Bump actions/checkout v4/v5 → v6, setup-node v4 → v6, setup-java v4 → v5,
  prek-action v1 → v2, github-script v6 → v7, aws-credentials v4 → v6,
  claude-code-action v1.0.89 → v1
- Add enable-cache: true to all astral-sh/setup-uv steps
- Remove redundant uv venv --seed (uv sync creates venvs automatically)
- Merge double uv sync steps in unit-tests into single conditional
- Fix codeflash.yaml: broken path filter and working-directory
- Consolidate duplicate publish jobs into a single matrix job
- Remove generate_release_notes overridden by manual body
2026-04-09 22:06:41 -05:00
Kevin Turcios
153097b9a3
Merge pull request #2015 from codeflash-ai/fix/gradle-maven-central-dependency
fix: improve multi-module Gradle detection for dynamic settings.gradle.kts
2026-04-09 19:17:01 -05:00
github-actions[bot]
a6ea56bf50 style: auto-format with ruff 2026-04-09 23:44:22 +00:00
Kevin Turcios
2dba3e3849
Merge branch 'main' into fix/gradle-maven-central-dependency 2026-04-09 18:43:25 -05:00