Commit graph

7955 commits

Author SHA1 Message Date
Kevin Turcios
2e2e19f7ae bench: add libcst visitor benchmarks for multi-file and full pipeline
- test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files
- test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file
2026-04-10 00:21:45 -05:00
Kevin Turcios
1a25f05e14 fix: remove unnecessary Optimizer from benchmark test
The test only needs project_root, not a full Optimizer (which requires
an API key). Also adds missing __init__.py to tests/benchmarks/.
2026-04-10 00:10:36 -05:00
Kevin Turcios
2208e8ca77 bench: add CLI startup benchmark for codeflash compare --script
Measures median wall-clock time for --version, --help, auth status,
and compare --help across 30 runs with 3 warmups.

Usage:
  codeflash compare main codeflash/optimize \
    --script "python benchmarks/bench_cli_startup.py" \
    --script-output benchmarks/results.json
2026-04-09 23:59:26 -05:00
Kevin Turcios
b533f50bdc perf: backport libcst visitor dispatch cache from codeflash-python
Cache the visitor dispatch tables that libcst rebuilds on every
MatcherDecoratableTransformer/Visitor instantiation. The tables
depend only on the class, not the instance, so caching by type is
safe. Saves ~27ms per visitor instantiation (24x faster).

Also fix pre-existing ruff F821 in cli.py (missing exit_with_message
import in process_pyproject_config).
2026-04-09 23:46:45 -05:00
github-actions[bot]
61053be9ce style: auto-format with ruff 2026-04-10 04:39:45 +00:00
Kevin Turcios
436d642847 perf: defer libcst, Rich, comparator imports in models.py
Move libcst, rich.tree.Tree, console, comparator, code_utils, registry,
lsp.helpers, and LspMarkdownMessage from module-level to the methods that
use them. Only pydantic and TestType remain at module level (needed for
class definitions).

models.py import: 633ms → 125ms on Azure Standard_D4s_v5.
2026-04-09 23:38:40 -05:00
github-actions[bot]
88babfef25 style: auto-format with ruff 2026-04-10 04:30:36 +00:00
Kevin Turcios
2fc528ebda perf: defer heavy imports in env_utils and shell_utils
Defer console, formatter, code_utils, registry, and lsp.helpers imports
from module level into the functions that use them. Inline is_LSP_enabled
(a one-liner env var check) to avoid importing lsp.helpers on the happy
path of get_codeflash_api_key.

auth status: 237ms → 160ms on Azure Standard_D4s_v5.
2026-04-09 23:29:31 -05:00
Kevin Turcios
992e91abc7 fix: prevent ruff auto-format from rewriting version.py placeholders
uv-dynamic-versioning rewrites version.py on every `uv run`, so the
ruff auto-format job was inadvertently committing dev version strings.
Restore version.py files after formatting and revert the ones already
changed on this branch.
2026-04-09 23:21:25 -05:00
github-actions[bot]
1e8e5d2cc2 style: auto-format with ruff 2026-04-10 04:14:58 +00:00
Kevin Turcios
a8c004164e perf: skip telemetry/banner for auth and compare commands
Restructure main() command dispatch so auth and compare exit early
without loading telemetry (sentry, posthog), version_check, or the
banner. Defer cmd_auth.py imports into functions.

auth status: ~1000ms → 237ms (4.2x)
compare --help: ~297ms → 38ms (7.9x)
2026-04-09 23:14:03 -05:00
github-actions[bot]
05a7641405 style: auto-format with ruff 2026-04-10 04:09:00 +00:00
Kevin Turcios
70e3ce1a67 perf: defer cli.py imports for 7.7x faster --help
Move heavy module-level imports in cli.py (console, env_utils,
code_utils, config_parser, lsp.helpers, version) into the functions
that actually use them. Split main.py imports so parse_args() is
called before loading the full stack — --help exits via argparse
before any heavy modules load.

Benchmark (Azure Standard_D4s_v5, Python 3.13, hyperfine --min-runs 30):
  --help: 297ms → 39ms (7.7x faster)
  --version: 17ms (unchanged)
2026-04-09 23:08:22 -05:00
Kevin Turcios
7351d0f0ba
Merge pull request #2051 from codeflash-ai/fix/ts-e2e-test-data-size
Increase TS E2E test data size to fix flaky js-ts-class
2026-04-09 22:26:38 -05:00
Kevin Turcios
8ca0f8d2cc Fix JS line profiler empty output file causing JSONDecodeError
The profiler's save() was called every 100 hit() calls. With O(n²)
algorithms this produced hundreds of thousands of writeFileSync calls,
each truncating the file to 0 bytes before writing. If the subprocess
timed out (SIGKILL), the file was left at 0 bytes → JSONDecodeError.

Fixes:
- Move require('fs')/require('path') to module scope (not inside save())
- Reduce save-every-N from 100 → 10,000 hits (100x fewer syscalls)
- Pre-create output file with {} before running Jest (safety net)
- Handle empty files gracefully in parse_results
- Fix misleading "file not found" warning → "file empty or no timing data"
2026-04-09 22:26:23 -05:00
github-actions[bot]
23d9e73bfa style: auto-format with ruff 2026-04-09 22:26:23 -05:00
Kevin Turcios
b7bcd0fe2e ci: add code_to_optimize/js/ to e2e_js path filter
The change detection for JS E2E tests was missing the test fixture
directory, so PRs that only modify JS test data (like this one) were
skipped. Java already had its equivalent path included.
2026-04-09 22:26:19 -05:00
Kevin Turcios
a73ccca426 Increase test data size for TS findDuplicates benchmark
The js-ts-class E2E test was flaky because n=100 is too small for
the O(n²)→O(n) optimization to overcome Map/Set per-operation overhead.
At n=100, the LLM correctly generates a Map-based O(n) solution but it
benchmarks as slower (-10.6%) due to constant factor dominance.

Bump to n=10,000 so the algorithmic improvement produces measurable
speedup, making the 30% E2E threshold reliably achievable.
2026-04-09 22:26:19 -05:00
Kevin Turcios
477dfa246e
Merge pull request #2049 from codeflash-ai/ci/cleanup-test-markers
Clean up Java test skip markers
2026-04-09 22:26:10 -05:00
github-actions[bot]
41841325e2 style: auto-format with ruff 2026-04-10 03:23:38 +00:00
Kevin Turcios
da536db8a2 Clean up Java test skip markers
- Remove dead `import shutil` from test_comparator.py
- Rename `requires_java` → `requires_java_runtime` for consistency with test_run_and_parse.py
- Remove redundant `@requires_java_runtime` on test_behavior_return_value_correctness (class already has it)
2026-04-09 22:22:39 -05:00
Kevin Turcios
e73492f414
Merge pull request #2053 from codeflash-ai/fix/ci-windows-shell
fix(ci): add shell: bash to conditional install step for Windows
2026-04-09 22:22:29 -05:00
Kevin Turcios
5b6318fcbb fix(ci): add shell: bash to conditional install step for Windows
The bash [[ ]] syntax fails on Windows runners which default to
PowerShell. Explicitly setting shell: bash fixes the ParserError.
2026-04-09 22:22:11 -05:00
Kevin Turcios
145043fdb3
Merge pull request #2052 from codeflash-ai/ci/workflow-upgrades-and-fixes
ci: upgrade action versions, add uv cache, fix broken paths, DRY publish
2026-04-09 22:10:58 -05:00
Kevin Turcios
7c4d98c6e7 ci: restore uv venv --seed in claude.yml
uv venv --seed makes pip available in the venv, which the
Claude Code action may need.
2026-04-09 22:08:59 -05:00
Kevin Turcios
be4c459d01 ci: upgrade action versions, add uv cache, fix broken paths, DRY publish
- Bump actions/checkout v4/v5 → v6, setup-node v4 → v6, setup-java v4 → v5,
  prek-action v1 → v2, github-script v6 → v7, aws-credentials v4 → v6,
  claude-code-action v1.0.89 → v1
- Add enable-cache: true to all astral-sh/setup-uv steps
- Remove redundant uv venv --seed (uv sync creates venvs automatically)
- Merge double uv sync steps in unit-tests into single conditional
- Fix codeflash.yaml: broken path filter and working-directory
- Consolidate duplicate publish jobs into a single matrix job
- Remove generate_release_notes overridden by manual body
2026-04-09 22:06:41 -05:00
Kevin Turcios
153097b9a3
Merge pull request #2015 from codeflash-ai/fix/gradle-maven-central-dependency
fix: improve multi-module Gradle detection for dynamic settings.gradle.kts
2026-04-09 19:17:01 -05:00
github-actions[bot]
a6ea56bf50 style: auto-format with ruff 2026-04-09 23:44:22 +00:00
Kevin Turcios
2dba3e3849
Merge branch 'main' into fix/gradle-maven-central-dependency 2026-04-09 18:43:25 -05:00
github-actions[bot]
11201fe7c6 style: auto-format with ruff 2026-04-09 23:43:15 +00:00
Kevin Turcios
8e60f2526f
Merge pull request #2050 from codeflash-ai/ci/optimize-caching-and-cleanup
Optimize CI: npm cache, Maven consolidation, remove duplicate workflow
2026-04-09 18:43:14 -05:00
Kevin Turcios
44c1bcf458 ci: retrigger CI 2026-04-09 18:42:16 -05:00
Kevin Turcios
e2eb677d18 Optimize CI caching and remove duplicate Java workflow
- Delete standalone java-e2e-tests.yml (duplicate of ci.yaml e2e-java)
- Add npm cache to e2e-js jobs via setup-node cache option
- Consolidate Maven build: mvn clean package + install → single mvn install
- Add .github/workflows/ci.yaml and .github/actions/** to push paths
  so CI validates its own changes when merged to main
2026-04-09 18:40:22 -05:00
github-actions[bot]
64790d5b60 style: auto-format with ruff 2026-04-09 23:13:50 +00:00
Kevin Turcios
3f53309847
Merge branch 'main' into fix/gradle-maven-central-dependency 2026-04-09 18:13:18 -05:00
Kevin Turcios
61f468ab7a
ci: batch CI improvements — paths, validate-pr action, test skips, fetch-depth
ci: consolidate improvements — paths, validate-pr action, fetch-depth, continue-on-error
2026-04-09 18:12:05 -05:00
Kevin Turcios
65b117c8b8 revert: restore version.py (remove CI trigger) 2026-04-09 18:11:43 -05:00
github-actions[bot]
a957a0e6c9 style: auto-format with ruff 2026-04-09 21:02:29 +00:00
Kevin Turcios
5ff38597ef test: skip all Java integration test classes when JAR missing
Apply @requires_java_runtime to TestJavaRunAndParseBehavior and
TestJavaRunAndParsePerformance at the class level. The performance
test was failing on Windows with a flaky 10ms timing assertion
(10.515ms actual, 5% tolerance) — pre-existing issue masked by
continue-on-error.
2026-04-09 16:01:53 -05:00
github-actions[bot]
72cf0e1654 style: auto-format with ruff 2026-04-09 20:47:47 +00:00
Kevin Turcios
78372bfbfb test: skip test_behavior_return_value_correctness when JAR missing
Same fix as test_comparator.py — uses _find_comparator_jar() to skip
when the codeflash-runtime JAR isn't built. Fixes Windows unit-tests
which don't have Java pre-installed (unlike Linux runners).
2026-04-09 15:47:10 -05:00
Mohamed Ashraf
32bb1cb8da Merge remote-tracking branch 'origin/main' into fix/gradle-maven-central-dependency 2026-04-09 17:52:56 +00:00
github-actions[bot]
15811c8c03 style: auto-format with ruff 2026-04-09 17:20:24 +00:00
Kevin Turcios
e5a18feb61 test: fix requires_java to check for runtime JAR, not just binaries
Ubuntu runners have Java/Maven pre-installed, so checking for java/mvn
binaries doesn't skip. The actual dependency is the codeflash-runtime
JAR which must be built from codeflash-java-runtime/ via Maven.
2026-04-09 12:19:16 -05:00
github-actions[bot]
241fd2d59c style: auto-format with ruff 2026-04-09 17:08:10 +00:00
Kevin Turcios
be446cd8de test: skip Java comparator tests when Maven is unavailable
The requires_java marker only checked for java binary but the tests
also need mvn to build the codeflash-runtime JAR. These 13 tests
were silently failing in unit-tests (masked by continue-on-error).
2026-04-09 12:06:26 -05:00
github-actions[bot]
9a86e09460 style: auto-format with ruff 2026-04-09 17:03:22 +00:00
Kevin Turcios
619ef3de34 ci: trigger test run (will revert) 2026-04-09 12:02:44 -05:00
Kevin Turcios
d97f372f43 ci: narrow paths, extract validate-pr, remove continue-on-error
- Remove codeflash-java-runtime/ from unit_tests change detection
- Narrow e2e flag from codeflash/ to explicit Python subdirs (excludes java/, javascript/)
- Narrow tests/ in e2e_java/e2e_js to specific test scripts
- Extract duplicated Validate PR step into composite action
- Use fetch-depth: 1 for unit-tests and type-check (no git history needed)
- Remove continue-on-error: true from unit-tests (was masking real failures)
- Change git add -A to git add -u in prek auto-fix (won't stage untracked files)
2026-04-09 12:00:17 -05:00
Kevin Turcios
82249efb4f ci: remove pyproject.toml/uv.lock from Java/JS E2E triggers 2026-04-09 11:52:33 -05:00