codeflash

mirror of https://github.com/codeflash-ai/codeflash.git synced 2026-05-04 18:25:17 +00:00

Author	SHA1	Message	Date
Sarthak Agarwal	3b8a2e5c82	update Docs for Plugin	2026-04-15 00:37:17 +05:30
Kevin Turcios	4d4cb5f517	Merge pull request #2059 from codeflash-ai/refactor/benchmarks-to-dotcodeflash Move benchmarks to .codeflash/benchmarks/	2026-04-13 05:06:00 -05:00
Kevin Turcios	819a56c33e	Merge pull request #2058 from codeflash-ai/perf/reduce-java-tracer-e2e perf: optimize Java tracing agent (E2E reduction + serialization + writes)	2026-04-10 18:43:58 -05:00
Kevin Turcios	b737f71e46	fix: update test assertions to match simplified Workload fixture The Workload.java fixture was trimmed to only repeatString but test files still asserted computeSum, filterEvens, and instanceMethod.	2026-04-10 16:05:27 -05:00
Kevin Turcios	0cb67c1a17	fix: add --no-pr to codeflash optimize workflow to prevent CI-opened PRs	2026-04-10 15:12:48 -05:00
Kevin Turcios	5c778dfad4	perf: trim tracer E2E workload to single function (repeatString) Keep only repeatString which reliably produces 284% improvement. Drop computeSum (marginal 16%), filterEvens and instanceMethod (no optimization found). Reduces tracer E2E from ~1h27m to ~21m.	2026-04-10 15:08:03 -05:00
Kevin Turcios	40f16b565a	ci: add standalone Java E2E workflow for isolated testing	2026-04-10 13:09:36 -05:00
Kevin Turcios	cb87763a2d	fix: skip environment approval gate for trusted users on workflow_dispatch	2026-04-10 12:58:54 -05:00
Kevin Turcios	013c83f5e4	fix: drop jdk.ExecutionSample#period from combined JFR opts (unsupported on Java 11)	2026-04-10 09:11:02 -05:00
Kevin Turcios	0d928f2b49	perf: merge Java tracer into single-pass JVM invocation Combine JFR profiling and argument capture agent into one JAVA_TOOL_OPTIONS string, running the target program once instead of twice. JFR and javaagent are orthogonal JVM features that coexist without conflict. Keeps build_jfr_env/build_agent_env for standalone use.	2026-04-10 09:05:30 -05:00
Kevin Turcios	ecf4e63eca	perf: reduce Java E2E looping time to 5s and cache runtime JAR build Make TOTAL_LOOPING_TIME configurable via CODEFLASH_LOOPING_TIME env var (defaults to 10s). Set to 5s in Java E2E CI jobs to cut verification time per candidate. Also cache the codeflash-runtime JAR keyed on source hash to skip mvn install when unchanged.	2026-04-10 09:02:45 -05:00
Kevin Turcios	8959ead2f9	fix: resolve Windows 8.3 short paths in get_run_tmp_file and fix ruff lint errors Add .resolve() to TemporaryDirectory path to expand Windows 8.3 short paths (e.g. RUNNER~1) to canonical long form, fixing test_pickle_patcher failures on Windows CI. Also add missing return type annotations and noqa suppressions for benchmark test file.	2026-04-10 08:51:10 -05:00
Kevin Turcios	ec14860d29	Move benchmarks to .codeflash/benchmarks/ and auto-discover Move codeflash's own benchmarks to .codeflash/benchmarks/. Add auto-discovery of .codeflash/benchmarks/ in codeflash compare and benchmark mode -- when benchmarks-root is not explicitly configured, the CLI checks for .codeflash/benchmarks/ before erroring. Backwards compatible: users with existing benchmarks-root config are unaffected. Docs continue to show tests/benchmarks as the example path.	2026-04-10 08:39:15 -05:00
Kevin Turcios	151df774a4	perf: use --effort low for java-tracer E2E to reduce CI time	2026-04-10 08:29:46 -05:00
Kevin Turcios	b05561ef9e	chore: replace console.print with logger.info for Java project detection	2026-04-10 07:51:08 -05:00
Kevin Turcios	70260f22b3	fix: ensure language_version is detected before optimization API calls JavaSupport.ensure_runtime_environment() was never called during the optimization flow, so _language_version stayed None and the backend received language_version=null. The LLM had no Java version constraint, causing it to generate Java 16+ APIs (e.g. Stream.toList()) for Java 11 projects.	2026-04-10 07:39:49 -05:00
Kevin Turcios	82ec301fad	chore: remove diagnostic logging from compare_test_results	2026-04-10 06:49:43 -05:00
Kevin Turcios	986654b7e6	fix: pin PYTHONHASHSEED=0 in test env and enhance diff diagnostics Set PYTHONHASHSEED=0 in test subprocess environments so original and candidate runs use identical hash behavior, eliminating a source of non-deterministic return-value comparisons. Also upgrade diff logging from debug to info level with actual types and repr values for DID_PASS, RETURN_VALUE, and STDOUT diffs.	2026-04-10 06:38:08 -05:00
Kevin Turcios	e191f74aa6	chore: add diagnostic logging to compare_test_results Temporary instrumentation to debug flaky futurehouse E2E test. Logs matched/skipped/timed-out counts and did_all_timeout state.	2026-04-10 06:16:39 -05:00
Kevin Turcios	fefccd5935	fix: drop JFR inline event config that breaks JDK 11 The jdk.ExecutionSample#period=1ms syntax in -XX:StartFlightRecording is only supported on JDK 13+. On JDK 11 (CI), it causes "Failure when starting JFR on_create_vm_2" and no JFR file is created. The settings=profile preset still provides 10ms CPU sampling.	2026-04-10 05:28:34 -05:00
Kevin Turcios	bfe6f3a828	Remove debug timing instrumentation from tracer Strip AtomicLong accumulators, System.nanoTime() timing, and getTimingSummary() that were added for profiling. No functional change.	2026-04-10 05:16:49 -05:00
Kevin Turcios	01e22152c7	flexing	2026-04-10 05:07:53 -05:00
Kevin Turcios	e81f25f825	fix: remove stale repeatString assertions from integration tests repeatString was removed from Workload.java in the E2E reduction.	2026-04-10 05:05:17 -05:00
Kevin Turcios	0772398c59	perf: optimize Java tracing agent serialization and writes - Reuse ThreadLocal Kryo Output buffers (eliminates #1 allocation hotspot) - Fast-path inline serialization for safe arg types (bypasses executor) - Skip verification roundtrip for known-safe containers (ArrayList, HashMap, etc.) - Batch SQLite inserts (256/txn) with permanent autocommit-off - Switch to ArrayBlockingQueue (no per-element Node allocation) - Add opt-in in-memory SQLite mode (VACUUM INTO at shutdown), enabled in CI - Add timing instrumentation (onEntry, serialization, writes, dump) - Add ProfilingWorkload fixture for benchmarking Benchmark (50k captures): onEntry 5200ms→1200ms (4.3x), avg/capture 0.43ms→0.02ms (21x), writes 3200ms→900ms (3.5x) with in-memory mode.	2026-04-10 04:55:36 -05:00
Kevin Turcios	08aa94c54a	perf: reduce java-tracer E2E to single function for ~11 min target Drop repeatString from the Workload fixture (2→1 function). computeSum alone exercises the full tracer→optimizer pipeline (trace → replay tests → optimize → evaluate → rank → explain → review). The second function added no additional pipeline coverage.	2026-04-10 03:44:54 -05:00
Kevin Turcios	46957e190f	fix: update java tracer unit tests for reduced Workload fixture Remove assertions for filterEvens and instanceMethod which were removed from the Workload fixture. Adjust expected invocation counts accordingly.	2026-04-10 03:17:46 -05:00
Kevin Turcios	21f61ec93d	ci: add java_tracer_e2e fixture path to e2e_java change detection The fixture directory wasn't in the path filter, so changes to Workload.java didn't trigger the java E2E tests.	2026-04-10 03:08:03 -05:00
Kevin Turcios	2b0f633c0f	perf: reduce java-tracer E2E from ~75 min to ~15 min Remove filterEvens and instanceMethod from the Workload fixture (4→2 functions) and reduce main() loop from 1000→100 rounds. The E2E test only needs to verify the tracer→optimizer pipeline works end-to-end; it doesn't need 4 functions or 1604 replay tests to prove that. Expected impact: ~2 functions × ~8 candidates × fewer replay tests should bring the job from ~75 min down to ~10-15 min.	2026-04-10 03:04:29 -05:00
Kevin Turcios	5ee642e35e	Merge pull request #2057 from codeflash-ai/fix/api-read-timeout fix: increase API read timeout to prevent flaky E2E failures	2026-04-10 02:45:31 -05:00
Kevin Turcios	4ac573f10f	fix: increase API read timeout from 90s to 300s to prevent flaky E2E failures The flat 90s timeout was too aggressive for LLM-powered endpoints (/testgen, /optimize, /refinement) under load, causing ReadTimeoutError and failing the async-optimization E2E test. Split into (10s connect, 300s read) tuple so connections fail fast but LLM inference gets adequate time.	2026-04-10 02:33:16 -05:00
Kevin Turcios	72a41a5665	Merge pull request #2055 from codeflash-ai/perf/defer-cli-imports perf: defer cli.py imports for 7.7x faster --help	2026-04-10 01:59:57 -05:00
Kevin Turcios	93810f8be6	Merge pull request #2056 from codeflash-ai/chore/delete-disabled-workflows chore: delete disabled codeflash.yaml workflow	2026-04-10 01:52:47 -05:00
Kevin Turcios	79d47e0fae	chore: delete disabled codeflash.yaml workflow JS ESM integration test — disabled and superseded by ci.yaml's e2e-js matrix.	2026-04-10 01:51:52 -05:00
Kevin Turcios	381d1319ea	fix: specify utf-8 encoding in benchmark read_text for Windows CI Windows defaults to cp1252 which can't decode some source file bytes.	2026-04-10 01:48:31 -05:00
Kevin Turcios	fe39d40e1b	perf: add type identity fast-paths for str/list/tuple/dict in comparator Move the 4 most common return-value types (str, list/tuple, dict) to `orig_type is T` identity checks at the top of the dispatch chain, before the frozenset lookup. A single pointer comparison is cheaper than a frozenset hash, and these types need special handling anyway (temp-path normalization, recursive comparison, superset support). Before: dict traversed ~8 isinstance checks before being handled. After: dict is handled at check #3 via `orig_type is dict`. The isinstance fallbacks remain as slow-paths for subclasses (deque, ChainMap, defaultdict, scipy dok_matrix, etc.). Backported from codeflash-python dispatch ordering.	2026-04-10 01:25:05 -05:00
Kevin Turcios	5a5b6e46ac	bench: add dedicated comparator microbenchmark for frozenset fast-path 5 scenarios: primitives, nested dicts, DB rows, deep nesting, and identity types (frozenset/range/complex/Decimal/OrderedDict).	2026-04-10 01:05:02 -05:00
Kevin Turcios	4c3c6ea167	perf: add frozenset fast-path for comparator type dispatch Use O(1) frozenset membership test with type identity before falling through to isinstance MRO traversal. Backported from codeflash-python.	2026-04-10 00:53:55 -05:00
Kevin Turcios	accbab4a16	fix: update test_cmd_auth patches for deferred imports Imports in cmd_auth.py were moved into function bodies, so mock patches must target the source modules instead of cmd_auth's namespace.	2026-04-10 00:36:02 -05:00
Kevin Turcios	2e2e19f7ae	bench: add libcst visitor benchmarks for multi-file and full pipeline - test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files - test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file	2026-04-10 00:21:45 -05:00
Kevin Turcios	1a25f05e14	fix: remove unnecessary Optimizer from benchmark test The test only needs project_root, not a full Optimizer (which requires an API key). Also adds missing __init__.py to tests/benchmarks/.	2026-04-10 00:10:36 -05:00
Kevin Turcios	2208e8ca77	bench: add CLI startup benchmark for codeflash compare --script Measures median wall-clock time for --version, --help, auth status, and compare --help across 30 runs with 3 warmups. Usage: codeflash compare main codeflash/optimize \ --script "python benchmarks/bench_cli_startup.py" \ --script-output benchmarks/results.json	2026-04-09 23:59:26 -05:00
Kevin Turcios	b533f50bdc	perf: backport libcst visitor dispatch cache from codeflash-python Cache the visitor dispatch tables that libcst rebuilds on every MatcherDecoratableTransformer/Visitor instantiation. The tables depend only on the class, not the instance, so caching by type is safe. Saves ~27ms per visitor instantiation (24x faster). Also fix pre-existing ruff F821 in cli.py (missing exit_with_message import in process_pyproject_config).	2026-04-09 23:46:45 -05:00
github-actions[bot]	61053be9ce	style: auto-format with ruff	2026-04-10 04:39:45 +00:00
Kevin Turcios	436d642847	perf: defer libcst, Rich, comparator imports in models.py Move libcst, rich.tree.Tree, console, comparator, code_utils, registry, lsp.helpers, and LspMarkdownMessage from module-level to the methods that use them. Only pydantic and TestType remain at module level (needed for class definitions). models.py import: 633ms → 125ms on Azure Standard_D4s_v5.	2026-04-09 23:38:40 -05:00
github-actions[bot]	88babfef25	style: auto-format with ruff	2026-04-10 04:30:36 +00:00
Kevin Turcios	2fc528ebda	perf: defer heavy imports in env_utils and shell_utils Defer console, formatter, code_utils, registry, and lsp.helpers imports from module level into the functions that use them. Inline is_LSP_enabled (a one-liner env var check) to avoid importing lsp.helpers on the happy path of get_codeflash_api_key. auth status: 237ms → 160ms on Azure Standard_D4s_v5.	2026-04-09 23:29:31 -05:00
Kevin Turcios	992e91abc7	fix: prevent ruff auto-format from rewriting version.py placeholders uv-dynamic-versioning rewrites version.py on every `uv run`, so the ruff auto-format job was inadvertently committing dev version strings. Restore version.py files after formatting and revert the ones already changed on this branch.	2026-04-09 23:21:25 -05:00
github-actions[bot]	1e8e5d2cc2	style: auto-format with ruff	2026-04-10 04:14:58 +00:00
Kevin Turcios	a8c004164e	perf: skip telemetry/banner for auth and compare commands Restructure main() command dispatch so auth and compare exit early without loading telemetry (sentry, posthog), version_check, or the banner. Defer cmd_auth.py imports into functions. auth status: ~1000ms → 237ms (4.2x) compare --help: ~297ms → 38ms (7.9x)	2026-04-09 23:14:03 -05:00
github-actions[bot]	05a7641405	style: auto-format with ruff	2026-04-10 04:09:00 +00:00

1 2 3 4 5 ...

7890 commits