codeflash

mirror of https://github.com/codeflash-ai/codeflash.git synced 2026-05-04 18:25:17 +00:00

Author	SHA1	Message	Date
Kevin Turcios	972d88c108	chore: require PRs to link an issue or discussion - Add PR template with required linked issue/discussion section - Add check-linked-issue CI job that validates PR body contains a reference (#123, Closes/Fixes/Relates, GitHub URL, or CF-# ticket) - Wire into required-checks-passed gate so it blocks merge - Update CONTRIBUTING.md with the policy and motivation	2026-04-23 02:27:49 -05:00
mashraf-222	67cf123929	Merge pull request #2064 from codeflash-ai/fix/tracer-subprocess-exit-codes fix: check subprocess exit codes in Java tracer	2026-04-21 15:35:46 +02:00
Kevin Turcios	1d26014d61	Merge pull request #2077 from mvanhorn/cf-522-contributing-guide docs: add CONTRIBUTING.md	2026-04-21 03:30:52 -05:00
Matt Van Horn	1112646e4e	docs: add CONTRIBUTING.md Closes #522 Covers the two audiences the issue calls out: 1. People contributing changes back to Codeflash - development environment setup with uv, the single-command verification via uv run prek, test runner invocation, code-style pointers to .claude/rules/code-style.md, and the branch / commit / PR conventions from .claude/rules/git.md and CLAUDE.md. 2. People using Codeflash in editable mode from a source checkout to optimize their own projects, including the install commands for uv and pip, when editable mode is appropriate vs installing the PyPI package, and a pointer to the README Quick Start for the codeflash init flow.	2026-04-21 01:29:03 -07:00
mashraf-222	ef535b8834	Merge pull request #2065 from codeflash-ai/fix/gradle-configure-on-demand fix: add --configure-on-demand to all Gradle commands	2026-04-21 03:44:10 +02:00
Mohamed Ashraf	a4473c3684	merge: resolve conflict with main — adapt exit-code handling to combined invocation Keep the combined JFR + tracing agent single JVM invocation from main while preserving the fix's intent: raise when trace-db was not created, warn when exit code is non-zero but trace-db exists. Integration tests rewritten to match the combined-invocation semantics.	2026-04-21 01:40:26 +00:00
Sarthak Agarwal	d8b62367ce	Merge pull request #2067 from codeflash-ai/update_docs update Docs for Plugin	2026-04-15 00:38:31 +05:30
Sarthak Agarwal	3b8a2e5c82	update Docs for Plugin	2026-04-15 00:37:17 +05:30
Kevin Turcios	4d4cb5f517	Merge pull request #2059 from codeflash-ai/refactor/benchmarks-to-dotcodeflash Move benchmarks to .codeflash/benchmarks/	2026-04-13 05:06:00 -05:00
Kevin Turcios	819a56c33e	Merge pull request #2058 from codeflash-ai/perf/reduce-java-tracer-e2e perf: optimize Java tracing agent (E2E reduction + serialization + writes)	2026-04-10 18:43:58 -05:00
Mohamed Ashraf	a7371b55ca	fix: add --configure-on-demand to all Gradle commands Gradle evaluates all project configurations during the configuration phase, even when only one module is targeted. Multi-module projects with diverse toolchain requirements (e.g., OpenRewrite's rewrite-gradle needs JDK 8) fail when an unrelated module's toolchain isn't available. Adds --configure-on-demand to all 8 Gradle command construction sites so Gradle only configures projects needed for the requested task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 21:46:42 +00:00
Mohamed Ashraf	470482e824	fix: check subprocess exit codes in Java tracer _run_java_with_graceful_timeout() discarded the subprocess exit code in both the no-timeout and timeout paths. If Maven/Gradle failed (compilation error, OOM, etc.), the tracer silently continued with missing/stale data. Now returns the exit code. Stage 1 (JFR profiling) warns on failure but continues. Stage 2 (argument capture) raises RuntimeError since trace data is essential for replay test generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 21:46:11 +00:00
Kevin Turcios	b737f71e46	fix: update test assertions to match simplified Workload fixture The Workload.java fixture was trimmed to only repeatString but test files still asserted computeSum, filterEvens, and instanceMethod.	2026-04-10 16:05:27 -05:00
Kevin Turcios	0cb67c1a17	fix: add --no-pr to codeflash optimize workflow to prevent CI-opened PRs	2026-04-10 15:12:48 -05:00
Kevin Turcios	5c778dfad4	perf: trim tracer E2E workload to single function (repeatString) Keep only repeatString which reliably produces 284% improvement. Drop computeSum (marginal 16%), filterEvens and instanceMethod (no optimization found). Reduces tracer E2E from ~1h27m to ~21m.	2026-04-10 15:08:03 -05:00
Kevin Turcios	40f16b565a	ci: add standalone Java E2E workflow for isolated testing	2026-04-10 13:09:36 -05:00
Kevin Turcios	cb87763a2d	fix: skip environment approval gate for trusted users on workflow_dispatch	2026-04-10 12:58:54 -05:00
Kevin Turcios	013c83f5e4	fix: drop jdk.ExecutionSample#period from combined JFR opts (unsupported on Java 11)	2026-04-10 09:11:02 -05:00
Kevin Turcios	0d928f2b49	perf: merge Java tracer into single-pass JVM invocation Combine JFR profiling and argument capture agent into one JAVA_TOOL_OPTIONS string, running the target program once instead of twice. JFR and javaagent are orthogonal JVM features that coexist without conflict. Keeps build_jfr_env/build_agent_env for standalone use.	2026-04-10 09:05:30 -05:00
Kevin Turcios	ecf4e63eca	perf: reduce Java E2E looping time to 5s and cache runtime JAR build Make TOTAL_LOOPING_TIME configurable via CODEFLASH_LOOPING_TIME env var (defaults to 10s). Set to 5s in Java E2E CI jobs to cut verification time per candidate. Also cache the codeflash-runtime JAR keyed on source hash to skip mvn install when unchanged.	2026-04-10 09:02:45 -05:00
Kevin Turcios	8959ead2f9	fix: resolve Windows 8.3 short paths in get_run_tmp_file and fix ruff lint errors Add .resolve() to TemporaryDirectory path to expand Windows 8.3 short paths (e.g. RUNNER~1) to canonical long form, fixing test_pickle_patcher failures on Windows CI. Also add missing return type annotations and noqa suppressions for benchmark test file.	2026-04-10 08:51:10 -05:00
Kevin Turcios	ec14860d29	Move benchmarks to .codeflash/benchmarks/ and auto-discover Move codeflash's own benchmarks to .codeflash/benchmarks/. Add auto-discovery of .codeflash/benchmarks/ in codeflash compare and benchmark mode -- when benchmarks-root is not explicitly configured, the CLI checks for .codeflash/benchmarks/ before erroring. Backwards compatible: users with existing benchmarks-root config are unaffected. Docs continue to show tests/benchmarks as the example path.	2026-04-10 08:39:15 -05:00
Kevin Turcios	151df774a4	perf: use --effort low for java-tracer E2E to reduce CI time	2026-04-10 08:29:46 -05:00
Kevin Turcios	b05561ef9e	chore: replace console.print with logger.info for Java project detection	2026-04-10 07:51:08 -05:00
Kevin Turcios	70260f22b3	fix: ensure language_version is detected before optimization API calls JavaSupport.ensure_runtime_environment() was never called during the optimization flow, so _language_version stayed None and the backend received language_version=null. The LLM had no Java version constraint, causing it to generate Java 16+ APIs (e.g. Stream.toList()) for Java 11 projects.	2026-04-10 07:39:49 -05:00
Kevin Turcios	82ec301fad	chore: remove diagnostic logging from compare_test_results	2026-04-10 06:49:43 -05:00
Kevin Turcios	986654b7e6	fix: pin PYTHONHASHSEED=0 in test env and enhance diff diagnostics Set PYTHONHASHSEED=0 in test subprocess environments so original and candidate runs use identical hash behavior, eliminating a source of non-deterministic return-value comparisons. Also upgrade diff logging from debug to info level with actual types and repr values for DID_PASS, RETURN_VALUE, and STDOUT diffs.	2026-04-10 06:38:08 -05:00
Kevin Turcios	e191f74aa6	chore: add diagnostic logging to compare_test_results Temporary instrumentation to debug flaky futurehouse E2E test. Logs matched/skipped/timed-out counts and did_all_timeout state.	2026-04-10 06:16:39 -05:00
Kevin Turcios	fefccd5935	fix: drop JFR inline event config that breaks JDK 11 The jdk.ExecutionSample#period=1ms syntax in -XX:StartFlightRecording is only supported on JDK 13+. On JDK 11 (CI), it causes "Failure when starting JFR on_create_vm_2" and no JFR file is created. The settings=profile preset still provides 10ms CPU sampling.	2026-04-10 05:28:34 -05:00
Kevin Turcios	bfe6f3a828	Remove debug timing instrumentation from tracer Strip AtomicLong accumulators, System.nanoTime() timing, and getTimingSummary() that were added for profiling. No functional change.	2026-04-10 05:16:49 -05:00
Kevin Turcios	01e22152c7	flexing	2026-04-10 05:07:53 -05:00
Kevin Turcios	e81f25f825	fix: remove stale repeatString assertions from integration tests repeatString was removed from Workload.java in the E2E reduction.	2026-04-10 05:05:17 -05:00
Kevin Turcios	0772398c59	perf: optimize Java tracing agent serialization and writes - Reuse ThreadLocal Kryo Output buffers (eliminates #1 allocation hotspot) - Fast-path inline serialization for safe arg types (bypasses executor) - Skip verification roundtrip for known-safe containers (ArrayList, HashMap, etc.) - Batch SQLite inserts (256/txn) with permanent autocommit-off - Switch to ArrayBlockingQueue (no per-element Node allocation) - Add opt-in in-memory SQLite mode (VACUUM INTO at shutdown), enabled in CI - Add timing instrumentation (onEntry, serialization, writes, dump) - Add ProfilingWorkload fixture for benchmarking Benchmark (50k captures): onEntry 5200ms→1200ms (4.3x), avg/capture 0.43ms→0.02ms (21x), writes 3200ms→900ms (3.5x) with in-memory mode.	2026-04-10 04:55:36 -05:00
Kevin Turcios	08aa94c54a	perf: reduce java-tracer E2E to single function for ~11 min target Drop repeatString from the Workload fixture (2→1 function). computeSum alone exercises the full tracer→optimizer pipeline (trace → replay tests → optimize → evaluate → rank → explain → review). The second function added no additional pipeline coverage.	2026-04-10 03:44:54 -05:00
Kevin Turcios	46957e190f	fix: update java tracer unit tests for reduced Workload fixture Remove assertions for filterEvens and instanceMethod which were removed from the Workload fixture. Adjust expected invocation counts accordingly.	2026-04-10 03:17:46 -05:00
Kevin Turcios	21f61ec93d	ci: add java_tracer_e2e fixture path to e2e_java change detection The fixture directory wasn't in the path filter, so changes to Workload.java didn't trigger the java E2E tests.	2026-04-10 03:08:03 -05:00
Kevin Turcios	2b0f633c0f	perf: reduce java-tracer E2E from ~75 min to ~15 min Remove filterEvens and instanceMethod from the Workload fixture (4→2 functions) and reduce main() loop from 1000→100 rounds. The E2E test only needs to verify the tracer→optimizer pipeline works end-to-end; it doesn't need 4 functions or 1604 replay tests to prove that. Expected impact: ~2 functions × ~8 candidates × fewer replay tests should bring the job from ~75 min down to ~10-15 min.	2026-04-10 03:04:29 -05:00
Kevin Turcios	5ee642e35e	Merge pull request #2057 from codeflash-ai/fix/api-read-timeout fix: increase API read timeout to prevent flaky E2E failures	2026-04-10 02:45:31 -05:00
Kevin Turcios	4ac573f10f	fix: increase API read timeout from 90s to 300s to prevent flaky E2E failures The flat 90s timeout was too aggressive for LLM-powered endpoints (/testgen, /optimize, /refinement) under load, causing ReadTimeoutError and failing the async-optimization E2E test. Split into (10s connect, 300s read) tuple so connections fail fast but LLM inference gets adequate time.	2026-04-10 02:33:16 -05:00
Kevin Turcios	72a41a5665	Merge pull request #2055 from codeflash-ai/perf/defer-cli-imports perf: defer cli.py imports for 7.7x faster --help	2026-04-10 01:59:57 -05:00
Kevin Turcios	93810f8be6	Merge pull request #2056 from codeflash-ai/chore/delete-disabled-workflows chore: delete disabled codeflash.yaml workflow	2026-04-10 01:52:47 -05:00
Kevin Turcios	79d47e0fae	chore: delete disabled codeflash.yaml workflow JS ESM integration test — disabled and superseded by ci.yaml's e2e-js matrix.	2026-04-10 01:51:52 -05:00
Kevin Turcios	381d1319ea	fix: specify utf-8 encoding in benchmark read_text for Windows CI Windows defaults to cp1252 which can't decode some source file bytes.	2026-04-10 01:48:31 -05:00
Kevin Turcios	fe39d40e1b	perf: add type identity fast-paths for str/list/tuple/dict in comparator Move the 4 most common return-value types (str, list/tuple, dict) to `orig_type is T` identity checks at the top of the dispatch chain, before the frozenset lookup. A single pointer comparison is cheaper than a frozenset hash, and these types need special handling anyway (temp-path normalization, recursive comparison, superset support). Before: dict traversed ~8 isinstance checks before being handled. After: dict is handled at check #3 via `orig_type is dict`. The isinstance fallbacks remain as slow-paths for subclasses (deque, ChainMap, defaultdict, scipy dok_matrix, etc.). Backported from codeflash-python dispatch ordering.	2026-04-10 01:25:05 -05:00
Kevin Turcios	5a5b6e46ac	bench: add dedicated comparator microbenchmark for frozenset fast-path 5 scenarios: primitives, nested dicts, DB rows, deep nesting, and identity types (frozenset/range/complex/Decimal/OrderedDict).	2026-04-10 01:05:02 -05:00
Kevin Turcios	4c3c6ea167	perf: add frozenset fast-path for comparator type dispatch Use O(1) frozenset membership test with type identity before falling through to isinstance MRO traversal. Backported from codeflash-python.	2026-04-10 00:53:55 -05:00
Kevin Turcios	accbab4a16	fix: update test_cmd_auth patches for deferred imports Imports in cmd_auth.py were moved into function bodies, so mock patches must target the source modules instead of cmd_auth's namespace.	2026-04-10 00:36:02 -05:00
Kevin Turcios	2e2e19f7ae	bench: add libcst visitor benchmarks for multi-file and full pipeline - test_benchmark_libcst_multi_file: discover_functions + get_code_optimization_context across 10 real source files - test_benchmark_libcst_pipeline: full discover → extract → replace → merge pipeline on one file	2026-04-10 00:21:45 -05:00
Kevin Turcios	1a25f05e14	fix: remove unnecessary Optimizer from benchmark test The test only needs project_root, not a full Optimizer (which requires an API key). Also adds missing __init__.py to tests/benchmarks/.	2026-04-10 00:10:36 -05:00
Kevin Turcios	2208e8ca77	bench: add CLI startup benchmark for codeflash compare --script Measures median wall-clock time for --version, --help, auth status, and compare --help across 30 runs with 3 warmups. Usage: codeflash compare main codeflash/optimize \ --script "python benchmarks/bench_cli_startup.py" \ --script-output benchmarks/results.json	2026-04-09 23:59:26 -05:00

1 2 3 4 5 ...

7899 commits