codeflash-internal

Author	SHA1	Message	Date
Kevin Turcios	1532a66278	feat: include coverage info in test review and improve review prompt Accept coverage_summary in the review schema and pass it to the prompt. Add two new review criteria: low coverage detection and constructor/ dependency error patterns. Coverage percentage is shown in the user prompt so the reviewer can flag tests that don't exercise the function.	2026-03-04 14:14:19 -05:00
claude[bot]	f31b428a72	style: auto-fix linting issues	2026-03-04 09:15:27 +00:00
Kevin Turcios	ff35883ce6	Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair	2026-03-04 04:13:24 -05:00
Kevin Turcios	644ded986f	Merge remote-tracking branch 'origin/main' into testgen-review-repair	2026-03-04 04:10:56 -05:00
Kevin Turcios	c2a67e8137	feat: pass test failure messages to review endpoint for better context Include runtime error messages from behavioral test failures in the review request. Failed function verdicts now include the specific error message. The review prompt shows error details so the AI can see patterns like type validation failures.	2026-03-04 04:09:27 -05:00
Kevin Turcios	fce866c96f	fix: splice only flagged functions from LLM repair into original test source Instead of replacing the entire test file with the LLM's output, parse both the original and repaired sources as CST, extract only the flagged function nodes from the repair output, and surgically replace them in the original. Unflagged functions are preserved exactly as-is.	2026-03-04 03:26:03 -05:00
Kevin Turcios	33be205d88	feat: run postprocessing pipeline on repaired tests before instrumentation Repaired tests from the LLM now go through the same postprocessing pipeline as initial generation (import fixing, loop limiting, unused definition removal) before instrumentation. Returns the display version (with asserts) as generated_tests for client-side display.	2026-03-04 03:20:09 -05:00
claude[bot]	8fe3171934	fix: resolve mypy type errors in generate.py and postprocess_pipeline.py	2026-03-04 08:19:57 +00:00
Kevin Turcios	2899eae4da	feat: return display-ready test source with asserts in testgen response Split postprocessing_testgen_pipeline to capture the test source before assert removal — fully cleaned (imports, loops, definitions) but with original asserts intact. Return it as raw_generated_tests in the TestGenResponseSchema so the CLI can display the human-readable version.	2026-03-04 03:16:30 -05:00
Kevin Turcios	96284e4805	Merge pull request #2467 from codeflash-ai/fix-js-async-testgen-flaky-tests fix: reduce flaky generated tests for JS async functions	2026-03-04 06:37:43 +00:00
Kevin Turcios	40f3236645	refactor: simplify template selection with string composition	2026-03-04 01:13:06 -05:00
Kevin Turcios	c2f9b17969	Merge remote-tracking branch 'origin/main' into fix-js-async-testgen-flaky-tests	2026-03-04 01:09:00 -05:00
Aseem Saxena	56ac044a86	Merge pull request #2364 from codeflash-ai/match-testdiff-schema bug: mismatch in cli and internal schema for code repair	2026-03-04 05:16:49 +05:30
claude[bot]	38ca8824d6	fix: resolve mypy type errors in code_repair_context	2026-03-03 23:29:13 +00:00
Aseem Saxena	16253b3d63	Merge branch 'main' into match-testdiff-schema	2026-03-04 04:56:29 +05:30
Sarthak Agarwal	cc32654b7f	mocha prompts in backend (#2468 ) Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-03-04 04:09:10 +05:30
HeshamHM28	44fc7dc8e8	feat: Add support for specifying target Java version in test generation (#2445 )	2026-03-03 22:03:29 +00:00
Aseem Saxena	29e91e1c3d	Merge branch 'main' into match-testdiff-schema	2026-03-03 07:28:08 +05:30
Aseem Saxena	94fc60bb13	Merge branch 'main' into fix-js-async-testgen-flaky-tests	2026-03-03 07:27:24 +05:30
Saurabh Misra	e8f1589107	Merge pull request #2429 from codeflash-ai/cf-aws-bedrock-claude-workflows feat: switch Claude workflows from Foundry to AWS Bedrock	2026-03-02 17:48:04 -08:00
aseembits93	76a81b4381	chore: switch CI Claude model to Sonnet 4.6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 06:47:20 +05:30
aseembits93	26e4936659	keep the non foundry env vars	2026-03-03 06:03:06 +05:30
Aseem Saxena	9e5e61e53d	Apply suggestion	2026-03-02 16:27:35 -08:00
Aseem Saxena	cc76543732	Merge branch 'main' into cf-aws-bedrock-claude-workflows	2026-03-03 05:56:37 +05:30
aseembits93	4e60026fcc	test	2026-03-03 05:53:34 +05:30
Kevin Turcios	04624cc389	relax	2026-03-02 19:22:55 -05:00
Kevin Turcios	5d2ad27d3f	refactor: extract shared create_prompt_env Jinja2 factory Deduplicate the identical Environment(FileSystemLoader, StrictUndefined, keep_trailing_newline=True) setup across JS testgen, Python testgen, and Python explanations into core/shared/jinja_utils.py. Also fix tests/testgen/test_testgen_javascript.py which had a stale copy of build_javascript_prompt and loaded the now-deleted .md files.	2026-03-02 18:42:57 -05:00
Kevin Turcios	7820fb15e1	refactor: move ESM/CJS import formatting from Python to Jinja2 macro Split _generate_import_statement into _resolve_import (pure logic: identifier validation, dot splitting, reserved words) and a js_import Jinja2 macro (pure formatting: ESM vs CJS syntax). The macro lives in _macros.md.j2 and is imported by user.md.j2.	2026-03-02 18:28:30 -05:00
Kevin Turcios	d00fa99cc5	feat: convert JS/TS testgen prompts to Jinja2 templates with model_type and ESM support Replace plain .md prompts rendered with str.format() with Jinja2 templates using {% extends %}, {% block %}, and {% if %} branching: - model_type branching: XML tags for Anthropic, markdown headers for OpenAI - module_system support: ESM imports (import { fn } from '...') vs CJS (require) - Template inheritance: base_system.md.j2 with sync/async overrides - Unified user.md.j2 with is_async and module_system conditionals - Add module_system field to TestGenSchema	2026-03-02 18:23:30 -05:00
Kevin Turcios	4cdcd57f04	fix: reduce flaky generated tests for JS async functions The async testgen prompt was steering the LLM toward generating timing-dependent and ordering-sensitive tests that produce non-deterministic results across runs. This caused ~50% E2E failure rate for the JS ESM async workflow. - Add determinism requirement: never assert on timing, elapsed duration, or relative ordering of async side effects - Remove directive to use Promise.all() for large-scale tests - Change large-scale objective from "concurrent operations" to "correctness with larger inputs" - Replace concurrent execution template example with a simple large-input correctness test	2026-03-02 17:47:20 -05:00
claude[bot]	962edcc595	fix: correct unpacking of validate_request_data return value Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>	2026-03-02 16:01:04 +00:00
claude[bot]	3f49aa1b43	fix: resolve mypy type errors in generate.py	2026-03-02 08:58:37 -05:00
Kevin Turcios	87ab144d40	feat: per-function test review + repair endpoints Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints. Review accepts per-test data with pre-flagged behavioral failures, AI reviews passing functions for unrealistic patterns, returns per-function verdicts. Repair takes flagged functions, LLM rewrites them, re-instruments, returns repaired test source. Python-only gate.	2026-03-02 08:54:44 -05:00
Kevin Turcios	9d6799a87f	Merge pull request #2464 from codeflash-ai/testgen-review refactor: reorganize python testgen directory structure	2026-03-02 13:52:43 +00:00
Kevin Turcios	efa29bf452	refactor: split instrument_new_tests.py into focused modules and extract model selection Split the 1,734-line instrument_new_tests.py into three modules by concern: - device_sync.py: GPU/device framework detection and sync AST generation - wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top - instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source Also extract select_model_for_test() from testgen_python() in generate.py to separate model selection logic from the HTTP handler.	2026-03-02 08:21:02 -05:00
Kevin Turcios	e26dd72d7d	refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output The call was redundant — the postprocessing pipeline already handles it as its final step. Move the test coverage to test_postprocessing_pipeline.py.	2026-03-02 07:58:54 -05:00
Kevin Turcios	0541126fc0	refactor: eliminate BaseTestGenContext class hierarchy Replace class hierarchy (BaseTestGenContext → Single/Multi) with standalone functions that branch on is_multi_context() internally. Delete context.py, move TestGenContextData to models.py, and distribute logic to validate.py, preprocess_pipeline.py, and generate.py.	2026-03-02 07:38:51 -05:00
Kevin Turcios	a1c0ac6ae4	refactor: leverage Jinja2 includes, extends, and composition in testgen prompts Use {% extends %} to deduplicate sync/async system templates via base_system.md.j2, {% include %} for conditional JIT content, and a compose_user.md.j2 wrapper to replace Python string assembly in build_prompt().	2026-03-02 07:26:38 -05:00
Kevin Turcios	f191c12438	refactor: reorganize python testgen directory structure Move prompts into prompts/ subdirectory with clearer names, rename testgen.py to generate.py, extract validate.py and demo_hacks.py, rename testgen_context.py to context.py, delete unused explain prompts.	2026-03-02 06:39:07 -05:00
Sarthak Agarwal	4b88fc0cc7	llm call optimization fail error log and small refactoring (#2447 )	2026-03-02 12:33:56 +05:30
Kevin Turcios	28da22be35	Merge pull request #2446 from codeflash-ai/allocs refactor: tighten optimizer and explanation prompts	2026-02-27 20:39:01 +00:00
claude[bot]	3309dcec2c	fix: resolve mypy type errors in explanations.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:10:19 +00:00
claude[bot]	49e11a585a	style: auto-fix formatting issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:08:28 +00:00
Kevin Turcios	ded2240818	Merge branch 'main' into allocs	2026-02-27 20:06:57 +00:00
Kevin Turcios	779fda2b36	slight changes	2026-02-27 15:02:08 -05:00
Kevin Turcios	1fedb8c443	refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints - Extract shared content into Jinja2 macros (`section`, `field`, `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping, eliminating full duplication of every section across both branches - Tighten system prompt to enforce concise 3-6 sentence output: trim bloated per-field context descriptions, add concrete positive example, explicitly forbid section headers and bullet groups, move output_format to be the last section so constraints are closest to generation - Add caveat that original_explanation is for factual reference only (in both system and user prompts) to prevent the model from mimicking its verbose multi-section format - Condense throughput/concurrency/acceptance sections to essentials - Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`	2026-02-27 14:38:24 -05:00
Kevin Turcios	396d7cc7e8	refactor: modernize explanation prompts with Jinja2 templates Extract inline prompts into .md.j2 templates, move schemas to models.py, and add model_type branching (XML for Anthropic, markdown for OpenAI) following the testgen pattern. Uses StrictUndefined, trim_blocks, and lstrip_blocks.	2026-02-27 13:45:21 -05:00
Kevin Turcios	09fafeb914	fix: remove scroll-triggered CSS transitions causing jank in observability timeline Reduce IntersectionObserver thresholds from 6 to 1, remove backdrop-blur from sticky header, drop opacity/color/maxHeight transitions that fired on every activeIndex change, and narrow progress bar to transition-[width].	2026-02-27 13:16:22 -05:00
Aseem Saxena	1a30f3eccb	Merge pull request #2440 from codeflash-ai/fix-middleware-llm-perf fix: eliminate redundant DB queries in middleware and unblock LLM responses	2026-02-27 23:15:14 +05:30
Kevin Turcios	879a22454f	Merge branch 'main' into fix-middleware-llm-perf	2026-02-27 15:08:05 +00:00

1 2 3 4 5 ...

6373 commits