Commit graph

6373 commits

Author SHA1 Message Date
Kevin Turcios
1532a66278 feat: include coverage info in test review and improve review prompt
Accept coverage_summary in the review schema and pass it to the prompt.
Add two new review criteria: low coverage detection and constructor/
dependency error patterns. Coverage percentage is shown in the user
prompt so the reviewer can flag tests that don't exercise the function.
2026-03-04 14:14:19 -05:00
claude[bot]
f31b428a72 style: auto-fix linting issues 2026-03-04 09:15:27 +00:00
Kevin Turcios
ff35883ce6 Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair 2026-03-04 04:13:24 -05:00
Kevin Turcios
644ded986f Merge remote-tracking branch 'origin/main' into testgen-review-repair 2026-03-04 04:10:56 -05:00
Kevin Turcios
c2a67e8137 feat: pass test failure messages to review endpoint for better context
Include runtime error messages from behavioral test failures in the
review request. Failed function verdicts now include the specific error
message. The review prompt shows error details so the AI can see
patterns like type validation failures.
2026-03-04 04:09:27 -05:00
Kevin Turcios
fce866c96f fix: splice only flagged functions from LLM repair into original test source
Instead of replacing the entire test file with the LLM's output, parse
both the original and repaired sources as CST, extract only the flagged
function nodes from the repair output, and surgically replace them in
the original. Unflagged functions are preserved exactly as-is.
2026-03-04 03:26:03 -05:00
Kevin Turcios
33be205d88 feat: run postprocessing pipeline on repaired tests before instrumentation
Repaired tests from the LLM now go through the same postprocessing
pipeline as initial generation (import fixing, loop limiting, unused
definition removal) before instrumentation. Returns the display version
(with asserts) as generated_tests for client-side display.
2026-03-04 03:20:09 -05:00
claude[bot]
8fe3171934 fix: resolve mypy type errors in generate.py and postprocess_pipeline.py 2026-03-04 08:19:57 +00:00
Kevin Turcios
2899eae4da feat: return display-ready test source with asserts in testgen response
Split postprocessing_testgen_pipeline to capture the test source before
assert removal — fully cleaned (imports, loops, definitions) but with
original asserts intact. Return it as raw_generated_tests in the
TestGenResponseSchema so the CLI can display the human-readable version.
2026-03-04 03:16:30 -05:00
Kevin Turcios
96284e4805
Merge pull request #2467 from codeflash-ai/fix-js-async-testgen-flaky-tests
fix: reduce flaky generated tests for JS async functions
2026-03-04 06:37:43 +00:00
Kevin Turcios
40f3236645 refactor: simplify template selection with string composition 2026-03-04 01:13:06 -05:00
Kevin Turcios
c2f9b17969 Merge remote-tracking branch 'origin/main' into fix-js-async-testgen-flaky-tests 2026-03-04 01:09:00 -05:00
Aseem Saxena
56ac044a86
Merge pull request #2364 from codeflash-ai/match-testdiff-schema
bug: mismatch in cli and internal schema for code repair
2026-03-04 05:16:49 +05:30
claude[bot]
38ca8824d6 fix: resolve mypy type errors in code_repair_context 2026-03-03 23:29:13 +00:00
Aseem Saxena
16253b3d63
Merge branch 'main' into match-testdiff-schema 2026-03-04 04:56:29 +05:30
Sarthak Agarwal
cc32654b7f
mocha prompts in backend (#2468)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-03-04 04:09:10 +05:30
HeshamHM28
44fc7dc8e8
feat: Add support for specifying target Java version in test generation (#2445) 2026-03-03 22:03:29 +00:00
Aseem Saxena
29e91e1c3d
Merge branch 'main' into match-testdiff-schema 2026-03-03 07:28:08 +05:30
Aseem Saxena
94fc60bb13
Merge branch 'main' into fix-js-async-testgen-flaky-tests 2026-03-03 07:27:24 +05:30
Saurabh Misra
e8f1589107
Merge pull request #2429 from codeflash-ai/cf-aws-bedrock-claude-workflows
feat: switch Claude workflows from Foundry to AWS Bedrock
2026-03-02 17:48:04 -08:00
aseembits93
76a81b4381 chore: switch CI Claude model to Sonnet 4.6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 06:47:20 +05:30
aseembits93
26e4936659 keep the non foundry env vars 2026-03-03 06:03:06 +05:30
Aseem Saxena
9e5e61e53d
Apply suggestion 2026-03-02 16:27:35 -08:00
Aseem Saxena
cc76543732
Merge branch 'main' into cf-aws-bedrock-claude-workflows 2026-03-03 05:56:37 +05:30
aseembits93
4e60026fcc test 2026-03-03 05:53:34 +05:30
Kevin Turcios
04624cc389 relax 2026-03-02 19:22:55 -05:00
Kevin Turcios
5d2ad27d3f refactor: extract shared create_prompt_env Jinja2 factory
Deduplicate the identical Environment(FileSystemLoader, StrictUndefined,
keep_trailing_newline=True) setup across JS testgen, Python testgen, and
Python explanations into core/shared/jinja_utils.py.

Also fix tests/testgen/test_testgen_javascript.py which had a stale
copy of build_javascript_prompt and loaded the now-deleted .md files.
2026-03-02 18:42:57 -05:00
Kevin Turcios
7820fb15e1 refactor: move ESM/CJS import formatting from Python to Jinja2 macro
Split _generate_import_statement into _resolve_import (pure logic:
identifier validation, dot splitting, reserved words) and a js_import
Jinja2 macro (pure formatting: ESM vs CJS syntax). The macro lives in
_macros.md.j2 and is imported by user.md.j2.
2026-03-02 18:28:30 -05:00
Kevin Turcios
d00fa99cc5 feat: convert JS/TS testgen prompts to Jinja2 templates with model_type and ESM support
Replace plain .md prompts rendered with str.format() with Jinja2
templates using {% extends %}, {% block %}, and {% if %} branching:

- model_type branching: XML tags for Anthropic, markdown headers for OpenAI
- module_system support: ESM imports (import { fn } from '...') vs CJS (require)
- Template inheritance: base_system.md.j2 with sync/async overrides
- Unified user.md.j2 with is_async and module_system conditionals
- Add module_system field to TestGenSchema
2026-03-02 18:23:30 -05:00
Kevin Turcios
4cdcd57f04 fix: reduce flaky generated tests for JS async functions
The async testgen prompt was steering the LLM toward generating
timing-dependent and ordering-sensitive tests that produce
non-deterministic results across runs. This caused ~50% E2E failure
rate for the JS ESM async workflow.

- Add determinism requirement: never assert on timing, elapsed
  duration, or relative ordering of async side effects
- Remove directive to use Promise.all() for large-scale tests
- Change large-scale objective from "concurrent operations" to
  "correctness with larger inputs"
- Replace concurrent execution template example with a simple
  large-input correctness test
2026-03-02 17:47:20 -05:00
claude[bot]
962edcc595 fix: correct unpacking of validate_request_data return value
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-03-02 16:01:04 +00:00
claude[bot]
3f49aa1b43 fix: resolve mypy type errors in generate.py 2026-03-02 08:58:37 -05:00
Kevin Turcios
87ab144d40 feat: per-function test review + repair endpoints
Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints.
Review accepts per-test data with pre-flagged behavioral failures, AI
reviews passing functions for unrealistic patterns, returns per-function
verdicts. Repair takes flagged functions, LLM rewrites them,
re-instruments, returns repaired test source. Python-only gate.
2026-03-02 08:54:44 -05:00
Kevin Turcios
9d6799a87f
Merge pull request #2464 from codeflash-ai/testgen-review
refactor: reorganize python testgen directory structure
2026-03-02 13:52:43 +00:00
Kevin Turcios
efa29bf452 refactor: split instrument_new_tests.py into focused modules and extract model selection
Split the 1,734-line instrument_new_tests.py into three modules by concern:
- device_sync.py: GPU/device framework detection and sync AST generation
- wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top
- instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source

Also extract select_model_for_test() from testgen_python() in generate.py to
separate model selection logic from the HTTP handler.
2026-03-02 08:21:02 -05:00
Kevin Turcios
e26dd72d7d refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output
The call was redundant — the postprocessing pipeline already handles it as
its final step. Move the test coverage to test_postprocessing_pipeline.py.
2026-03-02 07:58:54 -05:00
Kevin Turcios
0541126fc0 refactor: eliminate BaseTestGenContext class hierarchy
Replace class hierarchy (BaseTestGenContext → Single/Multi) with
standalone functions that branch on is_multi_context() internally.
Delete context.py, move TestGenContextData to models.py, and
distribute logic to validate.py, preprocess_pipeline.py, and
generate.py.
2026-03-02 07:38:51 -05:00
Kevin Turcios
a1c0ac6ae4 refactor: leverage Jinja2 includes, extends, and composition in testgen prompts
Use {% extends %} to deduplicate sync/async system templates via
base_system.md.j2, {% include %} for conditional JIT content, and a
compose_user.md.j2 wrapper to replace Python string assembly in
build_prompt().
2026-03-02 07:26:38 -05:00
Kevin Turcios
f191c12438 refactor: reorganize python testgen directory structure
Move prompts into prompts/ subdirectory with clearer names, rename
testgen.py to generate.py, extract validate.py and demo_hacks.py,
rename testgen_context.py to context.py, delete unused explain prompts.
2026-03-02 06:39:07 -05:00
Sarthak Agarwal
4b88fc0cc7
llm call optimization fail error log and small refactoring (#2447) 2026-03-02 12:33:56 +05:30
Kevin Turcios
28da22be35
Merge pull request #2446 from codeflash-ai/allocs
refactor: tighten optimizer and explanation prompts
2026-02-27 20:39:01 +00:00
claude[bot]
3309dcec2c fix: resolve mypy type errors in explanations.py
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:10:19 +00:00
claude[bot]
49e11a585a style: auto-fix formatting issues
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:08:28 +00:00
Kevin Turcios
ded2240818
Merge branch 'main' into allocs 2026-02-27 20:06:57 +00:00
Kevin Turcios
779fda2b36 slight changes 2026-02-27 15:02:08 -05:00
Kevin Turcios
1fedb8c443 refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints
- Extract shared content into Jinja2 macros (`section`, `field`,
  `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping,
  eliminating full duplication of every section across both branches
- Tighten system prompt to enforce concise 3-6 sentence output: trim
  bloated per-field context descriptions, add concrete positive example,
  explicitly forbid section headers and bullet groups, move output_format
  to be the last section so constraints are closest to generation
- Add caveat that original_explanation is for factual reference only (in
  both system and user prompts) to prevent the model from mimicking its
  verbose multi-section format
- Condense throughput/concurrency/acceptance sections to essentials
- Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`
2026-02-27 14:38:24 -05:00
Kevin Turcios
396d7cc7e8 refactor: modernize explanation prompts with Jinja2 templates
Extract inline prompts into .md.j2 templates, move schemas to
models.py, and add model_type branching (XML for Anthropic, markdown
for OpenAI) following the testgen pattern. Uses StrictUndefined,
trim_blocks, and lstrip_blocks.
2026-02-27 13:45:21 -05:00
Kevin Turcios
09fafeb914 fix: remove scroll-triggered CSS transitions causing jank in observability timeline
Reduce IntersectionObserver thresholds from 6 to 1, remove backdrop-blur
from sticky header, drop opacity/color/maxHeight transitions that fired
on every activeIndex change, and narrow progress bar to transition-[width].
2026-02-27 13:16:22 -05:00
Aseem Saxena
1a30f3eccb
Merge pull request #2440 from codeflash-ai/fix-middleware-llm-perf
fix: eliminate redundant DB queries in middleware and unblock LLM responses
2026-02-27 23:15:14 +05:30
Kevin Turcios
879a22454f
Merge branch 'main' into fix-middleware-llm-perf 2026-02-27 15:08:05 +00:00