# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- The `/testgen` response's `generated_tests` field contained the
assert-removed version with `codeflash_output` assignments
- When the CLI's testgen review fell back to this field (instead of
`raw_generated_tests`), the review LLM flagged every test as a "no-op
assignment"
- Now returns the display version (asserts kept, no instrumentation) as
`generated_tests`, matching what the repair endpoint already does
- Also applies isort to the display source for consistency
## Summary
- Use greedy code extraction and retry on syntax errors in testgen
repair
- Fix broken `asyncio.to_thread(log_features)` in Java optimizer —
`log_features` is `@sync_to_async` so calling it via `to_thread` created
an unawaited coroutine (`RuntimeWarning: coroutine
'SyncToAsync.__call__' was never awaited`) and silently skipped logging.
Replaced with `await log_features(...)` using correct keyword arguments.
## Test plan
- [ ] Verify testgen repair handles syntax errors with retry
- [ ] Verify Java optimization requests no longer emit `SyncToAsync`
RuntimeWarning
- [ ] Verify Java optimization features are correctly logged to DB
## Summary
- Switch `extract_code_block_with_context` (non-greedy `.*?`) →
`extract_code_block` (greedy `.*`) for repair code extraction — the
non-greedy regex matched the first closing fence, truncating code when
the LLM included explanatory snippets before the full file (root cause
of 82% of repair failures)
- Add `ast.parse` validation before CST parsing for fast syntax checking
- Retry the LLM once with the specific syntax error appended to the
conversation when validation fails
## Test plan
- [x] Existing tests pass
- [ ] Run end-to-end optimization to verify repairs succeed
## Summary
- Add multiline string literal constraint to testgen and repair prompts
— LLM was consistently generating unterminated string literals by
splitting strings across lines without triple quotes
- Deduplicate anthropic/markdown branches in testgen prompt templates —
single flow with inline `{% if is_xml %}` wrappers instead of duplicated
content
## Test plan
- [x] Verified templates render correctly for both anthropic and openai
model types (sync and async)
- [x] All block overrides from child templates work with the unified
block names
## Summary
- Pass coverage details (unexecuted lines, threshold) to review and
repair prompts so the LLM can identify low-coverage tests
- Accept previous repair errors in the repair endpoint and include them
in the prompt for retry cycles
- Parallelize per-test review LLM calls with `asyncio.TaskGroup`
- Conditionally include codeflash env var context
(`CODEFLASH_TRACER_DISABLE`, etc.) in repair prompts when the function
under test references them
## Test plan
- [x] Tested locally with codeflash CLI against `Tracer.__enter__` —
review, repair, and retry cycles all work
- [x] Coverage details and previous errors appear correctly in prompts
- [x] Review parallelization reduces latency from sequential ~60s per
test to concurrent
## Summary
- Switch testgen repair endpoint from `EXECUTE_MODEL` (GPT-5-Mini) to
`HAIKU_MODEL` (Haiku 4.5)
- Matches the review endpoint which already uses Haiku
- Repair is a structured task (splice functions, fix assertions) that
doesn't need a frontier model
- Should reduce latency (was timing out at 90s in CI) and cost
Accept coverage_summary in the review schema and pass it to the prompt.
Add two new review criteria: low coverage detection and constructor/
dependency error patterns. Coverage percentage is shown in the user
prompt so the reviewer can flag tests that don't exercise the function.
Include runtime error messages from behavioral test failures in the
review request. Failed function verdicts now include the specific error
message. The review prompt shows error details so the AI can see
patterns like type validation failures.
Instead of replacing the entire test file with the LLM's output, parse
both the original and repaired sources as CST, extract only the flagged
function nodes from the repair output, and surgically replace them in
the original. Unflagged functions are preserved exactly as-is.
Repaired tests from the LLM now go through the same postprocessing
pipeline as initial generation (import fixing, loop limiting, unused
definition removal) before instrumentation. Returns the display version
(with asserts) as generated_tests for client-side display.
Split postprocessing_testgen_pipeline to capture the test source before
assert removal — fully cleaned (imports, loops, definitions) but with
original asserts intact. Return it as raw_generated_tests in the
TestGenResponseSchema so the CLI can display the human-readable version.