The optimization adds an early-exit check in `calculate_llm_cost` that returns zero immediately when all rate fields (`input_cost`, `cached_input_cost`, `output_cost`) are zero, before extracting token counts via `getattr` calls. Line profiler confirms the hot path: the original spent 70.7% of function time (580 ms) in the final return statement's arithmetic, but 99.3% of calls (949/956) had zero-cost models where token extraction was wasted work. The optimized version short-circuits these cases in 1.9 ms total, cutting `calculate_llm_cost` from 821 ms to 29 ms (96.5% reduction). This cascades to `LLMClient.call`, where cost calculation dropped from 50.5% to 4.3% of method time, yielding an 80% throughput gain (6165 → 11,097 ops/sec) despite a 37% concurrency ratio regression caused by spending proportionally more time in non-yielding sync code after eliminating the async bottleneck.
Replace access to private `asyncio.Lock._loop` attribute with equivalent
check using `self.client_loop`, which is semantically equivalent and avoids
the unresolved-attribute diagnostic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Issue #11: LLM Client TOCTOU Race Condition**
**Root Cause:**
PR #2575 fixed the client *recreation* race but introduced a Time-Of-Check
to Time-Of-Use (TOCTOU) bug. The lock protects client creation (lines 103-120),
but the actual API call happens *after* the lock is released. This creates a
race window where another thread can close the client between the lock release
and the API call.
**Race Timeline:**
1. Thread A: Acquires lock → checks/creates clients → releases lock
2. Thread A: Calls `await self.call_openai()` (line 155)
3. **Thread B:** Acquires lock → detects loop change → closes clients → recreates
4. Thread A: Uses `self.openai_client.chat.completions.create()` (line 231)
→ **RuntimeError: Cannot send a request, as the client has been closed.**
**Impact:**
- Affects 2/12 optimization runs (16.7% under concurrent load)
- Causes 300-second timeouts when all LLM retries exhaust
- Trace IDs: 146c8968-8264-4755-a852-0bebd4988517, 1dfac2ef-8e74-41da-870c-03b3697badf4
- Error pattern: All LLM calls fail → retry exhaustion → timeout → "NO OPTIMIZATIONS GENERATED"
**Fix:**
1. **Remove client close() calls** (lines 108-121) - Don't close clients when recreating
2. **Capture client references** (lines 123-127) - Snapshot clients after lock for consistency
3. **Pass clients as parameters** - call_openai(client=...) and call_anthropic(client=...)
This prevents the TOCTOU race by:
- Not closing clients that concurrent requests might be using
- Allowing Python's GC to safely clean up clients when no references remain
- Ensuring each request uses a consistent client instance
**Tests Added:**
- `test_old_clients_not_closed_on_event_loop_change` - Verifies close() is never called
- Updated existing tests to expect new behavior (no close() calls)
**Verification:**
- All 564 tests pass
- Reproduces the production error before fix
- Fix eliminates the race condition
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Issue**: The `/ai/optimization_review` endpoint was returning 500 errors
when trying to close LLM clients during event loop changes.
**Root Cause**: In `aiservice/llm.py` lines 96-99, the `close()` calls on
OpenAI and Anthropic clients were not wrapped in exception handlers. When
the httpx transport was already closed or in a bad state (e.g., event loop
closure, connection already closed), the exception would propagate and cause
the entire request to fail with a 500 error.
**Fix**: Wrapped both `openai_client.close()` and `anthropic_client.close()`
in try-except blocks that catch and log exceptions at DEBUG level. This
prevents transport errors from crashing requests while still attempting to
clean up resources properly.
**Impact**: Fixes 500 errors on `/ai/optimization_review` and other endpoints
that use the LLM client when event loops change or clients are in bad states.
**Testing**: Added `test_llm_client_close.py` with 2 test cases that verify:
1. Transport errors during close() are handled gracefully
2. Event loop closed errors are handled gracefully
**Traces**: 312d7392, 5bbdf214, a1325051
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
Co-authored-by: ali <mohammed18200118@gmail.com>
- ruff-format: reformat test file
- fix ty type error: cast mock clients to MagicMock for assert_called_once
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This fixes a critical bug where old AsyncAzureOpenAI and AsyncAnthropicBedrock
clients were not being closed when the event loop changed, causing:
1. Connection pool exhaustion → "couldn't get a connection after 30.00 sec"
2. RuntimeError: Event loop is closed during httpx client cleanup
Root cause:
In LLMClient.call(), when the event loop changed, new clients were created
but old clients were not properly closed, leading to connection leaks.
Fix:
- Added await client.close() for both openai_client and anthropic_client
before creating new instances
- Added comprehensive unit tests to verify proper cleanup
Impact:
- Resolves ~150+ test generation failures (500 errors)
- Fixes event loop closure errors in aiservice logs
Trace IDs affected: 04500fbd-88e0-44e4-8d20-32f6a0dc06cc (and many others)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Summary
- `/api/healthcheck` was returning 401 because `proxy.ts` requires auth
for all `/api/*` routes
- Application Gateway health probe got 401 → marked backend unhealthy →
**502 for all users**
- Adds `/api/healthcheck` to `ignorePaths` so it bypasses auth
- Also removes the erroneously added `middleware.ts` (Next.js 16 uses
`proxy.ts`)
## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
## Summary
- Auth0 v4 auto-generates middleware that protects all routes when no
`middleware.ts` exists
- This caused `/api/healthcheck` to return 401, making the Application
Gateway mark the backend as unhealthy → **502 for all users**
- Restores explicit middleware with Auth0 v4 API and excludes
`/api/healthcheck` from the matcher
## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
## Summary
- Convert
`debug_log_sensitive_data(f"...{response.model_dump_json(indent=2)}")`
to `debug_log_sensitive_data_from_callable(lambda: ...)` across 8
endpoint files
- In production, `debug_log_sensitive_data` is a no-op but the f-string
interpolation (including `model_dump_json(indent=2)`) was always
evaluated — serializing the full LLM response to JSON on every call
- The `_from_callable` variant only invokes the lambda when debug
logging is active (non-production)
- **Fix pre-existing bug**: `log_response()` closures in 4 endpoint
files returned `None` instead of a string, causing
`debug_log_sensitive_data_from_callable` to log `None`. Now they return
the concatenated log string as expected by the callable-based API.
Affected endpoints: Python optimizer, line profiler, jit_rewrite, Java
optimizer, Java line profiler, JS/TS optimizer, JS/TS line profiler,
testgen.
## Test plan
- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean
- [ ] Verify debug logging still works in non-production environments
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Replace Pydantic frozen dataclass with stdlib
`@dataclass(frozen=True)` for `CodeExplanationAndID` and
`CodeAndExplanation`, removing `field_validator` that ran `.code` +
`compile()` ~280 times per pipeline run
- Pre-compute `original_module.code` once and pass to pipeline steps
(`clean_extraneous_comments`, `equality_check`) that previously called
it independently
- Replace `ast.dump(annotate_fields=False)` with `ast.unparse` in
`deduplicate_optimizations` (70% faster)
- Skip re-parse in `dedup_and_sort_imports` when isort returns unchanged
code
- Cache comment-stripped original code across candidates in
`clean_extraneous_comments`
**Pipeline median per-run: ~1.5s → 184ms** (4 candidates, controlled
measurement). Saves ~4-5s of CPU per optimization request in production.
## Test plan
- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean (no new warnings)
- [ ] Verify optimizer endpoints return correct results in staging
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Move `safe_log_features()` and `update_optimization_cost()` out of
blocking `TaskGroup`s into fire-and-forget background tasks across 4
optimization endpoints (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer)
- These DB writes are analytics-only and don't affect response bodies —
waiting for them adds 100-300ms per request unnecessarily
- Add `aiservice/background.py` with `fire_and_forget()` helper using
the same `set` + `add_done_callback` pattern already used in `LLMClient`
- `get_or_create_optimization_event()` remains awaited where the
response needs `event.id`
## Test plan
- [x] All 550 tests pass locally
- [ ] Verify response latency improvement in production metrics after
deploy
- [ ] Confirm `safe_log_features` and `update_optimization_cost` still
complete successfully in background (check DB records)
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Vitest tests were failing with "Cannot find module" errors because
`vi.mock()` calls retained `.js` extensions while imports had them
stripped, causing mock/import path mismatch in ESM mode.
## Root Cause
The `strip_js_extensions()` function in `testgen.py` only handled
`jest.mock()` but not `vi.mock()`, which is used by Vitest. The pattern
`_JEST_MOCK_EXTENSION_PATTERN` matched Jest mocking functions but not
Vitest's `vi.*` equivalents.
## Fix
Added `_VITEST_MOCK_EXTENSION_PATTERN` regex to match and strip
extensions from:
- `vi.mock()`
- `vi.doMock()`
- `vi.unmock()`
- `vi.requireActual()`
- `vi.requireMock()`
- `vi.importActual()`
- `vi.importMock()`
## Affected Trace IDs
- `0fe99c9f-b348-4f0a-b051-0ea9455231ba`
- `127cdaec-a343-4918-a86a-b646dd4d79cf`
- `2b6c896e-20d7-4505-8bf4-e4a2f20b37fc`
These trace IDs exhibited the bug where generated tests had
`vi.mock('../config/paths.js')` but imports had `from
'../config/paths'`, causing module resolution failures.
## Test Coverage
- Added 8 new tests in `TestStripJsExtensions` class
- All 31 tests in `test_testgen_javascript.py` pass
- Specific regression test for vi.mock() extension stripping
- Tests cover all vi.mock variants and edge cases
## Files Changed
- `django/aiservice/core/languages/js_ts/testgen.py` (fix)
- `django/aiservice/tests/testgen/test_testgen_javascript.py` (tests)
---------
Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
## Summary
Fixes 500 Internal Server Error when replaying test generation with
`--rerun` flag and database arrays contain `None`/`NULL` values.
## Root Cause
The `rerun_testgen()` function in `core/shared/replay.py` accessed array
elements without checking if they were `None`. When PostgreSQL arrays
contained `NULL` values (e.g., `generated_test = [NULL, 'test2']`), the
function returned a `TestGenResponseSchema` with `None` values, causing
Pydantic validation to fail:
```
pydantic_core._pydantic_core.ValidationError: 2 validation errors for TestGenResponseSchema
generated_tests
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
instrumented_behavior_tests
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
```
## Changes
Added explicit `None` checks before creating `TestGenResponseSchema`:
- If `generated_test[index]` or `instrumented_generated_test[index]` is
`None`, return `None` (skip this test)
- If `instrumented_perf_test[index]` is `None`, default to empty string
(non-critical field)
## Impact
Resolves **10+ replay failures** where test generation produced partial
results stored as `NULL` in database arrays.
## Test Coverage
Added comprehensive test suite for `replay.py`:
- `test_rerun_with_valid_test_data()` - Happy path
- `test_rerun_with_none_values_in_arrays()` - **Primary bug fix test**
- `test_rerun_with_index_out_of_bounds()` - Boundary conditions
- `test_rerun_with_empty_arrays()` - Empty data handling
- `test_rerun_with_none_arrays()` - NULL arrays
- `test_rerun_with_mismatched_array_lengths()` - Length mismatches
- `test_rerun_missing_perf_test()` - Missing perf data
All 7 tests pass.
## Trace IDs
This fix addresses errors seen in traces:
- Primary: `056561cc-94af-4d7b-ac79-85dfd4b7282d`
- And 9 additional trace IDs with the same "500 - Error generating
JavaScript tests" error
## Verification
Tested with original failing trace:
```bash
cd /workspace/target && codeflash --file src/daemon/constants.ts --function formatGatewayServiceDescription --rerun 056561cc-94af-4d7b-ac79-85dfd4b7282d
```
**Before fix:** `ERROR: 500 - Traceback... ValidationError: Input should
be a valid string [type=string_type, input_value=None]`
**After fix:** Gracefully skips None entries, no 500 error ✅
---------
Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- **Memory leak fix**: Added explicit `LOGGING` config in `settings.py`
to prevent unbounded `LogRecord` buffering. Django's `django.request`
logger creates WARNING records for 4xx responses with the full
`ASGIRequest` (headers, body, payload) pinned in `args`. Without
explicit config, Django's default handlers and Sentry's
`enable_logs=True` buffer these indefinitely. Setting `django.request`
to ERROR level + removing `enable_logs=True` eliminated the leak — load
testing showed **84% reduction** in per-request memory growth (7.4 → 1.2
KiB/req).
- **Async event loop fix**: Wrapped
`parse_and_generate_candidate_schema()` in `asyncio.to_thread()` across
all 4 async callers (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer). This offloads the synchronous libcst parsing +
8-stage postprocessing pipeline to the thread pool, preventing it from
blocking the event loop during peak traffic.
## Test plan
- [x] All 550 tests pass (`uv run pytest tests/ --ignore=tests/profiling
-x -q`)
- [ ] Monitor Azure memory alerts after deploy — expect significant
reduction in memory growth rate
- [ ] Monitor 5xx error rate during peak traffic — expect reduction from
event loop no longer blocked by sync postprocessing
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- **`thread_sensitive=False`** on `sync_to_async` so concurrent
`log_features` calls get their own threads instead of serializing
through one (was `True`, causing a bottleneck)
- **Raised DB pool `max_size` from 10 to 100** — prod Postgres allows
859 connections, giving plenty of headroom
- **Added `safe_log_features` wrapper** that catches errors via Sentry
instead of propagating — used at all 9 TaskGroup and bare-await call
sites so a logging failure can't crash an otherwise successful
optimization endpoint
- **Kept `transaction.atomic` + `select_for_update`** for correctness
(Django doesn't support async transactions yet, and removing these
causes lost-update races on dict-merge fields)
## Root cause
`log_features` uses `@sync_to_async` + `@transaction.atomic` because
Django lacks async transaction support. The previous fix for pool
exhaustion changed `thread_sensitive=False` to `True`, which serialized
all calls through a single thread — fixing pool exhaustion but creating
a throughput bottleneck that caused 500s under load. Additionally, 6
call sites used `asyncio.TaskGroup` where any `log_features` exception
would propagate and crash the entire endpoint.
## Test plan
- [x] `tests/log_features/test_log_features_concurrency.py` — verifies
`thread_sensitive=False` and `safe_log_features` is async
- [x] `ruff check` passes on all changed files
- [ ] Deploy to staging and verify no 500s under concurrent optimization
requests
The optimization hoisted the 70-element `reserved_words` set out of `_is_valid_js_identifier` into a module-level `frozenset`, eliminating 1677 repeated set constructions that consumed 1.79 ms per profiler (42% of that function's time). More significantly, `_detect_export_style` previously compiled six regex patterns on every invocation via f-string interpolation with `escaped_id`; the optimized version pre-compiles generic patterns once at module load and uses `finditer` plus manual identifier comparison, cutting the function's runtime from 3.17 s to 14.7 ms across 1146 calls—a 99.5% reduction that accounts for nearly all of the 10× speedup. Test annotations confirm the largest gains occur in the `test_large_scale_many_class_methods_with_alternating_export_styles` case (107 ms → 4.66 ms), where repeated export detection dominated.
When generating tests for class methods (e.g., ModulesContainer.getById),
the test generator was incorrectly assuming default import style, generating:
import ModulesContainer from '...'
This caused "Cannot find module" errors when:
1. The class was not exported at all
2. The class used named export (export class X) instead of default export
This fix:
- Adds _detect_export_style() to parse source code and detect actual export style
- Modifies _resolve_import() to use detected export style:
- 'export default class X' → default import
- 'export class X' → named import
- No export → named import (test will fail, surfacing the issue)
- Adds comprehensive unit tests for all scenarios
Affected traces: 12332328-80e8-4bde-bdd6-c76ac373675a, 73ccd4c6-a4f7-467a-8356-5199e9d9b877, 989dcbda-bc27-40b7-aed0-0ab51fd00e6d, and others with ERR_MODULE_NOT_FOUND
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Summary
Parallelize independent DB writes at the end of 4 endpoints using
`asyncio.TaskGroup`. With psycopg3 connection pooling (#2489), each task
gets its own connection from the pool.
### Endpoints optimized
| Endpoint | Before | After |
|----------|--------|-------|
| **Refinement** | `log_features` then `update_optimization_cost` |
`TaskGroup` (concurrent) |
| **Explanations** | `update_optimization_cost` inside inner fn | Moved
to handler, `TaskGroup` with `log_features` |
| **Optimization review** | `update_optimization_cost` inside inner fn |
Moved to handler, `TaskGroup` with `update_optimization_features_review`
|
| **Ranker** | `update_optimization_cost` inside inner fn | Moved to
handler, `TaskGroup` with `log_features` |
Each endpoint saves ~87ms (one DB round-trip) by overlapping two
independent writes.
### Comprehensive audit
All 13 endpoints were audited — no remaining async antipatterns found:
- No blocking calls in async paths
- No `await`-in-loop patterns
- LLM clients already use connection reuse
- All other endpoints have at most 1 DB write in the epilogue
## Test plan
- [x] All 538 tests passing
- [ ] Verify under load in staging
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
## Summary
- Normalize quote style to double quotes for YAML consistency
- Remove redundant `jest-junit` runtime install step (already in
devDependencies)
- Simplify codeflash CLI flags: `--all --verbose --yes` → `--yes`
## Test plan
- [ ] Verify workflow runs successfully on a test PR touching
`js/cf-api/` or `js/cf-webapp/`
- [ ] Confirm `npm ci` installs jest-junit from package-lock without the
extra install step
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
## Summary
The TypeScript validator was rejecting valid JSX/TSX syntax, causing
optimization runs to fail on React components with JSX.
## Problem
The validator was using `tree_sitter_typescript.language_typescript()`
which doesn't parse JSX syntax. This caused validation failures for
`.tsx` files containing JSX elements like:
- `<div className={...} />`
- `{...rest}` (spread props)
- Any JSX tags
## Solution
Changed to use `tree_sitter_typescript.language_tsx()` instead. Since
TSX is a superset of TypeScript, this supports both:
- Plain TypeScript code
- TypeScript with JSX (TSX)
## Testing
Added three new test cases:
- `test_tsx_simple_jsx` - Tests basic JSX elements
- `test_tsx_nested_jsx` - Tests nested JSX
- `test_tsx_with_props_spread` - Tests spread props in JSX
All existing tests continue to pass.
## Impact
This fixes validation errors for all React/JSX components. Affected
trace IDs from logs:
- 5bedfbb7-ccc0-4fdd-b208-60b8b860750c
- 39892d42-774f-4921-80fc-2ee42ff8ae1c
- 80b818b6-e784-4ff8-abda-c3ce6b25422f
- 9b76943e-1a93-45fa-84b9-aae7d6305f79
- d1bac014-d622-4772-90ea-0f9ff88e32dd
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Bug: PostgreSQL connection pool timeout (30 seconds)
Root cause: log_features uses @sync_to_async(thread_sensitive=False), causing
each call to grab a separate database connection from the pool. When multiple
optimization requests run concurrently, the pool (max_size=10) exhausts.
Error seen: psycopg_pool.PoolTimeout: couldn't get a connection after 30.00 sec
Fix: Change thread_sensitive=False to thread_sensitive=True. This ensures Django
properly reuses connections across async/sync boundaries instead of allocating
a new connection for each call.
Affected trace IDs from logs:
- a0d8dab6-6524-47dc-9c82-5fa92e6390fb
- 62f5c35b-7161-4ab0-958a-4865231f5188
- ddc0e882-f914-49e4-a2ac-2d5f19a17507
- eaeb0cbe-6474-4808-9092-42f837dd52cf
Testing:
- Added test_log_features_concurrency.py to verify thread_sensitive=True
- Verified reproduction script now passes without pool exhaustion
- All existing tests pass
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Summary
- Adds `rerun_trace_id` field to all request schemas (`OptimizeSchema`,
`OptimizeSchemaLP`, `TestGenSchema`, `RefinementRequestSchema`,
`CodeRepairRequestSchema`)
- Creates `core/shared/replay.py` with shared rerun logic that queries
`optimization_features` and returns stored results
- Adds early-return short-circuit to `/optimize`,
`/optimize-line-profiler`, `/testgen`, `/refinement`, `/code_repair` —
bypasses LLM calls when `rerun_trace_id` is provided
- Filters results by `optimizations_origin.source` (OPTIMIZE,
OPTIMIZE_LP, REFINE, REPAIR) and matches by parent optimization ID for
refinement/repair
## Test plan
- [ ] Run optimization normally to populate `optimization_features` with
a trace_id
- [ ] Rerun with `codeflash --rerun <trace_id>` against local server
- [ ] Verify each endpoint returns stored results without LLM calls
- [ ] Verify backward compatibility — requests without `rerun_trace_id`
behave unchanged
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
## Summary
- The `/testgen_repair` endpoint was missed when `language_version`
support was added in #2488
- Clients that stopped sending `python_version`
(codeflash-ai/codeflash#1914) hit `400 - Python version is required`
- Adds `language_version` field and `resolve_python_version` validator
to `TestRepairSchema`, matching the pattern in
`OptimizeSchema`/`TestGenSchema`
- Replaces `python_version=data.python_version` with
`language_version=data.language_version` when constructing
`TestGenSchema` in the repair handler
## Test plan
- [ ] Deploy and verify testgen repair calls no longer return 400
- [ ] Verify old clients sending `python_version` still work (backward
compat via validator)
## Summary
- Fix regexes in `buildBenchmarkInfo` that strip empty improved/degraded
benchmark sections from PR comments
- The regexes didn't match the actual template text: wrong heading level
(`####` vs `###`), wrong emoji (`📉` vs `⚡️`), and wrong wording —
causing `{benchmark_info_degraded}` to render literally when there are
no degraded benchmarks
- Add unit tests for improved-only, degraded-only, and empty benchmark
scenarios
## Test plan
- [x] All 33 tests in `pr-changes-utils.test.ts` pass
- [x] New tests verify template placeholders are fully removed when a
section is empty