Commit graph

6350 commits

Author SHA1 Message Date
Kevin Turcios
12c6113f7e Update context_helpers.py 2026-03-22 03:56:26 -05:00
Kevin Turcios
387c909c9e
fix codeflash optimizing python backend (#2483)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:50:30 -05:00
Kevin Turcios
28c9acc877
refactor: aiservice deep dive — LLM client, dedup, async, cleanup (#2482)
## Summary

Comprehensive refactoring of the aiservice Django backend focusing on
code quality, deduplication, and correctness:

- **LLM client extraction**: Extract `LLMClient` class with lazy client
init, centralized error handling, and event loop detection
- **Centralize retry logic**: `@stamina.retry` on
`call_anthropic`/`call_openai` for transient errors (rate limits,
timeouts, 500s), removing scattered retry decorators from testgen files
- **Deduplicate helpers**: Consolidate `extract_code_and_explanation`
into shared `context_helpers.py`, unify `normalize_*_code` into
`normalize_c_style_code`
- **Eliminate double DB queries**: Auth middleware `afirst()` then
`aupdate()` by PK, middleware caches org/subscription
- **Parallelize Java optimizer**: Use `asyncio.TaskGroup` for
independent LLM calls
- **Lazy logging**: Convert all f-string logging to lazy `%s` formatting
across 11 files
- **Cleanup**: Remove unused `PipelineError`/`ValidationError`, fix
`seach_and_replace.py` typo, replace `print()` with `logging.debug()` in
middleware
- **Sentry**: Reduce sampling 1.0 → 0.1/0.01, fix auth `settings.DEBUG`
check, sanitize ranker errors

## Test plan

- [x] All existing pytest tests pass (`uv run pytest`)
- [x] Ruff lint/format clean
- [x] No behavioral changes — pure refactoring

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 01:53:32 -05:00
Aseem Saxena
c5e8b56c6f
Merge pull request #2317 from codeflash-ai/codeflash/optimize-checkForValidAPIKey-mkwv868t
️ Speed up function `checkForValidAPIKey` by 30%
2026-03-18 12:13:42 -07:00
Aseem Saxena
d44ca16d27
Merge branch 'main' into codeflash/optimize-checkForValidAPIKey-mkwv868t 2026-03-18 12:10:27 -07:00
Aseem Saxena
1dde1f0e16
Merge pull request #2323 from codeflash-ai/add/close_pr_end_point
new endpoint for close pr
2026-03-18 12:09:13 -07:00
Aseem Saxena
960401e2d4
Merge branch 'main' into add/close_pr_end_point 2026-03-18 12:08:35 -07:00
HeshamHM28
8f74cf42e2
Fix Unauthorized check for CLI login page (#2480)
# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-03-17 16:37:18 -07:00
Sarthak Agarwal
8f41556b01
fix to mobile view sidebar and login msg (#2481)
# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-03-18 04:57:23 +05:30
HeshamHM28
ea23cf06a6
fix: skip Python validation for Java/JS in optimize-line-profiler endpoint (#2478)
## Summary
- Fix `/optimize-line-profiler` endpoint rejecting Java/JS/TS requests
with `"Invalid Python version"` error by moving `parse_python_version()`
and Python syntax validation inside `if is_python:` block
- Fix code extraction regex in Java and JS/TS line profiler optimizers
to handle LLM responses with ```` ```java:FileName.java ```` format
(optional `:filename` suffix)

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: HeshamHM28 <HeshamHM28@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 18:31:08 -07:00
Sarthak Agarwal
7deb16819e
[Fix] Suppress slack for codeflash employees (#2466)
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-03-08 02:54:32 +05:30
Kevin Turcios
d74da02e57
fix: return display-version tests as generated_tests in testgen response (#2477)
## Summary
- The `/testgen` response's `generated_tests` field contained the
assert-removed version with `codeflash_output` assignments
- When the CLI's testgen review fell back to this field (instead of
`raw_generated_tests`), the review LLM flagged every test as a "no-op
assignment"
- Now returns the display version (asserts kept, no instrumentation) as
`generated_tests`, matching what the repair endpoint already does
- Also applies isort to the display source for consistency
2026-03-07 13:21:09 -06:00
Kevin Turcios
42b8eed7b4
fix: greedy code extraction with retry & fix unawaited coroutine in Java optimizer (#2476)
## Summary
- Use greedy code extraction and retry on syntax errors in testgen
repair
- Fix broken `asyncio.to_thread(log_features)` in Java optimizer —
`log_features` is `@sync_to_async` so calling it via `to_thread` created
an unawaited coroutine (`RuntimeWarning: coroutine
'SyncToAsync.__call__' was never awaited`) and silently skipped logging.
Replaced with `await log_features(...)` using correct keyword arguments.

## Test plan
- [ ] Verify testgen repair handles syntax errors with retry
- [ ] Verify Java optimization requests no longer emit `SyncToAsync`
RuntimeWarning
- [ ] Verify Java optimization features are correctly logged to DB
2026-03-07 03:18:43 -05:00
Kevin Turcios
0ca3a2ab07
fix: use greedy code extraction and retry on syntax errors in repair (#2475)
## Summary
- Switch `extract_code_block_with_context` (non-greedy `.*?`) →
`extract_code_block` (greedy `.*`) for repair code extraction — the
non-greedy regex matched the first closing fence, truncating code when
the LLM included explanatory snippets before the full file (root cause
of 82% of repair failures)
- Add `ast.parse` validation before CST parsing for fast syntax checking
- Retry the LLM once with the specific syntax error appended to the
conversation when validation fails

## Test plan
- [x] Existing tests pass
- [ ] Run end-to-end optimization to verify repairs succeed
2026-03-06 06:24:31 -05:00
Kevin Turcios
07edfaa0bd
fix: testgen prompt improvements (#2474)
## Summary

- Add multiline string literal constraint to testgen and repair prompts
— LLM was consistently generating unterminated string literals by
splitting strings across lines without triple quotes
- Deduplicate anthropic/markdown branches in testgen prompt templates —
single flow with inline `{% if is_xml %}` wrappers instead of duplicated
content

## Test plan

- [x] Verified templates render correctly for both anthropic and openai
model types (sync and async)
- [x] All block overrides from child templates work with the unified
block names
2026-03-06 10:54:50 +00:00
Kevin Turcios
434fb7df77
feat: improve testgen review & repair quality (#2473)
## Summary

- Pass coverage details (unexecuted lines, threshold) to review and
repair prompts so the LLM can identify low-coverage tests
- Accept previous repair errors in the repair endpoint and include them
in the prompt for retry cycles
- Parallelize per-test review LLM calls with `asyncio.TaskGroup`
- Conditionally include codeflash env var context
(`CODEFLASH_TRACER_DISABLE`, etc.) in repair prompts when the function
under test references them

## Test plan

- [x] Tested locally with codeflash CLI against `Tracer.__enter__` —
review, repair, and retry cycles all work
- [x] Coverage details and previous errors appear correctly in prompts
- [x] Review parallelization reduces latency from sequential ~60s per
test to concurrent
2026-03-06 10:23:55 +00:00
Kevin Turcios
14c0b3acca
fix: handle syntactically invalid LLM output in testgen repair (#2472)
## Summary
- Catch `ParserSyntaxError` when parsing LLM-repaired code instead of
letting it bubble to the generic 500 handler
- Reduces Sentry noise from expected LLM failures
- The CLI already handles non-200 responses gracefully (returns `None`,
continues)
2026-03-06 07:32:30 +00:00
Kevin Turcios
4edd183d82
perf: use Haiku model for testgen repair (#2471)
## Summary
- Switch testgen repair endpoint from `EXECUTE_MODEL` (GPT-5-Mini) to
`HAIKU_MODEL` (Haiku 4.5)
- Matches the review endpoint which already uses Haiku
- Repair is a structured task (splice functions, fix assertions) that
doesn't need a frontier model
- Should reduce latency (was timing out at 90s in CI) and cost
2026-03-06 07:10:44 +00:00
Kevin Turcios
8d1dfd9bdb
Merge pull request #2465 from codeflash-ai/testgen-review-repair
feat: per-function test review + repair endpoints
2026-03-05 22:37:21 +00:00
claude[bot]
6c7377a71f fix: resolve duplicate kwargs and missing HttpError import in testgen_repair 2026-03-05 22:14:28 +00:00
Kevin Turcios
641e609bda
Merge branch 'main' into testgen-review-repair 2026-03-05 22:09:37 +00:00
Kevin Turcios
de109c6e12
Update django/aiservice/core/shared/testgen_review/repair.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-03-05 17:09:28 -05:00
Kevin Turcios
737a270801
Update django/aiservice/core/shared/testgen_review/repair.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-03-05 17:09:17 -05:00
Kevin Turcios
0e9a8a5959
Merge pull request #2469 from codeflash-ai/fix-markdown-code-path-lookup
fix: clarify multi-file prompt to identify target file
2026-03-05 13:13:23 +00:00
Kevin Turcios
d9a963d305 fix: resolve contradicting response format instructions in multi-file prompt 2026-03-05 07:59:39 -05:00
Kevin Turcios
9a979439f1 fix: clarify multi-file prompt to identify target file and reduce context noise
Tell the LLM the first file is the optimization target and remaining
files are context only. Allow omitting unchanged context files from
the response.
2026-03-05 05:59:41 -05:00
Kevin Turcios
8106d53e32 Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair 2026-03-04 14:30:01 -05:00
Kevin Turcios
1532a66278 feat: include coverage info in test review and improve review prompt
Accept coverage_summary in the review schema and pass it to the prompt.
Add two new review criteria: low coverage detection and constructor/
dependency error patterns. Coverage percentage is shown in the user
prompt so the reviewer can flag tests that don't exercise the function.
2026-03-04 14:14:19 -05:00
claude[bot]
f31b428a72 style: auto-fix linting issues 2026-03-04 09:15:27 +00:00
Kevin Turcios
ff35883ce6 Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair 2026-03-04 04:13:24 -05:00
Kevin Turcios
644ded986f Merge remote-tracking branch 'origin/main' into testgen-review-repair 2026-03-04 04:10:56 -05:00
Kevin Turcios
c2a67e8137 feat: pass test failure messages to review endpoint for better context
Include runtime error messages from behavioral test failures in the
review request. Failed function verdicts now include the specific error
message. The review prompt shows error details so the AI can see
patterns like type validation failures.
2026-03-04 04:09:27 -05:00
Kevin Turcios
fce866c96f fix: splice only flagged functions from LLM repair into original test source
Instead of replacing the entire test file with the LLM's output, parse
both the original and repaired sources as CST, extract only the flagged
function nodes from the repair output, and surgically replace them in
the original. Unflagged functions are preserved exactly as-is.
2026-03-04 03:26:03 -05:00
Kevin Turcios
33be205d88 feat: run postprocessing pipeline on repaired tests before instrumentation
Repaired tests from the LLM now go through the same postprocessing
pipeline as initial generation (import fixing, loop limiting, unused
definition removal) before instrumentation. Returns the display version
(with asserts) as generated_tests for client-side display.
2026-03-04 03:20:09 -05:00
claude[bot]
8fe3171934 fix: resolve mypy type errors in generate.py and postprocess_pipeline.py 2026-03-04 08:19:57 +00:00
Kevin Turcios
2899eae4da feat: return display-ready test source with asserts in testgen response
Split postprocessing_testgen_pipeline to capture the test source before
assert removal — fully cleaned (imports, loops, definitions) but with
original asserts intact. Return it as raw_generated_tests in the
TestGenResponseSchema so the CLI can display the human-readable version.
2026-03-04 03:16:30 -05:00
Kevin Turcios
96284e4805
Merge pull request #2467 from codeflash-ai/fix-js-async-testgen-flaky-tests
fix: reduce flaky generated tests for JS async functions
2026-03-04 06:37:43 +00:00
Kevin Turcios
40f3236645 refactor: simplify template selection with string composition 2026-03-04 01:13:06 -05:00
Kevin Turcios
c2f9b17969 Merge remote-tracking branch 'origin/main' into fix-js-async-testgen-flaky-tests 2026-03-04 01:09:00 -05:00
Aseem Saxena
56ac044a86
Merge pull request #2364 from codeflash-ai/match-testdiff-schema
bug: mismatch in cli and internal schema for code repair
2026-03-04 05:16:49 +05:30
claude[bot]
38ca8824d6 fix: resolve mypy type errors in code_repair_context 2026-03-03 23:29:13 +00:00
Aseem Saxena
16253b3d63
Merge branch 'main' into match-testdiff-schema 2026-03-04 04:56:29 +05:30
Sarthak Agarwal
cc32654b7f
mocha prompts in backend (#2468)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-03-04 04:09:10 +05:30
HeshamHM28
44fc7dc8e8
feat: Add support for specifying target Java version in test generation (#2445) 2026-03-03 22:03:29 +00:00
Aseem Saxena
29e91e1c3d
Merge branch 'main' into match-testdiff-schema 2026-03-03 07:28:08 +05:30
Aseem Saxena
94fc60bb13
Merge branch 'main' into fix-js-async-testgen-flaky-tests 2026-03-03 07:27:24 +05:30
Saurabh Misra
e8f1589107
Merge pull request #2429 from codeflash-ai/cf-aws-bedrock-claude-workflows
feat: switch Claude workflows from Foundry to AWS Bedrock
2026-03-02 17:48:04 -08:00
aseembits93
76a81b4381 chore: switch CI Claude model to Sonnet 4.6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 06:47:20 +05:30
aseembits93
26e4936659 keep the non foundry env vars 2026-03-03 06:03:06 +05:30
Aseem Saxena
9e5e61e53d
Apply suggestion 2026-03-02 16:27:35 -08:00