Commit graph

1502 commits

Author SHA1 Message Date
Kevin Turcios
9a979439f1 fix: clarify multi-file prompt to identify target file and reduce context noise
Tell the LLM the first file is the optimization target and remaining
files are context only. Allow omitting unchanged context files from
the response.
2026-03-05 05:59:41 -05:00
Kevin Turcios
8106d53e32 Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair 2026-03-04 14:30:01 -05:00
Kevin Turcios
1532a66278 feat: include coverage info in test review and improve review prompt
Accept coverage_summary in the review schema and pass it to the prompt.
Add two new review criteria: low coverage detection and constructor/
dependency error patterns. Coverage percentage is shown in the user
prompt so the reviewer can flag tests that don't exercise the function.
2026-03-04 14:14:19 -05:00
claude[bot]
f31b428a72 style: auto-fix linting issues 2026-03-04 09:15:27 +00:00
Kevin Turcios
ff35883ce6 Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair 2026-03-04 04:13:24 -05:00
Kevin Turcios
644ded986f Merge remote-tracking branch 'origin/main' into testgen-review-repair 2026-03-04 04:10:56 -05:00
Kevin Turcios
c2a67e8137 feat: pass test failure messages to review endpoint for better context
Include runtime error messages from behavioral test failures in the
review request. Failed function verdicts now include the specific error
message. The review prompt shows error details so the AI can see
patterns like type validation failures.
2026-03-04 04:09:27 -05:00
Kevin Turcios
fce866c96f fix: splice only flagged functions from LLM repair into original test source
Instead of replacing the entire test file with the LLM's output, parse
both the original and repaired sources as CST, extract only the flagged
function nodes from the repair output, and surgically replace them in
the original. Unflagged functions are preserved exactly as-is.
2026-03-04 03:26:03 -05:00
Kevin Turcios
33be205d88 feat: run postprocessing pipeline on repaired tests before instrumentation
Repaired tests from the LLM now go through the same postprocessing
pipeline as initial generation (import fixing, loop limiting, unused
definition removal) before instrumentation. Returns the display version
(with asserts) as generated_tests for client-side display.
2026-03-04 03:20:09 -05:00
claude[bot]
8fe3171934 fix: resolve mypy type errors in generate.py and postprocess_pipeline.py 2026-03-04 08:19:57 +00:00
Kevin Turcios
2899eae4da feat: return display-ready test source with asserts in testgen response
Split postprocessing_testgen_pipeline to capture the test source before
assert removal — fully cleaned (imports, loops, definitions) but with
original asserts intact. Return it as raw_generated_tests in the
TestGenResponseSchema so the CLI can display the human-readable version.
2026-03-04 03:16:30 -05:00
Kevin Turcios
40f3236645 refactor: simplify template selection with string composition 2026-03-04 01:13:06 -05:00
Kevin Turcios
c2f9b17969 Merge remote-tracking branch 'origin/main' into fix-js-async-testgen-flaky-tests 2026-03-04 01:09:00 -05:00
claude[bot]
38ca8824d6 fix: resolve mypy type errors in code_repair_context 2026-03-03 23:29:13 +00:00
Aseem Saxena
16253b3d63
Merge branch 'main' into match-testdiff-schema 2026-03-04 04:56:29 +05:30
Sarthak Agarwal
cc32654b7f
mocha prompts in backend (#2468)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-03-04 04:09:10 +05:30
HeshamHM28
44fc7dc8e8
feat: Add support for specifying target Java version in test generation (#2445) 2026-03-03 22:03:29 +00:00
aseembits93
4e60026fcc test 2026-03-03 05:53:34 +05:30
Kevin Turcios
04624cc389 relax 2026-03-02 19:22:55 -05:00
Kevin Turcios
5d2ad27d3f refactor: extract shared create_prompt_env Jinja2 factory
Deduplicate the identical Environment(FileSystemLoader, StrictUndefined,
keep_trailing_newline=True) setup across JS testgen, Python testgen, and
Python explanations into core/shared/jinja_utils.py.

Also fix tests/testgen/test_testgen_javascript.py which had a stale
copy of build_javascript_prompt and loaded the now-deleted .md files.
2026-03-02 18:42:57 -05:00
Kevin Turcios
7820fb15e1 refactor: move ESM/CJS import formatting from Python to Jinja2 macro
Split _generate_import_statement into _resolve_import (pure logic:
identifier validation, dot splitting, reserved words) and a js_import
Jinja2 macro (pure formatting: ESM vs CJS syntax). The macro lives in
_macros.md.j2 and is imported by user.md.j2.
2026-03-02 18:28:30 -05:00
Kevin Turcios
d00fa99cc5 feat: convert JS/TS testgen prompts to Jinja2 templates with model_type and ESM support
Replace plain .md prompts rendered with str.format() with Jinja2
templates using {% extends %}, {% block %}, and {% if %} branching:

- model_type branching: XML tags for Anthropic, markdown headers for OpenAI
- module_system support: ESM imports (import { fn } from '...') vs CJS (require)
- Template inheritance: base_system.md.j2 with sync/async overrides
- Unified user.md.j2 with is_async and module_system conditionals
- Add module_system field to TestGenSchema
2026-03-02 18:23:30 -05:00
Kevin Turcios
4cdcd57f04 fix: reduce flaky generated tests for JS async functions
The async testgen prompt was steering the LLM toward generating
timing-dependent and ordering-sensitive tests that produce
non-deterministic results across runs. This caused ~50% E2E failure
rate for the JS ESM async workflow.

- Add determinism requirement: never assert on timing, elapsed
  duration, or relative ordering of async side effects
- Remove directive to use Promise.all() for large-scale tests
- Change large-scale objective from "concurrent operations" to
  "correctness with larger inputs"
- Replace concurrent execution template example with a simple
  large-input correctness test
2026-03-02 17:47:20 -05:00
claude[bot]
962edcc595 fix: correct unpacking of validate_request_data return value
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-03-02 16:01:04 +00:00
claude[bot]
3f49aa1b43 fix: resolve mypy type errors in generate.py 2026-03-02 08:58:37 -05:00
Kevin Turcios
87ab144d40 feat: per-function test review + repair endpoints
Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints.
Review accepts per-test data with pre-flagged behavioral failures, AI
reviews passing functions for unrealistic patterns, returns per-function
verdicts. Repair takes flagged functions, LLM rewrites them,
re-instruments, returns repaired test source. Python-only gate.
2026-03-02 08:54:44 -05:00
Kevin Turcios
efa29bf452 refactor: split instrument_new_tests.py into focused modules and extract model selection
Split the 1,734-line instrument_new_tests.py into three modules by concern:
- device_sync.py: GPU/device framework detection and sync AST generation
- wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top
- instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source

Also extract select_model_for_test() from testgen_python() in generate.py to
separate model selection logic from the HTTP handler.
2026-03-02 08:21:02 -05:00
Kevin Turcios
e26dd72d7d refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output
The call was redundant — the postprocessing pipeline already handles it as
its final step. Move the test coverage to test_postprocessing_pipeline.py.
2026-03-02 07:58:54 -05:00
Kevin Turcios
0541126fc0 refactor: eliminate BaseTestGenContext class hierarchy
Replace class hierarchy (BaseTestGenContext → Single/Multi) with
standalone functions that branch on is_multi_context() internally.
Delete context.py, move TestGenContextData to models.py, and
distribute logic to validate.py, preprocess_pipeline.py, and
generate.py.
2026-03-02 07:38:51 -05:00
Kevin Turcios
a1c0ac6ae4 refactor: leverage Jinja2 includes, extends, and composition in testgen prompts
Use {% extends %} to deduplicate sync/async system templates via
base_system.md.j2, {% include %} for conditional JIT content, and a
compose_user.md.j2 wrapper to replace Python string assembly in
build_prompt().
2026-03-02 07:26:38 -05:00
Kevin Turcios
f191c12438 refactor: reorganize python testgen directory structure
Move prompts into prompts/ subdirectory with clearer names, rename
testgen.py to generate.py, extract validate.py and demo_hacks.py,
rename testgen_context.py to context.py, delete unused explain prompts.
2026-03-02 06:39:07 -05:00
Sarthak Agarwal
4b88fc0cc7
llm call optimization fail error log and small refactoring (#2447) 2026-03-02 12:33:56 +05:30
claude[bot]
3309dcec2c fix: resolve mypy type errors in explanations.py
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:10:19 +00:00
claude[bot]
49e11a585a style: auto-fix formatting issues
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:08:28 +00:00
Kevin Turcios
ded2240818
Merge branch 'main' into allocs 2026-02-27 20:06:57 +00:00
Kevin Turcios
779fda2b36 slight changes 2026-02-27 15:02:08 -05:00
Kevin Turcios
1fedb8c443 refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints
- Extract shared content into Jinja2 macros (`section`, `field`,
  `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping,
  eliminating full duplication of every section across both branches
- Tighten system prompt to enforce concise 3-6 sentence output: trim
  bloated per-field context descriptions, add concrete positive example,
  explicitly forbid section headers and bullet groups, move output_format
  to be the last section so constraints are closest to generation
- Add caveat that original_explanation is for factual reference only (in
  both system and user prompts) to prevent the model from mimicking its
  verbose multi-section format
- Condense throughput/concurrency/acceptance sections to essentials
- Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`
2026-02-27 14:38:24 -05:00
Kevin Turcios
396d7cc7e8 refactor: modernize explanation prompts with Jinja2 templates
Extract inline prompts into .md.j2 templates, move schemas to
models.py, and add model_type branching (XML for Anthropic, markdown
for OpenAI) following the testgen pattern. Uses StrictUndefined,
trim_blocks, and lstrip_blocks.
2026-02-27 13:45:21 -05:00
Kevin Turcios
879a22454f
Merge branch 'main' into fix-middleware-llm-perf 2026-02-27 15:08:05 +00:00
Kevin Turcios
18ed70e031 feat: add adaptive optimization support to observability V2
Display ADAPTIVE source candidates in the timeline with Sparkles icon,
parent candidate linking, and ranking labels. Also fix the backend to
pass call_type, trace_id, and user_id to call_llm for proper
observability logging.
2026-02-27 06:32:28 -05:00
Aseem Saxena
be1480b937
Merge branch 'main' into testgen-jit-iter 2026-02-26 03:54:55 +05:30
claude[bot]
3f11204164 fix: add type parameter to asyncio.Task for mypy 2026-02-25 22:22:26 +00:00
Aseem Saxena
e97ca0d37f
Merge branch 'main' into fix-middleware-llm-perf 2026-02-26 03:49:53 +05:30
HeshamHM28
29011d5cc3
Merge branch 'main' into fix-middleware-llm-perf 2026-02-25 12:32:52 -08:00
mashraf-222
879658cedb
Merge branch 'main' into testgen-jit-iter 2026-02-25 21:51:18 +02:00
mashraf-222
db871c321a
Merge branch 'main' into reduce-recompilations 2026-02-25 21:50:48 +02:00
Aseem Saxena
df6b4ba341
Merge branch 'main' into cf-jit-output-format-prompt 2026-02-25 23:12:54 +05:30
Aseem Saxena
0380f9ad0d
Merge branch 'main' into reduce-recompilations 2026-02-25 02:27:47 +05:30
Aseem Saxena
14feee119f
Merge branch 'main' into testgen-jit-iter 2026-02-25 02:27:41 +05:30
claude[bot]
c6e9fc4530 fix: remove duplicate return statement in _find_error_location
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-24 12:57:02 +00:00
mohammed ahmed
f301be093c
Update django/aiservice/aiservice/validators/javascript_validator.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-24 14:54:56 +02:00
ali
c2eb63eb2e
feat: improve JS/TS validator with markdown support and error locations
Add markdown code block parsing, detailed syntax error locations with
line/col info, and structured logging to the JavaScript/TypeScript
validators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:50:50 +02:00
claude[bot]
ae7110491c fix: add type ignore for Django ORM field type mismatch
Update type hints for `add_months_safe` and `get_next_subscription_period`
to accept both datetime.datetime and datetime.date, and add ty:ignore
comment for Django ORM field type that ty cannot infer correctly.

Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
2026-02-24 10:37:33 +00:00
aseembits93
7f824ce101 fix: eliminate redundant DB queries in middleware and unblock LLM responses
Auth now attaches fetched organization/subscription to the request so
TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware
caches restricted_paths at init and uses async cache methods. LLM call
recording is fire-and-forget via asyncio.create_task to avoid blocking
responses on DB writes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:43:18 +05:30
aseembits93
d4867ef18e refactor: make line profiler JIT handling consistent with regular optimizer
Move JIT instructions appending from the per-call level
(optimize_python_code_line_profiler_single) to the endpoint level
(optimize endpoint), matching the regular optimizer's pattern.
This removes the is_numerical_code parameter threading through
the call chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:54:03 +05:30
aseembits93
0b523fc367 fix: enforce direct JIT decorator in optimizer prompt for numerical code
When is_numerical_code is true, the LLM sometimes outputs conditional
fallback paths (try/except, if/else) instead of applying the JIT
decorator directly. Add explicit output format instructions to prevent
this behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:49:24 +05:30
Kevin Turcios
033d14ea87
Merge branch 'main' into testgen-jit-iter 2026-02-23 08:56:11 +00:00
Kevin Turcios
f14ff077a6
Merge branch 'main' into reduce-recompilations 2026-02-23 08:55:29 +00:00
claude[bot]
bf4e38c301 fix: add cast to satisfy ty type checker for list covariance
The ty type checker correctly flags that list[str] is not a subtype of list[str | None] due to list invariance. Added explicit cast.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-23 08:42:24 +00:00
Kevin Turcios
16e043883a style: auto-format ranker and test_markdown_utils 2026-02-23 03:39:38 -05:00
Kevin Turcios
85a1c8b183 fix: derive ranker ranking from structured scores instead of LLM array
The JSON parsing path returned the LLM's explicit ranking array,
which sometimes contradicted its own per-dimension scores. Use
_scores_to_ranking() to compute the ranking from weighted scores
when available, falling back to the LLM ranking only when scores
are absent.
2026-02-23 03:37:42 -05:00
Kevin Turcios
20ee6d5b62 fix: penalize local variable caching of globals in ranker prompt
The ranker LLM was rewarding candidates that cache global variables
into locals as a performance win. Add an explicit rule: this is only
relevant on Python ≤3.10; on 3.11+ LOAD_GLOBAL uses adaptive
specialization and is nearly as fast as LOAD_FAST.
2026-02-23 03:37:21 -05:00
Kevin Turcios
c95a36cf38 fix: handle nested code fences in extract_code_block
The non-greedy regex in FIRST_CODE_BLOCK_PATTERN stopped at the first
``` occurrence, even inside triple-quoted strings or nested code fence
blocks. This truncated the extracted code and lost test functions when
LLMs embedded function definitions using ```python:filepath syntax.

Switch to greedy matching and require the closing ``` to be alone on
its line so intermediate backticks are skipped.
2026-02-23 03:36:50 -05:00
Kevin Turcios
ca71d0c8a0 refactor: remove constructor notes preprocessing from testgen pipeline
Full class source is now included in the client-side testgen context,
making the server-side constructor signature extraction redundant.
2026-02-23 03:36:50 -05:00
Kevin Turcios
bfd9f2cd04 fix: respect test_index when creating optimization_features row
The get_or_create defaults passed test lists without positional
indexing, so when a higher test_index created the row first its
content landed at index 0 and was overwritten by the lower index
update, losing a test.
2026-02-23 03:36:50 -05:00
Kevin Turcios
af3185edff fix: handle non-numeric patch suffixes and support Python 3.15 2026-02-23 03:36:50 -05:00
Aseem Saxena
852274e2be
Merge branch 'main' into reduce-recompilations 2026-02-21 00:59:24 +05:30
aseembits93
85c5a2ec82 reduce rcompilations in the tests 2026-02-21 00:57:52 +05:30
Aseem Saxena
8f6d1d0602 fix: improve JIT testgen prompt to avoid error-checking tests
Add explicit guidance to avoid generating tests that check for specific
exception types, since JIT compilers (numba, torch.compile) produce
different error types than uncompiled code. This ensures generated tests
work consistently for both compiled and uncompiled versions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-20 18:59:04 +00:00
Aseem Saxena
5553b01bc1
Merge branch 'main' into testgen-jit-iter 2026-02-21 00:06:44 +05:30
claude[bot]
4fa972edd3 refactor: remove unused TORCH_TENSOR_FUNCTIONS constant
Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-20 18:33:41 +00:00
Sarthak Agarwal
eb5f4b460e
Migrate to AWS bedrock (#2430)
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1



Will require these for boto3 authentication
2026-02-20 23:52:48 +05:30
claude[bot]
46da033b05 style: fix ruff formatting and add mypy type annotation 2026-02-20 18:09:05 +00:00
Aseem Saxena
7e1b2a3ade investigate 2026-02-20 18:03:28 +00:00
claude[bot]
1bb1407c6b fix: resolve type checker errors 2026-02-15 12:33:05 +00:00
Kevin Turcios
d6a3c6254f feat: add constructor notes for non-dataclass classes with __init__
The LLM prompt preprocessing now highlights __init__ signatures for
regular classes, not just @dataclass ones, reducing brute-force
constructor guessing and pytest.skip() fallbacks in generated tests.
2026-02-15 07:29:05 -05:00
Kevin Turcios
e5d70443db fix: use positional insertion in log_features to preserve model attribution
log_features() appended test results in call-completion order, causing
model attribution swaps when LLM responses arrived out of order. Pass
test_index through and use positional insertion instead of append.
2026-02-15 03:58:05 -05:00
Kevin Turcios
c13835963c docs: restructure CLAUDE.md files into modular rules
Slim down CLAUDE.md files and move content into path-scoped
.claude/rules/ files to reduce context bloat.
2026-02-14 19:36:21 -05:00
Kevin Turcios
4c3deeb7b8
Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417)
## Summary

- Restructure CLAUDE.md hierarchy so Claude Code auto-discovers
project-specific instructions
- Delete dead `AGENTS.md` files (referenced non-existent
`.tessl/RULES.md`)
- Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery
- Create `js/CLAUDE.md` with package commands and gotchas
- Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded
rule)
- Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill)
- Add path-scoped rules for Python and Next.js patterns
- Add domain glossary, service architecture diagram, and per-package
gotchas

## Test plan

- Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/`
- Verify no remaining references to `AGENTS.md` or `.tessl/`
- Verify `.claude/rules/` and `.claude/skills/` files are committed
2026-02-14 17:13:09 -05:00
Kevin Turcios
e26a8ea486
Reorganize top-level feature modules under core/ (#2416)
## Summary

- Move `log_features/` → `core/log_features/` (Django app with
`managed=False` models, no DB impact)
- Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` →
`core/languages/python/` (Python-focused API modules)
- Update all imports across the codebase (19 files)

## Test plan

- [x] All 548 tests pass
- [x] No stale top-level imports (`from log_features.`, `from ranker.`,
etc.)
- [x] `log_features` AppConfig preserves `label = "log_features"` for
Django app registry compatibility
2026-02-14 17:07:40 -05:00
Kevin Turcios
6caf7469c6
Decouple language modules and remove stale cross-module code (#2415)
## Summary

- Extract testgen and optimizer API routers from
`core/languages/python/` into `core/shared/` with lazy imports,
eliminating cross-module coupling between language modules
- Delete stale JavaScript prompt files left in the Python module after
migration to `js_ts/`
- Remove backward-compat fallback paths for prompt files that already
exist at their new locations
- Remove unused `is_multi_context_any()` and its cross-language imports
- Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO

## Test plan

- [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS,
and Java
- [ ] Verify optimizer endpoint dispatches correctly for all languages
- [ ] Run existing testgen and optimizer tests
2026-02-14 00:09:44 -05:00
Kevin Turcios
2614393793
Add test_index to LLM call context for observability chat (#2414)
## Summary

- Pass test_index through LLM call context so observability chat can
attribute responses to specific test generation calls
- Fix SSE streaming to send keepalive pings from the start

CF-504
2026-02-13 23:49:20 -05:00
Sarthak Agarwal
c721723971
remove demo test loops (#2412) 2026-02-14 00:43:09 +05:30
Saurabh Misra
198c0c1a4e
codeflash-omni-java (#2335)
# Pull Request Checklist

## Description
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-02-13 23:26:55 +05:30
Kevin Turcios
ad26be10b8
Fix JS/TS cross-imports from Python module (#2396)
## Problem

The JS/TS language handler (`core/languages/js_ts/`) was importing
models, schemas, config, prompts, and helpers directly from the Python
language handler. This created a confusing architectural dependency and
risked serving wrong language-specific prompt content.

## What Changed

- Created `core/shared/` for genuinely language-agnostic code (optimizer
schemas, models, config, testgen models, context helpers)
- Moved JS/TS-specific prompts and context helpers into
`core/languages/js_ts/`
- Updated all consumers (20+ files) to import from the correct locations
- Removed backwards-compat re-exports from the Python module

## Result

- **Before:** 11 imports from `core.languages.python` in
`core/languages/js_ts/`
- **After:** 0
2026-02-12 22:34:38 -05:00
Kevin Turcios
0df421eccb
Add chat interface to observability timeline (#2395)
## Summary
- Chat panel on the observability timeline that uses Claude to answer
questions about optimization traces
- Tool-based context retrieval (fetches candidates, tests, errors on
demand instead of stuffing everything upfront)
- Uses `@anthropic-ai/sdk` via Azure AI Foundry
- Strengthened testgen prompts to ban mocks/fakes for test inputs
2026-02-12 20:45:33 -05:00
Kevin Turcios
e28642cf22
Fix FTO display showing wrong function for methods with common names (#2391)
Store qualified function name (e.g., HttpInterface.__init__) and
file_path in testgen metadata instead of bare function_name (__init__).
Update the frontend parser to handle qualified names by splitting into
class + method and searching within the correct class using both
tree-sitter and regex. Prioritize the file matching filePath before
searching all files.

# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-02-12 00:30:33 -05:00
Kevin Turcios
db973a0487
fix: relax testgen assertion rule to allow imports from function depe… (#2388)
…ndencies

The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs
to reinvent helpers like np.allclose using slow / inaccurate Python
loops. The new rule allows assertions from packages already imported by
the function under test.

# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-02-09 15:05:19 -05:00
Kevin Turcios
629442cc5e
Restructure aiservice to language-first architecture (#2383)
## Summary
- Reorganizes `django/aiservice/` from feature-first layout (separate
`optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout
under `core/languages/{python,js_ts}/`
- Adds handler/registry/dispatcher pattern for routing requests to
language-specific implementations
- All existing module code preserved via `git mv` for history tracking;
no logic changes to existing modules

## What changed
- New `core/` app with registry, dispatcher, protocols, and error
hierarchy
- `PythonHandler` and `JSTypeScriptHandler` delegate to existing module
functions
- All imports updated across the codebase (views, tests,
adaptive_optimizer, etc.)
- Integration tests for handler registration and dispatch
- 155 files changed, ~880 additions / ~207 deletions (mostly import path
updates and moves)

## Test plan
- [ ] `python manage.py check` passes
- [ ] Integration tests in
`tests/integration/test_handler_integration.py` pass
- [ ] Existing test suite passes with updated import paths
- [ ] Ruff and ty clean on all new infrastructure files

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-09 09:15:50 -05:00
Kevin Turcios
b9d318279c
feat: observability improvements and testgen prompt modernization (#2382)
## Summary
- Rewrite testgen system prompts from constraint-heavy to positive-first
structure with chain-of-thought instructions
- Simplify LLM message structure from `[system, user, user, user]` to
`[system, user]` by absorbing plan_content guidelines into system
prompts
- Observability UI: add search to LLM debug dialog, expand timeline view
- Fix data capture: raw LLM responses, all user messages in prompt
column, nested code fences, empty notes handling

## Test plan
- [ ] Verify testgen produces valid test suites with the new prompt
structure
- [ ] Verify observability timeline displays LLM prompts/responses
correctly
- [ ] Check that search works in the LLM debug dialog
2026-02-09 01:20:59 -05:00
Kevin Turcios
752e2504e4
Restructure and improve refinement prompt (#2379)
## Summary
- Restructure the refinement system prompt into clear numbered sections
(Preserve Behavior, Minimize Diff, Revert Anti-Patterns, Maintain
Readability) with an explicit 6-step refinement process
- Extract inline prompt strings into separate markdown files
(`refinement_system_prompt.md`, `refinement_user_prompt.md`), matching
the convention used by other optimizer prompts
- Add `AuthenticatedRequest` type hint to `refine()` endpoint and fix
grammar in tool use section

## Test plan
- [ ] Verify refinement endpoint still works end-to-end with a test
optimization candidate
- [ ] Confirm prompt content is loaded correctly from markdown files at
startup

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-08 02:10:20 -05:00
Kevin Turcios
47053591f4
observability v2 toggle (#2378) 2026-02-07 15:50:12 -05:00
Kevin Turcios
f03a06f4e1
Reintroduce enriched obs_context for testgen LLM calls (#2377)
## Summary
- Re-adds the enriched observability context from CF-1041 that was
reverted
- Passes `module_path`, `test_module_path`, `helper_function_names`,
`is_async`, and `function_to_optimize` details to `call_llm` in testgen

## Test plan
- [ ] Verify testgen LLM calls include the enriched context
- [ ] Confirm no regressions in test generation flow
2026-02-07 10:33:13 -05:00
Sarthak Agarwal
98fb2d1579
Revert "CF-1041 observability v2 " need more changes and testing (#2375)
Reverts codeflash-ai/codeflash-internal#2329
2026-02-06 01:18:17 +05:30
Kevin Turcios
07d33edd9f
CF-1041 observability v2 (#2329)
introducing this due to pain points in V1, not a complete rewrite, based
off v1

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-05 14:08:02 -05:00
Sarthak Agarwal
08fd1a8787
adding validation for ts in refiner and testgen (#2372)
1. languages/js_ts/testgen.py:
- Updated parse_and_validate_js_output to accept a language parameter
- Uses validate_typescript_syntax when language="typescript", otherwise
uses validate_javascript_syntax
- Updated generate_and_validate_js_test_code to accept and pass the
language parameter
- Updated the call chain to pass language through to the validation
2. optimizer/context_utils/refiner_context.py:
- Added import for validate_typescript_syntax
- Fixed is_valid_refinement method to use correct validator based on
language
- Fixed validate_code_syntax in SingleRefinerContext class
- Fixed validate_code_syntax in MultiRefinerContext class
3. tests/optimizer/test_javascript_validator.py:
- Added test_typescript_type_assertion_valid_in_ts - verifies as unknown
as number is valid TypeScript
- Added test_typescript_type_assertion_invalid_in_js - verifies as
unknown as number is INVALID JavaScript (this would have caught the
original bug)
- Added test_typescript_generic_valid_in_ts - verifies generics are
valid TypeScript
- Added test_typescript_generic_invalid_in_js - verifies generics are
INVALID JavaScript
Files Already Correct (no changes needed):
- languages/js_ts/optimizer.py - already correctly checks language
- languages/js_ts/optimizer_lp.py - already correctly checks language
- optimizer/optimizer_line_profiler.py - already correctly checks
language

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-04 22:54:44 +00:00
Aseem Saxena
648c95c909
Merge branch 'main' into match-testdiff-schema 2026-02-02 15:07:22 -08:00
Sarthak Agarwal
eb8ad603ff
vitest related changes to prompt (#2366) 2026-02-03 03:29:36 +05:30
Aseem Saxena
90597c52e3
markdown more info 2026-02-02 10:11:44 -08:00
aseembits93
5d0ca8d01b fn var was not used in .format() 2026-02-02 10:00:40 -08:00