codeflash-internal

Author	SHA1	Message	Date
Kevin Turcios	9a979439f1	fix: clarify multi-file prompt to identify target file and reduce context noise Tell the LLM the first file is the optimization target and remaining files are context only. Allow omitting unchanged context files from the response.	2026-03-05 05:59:41 -05:00
Kevin Turcios	8106d53e32	Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair	2026-03-04 14:30:01 -05:00
Kevin Turcios	1532a66278	feat: include coverage info in test review and improve review prompt Accept coverage_summary in the review schema and pass it to the prompt. Add two new review criteria: low coverage detection and constructor/ dependency error patterns. Coverage percentage is shown in the user prompt so the reviewer can flag tests that don't exercise the function.	2026-03-04 14:14:19 -05:00
claude[bot]	f31b428a72	style: auto-fix linting issues	2026-03-04 09:15:27 +00:00
Kevin Turcios	ff35883ce6	Merge remote-tracking branch 'origin/testgen-review-repair' into testgen-review-repair	2026-03-04 04:13:24 -05:00
Kevin Turcios	644ded986f	Merge remote-tracking branch 'origin/main' into testgen-review-repair	2026-03-04 04:10:56 -05:00
Kevin Turcios	c2a67e8137	feat: pass test failure messages to review endpoint for better context Include runtime error messages from behavioral test failures in the review request. Failed function verdicts now include the specific error message. The review prompt shows error details so the AI can see patterns like type validation failures.	2026-03-04 04:09:27 -05:00
Kevin Turcios	fce866c96f	fix: splice only flagged functions from LLM repair into original test source Instead of replacing the entire test file with the LLM's output, parse both the original and repaired sources as CST, extract only the flagged function nodes from the repair output, and surgically replace them in the original. Unflagged functions are preserved exactly as-is.	2026-03-04 03:26:03 -05:00
Kevin Turcios	33be205d88	feat: run postprocessing pipeline on repaired tests before instrumentation Repaired tests from the LLM now go through the same postprocessing pipeline as initial generation (import fixing, loop limiting, unused definition removal) before instrumentation. Returns the display version (with asserts) as generated_tests for client-side display.	2026-03-04 03:20:09 -05:00
claude[bot]	8fe3171934	fix: resolve mypy type errors in generate.py and postprocess_pipeline.py	2026-03-04 08:19:57 +00:00
Kevin Turcios	2899eae4da	feat: return display-ready test source with asserts in testgen response Split postprocessing_testgen_pipeline to capture the test source before assert removal — fully cleaned (imports, loops, definitions) but with original asserts intact. Return it as raw_generated_tests in the TestGenResponseSchema so the CLI can display the human-readable version.	2026-03-04 03:16:30 -05:00
Kevin Turcios	40f3236645	refactor: simplify template selection with string composition	2026-03-04 01:13:06 -05:00
Kevin Turcios	c2f9b17969	Merge remote-tracking branch 'origin/main' into fix-js-async-testgen-flaky-tests	2026-03-04 01:09:00 -05:00
claude[bot]	38ca8824d6	fix: resolve mypy type errors in code_repair_context	2026-03-03 23:29:13 +00:00
Aseem Saxena	16253b3d63	Merge branch 'main' into match-testdiff-schema	2026-03-04 04:56:29 +05:30
Sarthak Agarwal	cc32654b7f	mocha prompts in backend (#2468 ) Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-03-04 04:09:10 +05:30
HeshamHM28	44fc7dc8e8	feat: Add support for specifying target Java version in test generation (#2445 )	2026-03-03 22:03:29 +00:00
aseembits93	4e60026fcc	test	2026-03-03 05:53:34 +05:30
Kevin Turcios	04624cc389	relax	2026-03-02 19:22:55 -05:00
Kevin Turcios	5d2ad27d3f	refactor: extract shared create_prompt_env Jinja2 factory Deduplicate the identical Environment(FileSystemLoader, StrictUndefined, keep_trailing_newline=True) setup across JS testgen, Python testgen, and Python explanations into core/shared/jinja_utils.py. Also fix tests/testgen/test_testgen_javascript.py which had a stale copy of build_javascript_prompt and loaded the now-deleted .md files.	2026-03-02 18:42:57 -05:00
Kevin Turcios	7820fb15e1	refactor: move ESM/CJS import formatting from Python to Jinja2 macro Split _generate_import_statement into _resolve_import (pure logic: identifier validation, dot splitting, reserved words) and a js_import Jinja2 macro (pure formatting: ESM vs CJS syntax). The macro lives in _macros.md.j2 and is imported by user.md.j2.	2026-03-02 18:28:30 -05:00
Kevin Turcios	d00fa99cc5	feat: convert JS/TS testgen prompts to Jinja2 templates with model_type and ESM support Replace plain .md prompts rendered with str.format() with Jinja2 templates using {% extends %}, {% block %}, and {% if %} branching: - model_type branching: XML tags for Anthropic, markdown headers for OpenAI - module_system support: ESM imports (import { fn } from '...') vs CJS (require) - Template inheritance: base_system.md.j2 with sync/async overrides - Unified user.md.j2 with is_async and module_system conditionals - Add module_system field to TestGenSchema	2026-03-02 18:23:30 -05:00
Kevin Turcios	4cdcd57f04	fix: reduce flaky generated tests for JS async functions The async testgen prompt was steering the LLM toward generating timing-dependent and ordering-sensitive tests that produce non-deterministic results across runs. This caused ~50% E2E failure rate for the JS ESM async workflow. - Add determinism requirement: never assert on timing, elapsed duration, or relative ordering of async side effects - Remove directive to use Promise.all() for large-scale tests - Change large-scale objective from "concurrent operations" to "correctness with larger inputs" - Replace concurrent execution template example with a simple large-input correctness test	2026-03-02 17:47:20 -05:00
claude[bot]	962edcc595	fix: correct unpacking of validate_request_data return value Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>	2026-03-02 16:01:04 +00:00
claude[bot]	3f49aa1b43	fix: resolve mypy type errors in generate.py	2026-03-02 08:58:37 -05:00
Kevin Turcios	87ab144d40	feat: per-function test review + repair endpoints Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints. Review accepts per-test data with pre-flagged behavioral failures, AI reviews passing functions for unrealistic patterns, returns per-function verdicts. Repair takes flagged functions, LLM rewrites them, re-instruments, returns repaired test source. Python-only gate.	2026-03-02 08:54:44 -05:00
Kevin Turcios	efa29bf452	refactor: split instrument_new_tests.py into focused modules and extract model selection Split the 1,734-line instrument_new_tests.py into three modules by concern: - device_sync.py: GPU/device framework detection and sync AST generation - wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top - instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source Also extract select_model_for_test() from testgen_python() in generate.py to separate model selection logic from the HTTP handler.	2026-03-02 08:21:02 -05:00
Kevin Turcios	e26dd72d7d	refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output The call was redundant — the postprocessing pipeline already handles it as its final step. Move the test coverage to test_postprocessing_pipeline.py.	2026-03-02 07:58:54 -05:00
Kevin Turcios	0541126fc0	refactor: eliminate BaseTestGenContext class hierarchy Replace class hierarchy (BaseTestGenContext → Single/Multi) with standalone functions that branch on is_multi_context() internally. Delete context.py, move TestGenContextData to models.py, and distribute logic to validate.py, preprocess_pipeline.py, and generate.py.	2026-03-02 07:38:51 -05:00
Kevin Turcios	a1c0ac6ae4	refactor: leverage Jinja2 includes, extends, and composition in testgen prompts Use {% extends %} to deduplicate sync/async system templates via base_system.md.j2, {% include %} for conditional JIT content, and a compose_user.md.j2 wrapper to replace Python string assembly in build_prompt().	2026-03-02 07:26:38 -05:00
Kevin Turcios	f191c12438	refactor: reorganize python testgen directory structure Move prompts into prompts/ subdirectory with clearer names, rename testgen.py to generate.py, extract validate.py and demo_hacks.py, rename testgen_context.py to context.py, delete unused explain prompts.	2026-03-02 06:39:07 -05:00
Sarthak Agarwal	4b88fc0cc7	llm call optimization fail error log and small refactoring (#2447 )	2026-03-02 12:33:56 +05:30
claude[bot]	3309dcec2c	fix: resolve mypy type errors in explanations.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:10:19 +00:00
claude[bot]	49e11a585a	style: auto-fix formatting issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:08:28 +00:00
Kevin Turcios	ded2240818	Merge branch 'main' into allocs	2026-02-27 20:06:57 +00:00
Kevin Turcios	779fda2b36	slight changes	2026-02-27 15:02:08 -05:00
Kevin Turcios	1fedb8c443	refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints - Extract shared content into Jinja2 macros (`section`, `field`, `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping, eliminating full duplication of every section across both branches - Tighten system prompt to enforce concise 3-6 sentence output: trim bloated per-field context descriptions, add concrete positive example, explicitly forbid section headers and bullet groups, move output_format to be the last section so constraints are closest to generation - Add caveat that original_explanation is for factual reference only (in both system and user prompts) to prevent the model from mimicking its verbose multi-section format - Condense throughput/concurrency/acceptance sections to essentials - Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`	2026-02-27 14:38:24 -05:00
Kevin Turcios	396d7cc7e8	refactor: modernize explanation prompts with Jinja2 templates Extract inline prompts into .md.j2 templates, move schemas to models.py, and add model_type branching (XML for Anthropic, markdown for OpenAI) following the testgen pattern. Uses StrictUndefined, trim_blocks, and lstrip_blocks.	2026-02-27 13:45:21 -05:00
Kevin Turcios	879a22454f	Merge branch 'main' into fix-middleware-llm-perf	2026-02-27 15:08:05 +00:00
Kevin Turcios	18ed70e031	feat: add adaptive optimization support to observability V2 Display ADAPTIVE source candidates in the timeline with Sparkles icon, parent candidate linking, and ranking labels. Also fix the backend to pass call_type, trace_id, and user_id to call_llm for proper observability logging.	2026-02-27 06:32:28 -05:00
Aseem Saxena	be1480b937	Merge branch 'main' into testgen-jit-iter	2026-02-26 03:54:55 +05:30
claude[bot]	3f11204164	fix: add type parameter to asyncio.Task for mypy	2026-02-25 22:22:26 +00:00
Aseem Saxena	e97ca0d37f	Merge branch 'main' into fix-middleware-llm-perf	2026-02-26 03:49:53 +05:30
HeshamHM28	29011d5cc3	Merge branch 'main' into fix-middleware-llm-perf	2026-02-25 12:32:52 -08:00
mashraf-222	879658cedb	Merge branch 'main' into testgen-jit-iter	2026-02-25 21:51:18 +02:00
mashraf-222	db871c321a	Merge branch 'main' into reduce-recompilations	2026-02-25 21:50:48 +02:00
Aseem Saxena	df6b4ba341	Merge branch 'main' into cf-jit-output-format-prompt	2026-02-25 23:12:54 +05:30
Aseem Saxena	0380f9ad0d	Merge branch 'main' into reduce-recompilations	2026-02-25 02:27:47 +05:30
Aseem Saxena	14feee119f	Merge branch 'main' into testgen-jit-iter	2026-02-25 02:27:41 +05:30
claude[bot]	c6e9fc4530	fix: remove duplicate return statement in _find_error_location Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-24 12:57:02 +00:00
mohammed ahmed	f301be093c	Update django/aiservice/aiservice/validators/javascript_validator.py Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>	2026-02-24 14:54:56 +02:00
ali	c2eb63eb2e	feat: improve JS/TS validator with markdown support and error locations Add markdown code block parsing, detailed syntax error locations with line/col info, and structured logging to the JavaScript/TypeScript validators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 14:50:50 +02:00
claude[bot]	ae7110491c	fix: add type ignore for Django ORM field type mismatch Update type hints for `add_months_safe` and `get_next_subscription_period` to accept both datetime.datetime and datetime.date, and add ty:ignore comment for Django ORM field type that ty cannot infer correctly. Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>	2026-02-24 10:37:33 +00:00
aseembits93	7f824ce101	fix: eliminate redundant DB queries in middleware and unblock LLM responses Auth now attaches fetched organization/subscription to the request so TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware caches restricted_paths at init and uses async cache methods. LLM call recording is fire-and-forget via asyncio.create_task to avoid blocking responses on DB writes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:43:18 +05:30
aseembits93	d4867ef18e	refactor: make line profiler JIT handling consistent with regular optimizer Move JIT instructions appending from the per-call level (optimize_python_code_line_profiler_single) to the endpoint level (optimize endpoint), matching the regular optimizer's pattern. This removes the is_numerical_code parameter threading through the call chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:54:03 +05:30
aseembits93	0b523fc367	fix: enforce direct JIT decorator in optimizer prompt for numerical code When is_numerical_code is true, the LLM sometimes outputs conditional fallback paths (try/except, if/else) instead of applying the JIT decorator directly. Add explicit output format instructions to prevent this behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:49:24 +05:30
Kevin Turcios	033d14ea87	Merge branch 'main' into testgen-jit-iter	2026-02-23 08:56:11 +00:00
Kevin Turcios	f14ff077a6	Merge branch 'main' into reduce-recompilations	2026-02-23 08:55:29 +00:00
claude[bot]	bf4e38c301	fix: add cast to satisfy ty type checker for list covariance The ty type checker correctly flags that list[str] is not a subtype of list[str \| None] due to list invariance. Added explicit cast. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-23 08:42:24 +00:00
Kevin Turcios	16e043883a	style: auto-format ranker and test_markdown_utils	2026-02-23 03:39:38 -05:00
Kevin Turcios	85a1c8b183	fix: derive ranker ranking from structured scores instead of LLM array The JSON parsing path returned the LLM's explicit ranking array, which sometimes contradicted its own per-dimension scores. Use _scores_to_ranking() to compute the ranking from weighted scores when available, falling back to the LLM ranking only when scores are absent.	2026-02-23 03:37:42 -05:00
Kevin Turcios	20ee6d5b62	fix: penalize local variable caching of globals in ranker prompt The ranker LLM was rewarding candidates that cache global variables into locals as a performance win. Add an explicit rule: this is only relevant on Python ≤3.10; on 3.11+ LOAD_GLOBAL uses adaptive specialization and is nearly as fast as LOAD_FAST.	2026-02-23 03:37:21 -05:00
Kevin Turcios	c95a36cf38	fix: handle nested code fences in extract_code_block The non-greedy regex in FIRST_CODE_BLOCK_PATTERN stopped at the first ``` occurrence, even inside triple-quoted strings or nested code fence blocks. This truncated the extracted code and lost test functions when LLMs embedded function definitions using ```python:filepath syntax. Switch to greedy matching and require the closing ``` to be alone on its line so intermediate backticks are skipped.	2026-02-23 03:36:50 -05:00
Kevin Turcios	ca71d0c8a0	refactor: remove constructor notes preprocessing from testgen pipeline Full class source is now included in the client-side testgen context, making the server-side constructor signature extraction redundant.	2026-02-23 03:36:50 -05:00
Kevin Turcios	bfd9f2cd04	fix: respect test_index when creating optimization_features row The get_or_create defaults passed test lists without positional indexing, so when a higher test_index created the row first its content landed at index 0 and was overwritten by the lower index update, losing a test.	2026-02-23 03:36:50 -05:00
Kevin Turcios	af3185edff	fix: handle non-numeric patch suffixes and support Python 3.15	2026-02-23 03:36:50 -05:00
Aseem Saxena	852274e2be	Merge branch 'main' into reduce-recompilations	2026-02-21 00:59:24 +05:30
aseembits93	85c5a2ec82	reduce rcompilations in the tests	2026-02-21 00:57:52 +05:30
Aseem Saxena	8f6d1d0602	fix: improve JIT testgen prompt to avoid error-checking tests Add explicit guidance to avoid generating tests that check for specific exception types, since JIT compilers (numba, torch.compile) produce different error types than uncompiled code. This ensures generated tests work consistently for both compiled and uncompiled versions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-20 18:59:04 +00:00
Aseem Saxena	5553b01bc1	Merge branch 'main' into testgen-jit-iter	2026-02-21 00:06:44 +05:30
claude[bot]	4fa972edd3	refactor: remove unused TORCH_TENSOR_FUNCTIONS constant Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-20 18:33:41 +00:00
Sarthak Agarwal	eb5f4b460e	Migrate to AWS bedrock (#2430 ) AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_REGION=us-east-1 Will require these for boto3 authentication	2026-02-20 23:52:48 +05:30
claude[bot]	46da033b05	style: fix ruff formatting and add mypy type annotation	2026-02-20 18:09:05 +00:00
Aseem Saxena	7e1b2a3ade	investigate	2026-02-20 18:03:28 +00:00
claude[bot]	1bb1407c6b	fix: resolve type checker errors	2026-02-15 12:33:05 +00:00
Kevin Turcios	d6a3c6254f	feat: add constructor notes for non-dataclass classes with __init__ The LLM prompt preprocessing now highlights __init__ signatures for regular classes, not just @dataclass ones, reducing brute-force constructor guessing and pytest.skip() fallbacks in generated tests.	2026-02-15 07:29:05 -05:00
Kevin Turcios	e5d70443db	fix: use positional insertion in log_features to preserve model attribution log_features() appended test results in call-completion order, causing model attribution swaps when LLM responses arrived out of order. Pass test_index through and use positional insertion instead of append.	2026-02-15 03:58:05 -05:00
Kevin Turcios	c13835963c	docs: restructure CLAUDE.md files into modular rules Slim down CLAUDE.md files and move content into path-scoped .claude/rules/ files to reduce context bloat.	2026-02-14 19:36:21 -05:00
Kevin Turcios	4c3deeb7b8	Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417 ) ## Summary - Restructure CLAUDE.md hierarchy so Claude Code auto-discovers project-specific instructions - Delete dead `AGENTS.md` files (referenced non-existent `.tessl/RULES.md`) - Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery - Create `js/CLAUDE.md` with package commands and gotchas - Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded rule) - Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill) - Add path-scoped rules for Python and Next.js patterns - Add domain glossary, service architecture diagram, and per-package gotchas ## Test plan - Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/` - Verify no remaining references to `AGENTS.md` or `.tessl/` - Verify `.claude/rules/` and `.claude/skills/` files are committed	2026-02-14 17:13:09 -05:00
Kevin Turcios	e26a8ea486	Reorganize top-level feature modules under core/ (#2416 ) ## Summary - Move `log_features/` → `core/log_features/` (Django app with `managed=False` models, no DB impact) - Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` → `core/languages/python/` (Python-focused API modules) - Update all imports across the codebase (19 files) ## Test plan - [x] All 548 tests pass - [x] No stale top-level imports (`from log_features.`, `from ranker.`, etc.) - [x] `log_features` AppConfig preserves `label = "log_features"` for Django app registry compatibility	2026-02-14 17:07:40 -05:00
Kevin Turcios	6caf7469c6	Decouple language modules and remove stale cross-module code (#2415 ) ## Summary - Extract testgen and optimizer API routers from `core/languages/python/` into `core/shared/` with lazy imports, eliminating cross-module coupling between language modules - Delete stale JavaScript prompt files left in the Python module after migration to `js_ts/` - Remove backward-compat fallback paths for prompt files that already exist at their new locations - Remove unused `is_multi_context_any()` and its cross-language imports - Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO ## Test plan - [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS, and Java - [ ] Verify optimizer endpoint dispatches correctly for all languages - [ ] Run existing testgen and optimizer tests	2026-02-14 00:09:44 -05:00
Kevin Turcios	2614393793	Add test_index to LLM call context for observability chat (#2414 ) ## Summary - Pass test_index through LLM call context so observability chat can attribute responses to specific test generation calls - Fix SSE streaming to send keepalive pings from the start CF-504	2026-02-13 23:49:20 -05:00
Sarthak Agarwal	c721723971	remove demo test loops (#2412 )	2026-02-14 00:43:09 +05:30
Saurabh Misra	198c0c1a4e	codeflash-omni-java (#2335 ) # Pull Request Checklist ## Description - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com> Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>	2026-02-13 23:26:55 +05:30
Kevin Turcios	ad26be10b8	Fix JS/TS cross-imports from Python module (#2396 ) ## Problem The JS/TS language handler (`core/languages/js_ts/`) was importing models, schemas, config, prompts, and helpers directly from the Python language handler. This created a confusing architectural dependency and risked serving wrong language-specific prompt content. ## What Changed - Created `core/shared/` for genuinely language-agnostic code (optimizer schemas, models, config, testgen models, context helpers) - Moved JS/TS-specific prompts and context helpers into `core/languages/js_ts/` - Updated all consumers (20+ files) to import from the correct locations - Removed backwards-compat re-exports from the Python module ## Result - Before: 11 imports from `core.languages.python` in `core/languages/js_ts/` - After: 0	2026-02-12 22:34:38 -05:00
Kevin Turcios	0df421eccb	Add chat interface to observability timeline (#2395 ) ## Summary - Chat panel on the observability timeline that uses Claude to answer questions about optimization traces - Tool-based context retrieval (fetches candidates, tests, errors on demand instead of stuffing everything upfront) - Uses `@anthropic-ai/sdk` via Azure AI Foundry - Strengthened testgen prompts to ban mocks/fakes for test inputs	2026-02-12 20:45:33 -05:00
Kevin Turcios	e28642cf22	Fix FTO display showing wrong function for methods with common names (#2391 ) Store qualified function name (e.g., HttpInterface.__init__) and file_path in testgen metadata instead of bare function_name (__init__). Update the frontend parser to handle qualified names by splitting into class + method and searching within the correct class using both tree-sitter and regex. Prioritize the file matching filePath before searching all files. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-12 00:30:33 -05:00
Kevin Turcios	db973a0487	fix: relax testgen assertion rule to allow imports from function depe… (#2388 ) …ndencies The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs to reinvent helpers like np.allclose using slow / inaccurate Python loops. The new rule allows assertions from packages already imported by the function under test. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-09 15:05:19 -05:00
Kevin Turcios	629442cc5e	Restructure aiservice to language-first architecture (#2383 ) ## Summary - Reorganizes `django/aiservice/` from feature-first layout (separate `optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout under `core/languages/{python,js_ts}/` - Adds handler/registry/dispatcher pattern for routing requests to language-specific implementations - All existing module code preserved via `git mv` for history tracking; no logic changes to existing modules ## What changed - New `core/` app with registry, dispatcher, protocols, and error hierarchy - `PythonHandler` and `JSTypeScriptHandler` delegate to existing module functions - All imports updated across the codebase (views, tests, adaptive_optimizer, etc.) - Integration tests for handler registration and dispatch - 155 files changed, ~880 additions / ~207 deletions (mostly import path updates and moves) ## Test plan - [ ] `python manage.py check` passes - [ ] Integration tests in `tests/integration/test_handler_integration.py` pass - [ ] Existing test suite passes with updated import paths - [ ] Ruff and ty clean on all new infrastructure files --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-09 09:15:50 -05:00
Kevin Turcios	b9d318279c	feat: observability improvements and testgen prompt modernization (#2382 ) ## Summary - Rewrite testgen system prompts from constraint-heavy to positive-first structure with chain-of-thought instructions - Simplify LLM message structure from `[system, user, user, user]` to `[system, user]` by absorbing plan_content guidelines into system prompts - Observability UI: add search to LLM debug dialog, expand timeline view - Fix data capture: raw LLM responses, all user messages in prompt column, nested code fences, empty notes handling ## Test plan - [ ] Verify testgen produces valid test suites with the new prompt structure - [ ] Verify observability timeline displays LLM prompts/responses correctly - [ ] Check that search works in the LLM debug dialog	2026-02-09 01:20:59 -05:00
Kevin Turcios	752e2504e4	Restructure and improve refinement prompt (#2379 ) ## Summary - Restructure the refinement system prompt into clear numbered sections (Preserve Behavior, Minimize Diff, Revert Anti-Patterns, Maintain Readability) with an explicit 6-step refinement process - Extract inline prompt strings into separate markdown files (`refinement_system_prompt.md`, `refinement_user_prompt.md`), matching the convention used by other optimizer prompts - Add `AuthenticatedRequest` type hint to `refine()` endpoint and fix grammar in tool use section ## Test plan - [ ] Verify refinement endpoint still works end-to-end with a test optimization candidate - [ ] Confirm prompt content is loaded correctly from markdown files at startup --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-08 02:10:20 -05:00
Kevin Turcios	47053591f4	observability v2 toggle (#2378 )	2026-02-07 15:50:12 -05:00
Kevin Turcios	f03a06f4e1	Reintroduce enriched obs_context for testgen LLM calls (#2377 ) ## Summary - Re-adds the enriched observability context from CF-1041 that was reverted - Passes `module_path`, `test_module_path`, `helper_function_names`, `is_async`, and `function_to_optimize` details to `call_llm` in testgen ## Test plan - [ ] Verify testgen LLM calls include the enriched context - [ ] Confirm no regressions in test generation flow	2026-02-07 10:33:13 -05:00
Sarthak Agarwal	98fb2d1579	Revert "CF-1041 observability v2 " need more changes and testing (#2375 ) Reverts codeflash-ai/codeflash-internal#2329	2026-02-06 01:18:17 +05:30
Kevin Turcios	07d33edd9f	CF-1041 observability v2 (#2329 ) introducing this due to pain points in V1, not a complete rewrite, based off v1 --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>	2026-02-05 14:08:02 -05:00
Sarthak Agarwal	08fd1a8787	adding validation for ts in refiner and testgen (#2372 ) 1. languages/js_ts/testgen.py: - Updated parse_and_validate_js_output to accept a language parameter - Uses validate_typescript_syntax when language="typescript", otherwise uses validate_javascript_syntax - Updated generate_and_validate_js_test_code to accept and pass the language parameter - Updated the call chain to pass language through to the validation 2. optimizer/context_utils/refiner_context.py: - Added import for validate_typescript_syntax - Fixed is_valid_refinement method to use correct validator based on language - Fixed validate_code_syntax in SingleRefinerContext class - Fixed validate_code_syntax in MultiRefinerContext class 3. tests/optimizer/test_javascript_validator.py: - Added test_typescript_type_assertion_valid_in_ts - verifies as unknown as number is valid TypeScript - Added test_typescript_type_assertion_invalid_in_js - verifies as unknown as number is INVALID JavaScript (this would have caught the original bug) - Added test_typescript_generic_valid_in_ts - verifies generics are valid TypeScript - Added test_typescript_generic_invalid_in_js - verifies generics are INVALID JavaScript Files Already Correct (no changes needed): - languages/js_ts/optimizer.py - already correctly checks language - languages/js_ts/optimizer_lp.py - already correctly checks language - optimizer/optimizer_line_profiler.py - already correctly checks language --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-04 22:54:44 +00:00
Aseem Saxena	648c95c909	Merge branch 'main' into match-testdiff-schema	2026-02-02 15:07:22 -08:00
Sarthak Agarwal	eb8ad603ff	vitest related changes to prompt (#2366 )	2026-02-03 03:29:36 +05:30
Aseem Saxena	90597c52e3	markdown more info	2026-02-02 10:11:44 -08:00
aseembits93	5d0ca8d01b	fn var was not used in .format()	2026-02-02 10:00:40 -08:00

1 2 3 4 5 ...

1502 commits