codeflash-internal

Author	SHA1	Message	Date
Kevin Turcios	4cdcd57f04	fix: reduce flaky generated tests for JS async functions The async testgen prompt was steering the LLM toward generating timing-dependent and ordering-sensitive tests that produce non-deterministic results across runs. This caused ~50% E2E failure rate for the JS ESM async workflow. - Add determinism requirement: never assert on timing, elapsed duration, or relative ordering of async side effects - Remove directive to use Promise.all() for large-scale tests - Change large-scale objective from "concurrent operations" to "correctness with larger inputs" - Replace concurrent execution template example with a simple large-input correctness test	2026-03-02 17:47:20 -05:00
claude[bot]	962edcc595	fix: correct unpacking of validate_request_data return value Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>	2026-03-02 16:01:04 +00:00
claude[bot]	3f49aa1b43	fix: resolve mypy type errors in generate.py	2026-03-02 08:58:37 -05:00
Kevin Turcios	87ab144d40	feat: per-function test review + repair endpoints Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints. Review accepts per-test data with pre-flagged behavioral failures, AI reviews passing functions for unrealistic patterns, returns per-function verdicts. Repair takes flagged functions, LLM rewrites them, re-instruments, returns repaired test source. Python-only gate.	2026-03-02 08:54:44 -05:00
Kevin Turcios	9d6799a87f	Merge pull request #2464 from codeflash-ai/testgen-review refactor: reorganize python testgen directory structure	2026-03-02 13:52:43 +00:00
Kevin Turcios	efa29bf452	refactor: split instrument_new_tests.py into focused modules and extract model selection Split the 1,734-line instrument_new_tests.py into three modules by concern: - device_sync.py: GPU/device framework detection and sync AST generation - wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top - instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source Also extract select_model_for_test() from testgen_python() in generate.py to separate model selection logic from the HTTP handler.	2026-03-02 08:21:02 -05:00
Kevin Turcios	e26dd72d7d	refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output The call was redundant — the postprocessing pipeline already handles it as its final step. Move the test coverage to test_postprocessing_pipeline.py.	2026-03-02 07:58:54 -05:00
Kevin Turcios	0541126fc0	refactor: eliminate BaseTestGenContext class hierarchy Replace class hierarchy (BaseTestGenContext → Single/Multi) with standalone functions that branch on is_multi_context() internally. Delete context.py, move TestGenContextData to models.py, and distribute logic to validate.py, preprocess_pipeline.py, and generate.py.	2026-03-02 07:38:51 -05:00
Kevin Turcios	a1c0ac6ae4	refactor: leverage Jinja2 includes, extends, and composition in testgen prompts Use {% extends %} to deduplicate sync/async system templates via base_system.md.j2, {% include %} for conditional JIT content, and a compose_user.md.j2 wrapper to replace Python string assembly in build_prompt().	2026-03-02 07:26:38 -05:00
Kevin Turcios	f191c12438	refactor: reorganize python testgen directory structure Move prompts into prompts/ subdirectory with clearer names, rename testgen.py to generate.py, extract validate.py and demo_hacks.py, rename testgen_context.py to context.py, delete unused explain prompts.	2026-03-02 06:39:07 -05:00
Sarthak Agarwal	4b88fc0cc7	llm call optimization fail error log and small refactoring (#2447 )	2026-03-02 12:33:56 +05:30
Kevin Turcios	28da22be35	Merge pull request #2446 from codeflash-ai/allocs refactor: tighten optimizer and explanation prompts	2026-02-27 20:39:01 +00:00
claude[bot]	3309dcec2c	fix: resolve mypy type errors in explanations.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:10:19 +00:00
claude[bot]	49e11a585a	style: auto-fix formatting issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-27 20:08:28 +00:00
Kevin Turcios	ded2240818	Merge branch 'main' into allocs	2026-02-27 20:06:57 +00:00
Kevin Turcios	779fda2b36	slight changes	2026-02-27 15:02:08 -05:00
Kevin Turcios	1fedb8c443	refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints - Extract shared content into Jinja2 macros (`section`, `field`, `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping, eliminating full duplication of every section across both branches - Tighten system prompt to enforce concise 3-6 sentence output: trim bloated per-field context descriptions, add concrete positive example, explicitly forbid section headers and bullet groups, move output_format to be the last section so constraints are closest to generation - Add caveat that original_explanation is for factual reference only (in both system and user prompts) to prevent the model from mimicking its verbose multi-section format - Condense throughput/concurrency/acceptance sections to essentials - Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`	2026-02-27 14:38:24 -05:00
Kevin Turcios	396d7cc7e8	refactor: modernize explanation prompts with Jinja2 templates Extract inline prompts into .md.j2 templates, move schemas to models.py, and add model_type branching (XML for Anthropic, markdown for OpenAI) following the testgen pattern. Uses StrictUndefined, trim_blocks, and lstrip_blocks.	2026-02-27 13:45:21 -05:00
Kevin Turcios	09fafeb914	fix: remove scroll-triggered CSS transitions causing jank in observability timeline Reduce IntersectionObserver thresholds from 6 to 1, remove backdrop-blur from sticky header, drop opacity/color/maxHeight transitions that fired on every activeIndex change, and narrow progress bar to transition-[width].	2026-02-27 13:16:22 -05:00
Aseem Saxena	1a30f3eccb	Merge pull request #2440 from codeflash-ai/fix-middleware-llm-perf fix: eliminate redundant DB queries in middleware and unblock LLM responses	2026-02-27 23:15:14 +05:30
Kevin Turcios	879a22454f	Merge branch 'main' into fix-middleware-llm-perf	2026-02-27 15:08:05 +00:00
Kevin Turcios	986688ceca	feat: show recent traces on observability V2 landing page Display the 10 most recent trace IDs on the empty state so users can quickly jump to recent optimizations without searching.	2026-02-27 06:32:28 -05:00
Kevin Turcios	18ed70e031	feat: add adaptive optimization support to observability V2 Display ADAPTIVE source candidates in the timeline with Sparkles icon, parent candidate linking, and ranking labels. Also fix the backend to pass call_type, trace_id, and user_id to call_llm for proper observability logging.	2026-02-27 06:32:28 -05:00
Aseem Saxena	c3a572b816	Merge pull request #2433 from codeflash-ai/testgen-jit-iter extend tensor limit to `rand, randn, ones, zeros, empty, full, randint`	2026-02-26 04:12:48 +05:30
Aseem Saxena	be1480b937	Merge branch 'main' into testgen-jit-iter	2026-02-26 03:54:55 +05:30
claude[bot]	3f11204164	fix: add type parameter to asyncio.Task for mypy	2026-02-25 22:22:26 +00:00
Aseem Saxena	e97ca0d37f	Merge branch 'main' into fix-middleware-llm-perf	2026-02-26 03:49:53 +05:30
Aseem Saxena	924808e909	Merge pull request #2435 from codeflash-ai/reduce-recompilations reduce rcompilations in the tests for jit function to test	2026-02-26 03:49:18 +05:30
HeshamHM28	29011d5cc3	Merge branch 'main' into fix-middleware-llm-perf	2026-02-25 12:32:52 -08:00
mashraf-222	879658cedb	Merge branch 'main' into testgen-jit-iter	2026-02-25 21:51:18 +02:00
mashraf-222	db871c321a	Merge branch 'main' into reduce-recompilations	2026-02-25 21:50:48 +02:00
Aseem Saxena	ef072a9b36	Merge pull request #2439 from codeflash-ai/cf-jit-output-format-prompt fix: enforce direct JIT decorator in optimizer prompt for numerical code	2026-02-25 23:26:26 +05:30
Aseem Saxena	df6b4ba341	Merge branch 'main' into cf-jit-output-format-prompt	2026-02-25 23:12:54 +05:30
Aseem Saxena	0380f9ad0d	Merge branch 'main' into reduce-recompilations	2026-02-25 02:27:47 +05:30
Aseem Saxena	14feee119f	Merge branch 'main' into testgen-jit-iter	2026-02-25 02:27:41 +05:30
mohammed ahmed	3ae091bc5b	Merge pull request #2441 from codeflash-ai/improve-js-ts-validator-errors feat: improve JS/TS validator with markdown support and error locations	2026-02-24 16:32:56 +02:00
claude[bot]	c6e9fc4530	fix: remove duplicate return statement in _find_error_location Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-24 12:57:02 +00:00
mohammed ahmed	f301be093c	Update django/aiservice/aiservice/validators/javascript_validator.py Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>	2026-02-24 14:54:56 +02:00
ali	c2eb63eb2e	feat: improve JS/TS validator with markdown support and error locations Add markdown code block parsing, detailed syntax error locations with line/col info, and structured logging to the JavaScript/TypeScript validators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 14:50:50 +02:00
claude[bot]	ae7110491c	fix: add type ignore for Django ORM field type mismatch Update type hints for `add_months_safe` and `get_next_subscription_period` to accept both datetime.datetime and datetime.date, and add ty:ignore comment for Django ORM field type that ty cannot infer correctly. Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>	2026-02-24 10:37:33 +00:00
aseembits93	7f824ce101	fix: eliminate redundant DB queries in middleware and unblock LLM responses Auth now attaches fetched organization/subscription to the request so TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware caches restricted_paths at init and uses async cache methods. LLM call recording is fire-and-forget via asyncio.create_task to avoid blocking responses on DB writes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:43:18 +05:30
aseembits93	d4867ef18e	refactor: make line profiler JIT handling consistent with regular optimizer Move JIT instructions appending from the per-call level (optimize_python_code_line_profiler_single) to the endpoint level (optimize endpoint), matching the regular optimizer's pattern. This removes the is_numerical_code parameter threading through the call chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:54:03 +05:30
aseembits93	0b523fc367	fix: enforce direct JIT decorator in optimizer prompt for numerical code When is_numerical_code is true, the LLM sometimes outputs conditional fallback paths (try/except, if/else) instead of applying the JIT decorator directly. Add explicit output format instructions to prevent this behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:49:24 +05:30
Kevin Turcios	033d14ea87	Merge branch 'main' into testgen-jit-iter	2026-02-23 08:56:11 +00:00
Kevin Turcios	f14ff077a6	Merge branch 'main' into reduce-recompilations	2026-02-23 08:55:29 +00:00
Kevin Turcios	05aecd6fbd	Merge pull request #2437 from codeflash-ai/misc-changes fix: improve ranker scoring consistency and local-caching bias	2026-02-23 08:55:18 +00:00
Kevin Turcios	40ff909b03	fix: add DATABASE_URL and DJANGO_SETTINGS_MODULE to pr-review workflow Coverage analysis in the Claude pr-review job needs these env vars to run pytest, matching how django-unit-tests and codeflash-aiservice workflows configure them.	2026-02-23 03:43:33 -05:00
claude[bot]	bf4e38c301	fix: add cast to satisfy ty type checker for list covariance The ty type checker correctly flags that list[str] is not a subtype of list[str \| None] due to list invariance. Added explicit cast. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-23 08:42:24 +00:00
Kevin Turcios	16e043883a	style: auto-format ranker and test_markdown_utils	2026-02-23 03:39:38 -05:00
Kevin Turcios	85a1c8b183	fix: derive ranker ranking from structured scores instead of LLM array The JSON parsing path returned the LLM's explicit ranking array, which sometimes contradicted its own per-dimension scores. Use _scores_to_ranking() to compute the ranking from weighted scores when available, falling back to the LLM ranking only when scores are absent.	2026-02-23 03:37:42 -05:00

... 3 4 5 6 7 ...

6494 commits