Commit graph

6494 commits

Author SHA1 Message Date
Kevin Turcios
4cdcd57f04 fix: reduce flaky generated tests for JS async functions
The async testgen prompt was steering the LLM toward generating
timing-dependent and ordering-sensitive tests that produce
non-deterministic results across runs. This caused ~50% E2E failure
rate for the JS ESM async workflow.

- Add determinism requirement: never assert on timing, elapsed
  duration, or relative ordering of async side effects
- Remove directive to use Promise.all() for large-scale tests
- Change large-scale objective from "concurrent operations" to
  "correctness with larger inputs"
- Replace concurrent execution template example with a simple
  large-input correctness test
2026-03-02 17:47:20 -05:00
claude[bot]
962edcc595 fix: correct unpacking of validate_request_data return value
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-03-02 16:01:04 +00:00
claude[bot]
3f49aa1b43 fix: resolve mypy type errors in generate.py 2026-03-02 08:58:37 -05:00
Kevin Turcios
87ab144d40 feat: per-function test review + repair endpoints
Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints.
Review accepts per-test data with pre-flagged behavioral failures, AI
reviews passing functions for unrealistic patterns, returns per-function
verdicts. Repair takes flagged functions, LLM rewrites them,
re-instruments, returns repaired test source. Python-only gate.
2026-03-02 08:54:44 -05:00
Kevin Turcios
9d6799a87f
Merge pull request #2464 from codeflash-ai/testgen-review
refactor: reorganize python testgen directory structure
2026-03-02 13:52:43 +00:00
Kevin Turcios
efa29bf452 refactor: split instrument_new_tests.py into focused modules and extract model selection
Split the 1,734-line instrument_new_tests.py into three modules by concern:
- device_sync.py: GPU/device framework detection and sync AST generation
- wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top
- instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source

Also extract select_model_for_test() from testgen_python() in generate.py to
separate model selection logic from the HTTP handler.
2026-03-02 08:21:02 -05:00
Kevin Turcios
e26dd72d7d refactor: remove duplicate replace_definition_with_import from parse_and_validate_llm_output
The call was redundant — the postprocessing pipeline already handles it as
its final step. Move the test coverage to test_postprocessing_pipeline.py.
2026-03-02 07:58:54 -05:00
Kevin Turcios
0541126fc0 refactor: eliminate BaseTestGenContext class hierarchy
Replace class hierarchy (BaseTestGenContext → Single/Multi) with
standalone functions that branch on is_multi_context() internally.
Delete context.py, move TestGenContextData to models.py, and
distribute logic to validate.py, preprocess_pipeline.py, and
generate.py.
2026-03-02 07:38:51 -05:00
Kevin Turcios
a1c0ac6ae4 refactor: leverage Jinja2 includes, extends, and composition in testgen prompts
Use {% extends %} to deduplicate sync/async system templates via
base_system.md.j2, {% include %} for conditional JIT content, and a
compose_user.md.j2 wrapper to replace Python string assembly in
build_prompt().
2026-03-02 07:26:38 -05:00
Kevin Turcios
f191c12438 refactor: reorganize python testgen directory structure
Move prompts into prompts/ subdirectory with clearer names, rename
testgen.py to generate.py, extract validate.py and demo_hacks.py,
rename testgen_context.py to context.py, delete unused explain prompts.
2026-03-02 06:39:07 -05:00
Sarthak Agarwal
4b88fc0cc7
llm call optimization fail error log and small refactoring (#2447) 2026-03-02 12:33:56 +05:30
Kevin Turcios
28da22be35
Merge pull request #2446 from codeflash-ai/allocs
refactor: tighten optimizer and explanation prompts
2026-02-27 20:39:01 +00:00
claude[bot]
3309dcec2c fix: resolve mypy type errors in explanations.py
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:10:19 +00:00
claude[bot]
49e11a585a style: auto-fix formatting issues
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 20:08:28 +00:00
Kevin Turcios
ded2240818
Merge branch 'main' into allocs 2026-02-27 20:06:57 +00:00
Kevin Turcios
779fda2b36 slight changes 2026-02-27 15:02:08 -05:00
Kevin Turcios
1fedb8c443 refactor: rewrite explanation prompts with Jinja2 macros and tighter brevity constraints
- Extract shared content into Jinja2 macros (`section`, `field`,
  `code_field`) that handle Anthropic XML vs OpenAI markdown wrapping,
  eliminating full duplication of every section across both branches
- Tighten system prompt to enforce concise 3-6 sentence output: trim
  bloated per-field context descriptions, add concrete positive example,
  explicitly forbid section headers and bullet groups, move output_format
  to be the last section so constraints are closest to generation
- Add caveat that original_explanation is for factual reference only (in
  both system and user prompts) to prevent the model from mimicking its
  verbose multi-section format
- Condense throughput/concurrency/acceptance sections to essentials
- Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`
2026-02-27 14:38:24 -05:00
Kevin Turcios
396d7cc7e8 refactor: modernize explanation prompts with Jinja2 templates
Extract inline prompts into .md.j2 templates, move schemas to
models.py, and add model_type branching (XML for Anthropic, markdown
for OpenAI) following the testgen pattern. Uses StrictUndefined,
trim_blocks, and lstrip_blocks.
2026-02-27 13:45:21 -05:00
Kevin Turcios
09fafeb914 fix: remove scroll-triggered CSS transitions causing jank in observability timeline
Reduce IntersectionObserver thresholds from 6 to 1, remove backdrop-blur
from sticky header, drop opacity/color/maxHeight transitions that fired
on every activeIndex change, and narrow progress bar to transition-[width].
2026-02-27 13:16:22 -05:00
Aseem Saxena
1a30f3eccb
Merge pull request #2440 from codeflash-ai/fix-middleware-llm-perf
fix: eliminate redundant DB queries in middleware and unblock LLM responses
2026-02-27 23:15:14 +05:30
Kevin Turcios
879a22454f
Merge branch 'main' into fix-middleware-llm-perf 2026-02-27 15:08:05 +00:00
Kevin Turcios
986688ceca feat: show recent traces on observability V2 landing page
Display the 10 most recent trace IDs on the empty state so users can
quickly jump to recent optimizations without searching.
2026-02-27 06:32:28 -05:00
Kevin Turcios
18ed70e031 feat: add adaptive optimization support to observability V2
Display ADAPTIVE source candidates in the timeline with Sparkles icon,
parent candidate linking, and ranking labels. Also fix the backend to
pass call_type, trace_id, and user_id to call_llm for proper
observability logging.
2026-02-27 06:32:28 -05:00
Aseem Saxena
c3a572b816
Merge pull request #2433 from codeflash-ai/testgen-jit-iter
extend tensor limit to `rand, randn, ones, zeros, empty, full, randint`
2026-02-26 04:12:48 +05:30
Aseem Saxena
be1480b937
Merge branch 'main' into testgen-jit-iter 2026-02-26 03:54:55 +05:30
claude[bot]
3f11204164 fix: add type parameter to asyncio.Task for mypy 2026-02-25 22:22:26 +00:00
Aseem Saxena
e97ca0d37f
Merge branch 'main' into fix-middleware-llm-perf 2026-02-26 03:49:53 +05:30
Aseem Saxena
924808e909
Merge pull request #2435 from codeflash-ai/reduce-recompilations
reduce rcompilations in the tests for jit function to test
2026-02-26 03:49:18 +05:30
HeshamHM28
29011d5cc3
Merge branch 'main' into fix-middleware-llm-perf 2026-02-25 12:32:52 -08:00
mashraf-222
879658cedb
Merge branch 'main' into testgen-jit-iter 2026-02-25 21:51:18 +02:00
mashraf-222
db871c321a
Merge branch 'main' into reduce-recompilations 2026-02-25 21:50:48 +02:00
Aseem Saxena
ef072a9b36
Merge pull request #2439 from codeflash-ai/cf-jit-output-format-prompt
fix: enforce direct JIT decorator in optimizer prompt for numerical code
2026-02-25 23:26:26 +05:30
Aseem Saxena
df6b4ba341
Merge branch 'main' into cf-jit-output-format-prompt 2026-02-25 23:12:54 +05:30
Aseem Saxena
0380f9ad0d
Merge branch 'main' into reduce-recompilations 2026-02-25 02:27:47 +05:30
Aseem Saxena
14feee119f
Merge branch 'main' into testgen-jit-iter 2026-02-25 02:27:41 +05:30
mohammed ahmed
3ae091bc5b
Merge pull request #2441 from codeflash-ai/improve-js-ts-validator-errors
feat: improve JS/TS validator with markdown support and error locations
2026-02-24 16:32:56 +02:00
claude[bot]
c6e9fc4530 fix: remove duplicate return statement in _find_error_location
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-24 12:57:02 +00:00
mohammed ahmed
f301be093c
Update django/aiservice/aiservice/validators/javascript_validator.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-24 14:54:56 +02:00
ali
c2eb63eb2e
feat: improve JS/TS validator with markdown support and error locations
Add markdown code block parsing, detailed syntax error locations with
line/col info, and structured logging to the JavaScript/TypeScript
validators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:50:50 +02:00
claude[bot]
ae7110491c fix: add type ignore for Django ORM field type mismatch
Update type hints for `add_months_safe` and `get_next_subscription_period`
to accept both datetime.datetime and datetime.date, and add ty:ignore
comment for Django ORM field type that ty cannot infer correctly.

Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
2026-02-24 10:37:33 +00:00
aseembits93
7f824ce101 fix: eliminate redundant DB queries in middleware and unblock LLM responses
Auth now attaches fetched organization/subscription to the request so
TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware
caches restricted_paths at init and uses async cache methods. LLM call
recording is fire-and-forget via asyncio.create_task to avoid blocking
responses on DB writes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:43:18 +05:30
aseembits93
d4867ef18e refactor: make line profiler JIT handling consistent with regular optimizer
Move JIT instructions appending from the per-call level
(optimize_python_code_line_profiler_single) to the endpoint level
(optimize endpoint), matching the regular optimizer's pattern.
This removes the is_numerical_code parameter threading through
the call chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:54:03 +05:30
aseembits93
0b523fc367 fix: enforce direct JIT decorator in optimizer prompt for numerical code
When is_numerical_code is true, the LLM sometimes outputs conditional
fallback paths (try/except, if/else) instead of applying the JIT
decorator directly. Add explicit output format instructions to prevent
this behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:49:24 +05:30
Kevin Turcios
033d14ea87
Merge branch 'main' into testgen-jit-iter 2026-02-23 08:56:11 +00:00
Kevin Turcios
f14ff077a6
Merge branch 'main' into reduce-recompilations 2026-02-23 08:55:29 +00:00
Kevin Turcios
05aecd6fbd
Merge pull request #2437 from codeflash-ai/misc-changes
fix: improve ranker scoring consistency and local-caching bias
2026-02-23 08:55:18 +00:00
Kevin Turcios
40ff909b03 fix: add DATABASE_URL and DJANGO_SETTINGS_MODULE to pr-review workflow
Coverage analysis in the Claude pr-review job needs these env vars
to run pytest, matching how django-unit-tests and codeflash-aiservice
workflows configure them.
2026-02-23 03:43:33 -05:00
claude[bot]
bf4e38c301 fix: add cast to satisfy ty type checker for list covariance
The ty type checker correctly flags that list[str] is not a subtype of list[str | None] due to list invariance. Added explicit cast.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-23 08:42:24 +00:00
Kevin Turcios
16e043883a style: auto-format ranker and test_markdown_utils 2026-02-23 03:39:38 -05:00
Kevin Turcios
85a1c8b183 fix: derive ranker ranking from structured scores instead of LLM array
The JSON parsing path returned the LLM's explicit ranking array,
which sometimes contradicted its own per-dimension scores. Use
_scores_to_ranking() to compute the ranking from weighted scores
when available, falling back to the LLM ranking only when scores
are absent.
2026-02-23 03:37:42 -05:00