Deduplicate the identical Environment(FileSystemLoader, StrictUndefined,
keep_trailing_newline=True) setup across JS testgen, Python testgen, and
Python explanations into core/shared/jinja_utils.py.
Also fix tests/testgen/test_testgen_javascript.py which had a stale
copy of build_javascript_prompt and loaded the now-deleted .md files.
Split _generate_import_statement into _resolve_import (pure logic:
identifier validation, dot splitting, reserved words) and a js_import
Jinja2 macro (pure formatting: ESM vs CJS syntax). The macro lives in
_macros.md.j2 and is imported by user.md.j2.
Replace plain .md prompts rendered with str.format() with Jinja2
templates using {% extends %}, {% block %}, and {% if %} branching:
- model_type branching: XML tags for Anthropic, markdown headers for OpenAI
- module_system support: ESM imports (import { fn } from '...') vs CJS (require)
- Template inheritance: base_system.md.j2 with sync/async overrides
- Unified user.md.j2 with is_async and module_system conditionals
- Add module_system field to TestGenSchema
The async testgen prompt was steering the LLM toward generating
timing-dependent and ordering-sensitive tests that produce
non-deterministic results across runs. This caused ~50% E2E failure
rate for the JS ESM async workflow.
- Add determinism requirement: never assert on timing, elapsed
duration, or relative ordering of async side effects
- Remove directive to use Promise.all() for large-scale tests
- Change large-scale objective from "concurrent operations" to
"correctness with larger inputs"
- Replace concurrent execution template example with a simple
large-input correctness test
Add POST /ai/testgen_review and POST /ai/testgen_repair endpoints.
Review accepts per-test data with pre-flagged behavioral failures, AI
reviews passing functions for unrealistic patterns, returns per-function
verdicts. Repair takes flagged functions, LLM rewrites them,
re-instruments, returns repaired test source. Python-only gate.
Split the 1,734-line instrument_new_tests.py into three modules by concern:
- device_sync.py: GPU/device framework detection and sync AST generation
- wrapper.py: wrapper function generation, unified inject_logging_code, format_and_float_to_top
- instrument_new_tests.py: core AST transformer (InjectPerfAndLogging) and instrument_test_source
Also extract select_model_for_test() from testgen_python() in generate.py to
separate model selection logic from the HTTP handler.
Replace class hierarchy (BaseTestGenContext → Single/Multi) with
standalone functions that branch on is_multi_context() internally.
Delete context.py, move TestGenContextData to models.py, and
distribute logic to validate.py, preprocess_pipeline.py, and
generate.py.
Use {% extends %} to deduplicate sync/async system templates via
base_system.md.j2, {% include %} for conditional JIT content, and a
compose_user.md.j2 wrapper to replace Python string assembly in
build_prompt().
Move prompts into prompts/ subdirectory with clearer names, rename
testgen.py to generate.py, extract validate.py and demo_hacks.py,
rename testgen_context.py to context.py, delete unused explain prompts.
- Extract shared content into Jinja2 macros (`section`, `field`,
`code_field`) that handle Anthropic XML vs OpenAI markdown wrapping,
eliminating full duplication of every section across both branches
- Tighten system prompt to enforce concise 3-6 sentence output: trim
bloated per-field context descriptions, add concrete positive example,
explicitly forbid section headers and bullet groups, move output_format
to be the last section so constraints are closest to generation
- Add caveat that original_explanation is for factual reference only (in
both system and user prompts) to prevent the model from mimicking its
verbose multi-section format
- Condense throughput/concurrency/acceptance sections to essentials
- Rename misleading `## CRITICAL` heading to `## Acceptance Criteria`
Extract inline prompts into .md.j2 templates, move schemas to
models.py, and add model_type branching (XML for Anthropic, markdown
for OpenAI) following the testgen pattern. Uses StrictUndefined,
trim_blocks, and lstrip_blocks.
Reduce IntersectionObserver thresholds from 6 to 1, remove backdrop-blur
from sticky header, drop opacity/color/maxHeight transitions that fired
on every activeIndex change, and narrow progress bar to transition-[width].
Display ADAPTIVE source candidates in the timeline with Sparkles icon,
parent candidate linking, and ranking labels. Also fix the backend to
pass call_type, trace_id, and user_id to call_llm for proper
observability logging.
Add markdown code block parsing, detailed syntax error locations with
line/col info, and structured logging to the JavaScript/TypeScript
validators.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update type hints for `add_months_safe` and `get_next_subscription_period`
to accept both datetime.datetime and datetime.date, and add ty:ignore
comment for Django ORM field type that ty cannot infer correctly.
Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
Auth now attaches fetched organization/subscription to the request so
TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware
caches restricted_paths at init and uses async cache methods. LLM call
recording is fire-and-forget via asyncio.create_task to avoid blocking
responses on DB writes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move JIT instructions appending from the per-call level
(optimize_python_code_line_profiler_single) to the endpoint level
(optimize endpoint), matching the regular optimizer's pattern.
This removes the is_numerical_code parameter threading through
the call chain.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When is_numerical_code is true, the LLM sometimes outputs conditional
fallback paths (try/except, if/else) instead of applying the JIT
decorator directly. Add explicit output format instructions to prevent
this behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coverage analysis in the Claude pr-review job needs these env vars
to run pytest, matching how django-unit-tests and codeflash-aiservice
workflows configure them.