Commit graph

6455 commits

Author SHA1 Message Date
claude[bot]
ae7110491c fix: add type ignore for Django ORM field type mismatch
Update type hints for `add_months_safe` and `get_next_subscription_period`
to accept both datetime.datetime and datetime.date, and add ty:ignore
comment for Django ORM field type that ty cannot infer correctly.

Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
2026-02-24 10:37:33 +00:00
aseembits93
7f824ce101 fix: eliminate redundant DB queries in middleware and unblock LLM responses
Auth now attaches fetched organization/subscription to the request so
TrackUsageMiddleware reuses them instead of re-querying. RateLimitMiddleware
caches restricted_paths at init and uses async cache methods. LLM call
recording is fire-and-forget via asyncio.create_task to avoid blocking
responses on DB writes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:43:18 +05:30
aseembits93
d4867ef18e refactor: make line profiler JIT handling consistent with regular optimizer
Move JIT instructions appending from the per-call level
(optimize_python_code_line_profiler_single) to the endpoint level
(optimize endpoint), matching the regular optimizer's pattern.
This removes the is_numerical_code parameter threading through
the call chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:54:03 +05:30
aseembits93
0b523fc367 fix: enforce direct JIT decorator in optimizer prompt for numerical code
When is_numerical_code is true, the LLM sometimes outputs conditional
fallback paths (try/except, if/else) instead of applying the JIT
decorator directly. Add explicit output format instructions to prevent
this behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:49:24 +05:30
Kevin Turcios
033d14ea87
Merge branch 'main' into testgen-jit-iter 2026-02-23 08:56:11 +00:00
Kevin Turcios
f14ff077a6
Merge branch 'main' into reduce-recompilations 2026-02-23 08:55:29 +00:00
Kevin Turcios
05aecd6fbd
Merge pull request #2437 from codeflash-ai/misc-changes
fix: improve ranker scoring consistency and local-caching bias
2026-02-23 08:55:18 +00:00
Kevin Turcios
40ff909b03 fix: add DATABASE_URL and DJANGO_SETTINGS_MODULE to pr-review workflow
Coverage analysis in the Claude pr-review job needs these env vars
to run pytest, matching how django-unit-tests and codeflash-aiservice
workflows configure them.
2026-02-23 03:43:33 -05:00
claude[bot]
bf4e38c301 fix: add cast to satisfy ty type checker for list covariance
The ty type checker correctly flags that list[str] is not a subtype of list[str | None] due to list invariance. Added explicit cast.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-23 08:42:24 +00:00
Kevin Turcios
16e043883a style: auto-format ranker and test_markdown_utils 2026-02-23 03:39:38 -05:00
Kevin Turcios
85a1c8b183 fix: derive ranker ranking from structured scores instead of LLM array
The JSON parsing path returned the LLM's explicit ranking array,
which sometimes contradicted its own per-dimension scores. Use
_scores_to_ranking() to compute the ranking from weighted scores
when available, falling back to the LLM ranking only when scores
are absent.
2026-02-23 03:37:42 -05:00
Kevin Turcios
20ee6d5b62 fix: penalize local variable caching of globals in ranker prompt
The ranker LLM was rewarding candidates that cache global variables
into locals as a performance win. Add an explicit rule: this is only
relevant on Python ≤3.10; on 3.11+ LOAD_GLOBAL uses adaptive
specialization and is nearly as fast as LOAD_FAST.
2026-02-23 03:37:21 -05:00
Kevin Turcios
c95a36cf38 fix: handle nested code fences in extract_code_block
The non-greedy regex in FIRST_CODE_BLOCK_PATTERN stopped at the first
``` occurrence, even inside triple-quoted strings or nested code fence
blocks. This truncated the extracted code and lost test functions when
LLMs embedded function definitions using ```python:filepath syntax.

Switch to greedy matching and require the closing ``` to be alone on
its line so intermediate backticks are skipped.
2026-02-23 03:36:50 -05:00
Kevin Turcios
ca71d0c8a0 refactor: remove constructor notes preprocessing from testgen pipeline
Full class source is now included in the client-side testgen context,
making the server-side constructor signature extraction redundant.
2026-02-23 03:36:50 -05:00
Kevin Turcios
bfd9f2cd04 fix: respect test_index when creating optimization_features row
The get_or_create defaults passed test lists without positional
indexing, so when a higher test_index created the row first its
content landed at index 0 and was overwritten by the lower index
update, losing a test.
2026-02-23 03:36:50 -05:00
Kevin Turcios
6346d0992a chore: rename repo path env vars to match standard names
CODEFLASH_INTERNAL_REPO_PATH → AISERVICE_DIR, CODEFLASH_CLI_REPO_PATH → CODEFLASH_DIR
2026-02-23 03:36:50 -05:00
Kevin Turcios
af3185edff fix: handle non-numeric patch suffixes and support Python 3.15 2026-02-23 03:36:50 -05:00
Sarthak Agarwal
2cb3d51ddb
fix issue with closed and merged PRs raising suggestion (#2436) 2026-02-21 01:23:55 +05:30
Aseem Saxena
852274e2be
Merge branch 'main' into reduce-recompilations 2026-02-21 00:59:24 +05:30
aseembits93
85c5a2ec82 reduce rcompilations in the tests 2026-02-21 00:57:52 +05:30
Aseem Saxena
8f6d1d0602 fix: improve JIT testgen prompt to avoid error-checking tests
Add explicit guidance to avoid generating tests that check for specific
exception types, since JIT compilers (numba, torch.compile) produce
different error types than uncompiled code. This ensures generated tests
work consistently for both compiled and uncompiled versions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-20 18:59:04 +00:00
Aseem Saxena
5553b01bc1
Merge branch 'main' into testgen-jit-iter 2026-02-21 00:06:44 +05:30
claude[bot]
4fa972edd3 refactor: remove unused TORCH_TENSOR_FUNCTIONS constant
Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-20 18:33:41 +00:00
Sarthak Agarwal
eb5f4b460e
Migrate to AWS bedrock (#2430)
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1



Will require these for boto3 authentication
2026-02-20 23:52:48 +05:30
claude[bot]
46da033b05 style: fix ruff formatting and add mypy type annotation 2026-02-20 18:09:05 +00:00
Aseem Saxena
7e1b2a3ade investigate 2026-02-20 18:03:28 +00:00
Kevin Turcios
7005156190
Merge pull request #2427 from codeflash-ai/class-constructor-notes
feat: add constructor notes for non-dataclass classes
2026-02-19 01:46:55 +00:00
Kevin Turcios
b5af1ca353
Merge branch 'main' into class-constructor-notes 2026-02-19 01:46:45 +00:00
Aseem Saxena
e336a91c93
update model id 2026-02-17 07:08:27 -08:00
aseembits93
730c01d047 feat: switch Claude workflows from Foundry to AWS Bedrock
Replace Anthropic Foundry authentication with AWS Bedrock OIDC
in both claude.yml and duplicate-code-detector.yml workflows.

Changes:
- Replace use_foundry with use_bedrock
- Add aws-actions/configure-aws-credentials@v4 OIDC step
- Remove ANTHROPIC_FOUNDRY_API_KEY/BASE_URL env vars
- Update model identifiers to Bedrock format

Requires AWS_ROLE_TO_ASSUME secret to be configured in the repo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 19:01:51 +05:30
Kevin Turcios
a69f67f68f
Merge pull request #2428 from codeflash-ai/fix-windows-skill-filenames
fix: rename skill files to be Windows-compatible
2026-02-17 05:06:29 +00:00
Kevin Turcios
a21eb7aba2 fix: rename skill files to be Windows-compatible
Renamed skill files from using colons to dashes (e.g., tessl:add-api-endpoint → tessl-add-api-endpoint) to fix checkout issues on Windows filesystems which don't allow colons in filenames.

Skills will continue to work as the files contain relative paths to .tessl directory and don't reference their own filenames.
2026-02-17 05:01:30 +00:00
Sarthak Agarwal
e22e5d1f8b
Add codeflash optimization workflow for cf-api and cf-webapp (#2411)
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-02-16 19:48:15 +05:30
claude[bot]
1bb1407c6b fix: resolve type checker errors 2026-02-15 12:33:05 +00:00
Kevin Turcios
d6a3c6254f feat: add constructor notes for non-dataclass classes with __init__
The LLM prompt preprocessing now highlights __init__ signatures for
regular classes, not just @dataclass ones, reducing brute-force
constructor guessing and pytest.skip() fallbacks in generated tests.
2026-02-15 07:29:05 -05:00
Kevin Turcios
38eda0c2d6
Merge pull request #2426 from codeflash-ai/observability-chat-codebase-browsing
feat: observability chat codebase browsing and model attribution fix
2026-02-15 06:50:59 -05:00
Kevin Turcios
e5d70443db fix: use positional insertion in log_features to preserve model attribution
log_features() appended test results in call-completion order, causing
model attribution swaps when LLM responses arrived out of order. Pass
test_index through and use positional insertion instead of append.
2026-02-15 03:58:05 -05:00
Kevin Turcios
496033539e fix: use flex column layout and SSR-safe localStorage for split pane
Replace sticky positioning + ResizeObserver height calc with a flex
column layout (h-screen container, flex-1 panel group) that reliably
fills the viewport. Drop useDefaultLayout hook (not SSR-safe) in favor
of manual localStorage persistence inside useEffect.
2026-02-15 03:58:04 -05:00
Kevin Turcios
bf826a6e0f feat: add resizable split pane and expandable tool results to observability chat
Replace the fixed 480px chat overlay with a draggable split-pane layout
using react-resizable-panels, and make tool rounds expandable to show
the actual data the agent retrieved (code, errors, LLM call details).
2026-02-15 03:58:04 -05:00
Kevin Turcios
a3f9c655f9 fix: guarantee text response when agent loop produces only thinking blocks
Remove MAX_TOOL_ROUNDS cap so the model decides when to stop calling
tools. Add a safety net that makes a final tool-free API call if the
loop ends without emitting any visible text, fixing empty assistant
bubbles. Clean up redundant comments.
2026-02-15 03:58:04 -05:00
Kevin Turcios
870968e7a7 refactor: restructure system prompt for Claude Opus 4.6 best practices
- Move trace data to top of prompt (long-context best practice: data
  before instructions improves quality ~30%)
- Wrap sections in XML tags (<trace_data>, <role>, <domain_knowledge>,
  <guidelines>, <use_parallel_tool_calls>) for better parseability
- Remove aggressive language (MUST, CRITICAL, HARD REQUIREMENT) that
  causes overtriggering on Opus 4.6
- Replace rigid 4-step investigation workflow with general guidelines
  to let adaptive thinking handle reasoning strategy
- Remove duplicate content (tool reference section, two checklists)
- Add <use_parallel_tool_calls> block per Anthropic's recommended pattern
- Tone down tool descriptions from directive to descriptive
- Net reduction: 49 fewer lines in system prompt
2026-02-15 03:58:04 -05:00
Kevin Turcios
ae2bff113d fix: prevent context blowup by redacting thinking blocks between rounds
Thinking blocks from previous tool rounds (10-50KB each) were
accumulating in conversation history, causing Azure AI Foundry to hang
after 4+ rounds. Redact thinking content before each API call while
preserving required block structure. Also adds per-round timeout safety
net and status indicators between rounds.
2026-02-15 03:58:04 -05:00
Kevin Turcios
fbcc283e97 perf: unify agent loop and pre-build lookup maps for O(1) tool calls
Eliminate redundant API call by extracting text from the loop's final
response directly instead of making a separate streaming call. Pre-build
candidatesBySource, candidatesById, and testModelMap in indexTraceData()
to replace repeated O(n) linear searches in tool calls and prompt
building. Combine cost/token aggregation into a single pass.
2026-02-15 03:58:04 -05:00
Kevin Turcios
b09262ccbc feat: add tool activity display and fix streaming timeout in observability chat
Restructure agent loop to use stream()+finalMessage() for all API calls,
fixing the SDK's non-streaming timeout error with max_tokens 32k. Add
parallel tool execution, tool activity bubbles in the frontend, and
restructure the system prompt for better investigation behavior.
2026-02-15 03:58:04 -05:00
Kevin Turcios
51372ca0ad feat: add debugging workflow and response checklist to observability chat prompt
Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS
section with structured guidance for get_llm_call_detail and codebase
browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND),
and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to
cite real file paths before responding.
2026-02-15 03:58:04 -05:00
Kevin Turcios
782ee508de feat: add codebase browsing and LLM call inspection to observability chat
Give the observability chat agent four new tools: get_llm_call_detail
(full prompt/response for any LLM call), read_file, search_code, and
list_directory for navigating the codeflash-internal and codeflash CLI
repos. This lets the agent trace problems end-to-end from trace data
through actual prompts to pipeline source code.

- Add id to IndexedTraceData.llmCalls so the agent can reference calls
- Make resolveToolCall async (Prisma + fs + child_process)
- Make processToolUseResponse async to match
- Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing
- Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars
- Path traversal protection, file size caps, search result limits
2026-02-15 03:58:04 -05:00
Kevin Turcios
eecd3ba4ce
Merge pull request #2425 from codeflash-ai/tessl-json-update
chore: update tessl config and add npm tiles
2026-02-15 03:57:09 -05:00
Kevin Turcios
6933fe07ac chore: add npm tessl tiles from tessl install 2026-02-15 03:56:05 -05:00
Kevin Turcios
9ab71ad672 chore: add .next/ to gitignore 2026-02-15 03:55:12 -05:00
Kevin Turcios
b6dc71421a chore: update tessl.json with npm tile entries 2026-02-15 03:54:13 -05:00