Eliminate redundant API call by extracting text from the loop's final
response directly instead of making a separate streaming call. Pre-build
candidatesBySource, candidatesById, and testModelMap in indexTraceData()
to replace repeated O(n) linear searches in tool calls and prompt
building. Combine cost/token aggregation into a single pass.
Restructure agent loop to use stream()+finalMessage() for all API calls,
fixing the SDK's non-streaming timeout error with max_tokens 32k. Add
parallel tool execution, tool activity bubbles in the frontend, and
restructure the system prompt for better investigation behavior.
Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS
section with structured guidance for get_llm_call_detail and codebase
browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND),
and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to
cite real file paths before responding.
Give the observability chat agent four new tools: get_llm_call_detail
(full prompt/response for any LLM call), read_file, search_code, and
list_directory for navigating the codeflash-internal and codeflash CLI
repos. This lets the agent trace problems end-to-end from trace data
through actual prompts to pipeline source code.
- Add id to IndexedTraceData.llmCalls so the agent can reference calls
- Make resolveToolCall async (Prisma + fs + child_process)
- Make processToolUseResponse async to match
- Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing
- Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars
- Path traversal protection, file size caps, search result limits
## Summary
- Add Foundry env vars (ANTHROPIC_FOUNDRY_API_KEY,
ANTHROPIC_FOUNDRY_BASE_URL) so the workflow authenticates via Azure
Foundry
- Fix Serena language config (javascript -> typescript)
## Summary
- Adds the GitHub Agentic Workflows duplicate code detector, configured
for Python and TypeScript/JavaScript with Serena semantic analysis
- Runs daily, flags patterns spanning 10+ lines or appearing in 3+
locations
- Creates up to 3 issues per run with `[duplicate-code]` prefix
## Notes
- Requires Claude API secret configured in repo Actions secrets
- `code-quality` and `automated-analysis` labels will be auto-created on
first run
## Summary
- Restructure CLAUDE.md hierarchy so Claude Code auto-discovers
project-specific instructions
- Delete dead `AGENTS.md` files (referenced non-existent
`.tessl/RULES.md`)
- Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery
- Create `js/CLAUDE.md` with package commands and gotchas
- Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded
rule)
- Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill)
- Add path-scoped rules for Python and Next.js patterns
- Add domain glossary, service architecture diagram, and per-package
gotchas
## Test plan
- Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/`
- Verify no remaining references to `AGENTS.md` or `.tessl/`
- Verify `.claude/rules/` and `.claude/skills/` files are committed
## Summary
- Extract testgen and optimizer API routers from
`core/languages/python/` into `core/shared/` with lazy imports,
eliminating cross-module coupling between language modules
- Delete stale JavaScript prompt files left in the Python module after
migration to `js_ts/`
- Remove backward-compat fallback paths for prompt files that already
exist at their new locations
- Remove unused `is_multi_context_any()` and its cross-language imports
- Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO
## Test plan
- [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS,
and Java
- [ ] Verify optimizer endpoint dispatches correctly for all languages
- [ ] Run existing testgen and optimizer tests
## Summary
- Pass test_index through LLM call context so observability chat can
attribute responses to specific test generation calls
- Fix SSE streaming to send keepalive pings from the start
CF-504
# Pull Request Checklist
## Description
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
## Problem
The JS/TS language handler (`core/languages/js_ts/`) was importing
models, schemas, config, prompts, and helpers directly from the Python
language handler. This created a confusing architectural dependency and
risked serving wrong language-specific prompt content.
## What Changed
- Created `core/shared/` for genuinely language-agnostic code (optimizer
schemas, models, config, testgen models, context helpers)
- Moved JS/TS-specific prompts and context helpers into
`core/languages/js_ts/`
- Updated all consumers (20+ files) to import from the correct locations
- Removed backwards-compat re-exports from the Python module
## Result
- **Before:** 11 imports from `core.languages.python` in
`core/languages/js_ts/`
- **After:** 0
## Summary
- Chat panel on the observability timeline that uses Claude to answer
questions about optimization traces
- Tool-based context retrieval (fetches candidates, tests, errors on
demand instead of stuffing everything upfront)
- Uses `@anthropic-ai/sdk` via Azure AI Foundry
- Strengthened testgen prompts to ban mocks/fakes for test inputs
## Summary
- Use claude-opus-4-6 model for both pr-review and claude-mention jobs
- Add mypy checks and consolidated summary comment (Steps 1 & 4) from
CLI workflow
- Add Edit tool and extra git/gh tools to allowed tools
Store qualified function name (e.g., HttpInterface.__init__) and
file_path in testgen metadata instead of bare function_name (__init__).
Update the frontend parser to handle qualified names by splitting into
class + method and searching within the correct class using both
tree-sitter and regex. Prioritize the file matching filePath before
searching all files.
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
…ndencies
The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs
to reinvent helpers like np.allclose using slow / inaccurate Python
loops. The new rule allows assertions from packages already imported by
the function under test.
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Optimize timeline data fetching/rendering with pre-computed maps and
reduced re-renders
- Split timeline monolith into focused components, lazy-load debug data,
use IntersectionObserver for active section tracking
- Optimize component rendering with `memo`, stable ref callbacks, and
pre-computed sort data
- Fix observability nav toggle not syncing with current URL pathname
- Fix Response button overlapping dialog close button in LLM debug
dialog
## Summary
- Reorganizes `django/aiservice/` from feature-first layout (separate
`optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout
under `core/languages/{python,js_ts}/`
- Adds handler/registry/dispatcher pattern for routing requests to
language-specific implementations
- All existing module code preserved via `git mv` for history tracking;
no logic changes to existing modules
## What changed
- New `core/` app with registry, dispatcher, protocols, and error
hierarchy
- `PythonHandler` and `JSTypeScriptHandler` delegate to existing module
functions
- All imports updated across the codebase (views, tests,
adaptive_optimizer, etc.)
- Integration tests for handler registration and dispatch
- 155 files changed, ~880 additions / ~207 deletions (mostly import path
updates and moves)
## Test plan
- [ ] `python manage.py check` passes
- [ ] Integration tests in
`tests/integration/test_handler_integration.py` pass
- [ ] Existing test suite passes with updated import paths
- [ ] Ruff and ty clean on all new infrastructure files
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Add split (side-by-side) diff view to the observability timeline for
comparing original vs optimized code
- Fix scroll handler not updating active section + expand container for
candidates
- Add LLM export route that returns plain text markdown of the full
trace, accessible via button next to search bar
## Test plan
- [ ] Load a trace in observability and verify the split diff view
renders correctly
- [ ] Verify the "LLM Export" button appears next to Search when results
are loaded
- [ ] Click the button and verify the new tab returns raw markdown text
(no HTML chrome)
- [ ] Verify all sections are present: function info, original code,
tests, candidates, ranking, errors, summary, and prompts
## Summary
- Rewrite testgen system prompts from constraint-heavy to positive-first
structure with chain-of-thought instructions
- Simplify LLM message structure from `[system, user, user, user]` to
`[system, user]` by absorbing plan_content guidelines into system
prompts
- Observability UI: add search to LLM debug dialog, expand timeline view
- Fix data capture: raw LLM responses, all user messages in prompt
column, nested code fences, empty notes handling
## Test plan
- [ ] Verify testgen produces valid test suites with the new prompt
structure
- [ ] Verify observability timeline displays LLM prompts/responses
correctly
- [ ] Check that search works in the LLM debug dialog
## Summary
- Published `@codeflash-ai/common@1.0.30` with `dist/` and
`instrumented_perf_test` schema field
- Updated webapp to use the new package so Prisma generates correct
types
- Removed `Record<string, unknown>` type cast workaround in `page.tsx`
The instrumented perf test data was already being stored in the DB but
the webapp's Prisma client didn't have the field in its generated types,
so it was never returned from queries.
## Test plan
- [ ] Search a trace that has perf tests (e.g.
`59a508fb-8d00-4830-992b-fa342e5d6c94`) and verify the `+perf` badge and
"Perf" tab appear in Test Generation
## Summary
- Bump `@codeflash-ai/common` from 1.0.28 to 1.0.29 to include the
`instrumented_perf_test` Prisma schema field in the published package
- This unblocks the observability timeline from displaying performance
tests (currently only generated + behavior tests show)
The field was added to the schema in #2330 but the package version was
never bumped, so the deployed webapp's Prisma client doesn't SELECT
`instrumented_perf_test`.
After merging: publish the package and redeploy the webapp.
## Summary
- Restructure the refinement system prompt into clear numbered sections
(Preserve Behavior, Minimize Diff, Revert Anti-Patterns, Maintain
Readability) with an explicit 6-step refinement process
- Extract inline prompt strings into separate markdown files
(`refinement_system_prompt.md`, `refinement_user_prompt.md`), matching
the convention used by other optimizer prompts
- Add `AuthenticatedRequest` type hint to `refine()` endpoint and fix
grammar in tool use section
## Test plan
- [ ] Verify refinement endpoint still works end-to-end with a test
optimization candidate
- [ ] Confirm prompt content is loaded correctly from markdown files at
startup
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Re-adds the enriched observability context from CF-1041 that was
reverted
- Passes `module_path`, `test_module_path`, `helper_function_names`,
`is_async`, and `function_to_optimize` details to `call_llm` in testgen
## Test plan
- [ ] Verify testgen LLM calls include the enriched context
- [ ] Confirm no regressions in test generation flow
introducing this due to pain points in V1, not a complete rewrite, based
off v1
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
1. languages/js_ts/testgen.py:
- Updated parse_and_validate_js_output to accept a language parameter
- Uses validate_typescript_syntax when language="typescript", otherwise
uses validate_javascript_syntax
- Updated generate_and_validate_js_test_code to accept and pass the
language parameter
- Updated the call chain to pass language through to the validation
2. optimizer/context_utils/refiner_context.py:
- Added import for validate_typescript_syntax
- Fixed is_valid_refinement method to use correct validator based on
language
- Fixed validate_code_syntax in SingleRefinerContext class
- Fixed validate_code_syntax in MultiRefinerContext class
3. tests/optimizer/test_javascript_validator.py:
- Added test_typescript_type_assertion_valid_in_ts - verifies as unknown
as number is valid TypeScript
- Added test_typescript_type_assertion_invalid_in_js - verifies as
unknown as number is INVALID JavaScript (this would have caught the
original bug)
- Added test_typescript_generic_valid_in_ts - verifies generics are
valid TypeScript
- Added test_typescript_generic_invalid_in_js - verifies generics are
INVALID JavaScript
Files Already Correct (no changes needed):
- languages/js_ts/optimizer.py - already correctly checks language
- languages/js_ts/optimizer_lp.py - already correctly checks language
- optimizer/optimizer_line_profiler.py - already correctly checks
language
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>