Commit graph

6204 commits

Author SHA1 Message Date
claude[bot]
1bb1407c6b fix: resolve type checker errors 2026-02-15 12:33:05 +00:00
Kevin Turcios
d6a3c6254f feat: add constructor notes for non-dataclass classes with __init__
The LLM prompt preprocessing now highlights __init__ signatures for
regular classes, not just @dataclass ones, reducing brute-force
constructor guessing and pytest.skip() fallbacks in generated tests.
2026-02-15 07:29:05 -05:00
Kevin Turcios
38eda0c2d6
Merge pull request #2426 from codeflash-ai/observability-chat-codebase-browsing
feat: observability chat codebase browsing and model attribution fix
2026-02-15 06:50:59 -05:00
Kevin Turcios
e5d70443db fix: use positional insertion in log_features to preserve model attribution
log_features() appended test results in call-completion order, causing
model attribution swaps when LLM responses arrived out of order. Pass
test_index through and use positional insertion instead of append.
2026-02-15 03:58:05 -05:00
Kevin Turcios
496033539e fix: use flex column layout and SSR-safe localStorage for split pane
Replace sticky positioning + ResizeObserver height calc with a flex
column layout (h-screen container, flex-1 panel group) that reliably
fills the viewport. Drop useDefaultLayout hook (not SSR-safe) in favor
of manual localStorage persistence inside useEffect.
2026-02-15 03:58:04 -05:00
Kevin Turcios
bf826a6e0f feat: add resizable split pane and expandable tool results to observability chat
Replace the fixed 480px chat overlay with a draggable split-pane layout
using react-resizable-panels, and make tool rounds expandable to show
the actual data the agent retrieved (code, errors, LLM call details).
2026-02-15 03:58:04 -05:00
Kevin Turcios
a3f9c655f9 fix: guarantee text response when agent loop produces only thinking blocks
Remove MAX_TOOL_ROUNDS cap so the model decides when to stop calling
tools. Add a safety net that makes a final tool-free API call if the
loop ends without emitting any visible text, fixing empty assistant
bubbles. Clean up redundant comments.
2026-02-15 03:58:04 -05:00
Kevin Turcios
870968e7a7 refactor: restructure system prompt for Claude Opus 4.6 best practices
- Move trace data to top of prompt (long-context best practice: data
  before instructions improves quality ~30%)
- Wrap sections in XML tags (<trace_data>, <role>, <domain_knowledge>,
  <guidelines>, <use_parallel_tool_calls>) for better parseability
- Remove aggressive language (MUST, CRITICAL, HARD REQUIREMENT) that
  causes overtriggering on Opus 4.6
- Replace rigid 4-step investigation workflow with general guidelines
  to let adaptive thinking handle reasoning strategy
- Remove duplicate content (tool reference section, two checklists)
- Add <use_parallel_tool_calls> block per Anthropic's recommended pattern
- Tone down tool descriptions from directive to descriptive
- Net reduction: 49 fewer lines in system prompt
2026-02-15 03:58:04 -05:00
Kevin Turcios
ae2bff113d fix: prevent context blowup by redacting thinking blocks between rounds
Thinking blocks from previous tool rounds (10-50KB each) were
accumulating in conversation history, causing Azure AI Foundry to hang
after 4+ rounds. Redact thinking content before each API call while
preserving required block structure. Also adds per-round timeout safety
net and status indicators between rounds.
2026-02-15 03:58:04 -05:00
Kevin Turcios
fbcc283e97 perf: unify agent loop and pre-build lookup maps for O(1) tool calls
Eliminate redundant API call by extracting text from the loop's final
response directly instead of making a separate streaming call. Pre-build
candidatesBySource, candidatesById, and testModelMap in indexTraceData()
to replace repeated O(n) linear searches in tool calls and prompt
building. Combine cost/token aggregation into a single pass.
2026-02-15 03:58:04 -05:00
Kevin Turcios
b09262ccbc feat: add tool activity display and fix streaming timeout in observability chat
Restructure agent loop to use stream()+finalMessage() for all API calls,
fixing the SDK's non-streaming timeout error with max_tokens 32k. Add
parallel tool execution, tool activity bubbles in the frontend, and
restructure the system prompt for better investigation behavior.
2026-02-15 03:58:04 -05:00
Kevin Turcios
51372ca0ad feat: add debugging workflow and response checklist to observability chat prompt
Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS
section with structured guidance for get_llm_call_detail and codebase
browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND),
and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to
cite real file paths before responding.
2026-02-15 03:58:04 -05:00
Kevin Turcios
782ee508de feat: add codebase browsing and LLM call inspection to observability chat
Give the observability chat agent four new tools: get_llm_call_detail
(full prompt/response for any LLM call), read_file, search_code, and
list_directory for navigating the codeflash-internal and codeflash CLI
repos. This lets the agent trace problems end-to-end from trace data
through actual prompts to pipeline source code.

- Add id to IndexedTraceData.llmCalls so the agent can reference calls
- Make resolveToolCall async (Prisma + fs + child_process)
- Make processToolUseResponse async to match
- Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing
- Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars
- Path traversal protection, file size caps, search result limits
2026-02-15 03:58:04 -05:00
Kevin Turcios
eecd3ba4ce
Merge pull request #2425 from codeflash-ai/tessl-json-update
chore: update tessl config and add npm tiles
2026-02-15 03:57:09 -05:00
Kevin Turcios
6933fe07ac chore: add npm tessl tiles from tessl install 2026-02-15 03:56:05 -05:00
Kevin Turcios
9ab71ad672 chore: add .next/ to gitignore 2026-02-15 03:55:12 -05:00
Kevin Turcios
b6dc71421a chore: update tessl.json with npm tile entries 2026-02-15 03:54:13 -05:00
Kevin Turcios
e4050920d2
Merge pull request #2424 from codeflash-ai/tessl-setup
chore: add tessl tiles, claude skills, and local settings
2026-02-15 00:05:28 -05:00
Kevin Turcios
686fb9d156 chore: remove local claude settings files 2026-02-15 00:05:08 -05:00
Kevin Turcios
70c8df6bd4 chore: remove aiservice local claude settings 2026-02-15 00:04:31 -05:00
Kevin Turcios
aff375ed20 chore: add tessl tiles, claude skills, and local settings 2026-02-15 00:03:01 -05:00
Kevin Turcios
0b52998698
Merge pull request #2423 from codeflash-ai/slim-claude-md-internal
Add Tessl tiles for codeflash-internal
2026-02-14 22:36:40 -05:00
Kevin Turcios
db717aaedc
Merge branch 'main' into slim-claude-md-internal 2026-02-14 22:33:09 -05:00
Kevin Turcios
d02f4a1564 test: add evals for all three Tessl tiles 2026-02-14 22:25:30 -05:00
Kevin Turcios
dfc56f19a0 feat: add Tessl tiles for codeflash-internal (rules, docs, skills)
Three private tiles published to the codeflash workspace:
- codeflash-internal-rules: 6 eager rules (code-style, architecture,
  optimization-patterns, git-conventions, testing-rules, multi-language-handlers)
- codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline,
  test-generation-pipeline, context-extraction, aiservice/cf-api endpoints,
  configuration-thresholds, llm-provider-abstraction)
- codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure,
  add-language-support, add-api-endpoint, debug-test-generation)
2026-02-14 22:16:33 -05:00
Kevin Turcios
37ccd953b2
Merge pull request #2422 from codeflash-ai/slim-claude-md-internal
docs: restructure CLAUDE.md into modular rules
2026-02-14 19:37:15 -05:00
Kevin Turcios
c13835963c docs: restructure CLAUDE.md files into modular rules
Slim down CLAUDE.md files and move content into path-scoped
.claude/rules/ files to reduce context bloat.
2026-02-14 19:36:21 -05:00
Kevin Turcios
e75d105b35 docs: add new-branch-from-main rule to git guidelines 2026-02-14 19:02:57 -05:00
Kevin Turcios
a97a3cb4e5 fix: allow bots in duplicate code detector workflow 2026-02-14 19:02:16 -05:00
Kevin Turcios
ee855abd76 fix: use correct secret names for Foundry auth 2026-02-14 18:52:12 -05:00
Kevin Turcios
7c76052c65
chore: replace gh-aw duplicate detector with claude-code-action + Serena (#2420)
## Summary
- Replace gh-aw workflow (incompatible with Azure Foundry) with
claude-code-action + use_foundry
- Add Serena MCP server for semantic duplicate code analysis
- Runs on PR open/sync and manual dispatch
- Targets Python and TypeScript/JavaScript files
2026-02-14 18:50:05 -05:00
Kevin Turcios
ac9f7ad2b5
fix: configure duplicate code detector for Azure Foundry (#2419)
## Summary
- Add Foundry env vars (ANTHROPIC_FOUNDRY_API_KEY,
ANTHROPIC_FOUNDRY_BASE_URL) so the workflow authenticates via Azure
Foundry
- Fix Serena language config (javascript -> typescript)
2026-02-14 18:29:04 -05:00
Kevin Turcios
9c5ad8fe06
chore: add gh-aw duplicate code detector workflow (#2418)
## Summary
- Adds the GitHub Agentic Workflows duplicate code detector, configured
for Python and TypeScript/JavaScript with Serena semantic analysis
- Runs daily, flags patterns spanning 10+ lines or appearing in 3+
locations
- Creates up to 3 issues per run with `[duplicate-code]` prefix

## Notes
- Requires Claude API secret configured in repo Actions secrets
- `code-quality` and `automated-analysis` labels will be auto-created on
first run
2026-02-14 18:14:55 -05:00
Kevin Turcios
4c3deeb7b8
Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417)
## Summary

- Restructure CLAUDE.md hierarchy so Claude Code auto-discovers
project-specific instructions
- Delete dead `AGENTS.md` files (referenced non-existent
`.tessl/RULES.md`)
- Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery
- Create `js/CLAUDE.md` with package commands and gotchas
- Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded
rule)
- Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill)
- Add path-scoped rules for Python and Next.js patterns
- Add domain glossary, service architecture diagram, and per-package
gotchas

## Test plan

- Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/`
- Verify no remaining references to `AGENTS.md` or `.tessl/`
- Verify `.claude/rules/` and `.claude/skills/` files are committed
2026-02-14 17:13:09 -05:00
Kevin Turcios
e26a8ea486
Reorganize top-level feature modules under core/ (#2416)
## Summary

- Move `log_features/` → `core/log_features/` (Django app with
`managed=False` models, no DB impact)
- Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` →
`core/languages/python/` (Python-focused API modules)
- Update all imports across the codebase (19 files)

## Test plan

- [x] All 548 tests pass
- [x] No stale top-level imports (`from log_features.`, `from ranker.`,
etc.)
- [x] `log_features` AppConfig preserves `label = "log_features"` for
Django app registry compatibility
2026-02-14 17:07:40 -05:00
Kevin Turcios
6caf7469c6
Decouple language modules and remove stale cross-module code (#2415)
## Summary

- Extract testgen and optimizer API routers from
`core/languages/python/` into `core/shared/` with lazy imports,
eliminating cross-module coupling between language modules
- Delete stale JavaScript prompt files left in the Python module after
migration to `js_ts/`
- Remove backward-compat fallback paths for prompt files that already
exist at their new locations
- Remove unused `is_multi_context_any()` and its cross-language imports
- Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO

## Test plan

- [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS,
and Java
- [ ] Verify optimizer endpoint dispatches correctly for all languages
- [ ] Run existing testgen and optimizer tests
2026-02-14 00:09:44 -05:00
Kevin Turcios
2614393793
Add test_index to LLM call context for observability chat (#2414)
## Summary

- Pass test_index through LLM call context so observability chat can
attribute responses to specific test generation calls
- Fix SSE streaming to send keepalive pings from the start

CF-504
2026-02-13 23:49:20 -05:00
Sarthak Agarwal
c721723971
remove demo test loops (#2412) 2026-02-14 00:43:09 +05:30
Saurabh Misra
198c0c1a4e
codeflash-omni-java (#2335)
# Pull Request Checklist

## Description
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-02-13 23:26:55 +05:30
Kevin Turcios
ad26be10b8
Fix JS/TS cross-imports from Python module (#2396)
## Problem

The JS/TS language handler (`core/languages/js_ts/`) was importing
models, schemas, config, prompts, and helpers directly from the Python
language handler. This created a confusing architectural dependency and
risked serving wrong language-specific prompt content.

## What Changed

- Created `core/shared/` for genuinely language-agnostic code (optimizer
schemas, models, config, testgen models, context helpers)
- Moved JS/TS-specific prompts and context helpers into
`core/languages/js_ts/`
- Updated all consumers (20+ files) to import from the correct locations
- Removed backwards-compat re-exports from the Python module

## Result

- **Before:** 11 imports from `core.languages.python` in
`core/languages/js_ts/`
- **After:** 0
2026-02-12 22:34:38 -05:00
Kevin Turcios
0df421eccb
Add chat interface to observability timeline (#2395)
## Summary
- Chat panel on the observability timeline that uses Claude to answer
questions about optimization traces
- Tool-based context retrieval (fetches candidates, tests, errors on
demand instead of stuffing everything upfront)
- Uses `@anthropic-ai/sdk` via Azure AI Foundry
- Strengthened testgen prompts to ban mocks/fakes for test inputs
2026-02-12 20:45:33 -05:00
Kevin Turcios
8baf828634
chore: sync claude workflow with CLI repo (#2392)
## Summary
- Use claude-opus-4-6 model for both pr-review and claude-mention jobs
- Add mypy checks and consolidated summary comment (Steps 1 & 4) from
CLI workflow
- Add Edit tool and extra git/gh tools to allowed tools
2026-02-12 00:33:51 -05:00
Kevin Turcios
e28642cf22
Fix FTO display showing wrong function for methods with common names (#2391)
Store qualified function name (e.g., HttpInterface.__init__) and
file_path in testgen metadata instead of bare function_name (__init__).
Update the frontend parser to handle qualified names by splitting into
class + method and searching within the correct class using both
tree-sitter and regex. Prioritize the file matching filePath before
searching all files.

# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-02-12 00:30:33 -05:00
Sarthak Agarwal
55f0a8b60a
Restoring the ordering of webhook before parsing json (#2389)
# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-02-10 04:06:35 +05:30
Sarthak Agarwal
899db4ed56
Fix logging msg on webhook and misc (#2387) 2026-02-10 02:52:10 +05:30
Kevin Turcios
db973a0487
fix: relax testgen assertion rule to allow imports from function depe… (#2388)
…ndencies

The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs
to reinvent helpers like np.allclose using slow / inaccurate Python
loops. The new rule allows assertions from packages already imported by
the function under test.

# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-02-09 15:05:19 -05:00
HeshamHM28
ad1cb9f032
[Fix] Fallback to staging if we fail to create PR (#2332)
Fixes CF-1037
2026-02-09 20:08:47 +02:00
Kevin Turcios
4f7d1818ac
Optimize observability timeline and fix UI issues (#2386)
## Summary
- Optimize timeline data fetching/rendering with pre-computed maps and
reduced re-renders
- Split timeline monolith into focused components, lazy-load debug data,
use IntersectionObserver for active section tracking
- Optimize component rendering with `memo`, stable ref callbacks, and
pre-computed sort data
- Fix observability nav toggle not syncing with current URL pathname
- Fix Response button overlapping dialog close button in LLM debug
dialog
2026-02-09 12:18:55 -05:00
Kevin Turcios
629442cc5e
Restructure aiservice to language-first architecture (#2383)
## Summary
- Reorganizes `django/aiservice/` from feature-first layout (separate
`optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout
under `core/languages/{python,js_ts}/`
- Adds handler/registry/dispatcher pattern for routing requests to
language-specific implementations
- All existing module code preserved via `git mv` for history tracking;
no logic changes to existing modules

## What changed
- New `core/` app with registry, dispatcher, protocols, and error
hierarchy
- `PythonHandler` and `JSTypeScriptHandler` delegate to existing module
functions
- All imports updated across the codebase (views, tests,
adaptive_optimizer, etc.)
- Integration tests for handler registration and dispatch
- 155 files changed, ~880 additions / ~207 deletions (mostly import path
updates and moves)

## Test plan
- [ ] `python manage.py check` passes
- [ ] Integration tests in
`tests/integration/test_handler_integration.py` pass
- [ ] Existing test suite passes with updated import paths
- [ ] Ruff and ty clean on all new infrastructure files

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-09 09:15:50 -05:00
Kevin Turcios
968946d62d
feat: split diff view and LLM export for observability (#2384)
## Summary
- Add split (side-by-side) diff view to the observability timeline for
comparing original vs optimized code
- Fix scroll handler not updating active section + expand container for
candidates
- Add LLM export route that returns plain text markdown of the full
trace, accessible via button next to search bar

## Test plan
- [ ] Load a trace in observability and verify the split diff view
renders correctly
- [ ] Verify the "LLM Export" button appears next to Search when results
are loaded
- [ ] Click the button and verify the new tab returns raw markdown text
(no HTML chrome)
- [ ] Verify all sections are present: function info, original code,
tests, candidates, ranking, errors, summary, and prompts
2026-02-09 04:21:41 -05:00