Commit graph

6212 commits

Author SHA1 Message Date
Kevin Turcios
af3185edff fix: handle non-numeric patch suffixes and support Python 3.15 2026-02-23 03:36:50 -05:00
Sarthak Agarwal
2cb3d51ddb
fix issue with closed and merged PRs raising suggestion (#2436) 2026-02-21 01:23:55 +05:30
Sarthak Agarwal
eb5f4b460e
Migrate to AWS bedrock (#2430)
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1



Will require these for boto3 authentication
2026-02-20 23:52:48 +05:30
Kevin Turcios
7005156190
Merge pull request #2427 from codeflash-ai/class-constructor-notes
feat: add constructor notes for non-dataclass classes
2026-02-19 01:46:55 +00:00
Kevin Turcios
b5af1ca353
Merge branch 'main' into class-constructor-notes 2026-02-19 01:46:45 +00:00
Kevin Turcios
a69f67f68f
Merge pull request #2428 from codeflash-ai/fix-windows-skill-filenames
fix: rename skill files to be Windows-compatible
2026-02-17 05:06:29 +00:00
Kevin Turcios
a21eb7aba2 fix: rename skill files to be Windows-compatible
Renamed skill files from using colons to dashes (e.g., tessl:add-api-endpoint → tessl-add-api-endpoint) to fix checkout issues on Windows filesystems which don't allow colons in filenames.

Skills will continue to work as the files contain relative paths to .tessl directory and don't reference their own filenames.
2026-02-17 05:01:30 +00:00
Sarthak Agarwal
e22e5d1f8b
Add codeflash optimization workflow for cf-api and cf-webapp (#2411)
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-02-16 19:48:15 +05:30
claude[bot]
1bb1407c6b fix: resolve type checker errors 2026-02-15 12:33:05 +00:00
Kevin Turcios
d6a3c6254f feat: add constructor notes for non-dataclass classes with __init__
The LLM prompt preprocessing now highlights __init__ signatures for
regular classes, not just @dataclass ones, reducing brute-force
constructor guessing and pytest.skip() fallbacks in generated tests.
2026-02-15 07:29:05 -05:00
Kevin Turcios
38eda0c2d6
Merge pull request #2426 from codeflash-ai/observability-chat-codebase-browsing
feat: observability chat codebase browsing and model attribution fix
2026-02-15 06:50:59 -05:00
Kevin Turcios
e5d70443db fix: use positional insertion in log_features to preserve model attribution
log_features() appended test results in call-completion order, causing
model attribution swaps when LLM responses arrived out of order. Pass
test_index through and use positional insertion instead of append.
2026-02-15 03:58:05 -05:00
Kevin Turcios
496033539e fix: use flex column layout and SSR-safe localStorage for split pane
Replace sticky positioning + ResizeObserver height calc with a flex
column layout (h-screen container, flex-1 panel group) that reliably
fills the viewport. Drop useDefaultLayout hook (not SSR-safe) in favor
of manual localStorage persistence inside useEffect.
2026-02-15 03:58:04 -05:00
Kevin Turcios
bf826a6e0f feat: add resizable split pane and expandable tool results to observability chat
Replace the fixed 480px chat overlay with a draggable split-pane layout
using react-resizable-panels, and make tool rounds expandable to show
the actual data the agent retrieved (code, errors, LLM call details).
2026-02-15 03:58:04 -05:00
Kevin Turcios
a3f9c655f9 fix: guarantee text response when agent loop produces only thinking blocks
Remove MAX_TOOL_ROUNDS cap so the model decides when to stop calling
tools. Add a safety net that makes a final tool-free API call if the
loop ends without emitting any visible text, fixing empty assistant
bubbles. Clean up redundant comments.
2026-02-15 03:58:04 -05:00
Kevin Turcios
870968e7a7 refactor: restructure system prompt for Claude Opus 4.6 best practices
- Move trace data to top of prompt (long-context best practice: data
  before instructions improves quality ~30%)
- Wrap sections in XML tags (<trace_data>, <role>, <domain_knowledge>,
  <guidelines>, <use_parallel_tool_calls>) for better parseability
- Remove aggressive language (MUST, CRITICAL, HARD REQUIREMENT) that
  causes overtriggering on Opus 4.6
- Replace rigid 4-step investigation workflow with general guidelines
  to let adaptive thinking handle reasoning strategy
- Remove duplicate content (tool reference section, two checklists)
- Add <use_parallel_tool_calls> block per Anthropic's recommended pattern
- Tone down tool descriptions from directive to descriptive
- Net reduction: 49 fewer lines in system prompt
2026-02-15 03:58:04 -05:00
Kevin Turcios
ae2bff113d fix: prevent context blowup by redacting thinking blocks between rounds
Thinking blocks from previous tool rounds (10-50KB each) were
accumulating in conversation history, causing Azure AI Foundry to hang
after 4+ rounds. Redact thinking content before each API call while
preserving required block structure. Also adds per-round timeout safety
net and status indicators between rounds.
2026-02-15 03:58:04 -05:00
Kevin Turcios
fbcc283e97 perf: unify agent loop and pre-build lookup maps for O(1) tool calls
Eliminate redundant API call by extracting text from the loop's final
response directly instead of making a separate streaming call. Pre-build
candidatesBySource, candidatesById, and testModelMap in indexTraceData()
to replace repeated O(n) linear searches in tool calls and prompt
building. Combine cost/token aggregation into a single pass.
2026-02-15 03:58:04 -05:00
Kevin Turcios
b09262ccbc feat: add tool activity display and fix streaming timeout in observability chat
Restructure agent loop to use stream()+finalMessage() for all API calls,
fixing the SDK's non-streaming timeout error with max_tokens 32k. Add
parallel tool execution, tool activity bubbles in the frontend, and
restructure the system prompt for better investigation behavior.
2026-02-15 03:58:04 -05:00
Kevin Turcios
51372ca0ad feat: add debugging workflow and response checklist to observability chat prompt
Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS
section with structured guidance for get_llm_call_detail and codebase
browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND),
and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to
cite real file paths before responding.
2026-02-15 03:58:04 -05:00
Kevin Turcios
782ee508de feat: add codebase browsing and LLM call inspection to observability chat
Give the observability chat agent four new tools: get_llm_call_detail
(full prompt/response for any LLM call), read_file, search_code, and
list_directory for navigating the codeflash-internal and codeflash CLI
repos. This lets the agent trace problems end-to-end from trace data
through actual prompts to pipeline source code.

- Add id to IndexedTraceData.llmCalls so the agent can reference calls
- Make resolveToolCall async (Prisma + fs + child_process)
- Make processToolUseResponse async to match
- Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing
- Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars
- Path traversal protection, file size caps, search result limits
2026-02-15 03:58:04 -05:00
Kevin Turcios
eecd3ba4ce
Merge pull request #2425 from codeflash-ai/tessl-json-update
chore: update tessl config and add npm tiles
2026-02-15 03:57:09 -05:00
Kevin Turcios
6933fe07ac chore: add npm tessl tiles from tessl install 2026-02-15 03:56:05 -05:00
Kevin Turcios
9ab71ad672 chore: add .next/ to gitignore 2026-02-15 03:55:12 -05:00
Kevin Turcios
b6dc71421a chore: update tessl.json with npm tile entries 2026-02-15 03:54:13 -05:00
Kevin Turcios
e4050920d2
Merge pull request #2424 from codeflash-ai/tessl-setup
chore: add tessl tiles, claude skills, and local settings
2026-02-15 00:05:28 -05:00
Kevin Turcios
686fb9d156 chore: remove local claude settings files 2026-02-15 00:05:08 -05:00
Kevin Turcios
70c8df6bd4 chore: remove aiservice local claude settings 2026-02-15 00:04:31 -05:00
Kevin Turcios
aff375ed20 chore: add tessl tiles, claude skills, and local settings 2026-02-15 00:03:01 -05:00
Kevin Turcios
0b52998698
Merge pull request #2423 from codeflash-ai/slim-claude-md-internal
Add Tessl tiles for codeflash-internal
2026-02-14 22:36:40 -05:00
Kevin Turcios
db717aaedc
Merge branch 'main' into slim-claude-md-internal 2026-02-14 22:33:09 -05:00
Kevin Turcios
d02f4a1564 test: add evals for all three Tessl tiles 2026-02-14 22:25:30 -05:00
Kevin Turcios
dfc56f19a0 feat: add Tessl tiles for codeflash-internal (rules, docs, skills)
Three private tiles published to the codeflash workspace:
- codeflash-internal-rules: 6 eager rules (code-style, architecture,
  optimization-patterns, git-conventions, testing-rules, multi-language-handlers)
- codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline,
  test-generation-pipeline, context-extraction, aiservice/cf-api endpoints,
  configuration-thresholds, llm-provider-abstraction)
- codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure,
  add-language-support, add-api-endpoint, debug-test-generation)
2026-02-14 22:16:33 -05:00
Kevin Turcios
37ccd953b2
Merge pull request #2422 from codeflash-ai/slim-claude-md-internal
docs: restructure CLAUDE.md into modular rules
2026-02-14 19:37:15 -05:00
Kevin Turcios
c13835963c docs: restructure CLAUDE.md files into modular rules
Slim down CLAUDE.md files and move content into path-scoped
.claude/rules/ files to reduce context bloat.
2026-02-14 19:36:21 -05:00
Kevin Turcios
e75d105b35 docs: add new-branch-from-main rule to git guidelines 2026-02-14 19:02:57 -05:00
Kevin Turcios
a97a3cb4e5 fix: allow bots in duplicate code detector workflow 2026-02-14 19:02:16 -05:00
Kevin Turcios
ee855abd76 fix: use correct secret names for Foundry auth 2026-02-14 18:52:12 -05:00
Kevin Turcios
7c76052c65
chore: replace gh-aw duplicate detector with claude-code-action + Serena (#2420)
## Summary
- Replace gh-aw workflow (incompatible with Azure Foundry) with
claude-code-action + use_foundry
- Add Serena MCP server for semantic duplicate code analysis
- Runs on PR open/sync and manual dispatch
- Targets Python and TypeScript/JavaScript files
2026-02-14 18:50:05 -05:00
Kevin Turcios
ac9f7ad2b5
fix: configure duplicate code detector for Azure Foundry (#2419)
## Summary
- Add Foundry env vars (ANTHROPIC_FOUNDRY_API_KEY,
ANTHROPIC_FOUNDRY_BASE_URL) so the workflow authenticates via Azure
Foundry
- Fix Serena language config (javascript -> typescript)
2026-02-14 18:29:04 -05:00
Kevin Turcios
9c5ad8fe06
chore: add gh-aw duplicate code detector workflow (#2418)
## Summary
- Adds the GitHub Agentic Workflows duplicate code detector, configured
for Python and TypeScript/JavaScript with Serena semantic analysis
- Runs daily, flags patterns spanning 10+ lines or appearing in 3+
locations
- Creates up to 3 issues per run with `[duplicate-code]` prefix

## Notes
- Requires Claude API secret configured in repo Actions secrets
- `code-quality` and `automated-analysis` labels will be auto-created on
first run
2026-02-14 18:14:55 -05:00
Kevin Turcios
4c3deeb7b8
Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417)
## Summary

- Restructure CLAUDE.md hierarchy so Claude Code auto-discovers
project-specific instructions
- Delete dead `AGENTS.md` files (referenced non-existent
`.tessl/RULES.md`)
- Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery
- Create `js/CLAUDE.md` with package commands and gotchas
- Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded
rule)
- Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill)
- Add path-scoped rules for Python and Next.js patterns
- Add domain glossary, service architecture diagram, and per-package
gotchas

## Test plan

- Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/`
- Verify no remaining references to `AGENTS.md` or `.tessl/`
- Verify `.claude/rules/` and `.claude/skills/` files are committed
2026-02-14 17:13:09 -05:00
Kevin Turcios
e26a8ea486
Reorganize top-level feature modules under core/ (#2416)
## Summary

- Move `log_features/` → `core/log_features/` (Django app with
`managed=False` models, no DB impact)
- Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` →
`core/languages/python/` (Python-focused API modules)
- Update all imports across the codebase (19 files)

## Test plan

- [x] All 548 tests pass
- [x] No stale top-level imports (`from log_features.`, `from ranker.`,
etc.)
- [x] `log_features` AppConfig preserves `label = "log_features"` for
Django app registry compatibility
2026-02-14 17:07:40 -05:00
Kevin Turcios
6caf7469c6
Decouple language modules and remove stale cross-module code (#2415)
## Summary

- Extract testgen and optimizer API routers from
`core/languages/python/` into `core/shared/` with lazy imports,
eliminating cross-module coupling between language modules
- Delete stale JavaScript prompt files left in the Python module after
migration to `js_ts/`
- Remove backward-compat fallback paths for prompt files that already
exist at their new locations
- Remove unused `is_multi_context_any()` and its cross-language imports
- Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO

## Test plan

- [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS,
and Java
- [ ] Verify optimizer endpoint dispatches correctly for all languages
- [ ] Run existing testgen and optimizer tests
2026-02-14 00:09:44 -05:00
Kevin Turcios
2614393793
Add test_index to LLM call context for observability chat (#2414)
## Summary

- Pass test_index through LLM call context so observability chat can
attribute responses to specific test generation calls
- Fix SSE streaming to send keepalive pings from the start

CF-504
2026-02-13 23:49:20 -05:00
Sarthak Agarwal
c721723971
remove demo test loops (#2412) 2026-02-14 00:43:09 +05:30
Saurabh Misra
198c0c1a4e
codeflash-omni-java (#2335)
# Pull Request Checklist

## Description
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
2026-02-13 23:26:55 +05:30
Kevin Turcios
ad26be10b8
Fix JS/TS cross-imports from Python module (#2396)
## Problem

The JS/TS language handler (`core/languages/js_ts/`) was importing
models, schemas, config, prompts, and helpers directly from the Python
language handler. This created a confusing architectural dependency and
risked serving wrong language-specific prompt content.

## What Changed

- Created `core/shared/` for genuinely language-agnostic code (optimizer
schemas, models, config, testgen models, context helpers)
- Moved JS/TS-specific prompts and context helpers into
`core/languages/js_ts/`
- Updated all consumers (20+ files) to import from the correct locations
- Removed backwards-compat re-exports from the Python module

## Result

- **Before:** 11 imports from `core.languages.python` in
`core/languages/js_ts/`
- **After:** 0
2026-02-12 22:34:38 -05:00
Kevin Turcios
0df421eccb
Add chat interface to observability timeline (#2395)
## Summary
- Chat panel on the observability timeline that uses Claude to answer
questions about optimization traces
- Tool-based context retrieval (fetches candidates, tests, errors on
demand instead of stuffing everything upfront)
- Uses `@anthropic-ai/sdk` via Azure AI Foundry
- Strengthened testgen prompts to ban mocks/fakes for test inputs
2026-02-12 20:45:33 -05:00
Kevin Turcios
8baf828634
chore: sync claude workflow with CLI repo (#2392)
## Summary
- Use claude-opus-4-6 model for both pr-review and claude-mention jobs
- Add mypy checks and consolidated summary comment (Steps 1 & 4) from
CLI workflow
- Add Edit tool and extra git/gh tools to allowed tools
2026-02-12 00:33:51 -05:00