codeflash-internal

Author	SHA1	Message	Date
claude[bot]	1bb1407c6b	fix: resolve type checker errors	2026-02-15 12:33:05 +00:00
Kevin Turcios	d6a3c6254f	feat: add constructor notes for non-dataclass classes with __init__ The LLM prompt preprocessing now highlights __init__ signatures for regular classes, not just @dataclass ones, reducing brute-force constructor guessing and pytest.skip() fallbacks in generated tests.	2026-02-15 07:29:05 -05:00
Kevin Turcios	38eda0c2d6	Merge pull request #2426 from codeflash-ai/observability-chat-codebase-browsing feat: observability chat codebase browsing and model attribution fix	2026-02-15 06:50:59 -05:00
Kevin Turcios	e5d70443db	fix: use positional insertion in log_features to preserve model attribution log_features() appended test results in call-completion order, causing model attribution swaps when LLM responses arrived out of order. Pass test_index through and use positional insertion instead of append.	2026-02-15 03:58:05 -05:00
Kevin Turcios	496033539e	fix: use flex column layout and SSR-safe localStorage for split pane Replace sticky positioning + ResizeObserver height calc with a flex column layout (h-screen container, flex-1 panel group) that reliably fills the viewport. Drop useDefaultLayout hook (not SSR-safe) in favor of manual localStorage persistence inside useEffect.	2026-02-15 03:58:04 -05:00
Kevin Turcios	bf826a6e0f	feat: add resizable split pane and expandable tool results to observability chat Replace the fixed 480px chat overlay with a draggable split-pane layout using react-resizable-panels, and make tool rounds expandable to show the actual data the agent retrieved (code, errors, LLM call details).	2026-02-15 03:58:04 -05:00
Kevin Turcios	a3f9c655f9	fix: guarantee text response when agent loop produces only thinking blocks Remove MAX_TOOL_ROUNDS cap so the model decides when to stop calling tools. Add a safety net that makes a final tool-free API call if the loop ends without emitting any visible text, fixing empty assistant bubbles. Clean up redundant comments.	2026-02-15 03:58:04 -05:00
Kevin Turcios	870968e7a7	refactor: restructure system prompt for Claude Opus 4.6 best practices - Move trace data to top of prompt (long-context best practice: data before instructions improves quality ~30%) - Wrap sections in XML tags (<trace_data>, <role>, <domain_knowledge>, <guidelines>, <use_parallel_tool_calls>) for better parseability - Remove aggressive language (MUST, CRITICAL, HARD REQUIREMENT) that causes overtriggering on Opus 4.6 - Replace rigid 4-step investigation workflow with general guidelines to let adaptive thinking handle reasoning strategy - Remove duplicate content (tool reference section, two checklists) - Add <use_parallel_tool_calls> block per Anthropic's recommended pattern - Tone down tool descriptions from directive to descriptive - Net reduction: 49 fewer lines in system prompt	2026-02-15 03:58:04 -05:00
Kevin Turcios	ae2bff113d	fix: prevent context blowup by redacting thinking blocks between rounds Thinking blocks from previous tool rounds (10-50KB each) were accumulating in conversation history, causing Azure AI Foundry to hang after 4+ rounds. Redact thinking content before each API call while preserving required block structure. Also adds per-round timeout safety net and status indicators between rounds.	2026-02-15 03:58:04 -05:00
Kevin Turcios	fbcc283e97	perf: unify agent loop and pre-build lookup maps for O(1) tool calls Eliminate redundant API call by extracting text from the loop's final response directly instead of making a separate streaming call. Pre-build candidatesBySource, candidatesById, and testModelMap in indexTraceData() to replace repeated O(n) linear searches in tool calls and prompt building. Combine cost/token aggregation into a single pass.	2026-02-15 03:58:04 -05:00
Kevin Turcios	b09262ccbc	feat: add tool activity display and fix streaming timeout in observability chat Restructure agent loop to use stream()+finalMessage() for all API calls, fixing the SDK's non-streaming timeout error with max_tokens 32k. Add parallel tool execution, tool activity bubbles in the frontend, and restructure the system prompt for better investigation behavior.	2026-02-15 03:58:04 -05:00
Kevin Turcios	51372ca0ad	feat: add debugging workflow and response checklist to observability chat prompt Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS section with structured guidance for get_llm_call_detail and codebase browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND), and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to cite real file paths before responding.	2026-02-15 03:58:04 -05:00
Kevin Turcios	782ee508de	feat: add codebase browsing and LLM call inspection to observability chat Give the observability chat agent four new tools: get_llm_call_detail (full prompt/response for any LLM call), read_file, search_code, and list_directory for navigating the codeflash-internal and codeflash CLI repos. This lets the agent trace problems end-to-end from trace data through actual prompts to pipeline source code. - Add id to IndexedTraceData.llmCalls so the agent can reference calls - Make resolveToolCall async (Prisma + fs + child_process) - Make processToolUseResponse async to match - Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing - Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars - Path traversal protection, file size caps, search result limits	2026-02-15 03:58:04 -05:00
Kevin Turcios	eecd3ba4ce	Merge pull request #2425 from codeflash-ai/tessl-json-update chore: update tessl config and add npm tiles	2026-02-15 03:57:09 -05:00
Kevin Turcios	6933fe07ac	chore: add npm tessl tiles from tessl install	2026-02-15 03:56:05 -05:00
Kevin Turcios	9ab71ad672	chore: add .next/ to gitignore	2026-02-15 03:55:12 -05:00
Kevin Turcios	b6dc71421a	chore: update tessl.json with npm tile entries	2026-02-15 03:54:13 -05:00
Kevin Turcios	e4050920d2	Merge pull request #2424 from codeflash-ai/tessl-setup chore: add tessl tiles, claude skills, and local settings	2026-02-15 00:05:28 -05:00
Kevin Turcios	686fb9d156	chore: remove local claude settings files	2026-02-15 00:05:08 -05:00
Kevin Turcios	70c8df6bd4	chore: remove aiservice local claude settings	2026-02-15 00:04:31 -05:00
Kevin Turcios	aff375ed20	chore: add tessl tiles, claude skills, and local settings	2026-02-15 00:03:01 -05:00
Kevin Turcios	0b52998698	Merge pull request #2423 from codeflash-ai/slim-claude-md-internal Add Tessl tiles for codeflash-internal	2026-02-14 22:36:40 -05:00
Kevin Turcios	db717aaedc	Merge branch 'main' into slim-claude-md-internal	2026-02-14 22:33:09 -05:00
Kevin Turcios	d02f4a1564	test: add evals for all three Tessl tiles	2026-02-14 22:25:30 -05:00
Kevin Turcios	dfc56f19a0	feat: add Tessl tiles for codeflash-internal (rules, docs, skills) Three private tiles published to the codeflash workspace: - codeflash-internal-rules: 6 eager rules (code-style, architecture, optimization-patterns, git-conventions, testing-rules, multi-language-handlers) - codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline, test-generation-pipeline, context-extraction, aiservice/cf-api endpoints, configuration-thresholds, llm-provider-abstraction) - codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure, add-language-support, add-api-endpoint, debug-test-generation)	2026-02-14 22:16:33 -05:00
Kevin Turcios	37ccd953b2	Merge pull request #2422 from codeflash-ai/slim-claude-md-internal docs: restructure CLAUDE.md into modular rules	2026-02-14 19:37:15 -05:00
Kevin Turcios	c13835963c	docs: restructure CLAUDE.md files into modular rules Slim down CLAUDE.md files and move content into path-scoped .claude/rules/ files to reduce context bloat.	2026-02-14 19:36:21 -05:00
Kevin Turcios	e75d105b35	docs: add new-branch-from-main rule to git guidelines	2026-02-14 19:02:57 -05:00
Kevin Turcios	a97a3cb4e5	fix: allow bots in duplicate code detector workflow	2026-02-14 19:02:16 -05:00
Kevin Turcios	ee855abd76	fix: use correct secret names for Foundry auth	2026-02-14 18:52:12 -05:00
Kevin Turcios	7c76052c65	chore: replace gh-aw duplicate detector with claude-code-action + Serena (#2420 ) ## Summary - Replace gh-aw workflow (incompatible with Azure Foundry) with claude-code-action + use_foundry - Add Serena MCP server for semantic duplicate code analysis - Runs on PR open/sync and manual dispatch - Targets Python and TypeScript/JavaScript files	2026-02-14 18:50:05 -05:00
Kevin Turcios	ac9f7ad2b5	fix: configure duplicate code detector for Azure Foundry (#2419 ) ## Summary - Add Foundry env vars (ANTHROPIC_FOUNDRY_API_KEY, ANTHROPIC_FOUNDRY_BASE_URL) so the workflow authenticates via Azure Foundry - Fix Serena language config (javascript -> typescript)	2026-02-14 18:29:04 -05:00
Kevin Turcios	9c5ad8fe06	chore: add gh-aw duplicate code detector workflow (#2418 ) ## Summary - Adds the GitHub Agentic Workflows duplicate code detector, configured for Python and TypeScript/JavaScript with Serena semantic analysis - Runs daily, flags patterns spanning 10+ lines or appearing in 3+ locations - Creates up to 3 issues per run with `[duplicate-code]` prefix ## Notes - Requires Claude API secret configured in repo Actions secrets - `code-quality` and `automated-analysis` labels will be auto-created on first run	2026-02-14 18:14:55 -05:00
Kevin Turcios	4c3deeb7b8	Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417 ) ## Summary - Restructure CLAUDE.md hierarchy so Claude Code auto-discovers project-specific instructions - Delete dead `AGENTS.md` files (referenced non-existent `.tessl/RULES.md`) - Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery - Create `js/CLAUDE.md` with package commands and gotchas - Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded rule) - Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill) - Add path-scoped rules for Python and Next.js patterns - Add domain glossary, service architecture diagram, and per-package gotchas ## Test plan - Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/` - Verify no remaining references to `AGENTS.md` or `.tessl/` - Verify `.claude/rules/` and `.claude/skills/` files are committed	2026-02-14 17:13:09 -05:00
Kevin Turcios	e26a8ea486	Reorganize top-level feature modules under core/ (#2416 ) ## Summary - Move `log_features/` → `core/log_features/` (Django app with `managed=False` models, no DB impact) - Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` → `core/languages/python/` (Python-focused API modules) - Update all imports across the codebase (19 files) ## Test plan - [x] All 548 tests pass - [x] No stale top-level imports (`from log_features.`, `from ranker.`, etc.) - [x] `log_features` AppConfig preserves `label = "log_features"` for Django app registry compatibility	2026-02-14 17:07:40 -05:00
Kevin Turcios	6caf7469c6	Decouple language modules and remove stale cross-module code (#2415 ) ## Summary - Extract testgen and optimizer API routers from `core/languages/python/` into `core/shared/` with lazy imports, eliminating cross-module coupling between language modules - Delete stale JavaScript prompt files left in the Python module after migration to `js_ts/` - Remove backward-compat fallback paths for prompt files that already exist at their new locations - Remove unused `is_multi_context_any()` and its cross-language imports - Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO ## Test plan - [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS, and Java - [ ] Verify optimizer endpoint dispatches correctly for all languages - [ ] Run existing testgen and optimizer tests	2026-02-14 00:09:44 -05:00
Kevin Turcios	2614393793	Add test_index to LLM call context for observability chat (#2414 ) ## Summary - Pass test_index through LLM call context so observability chat can attribute responses to specific test generation calls - Fix SSE streaming to send keepalive pings from the start CF-504	2026-02-13 23:49:20 -05:00
Sarthak Agarwal	c721723971	remove demo test loops (#2412 )	2026-02-14 00:43:09 +05:30
Saurabh Misra	198c0c1a4e	codeflash-omni-java (#2335 ) # Pull Request Checklist ## Description - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com> Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>	2026-02-13 23:26:55 +05:30
Kevin Turcios	ad26be10b8	Fix JS/TS cross-imports from Python module (#2396 ) ## Problem The JS/TS language handler (`core/languages/js_ts/`) was importing models, schemas, config, prompts, and helpers directly from the Python language handler. This created a confusing architectural dependency and risked serving wrong language-specific prompt content. ## What Changed - Created `core/shared/` for genuinely language-agnostic code (optimizer schemas, models, config, testgen models, context helpers) - Moved JS/TS-specific prompts and context helpers into `core/languages/js_ts/` - Updated all consumers (20+ files) to import from the correct locations - Removed backwards-compat re-exports from the Python module ## Result - Before: 11 imports from `core.languages.python` in `core/languages/js_ts/` - After: 0	2026-02-12 22:34:38 -05:00
Kevin Turcios	0df421eccb	Add chat interface to observability timeline (#2395 ) ## Summary - Chat panel on the observability timeline that uses Claude to answer questions about optimization traces - Tool-based context retrieval (fetches candidates, tests, errors on demand instead of stuffing everything upfront) - Uses `@anthropic-ai/sdk` via Azure AI Foundry - Strengthened testgen prompts to ban mocks/fakes for test inputs	2026-02-12 20:45:33 -05:00
Kevin Turcios	8baf828634	chore: sync claude workflow with CLI repo (#2392 ) ## Summary - Use claude-opus-4-6 model for both pr-review and claude-mention jobs - Add mypy checks and consolidated summary comment (Steps 1 & 4) from CLI workflow - Add Edit tool and extra git/gh tools to allowed tools	2026-02-12 00:33:51 -05:00
Kevin Turcios	e28642cf22	Fix FTO display showing wrong function for methods with common names (#2391 ) Store qualified function name (e.g., HttpInterface.__init__) and file_path in testgen metadata instead of bare function_name (__init__). Update the frontend parser to handle qualified names by splitting into class + method and searching within the correct class using both tree-sitter and regex. Prioritize the file matching filePath before searching all files. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-12 00:30:33 -05:00
Sarthak Agarwal	55f0a8b60a	Restoring the ordering of webhook before parsing json (#2389 ) # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-10 04:06:35 +05:30
Sarthak Agarwal	899db4ed56	Fix logging msg on webhook and misc (#2387 )	2026-02-10 02:52:10 +05:30
Kevin Turcios	db973a0487	fix: relax testgen assertion rule to allow imports from function depe… (#2388 ) …ndencies The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs to reinvent helpers like np.allclose using slow / inaccurate Python loops. The new rule allows assertions from packages already imported by the function under test. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-09 15:05:19 -05:00
HeshamHM28	ad1cb9f032	[Fix] Fallback to staging if we fail to create PR (#2332 ) Fixes CF-1037	2026-02-09 20:08:47 +02:00
Kevin Turcios	4f7d1818ac	Optimize observability timeline and fix UI issues (#2386 ) ## Summary - Optimize timeline data fetching/rendering with pre-computed maps and reduced re-renders - Split timeline monolith into focused components, lazy-load debug data, use IntersectionObserver for active section tracking - Optimize component rendering with `memo`, stable ref callbacks, and pre-computed sort data - Fix observability nav toggle not syncing with current URL pathname - Fix Response button overlapping dialog close button in LLM debug dialog	2026-02-09 12:18:55 -05:00
Kevin Turcios	629442cc5e	Restructure aiservice to language-first architecture (#2383 ) ## Summary - Reorganizes `django/aiservice/` from feature-first layout (separate `optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout under `core/languages/{python,js_ts}/` - Adds handler/registry/dispatcher pattern for routing requests to language-specific implementations - All existing module code preserved via `git mv` for history tracking; no logic changes to existing modules ## What changed - New `core/` app with registry, dispatcher, protocols, and error hierarchy - `PythonHandler` and `JSTypeScriptHandler` delegate to existing module functions - All imports updated across the codebase (views, tests, adaptive_optimizer, etc.) - Integration tests for handler registration and dispatch - 155 files changed, ~880 additions / ~207 deletions (mostly import path updates and moves) ## Test plan - [ ] `python manage.py check` passes - [ ] Integration tests in `tests/integration/test_handler_integration.py` pass - [ ] Existing test suite passes with updated import paths - [ ] Ruff and ty clean on all new infrastructure files --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-09 09:15:50 -05:00
Kevin Turcios	968946d62d	feat: split diff view and LLM export for observability (#2384 ) ## Summary - Add split (side-by-side) diff view to the observability timeline for comparing original vs optimized code - Fix scroll handler not updating active section + expand container for candidates - Add LLM export route that returns plain text markdown of the full trace, accessible via button next to search bar ## Test plan - [ ] Load a trace in observability and verify the split diff view renders correctly - [ ] Verify the "LLM Export" button appears next to Search when results are loaded - [ ] Click the button and verify the new tab returns raw markdown text (no HTML chrome) - [ ] Verify all sections are present: function info, original code, tests, candidates, ranking, errors, summary, and prompts	2026-02-09 04:21:41 -05:00

1 2 3 4 5 ...

6204 commits