codeflash-internal

Author	SHA1	Message	Date
Kevin Turcios	fbcc283e97	perf: unify agent loop and pre-build lookup maps for O(1) tool calls Eliminate redundant API call by extracting text from the loop's final response directly instead of making a separate streaming call. Pre-build candidatesBySource, candidatesById, and testModelMap in indexTraceData() to replace repeated O(n) linear searches in tool calls and prompt building. Combine cost/token aggregation into a single pass.	2026-02-15 03:58:04 -05:00
Kevin Turcios	b09262ccbc	feat: add tool activity display and fix streaming timeout in observability chat Restructure agent loop to use stream()+finalMessage() for all API calls, fixing the SDK's non-streaming timeout error with max_tokens 32k. Add parallel tool execution, tool activity bubbles in the frontend, and restructure the system prompt for better investigation behavior.	2026-02-15 03:58:04 -05:00
Kevin Turcios	51372ca0ad	feat: add debugging workflow and response checklist to observability chat prompt Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS section with structured guidance for get_llm_call_detail and codebase browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND), and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to cite real file paths before responding.	2026-02-15 03:58:04 -05:00
Kevin Turcios	782ee508de	feat: add codebase browsing and LLM call inspection to observability chat Give the observability chat agent four new tools: get_llm_call_detail (full prompt/response for any LLM call), read_file, search_code, and list_directory for navigating the codeflash-internal and codeflash CLI repos. This lets the agent trace problems end-to-end from trace data through actual prompts to pipeline source code. - Add id to IndexedTraceData.llmCalls so the agent can reference calls - Make resolveToolCall async (Prisma + fs + child_process) - Make processToolUseResponse async to match - Bump MAX_TOOL_ROUNDS from 5 to 15 for multi-step code browsing - Add CODEFLASH_INTERNAL_REPO_PATH / CODEFLASH_CLI_REPO_PATH env vars - Path traversal protection, file size caps, search result limits	2026-02-15 03:58:04 -05:00
Kevin Turcios	eecd3ba4ce	Merge pull request #2425 from codeflash-ai/tessl-json-update chore: update tessl config and add npm tiles	2026-02-15 03:57:09 -05:00
Kevin Turcios	6933fe07ac	chore: add npm tessl tiles from tessl install	2026-02-15 03:56:05 -05:00
Kevin Turcios	9ab71ad672	chore: add .next/ to gitignore	2026-02-15 03:55:12 -05:00
Kevin Turcios	b6dc71421a	chore: update tessl.json with npm tile entries	2026-02-15 03:54:13 -05:00
Kevin Turcios	e4050920d2	Merge pull request #2424 from codeflash-ai/tessl-setup chore: add tessl tiles, claude skills, and local settings	2026-02-15 00:05:28 -05:00
Kevin Turcios	686fb9d156	chore: remove local claude settings files	2026-02-15 00:05:08 -05:00
Kevin Turcios	70c8df6bd4	chore: remove aiservice local claude settings	2026-02-15 00:04:31 -05:00
Kevin Turcios	aff375ed20	chore: add tessl tiles, claude skills, and local settings	2026-02-15 00:03:01 -05:00
Kevin Turcios	0b52998698	Merge pull request #2423 from codeflash-ai/slim-claude-md-internal Add Tessl tiles for codeflash-internal	2026-02-14 22:36:40 -05:00
Kevin Turcios	db717aaedc	Merge branch 'main' into slim-claude-md-internal	2026-02-14 22:33:09 -05:00
Kevin Turcios	d02f4a1564	test: add evals for all three Tessl tiles	2026-02-14 22:25:30 -05:00
Kevin Turcios	dfc56f19a0	feat: add Tessl tiles for codeflash-internal (rules, docs, skills) Three private tiles published to the codeflash workspace: - codeflash-internal-rules: 6 eager rules (code-style, architecture, optimization-patterns, git-conventions, testing-rules, multi-language-handlers) - codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline, test-generation-pipeline, context-extraction, aiservice/cf-api endpoints, configuration-thresholds, llm-provider-abstraction) - codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure, add-language-support, add-api-endpoint, debug-test-generation)	2026-02-14 22:16:33 -05:00
Kevin Turcios	37ccd953b2	Merge pull request #2422 from codeflash-ai/slim-claude-md-internal docs: restructure CLAUDE.md into modular rules	2026-02-14 19:37:15 -05:00
Kevin Turcios	c13835963c	docs: restructure CLAUDE.md files into modular rules Slim down CLAUDE.md files and move content into path-scoped .claude/rules/ files to reduce context bloat.	2026-02-14 19:36:21 -05:00
Kevin Turcios	e75d105b35	docs: add new-branch-from-main rule to git guidelines	2026-02-14 19:02:57 -05:00
Kevin Turcios	a97a3cb4e5	fix: allow bots in duplicate code detector workflow	2026-02-14 19:02:16 -05:00
Kevin Turcios	ee855abd76	fix: use correct secret names for Foundry auth	2026-02-14 18:52:12 -05:00
Kevin Turcios	7c76052c65	chore: replace gh-aw duplicate detector with claude-code-action + Serena (#2420 ) ## Summary - Replace gh-aw workflow (incompatible with Azure Foundry) with claude-code-action + use_foundry - Add Serena MCP server for semantic duplicate code analysis - Runs on PR open/sync and manual dispatch - Targets Python and TypeScript/JavaScript files	2026-02-14 18:50:05 -05:00
Kevin Turcios	ac9f7ad2b5	fix: configure duplicate code detector for Azure Foundry (#2419 ) ## Summary - Add Foundry env vars (ANTHROPIC_FOUNDRY_API_KEY, ANTHROPIC_FOUNDRY_BASE_URL) so the workflow authenticates via Azure Foundry - Fix Serena language config (javascript -> typescript)	2026-02-14 18:29:04 -05:00
Kevin Turcios	9c5ad8fe06	chore: add gh-aw duplicate code detector workflow (#2418 ) ## Summary - Adds the GitHub Agentic Workflows duplicate code detector, configured for Python and TypeScript/JavaScript with Serena semantic analysis - Runs daily, flags patterns spanning 10+ lines or appearing in 3+ locations - Creates up to 3 issues per run with `[duplicate-code]` prefix ## Notes - Requires Claude API secret configured in repo Actions secrets - `code-quality` and `automated-analysis` labels will be auto-created on first run	2026-02-14 18:14:55 -05:00
Kevin Turcios	4c3deeb7b8	Restructure CLAUDE.md files and add path-scoped rules for monorepo (#2417 ) ## Summary - Restructure CLAUDE.md hierarchy so Claude Code auto-discovers project-specific instructions - Delete dead `AGENTS.md` files (referenced non-existent `.tessl/RULES.md`) - Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery - Create `js/CLAUDE.md` with package commands and gotchas - Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded rule) - Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill) - Add path-scoped rules for Python and Next.js patterns - Add domain glossary, service architecture diagram, and per-package gotchas ## Test plan - Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/` - Verify no remaining references to `AGENTS.md` or `.tessl/` - Verify `.claude/rules/` and `.claude/skills/` files are committed	2026-02-14 17:13:09 -05:00
Kevin Turcios	e26a8ea486	Reorganize top-level feature modules under core/ (#2416 ) ## Summary - Move `log_features/` → `core/log_features/` (Django app with `managed=False` models, no DB impact) - Move `ranker/`, `workflow_gen/`, `adaptive_optimizer/` → `core/languages/python/` (Python-focused API modules) - Update all imports across the codebase (19 files) ## Test plan - [x] All 548 tests pass - [x] No stale top-level imports (`from log_features.`, `from ranker.`, etc.) - [x] `log_features` AppConfig preserves `label = "log_features"` for Django app registry compatibility	2026-02-14 17:07:40 -05:00
Kevin Turcios	6caf7469c6	Decouple language modules and remove stale cross-module code (#2415 ) ## Summary - Extract testgen and optimizer API routers from `core/languages/python/` into `core/shared/` with lazy imports, eliminating cross-module coupling between language modules - Delete stale JavaScript prompt files left in the Python module after migration to `js_ts/` - Remove backward-compat fallback paths for prompt files that already exist at their new locations - Remove unused `is_multi_context_any()` and its cross-language imports - Remove unused `BEGIN_PATCH`/`END_PATCH` constants and stale TODO ## Test plan - [ ] Verify testgen endpoint dispatches correctly for Python, JS/TS, and Java - [ ] Verify optimizer endpoint dispatches correctly for all languages - [ ] Run existing testgen and optimizer tests	2026-02-14 00:09:44 -05:00
Kevin Turcios	2614393793	Add test_index to LLM call context for observability chat (#2414 ) ## Summary - Pass test_index through LLM call context so observability chat can attribute responses to specific test generation calls - Fix SSE streaming to send keepalive pings from the start CF-504	2026-02-13 23:49:20 -05:00
Sarthak Agarwal	c721723971	remove demo test loops (#2412 )	2026-02-14 00:43:09 +05:30
Saurabh Misra	198c0c1a4e	codeflash-omni-java (#2335 ) # Pull Request Checklist ## Description - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: HeshamHM28 <HeshamMohamedFathy@outlook.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-39-200.ec2.internal> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com> Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>	2026-02-13 23:26:55 +05:30
Kevin Turcios	ad26be10b8	Fix JS/TS cross-imports from Python module (#2396 ) ## Problem The JS/TS language handler (`core/languages/js_ts/`) was importing models, schemas, config, prompts, and helpers directly from the Python language handler. This created a confusing architectural dependency and risked serving wrong language-specific prompt content. ## What Changed - Created `core/shared/` for genuinely language-agnostic code (optimizer schemas, models, config, testgen models, context helpers) - Moved JS/TS-specific prompts and context helpers into `core/languages/js_ts/` - Updated all consumers (20+ files) to import from the correct locations - Removed backwards-compat re-exports from the Python module ## Result - Before: 11 imports from `core.languages.python` in `core/languages/js_ts/` - After: 0	2026-02-12 22:34:38 -05:00
Kevin Turcios	0df421eccb	Add chat interface to observability timeline (#2395 ) ## Summary - Chat panel on the observability timeline that uses Claude to answer questions about optimization traces - Tool-based context retrieval (fetches candidates, tests, errors on demand instead of stuffing everything upfront) - Uses `@anthropic-ai/sdk` via Azure AI Foundry - Strengthened testgen prompts to ban mocks/fakes for test inputs	2026-02-12 20:45:33 -05:00
Kevin Turcios	8baf828634	chore: sync claude workflow with CLI repo (#2392 ) ## Summary - Use claude-opus-4-6 model for both pr-review and claude-mention jobs - Add mypy checks and consolidated summary comment (Steps 1 & 4) from CLI workflow - Add Edit tool and extra git/gh tools to allowed tools	2026-02-12 00:33:51 -05:00
Kevin Turcios	e28642cf22	Fix FTO display showing wrong function for methods with common names (#2391 ) Store qualified function name (e.g., HttpInterface.__init__) and file_path in testgen metadata instead of bare function_name (__init__). Update the frontend parser to handle qualified names by splitting into class + method and searching within the correct class using both tree-sitter and regex. Prioritize the file matching filePath before searching all files. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-12 00:30:33 -05:00
Sarthak Agarwal	55f0a8b60a	Restoring the ordering of webhook before parsing json (#2389 ) # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-10 04:06:35 +05:30
Sarthak Agarwal	899db4ed56	Fix logging msg on webhook and misc (#2387 )	2026-02-10 02:52:10 +05:30
Kevin Turcios	db973a0487	fix: relax testgen assertion rule to allow imports from function depe… (#2388 ) …ndencies The old rule ("NOT in libraries such as numpy, pandas etc.") forced LLMs to reinvent helpers like np.allclose using slow / inaccurate Python loops. The new rule allows assertions from packages already imported by the function under test. # Pull Request Checklist ## Description - [ ] Description of PR: Clear and concise description of what this PR accomplishes - [ ] Breaking Changes: Document any breaking changes (if applicable) - [ ] Related Issues: Link to any related issues or tickets ## Testing - [ ] Test cases Attached: All relevant test cases have been added/updated - [ ] Manual Testing: Manual testing completed for the changes ## Monitoring & Debugging - [ ] Logging in place: Appropriate logging has been added for debugging user issues - [ ] Sentry will be able to catch errors: Error handling ensures Sentry can capture and report errors - [ ] Avoid Dev based/Prisma logging: No development-only or Prisma-specific logging in production code ## Configuration - [ ] Env variables newly added: Any new environment variables are documented in .env.example file or mentioned in description --- ## Additional Notes <!-- Add any additional context, screenshots, or notes for reviewers here -->	2026-02-09 15:05:19 -05:00
HeshamHM28	ad1cb9f032	[Fix] Fallback to staging if we fail to create PR (#2332 ) Fixes CF-1037	2026-02-09 20:08:47 +02:00
Kevin Turcios	4f7d1818ac	Optimize observability timeline and fix UI issues (#2386 ) ## Summary - Optimize timeline data fetching/rendering with pre-computed maps and reduced re-renders - Split timeline monolith into focused components, lazy-load debug data, use IntersectionObserver for active section tracking - Optimize component rendering with `memo`, stable ref callbacks, and pre-computed sort data - Fix observability nav toggle not syncing with current URL pathname - Fix Response button overlapping dialog close button in LLM debug dialog	2026-02-09 12:18:55 -05:00
Kevin Turcios	629442cc5e	Restructure aiservice to language-first architecture (#2383 ) ## Summary - Reorganizes `django/aiservice/` from feature-first layout (separate `optimizer/`, `testgen/`, `code_repair/` dirs) to language-first layout under `core/languages/{python,js_ts}/` - Adds handler/registry/dispatcher pattern for routing requests to language-specific implementations - All existing module code preserved via `git mv` for history tracking; no logic changes to existing modules ## What changed - New `core/` app with registry, dispatcher, protocols, and error hierarchy - `PythonHandler` and `JSTypeScriptHandler` delegate to existing module functions - All imports updated across the codebase (views, tests, adaptive_optimizer, etc.) - Integration tests for handler registration and dispatch - 155 files changed, ~880 additions / ~207 deletions (mostly import path updates and moves) ## Test plan - [ ] `python manage.py check` passes - [ ] Integration tests in `tests/integration/test_handler_integration.py` pass - [ ] Existing test suite passes with updated import paths - [ ] Ruff and ty clean on all new infrastructure files --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-09 09:15:50 -05:00
Kevin Turcios	968946d62d	feat: split diff view and LLM export for observability (#2384 ) ## Summary - Add split (side-by-side) diff view to the observability timeline for comparing original vs optimized code - Fix scroll handler not updating active section + expand container for candidates - Add LLM export route that returns plain text markdown of the full trace, accessible via button next to search bar ## Test plan - [ ] Load a trace in observability and verify the split diff view renders correctly - [ ] Verify the "LLM Export" button appears next to Search when results are loaded - [ ] Click the button and verify the new tab returns raw markdown text (no HTML chrome) - [ ] Verify all sections are present: function info, original code, tests, candidates, ranking, errors, summary, and prompts	2026-02-09 04:21:41 -05:00
Kevin Turcios	b9d318279c	feat: observability improvements and testgen prompt modernization (#2382 ) ## Summary - Rewrite testgen system prompts from constraint-heavy to positive-first structure with chain-of-thought instructions - Simplify LLM message structure from `[system, user, user, user]` to `[system, user]` by absorbing plan_content guidelines into system prompts - Observability UI: add search to LLM debug dialog, expand timeline view - Fix data capture: raw LLM responses, all user messages in prompt column, nested code fences, empty notes handling ## Test plan - [ ] Verify testgen produces valid test suites with the new prompt structure - [ ] Verify observability timeline displays LLM prompts/responses correctly - [ ] Check that search works in the LLM debug dialog	2026-02-09 01:20:59 -05:00
Kevin Turcios	2c56875f83	fix: display instrumented perf tests in observability timeline (#2381 ) ## Summary - Published `@codeflash-ai/common@1.0.30` with `dist/` and `instrumented_perf_test` schema field - Updated webapp to use the new package so Prisma generates correct types - Removed `Record<string, unknown>` type cast workaround in `page.tsx` The instrumented perf test data was already being stored in the DB but the webapp's Prisma client didn't have the field in its generated types, so it was never returned from queries. ## Test plan - [ ] Search a trace that has perf tests (e.g. `59a508fb-8d00-4830-992b-fa342e5d6c94`) and verify the `+perf` badge and "Perf" tab appear in Test Generation	2026-02-08 03:21:57 -05:00
Kevin Turcios	223a730dff	chore: bump @codeflash-ai/common to 1.0.29 (#2380 ) ## Summary - Bump `@codeflash-ai/common` from 1.0.28 to 1.0.29 to include the `instrumented_perf_test` Prisma schema field in the published package - This unblocks the observability timeline from displaying performance tests (currently only generated + behavior tests show) The field was added to the schema in #2330 but the package version was never bumped, so the deployed webapp's Prisma client doesn't SELECT `instrumented_perf_test`. After merging: publish the package and redeploy the webapp.	2026-02-08 02:31:12 -05:00
Kevin Turcios	752e2504e4	Restructure and improve refinement prompt (#2379 ) ## Summary - Restructure the refinement system prompt into clear numbered sections (Preserve Behavior, Minimize Diff, Revert Anti-Patterns, Maintain Readability) with an explicit 6-step refinement process - Extract inline prompt strings into separate markdown files (`refinement_system_prompt.md`, `refinement_user_prompt.md`), matching the convention used by other optimizer prompts - Add `AuthenticatedRequest` type hint to `refine()` endpoint and fix grammar in tool use section ## Test plan - [ ] Verify refinement endpoint still works end-to-end with a test optimization candidate - [ ] Confirm prompt content is loaded correctly from markdown files at startup --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-08 02:10:20 -05:00
Kevin Turcios	47053591f4	observability v2 toggle (#2378 )	2026-02-07 15:50:12 -05:00
Kevin Turcios	f03a06f4e1	Reintroduce enriched obs_context for testgen LLM calls (#2377 ) ## Summary - Re-adds the enriched observability context from CF-1041 that was reverted - Passes `module_path`, `test_module_path`, `helper_function_names`, `is_async`, and `function_to_optimize` details to `call_llm` in testgen ## Test plan - [ ] Verify testgen LLM calls include the enriched context - [ ] Confirm no regressions in test generation flow	2026-02-07 10:33:13 -05:00
Sarthak Agarwal	98fb2d1579	Revert "CF-1041 observability v2 " need more changes and testing (#2375 ) Reverts codeflash-ai/codeflash-internal#2329	2026-02-06 01:18:17 +05:30
Kevin Turcios	07d33edd9f	CF-1041 observability v2 (#2329 ) introducing this due to pain points in V1, not a complete rewrite, based off v1 --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>	2026-02-05 14:08:02 -05:00
Sarthak Agarwal	08fd1a8787	adding validation for ts in refiner and testgen (#2372 ) 1. languages/js_ts/testgen.py: - Updated parse_and_validate_js_output to accept a language parameter - Uses validate_typescript_syntax when language="typescript", otherwise uses validate_javascript_syntax - Updated generate_and_validate_js_test_code to accept and pass the language parameter - Updated the call chain to pass language through to the validation 2. optimizer/context_utils/refiner_context.py: - Added import for validate_typescript_syntax - Fixed is_valid_refinement method to use correct validator based on language - Fixed validate_code_syntax in SingleRefinerContext class - Fixed validate_code_syntax in MultiRefinerContext class 3. tests/optimizer/test_javascript_validator.py: - Added test_typescript_type_assertion_valid_in_ts - verifies as unknown as number is valid TypeScript - Added test_typescript_type_assertion_invalid_in_js - verifies as unknown as number is INVALID JavaScript (this would have caught the original bug) - Added test_typescript_generic_valid_in_ts - verifies generics are valid TypeScript - Added test_typescript_generic_invalid_in_js - verifies generics are INVALID JavaScript Files Already Correct (no changes needed): - languages/js_ts/optimizer.py - already correctly checks language - languages/js_ts/optimizer_lp.py - already correctly checks language - optimizer/optimizer_line_profiler.py - already correctly checks language --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>	2026-02-04 22:54:44 +00:00

1 2 3 4 5 ...

6195 commits