Commit graph

6373 commits

Author SHA1 Message Date
Kevin Turcios
968946d62d
feat: split diff view and LLM export for observability (#2384)
## Summary
- Add split (side-by-side) diff view to the observability timeline for
comparing original vs optimized code
- Fix scroll handler not updating active section + expand container for
candidates
- Add LLM export route that returns plain text markdown of the full
trace, accessible via button next to search bar

## Test plan
- [ ] Load a trace in observability and verify the split diff view
renders correctly
- [ ] Verify the "LLM Export" button appears next to Search when results
are loaded
- [ ] Click the button and verify the new tab returns raw markdown text
(no HTML chrome)
- [ ] Verify all sections are present: function info, original code,
tests, candidates, ranking, errors, summary, and prompts
2026-02-09 04:21:41 -05:00
Kevin Turcios
b9d318279c
feat: observability improvements and testgen prompt modernization (#2382)
## Summary
- Rewrite testgen system prompts from constraint-heavy to positive-first
structure with chain-of-thought instructions
- Simplify LLM message structure from `[system, user, user, user]` to
`[system, user]` by absorbing plan_content guidelines into system
prompts
- Observability UI: add search to LLM debug dialog, expand timeline view
- Fix data capture: raw LLM responses, all user messages in prompt
column, nested code fences, empty notes handling

## Test plan
- [ ] Verify testgen produces valid test suites with the new prompt
structure
- [ ] Verify observability timeline displays LLM prompts/responses
correctly
- [ ] Check that search works in the LLM debug dialog
2026-02-09 01:20:59 -05:00
Kevin Turcios
2c56875f83
fix: display instrumented perf tests in observability timeline (#2381)
## Summary
- Published `@codeflash-ai/common@1.0.30` with `dist/` and
`instrumented_perf_test` schema field
- Updated webapp to use the new package so Prisma generates correct
types
- Removed `Record<string, unknown>` type cast workaround in `page.tsx`

The instrumented perf test data was already being stored in the DB but
the webapp's Prisma client didn't have the field in its generated types,
so it was never returned from queries.

## Test plan
- [ ] Search a trace that has perf tests (e.g.
`59a508fb-8d00-4830-992b-fa342e5d6c94`) and verify the `+perf` badge and
"Perf" tab appear in Test Generation
2026-02-08 03:21:57 -05:00
Kevin Turcios
223a730dff
chore: bump @codeflash-ai/common to 1.0.29 (#2380)
## Summary
- Bump `@codeflash-ai/common` from 1.0.28 to 1.0.29 to include the
`instrumented_perf_test` Prisma schema field in the published package
- This unblocks the observability timeline from displaying performance
tests (currently only generated + behavior tests show)

The field was added to the schema in #2330 but the package version was
never bumped, so the deployed webapp's Prisma client doesn't SELECT
`instrumented_perf_test`.

After merging: publish the package and redeploy the webapp.
2026-02-08 02:31:12 -05:00
Kevin Turcios
752e2504e4
Restructure and improve refinement prompt (#2379)
## Summary
- Restructure the refinement system prompt into clear numbered sections
(Preserve Behavior, Minimize Diff, Revert Anti-Patterns, Maintain
Readability) with an explicit 6-step refinement process
- Extract inline prompt strings into separate markdown files
(`refinement_system_prompt.md`, `refinement_user_prompt.md`), matching
the convention used by other optimizer prompts
- Add `AuthenticatedRequest` type hint to `refine()` endpoint and fix
grammar in tool use section

## Test plan
- [ ] Verify refinement endpoint still works end-to-end with a test
optimization candidate
- [ ] Confirm prompt content is loaded correctly from markdown files at
startup

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-08 02:10:20 -05:00
Kevin Turcios
47053591f4
observability v2 toggle (#2378) 2026-02-07 15:50:12 -05:00
Kevin Turcios
f03a06f4e1
Reintroduce enriched obs_context for testgen LLM calls (#2377)
## Summary
- Re-adds the enriched observability context from CF-1041 that was
reverted
- Passes `module_path`, `test_module_path`, `helper_function_names`,
`is_async`, and `function_to_optimize` details to `call_llm` in testgen

## Test plan
- [ ] Verify testgen LLM calls include the enriched context
- [ ] Confirm no regressions in test generation flow
2026-02-07 10:33:13 -05:00
Sarthak Agarwal
98fb2d1579
Revert "CF-1041 observability v2 " need more changes and testing (#2375)
Reverts codeflash-ai/codeflash-internal#2329
2026-02-06 01:18:17 +05:30
Kevin Turcios
07d33edd9f
CF-1041 observability v2 (#2329)
introducing this due to pain points in V1, not a complete rewrite, based
off v1

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-05 14:08:02 -05:00
Sarthak Agarwal
08fd1a8787
adding validation for ts in refiner and testgen (#2372)
1. languages/js_ts/testgen.py:
- Updated parse_and_validate_js_output to accept a language parameter
- Uses validate_typescript_syntax when language="typescript", otherwise
uses validate_javascript_syntax
- Updated generate_and_validate_js_test_code to accept and pass the
language parameter
- Updated the call chain to pass language through to the validation
2. optimizer/context_utils/refiner_context.py:
- Added import for validate_typescript_syntax
- Fixed is_valid_refinement method to use correct validator based on
language
- Fixed validate_code_syntax in SingleRefinerContext class
- Fixed validate_code_syntax in MultiRefinerContext class
3. tests/optimizer/test_javascript_validator.py:
- Added test_typescript_type_assertion_valid_in_ts - verifies as unknown
as number is valid TypeScript
- Added test_typescript_type_assertion_invalid_in_js - verifies as
unknown as number is INVALID JavaScript (this would have caught the
original bug)
- Added test_typescript_generic_valid_in_ts - verifies generics are
valid TypeScript
- Added test_typescript_generic_invalid_in_js - verifies generics are
INVALID JavaScript
Files Already Correct (no changes needed):
- languages/js_ts/optimizer.py - already correctly checks language
- languages/js_ts/optimizer_lp.py - already correctly checks language
- optimizer/optimizer_line_profiler.py - already correctly checks
language

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-02-04 22:54:44 +00:00
Kevin Turcios
e7cf9bf29e
feat: sync Claude workflow with CLI (#2368)
## Summary
- Add prek auto-fix step (format/lint changed files, commit & push)
- Add coverage analysis step (compare PR vs main, enforce 75% for new
code)
- Add uv setup and dependency install to pr-review job
- Change pr-review permissions to allow pushing fixes

Syncs with recent improvements made to the CLI repo.
2026-02-03 23:18:47 -05:00
Aseem Saxena
ee0f2a9d98
Merge branch 'main' into match-testdiff-schema 2026-02-03 15:28:18 -08:00
HeshamHM28
7272b71e6d
[Feat] Allow multi language in staging and line profiler (#2365)
Fixes CF-1051
<img width="2319" height="699" alt="Screenshot 2026-02-02 at 11 03
02 PM"
src="https://github.com/user-attachments/assets/6cf9ec5f-6f3b-461f-ac5c-9fe4cee5ac9f"
/>
<img width="2326" height="1254" alt="Screenshot 2026-02-02 at 10 47
32 PM"
src="https://github.com/user-attachments/assets/0a022ace-a32e-4f4f-9925-7f13f18bf901"
/>
<img width="2308" height="719" alt="Screenshot 2026-02-02 at 10 47
24 PM"
src="https://github.com/user-attachments/assets/3760e7f7-5a3c-430e-9fca-6dc5292f860c"
/>

---------

Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-02 16:02:29 -08:00
Aseem Saxena
648c95c909
Merge branch 'main' into match-testdiff-schema 2026-02-02 15:07:22 -08:00
Sarthak Agarwal
eb8ad603ff
vitest related changes to prompt (#2366) 2026-02-03 03:29:36 +05:30
Aseem Saxena
90597c52e3
markdown more info 2026-02-02 10:11:44 -08:00
aseembits93
5d0ca8d01b fn var was not used in .format() 2026-02-02 10:00:40 -08:00
Aseem Saxena
019f220c11
cleaning up 2026-02-02 09:45:48 -08:00
Aseem Saxena
3276f9542e
Update django/aiservice/code_repair/code_repair_context.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-02-02 09:44:36 -08:00
aseembits93
c2cd6e5e72 minor fix 2026-02-02 09:37:34 -08:00
aseembits93
2e523313b5 prek fixes 2026-02-02 09:25:44 -08:00
Aseem Saxena
1ffdee3000
Fix check for empty test source code section
Ensure sections[diff.test_src_code] is not None before assignment.
2026-02-02 09:24:28 -08:00
Aseem Saxena
a39e155a84
bug: mismatch in cli and internal schema for code repair
Change test_src_code to allow None type
2026-02-02 09:08:17 -08:00
Sarthak Agarwal
b48a8d9a43
Add vitest support in backend (#2363) 2026-02-02 20:51:52 +05:30
Saurabh Misra
00fc5708cc
Merge pull request #2343 from codeflash-ai/fix/escape-curly-braces-in-js-testgen-prompt
fix(js-testgen): escape curly braces in prompt template
2026-01-31 18:35:50 -08:00
Sarthak Agarwal
cbfebf8ee4 fix(js-testgen): escape curly braces in prompt template
The JavaScript test generation prompt contained `{fn}` as part of
example code showing import syntax. However, Python's `.format()`
method interprets this as a placeholder and tries to substitute it,
causing a KeyError.

Fixed by escaping the curly braces as `{{fn}}` so they render as
literal `{fn}` in the final prompt.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 03:50:05 +05:30
Saurabh Misra
4dc975c99b
Merge pull request #2341 from codeflash-ai/fix/strip-js-import-extensions
fix: strip file extensions from JS/TS import paths in generated tests
2026-01-30 20:32:36 -08:00
Saurabh Misra
70360436bd fix: strip file extensions from JS/TS import paths in generated tests
LLMs often add .js extensions to TypeScript import paths (e.g.,
`import { func } from '../module.js'`), but TypeScript/Jest module
resolution doesn't require explicit extensions. This causes
"Cannot find module" errors.

This change adds `strip_js_extensions()` function that removes
.js/.ts/.tsx/.jsx/.mjs/.mts extensions from relative import paths
in generated tests. The function handles:
- ES module imports: import { x } from '../path.js'
- CommonJS requires: require('../path.js')
- Jest mocks: jest.mock('../path.js'), jest.doMock(), etc.

External package imports (lodash, react, etc.) are preserved.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 04:22:07 +00:00
Saurabh Misra
6a13aa688a
Merge pull request #2339 from codeflash-ai/fix/class-method-import-syntax
Fix invalid JavaScript import syntax for class methods
2026-01-30 18:35:27 -08:00
Saurabh Misra
b801254d13 fix: strengthen import path extension guidance in prompts
Add more explicit instructions to prevent LLMs from adding .js/.ts
extensions to import paths. The previous guidance was being ignored
by some models.

- Add dedicated "CRITICAL: IMPORT PATH RULES" section with examples
- Show both WRONG and CORRECT patterns explicitly
- Remind to copy the provided import statement exactly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 02:35:21 +00:00
Saurabh Misra
d59c48426e fix: merge prompt extension fixes and LLM client improvements
- Cherry-pick: Remove .js extension guidance from prompts (from fix/js-import-extension-prompt)
- Add get_llm_client() to create fresh clients per request (fixes event loop issues)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 02:32:58 +00:00
Saurabh Misra
8461f71668 fix: use JavaScript identifier regex instead of Python isidentifier()
Python's str.isidentifier() validates Python identifiers, not JavaScript
identifiers. This caused valid JS identifiers like '$handler' to be
rejected (since $ is not valid in Python identifiers).

Changed to use a regex pattern that matches JavaScript identifier rules:
- Can start with letter, underscore, or $
- Can contain letters, digits, underscores, or $

Added tests for $ identifiers to ensure they are correctly handled.
2026-01-31 02:21:02 +00:00
Kevin Turcios
911f3e6c7b Remove wait-for-prek dependencies from CI workflows
Prek checks should not block other workflows from running. This removes
the wait-for-prek jobs entirely so unit tests, e2e tests, and codeflash
optimization can run independently of pre-commit checks.
2026-01-30 20:20:51 -05:00
Kevin Turcios
476bbc2305 for now 2026-01-30 20:10:52 -05:00
Kevin Turcios
bf8d8efd5f Update prek.yaml 2026-01-30 20:04:28 -05:00
Kevin Turcios
a394db3382 formatting 2026-01-30 20:00:11 -05:00
Saurabh Misra
addbaad370
Merge branch 'main' into fix/class-method-import-syntax 2026-01-30 16:36:03 -08:00
Saurabh Misra
09e6a1710f Address review: add validation for edge cases in import generation
- Add _is_valid_js_identifier() to check for reserved words (module, exports, prototype, etc.)
- Only use class import pattern for single-dot names where class name is valid identifier
- Fall back to module import for:
  - Multiple dots (e.g., Constructor.prototype.method)
  - Reserved words (e.g., module.exports)
- Add comprehensive tests for edge cases
2026-01-31 00:35:10 +00:00
Saurabh Misra
b2fb58eba6 Fix invalid JavaScript import syntax for class methods
When generating test imports for class methods like `Validator.validateRequest`,
the previous code produced invalid JavaScript:
  const { Validator.validateRequest } = require('../middlewares/Validator');

This is invalid because dots are not allowed in destructuring patterns.

The fix:
- Add _generate_import_statement() function to detect class methods (names with dots)
- For class methods: generate `const ClassName = require('...')`
- For simple functions: keep destructuring `const { funcName } = require('...')`
- Update prompt templates to use {import_statement} placeholder

Includes unit tests for the new import generation logic.
2026-01-31 00:35:10 +00:00
HeshamHM28
795c157d12
Fix Line Profiler query (#2338)
# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
2026-01-31 00:04:29 +00:00
Saurabh Misra
289827e5cb
Merge pull request #2337 from codeflash-ai/fix/improve-typescript-validation-error-messages
fix: improve TypeScript/JavaScript validation error messages
2026-01-30 16:03:02 -08:00
Saurabh Misra
d255a29203
Update django/aiservice/aiservice/validators/javascript_validator.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-01-30 16:00:05 -08:00
Saurabh Misra
8800614d1c Add unit tests for TypeScript/JavaScript validator error reporting
Tests for:
- Error location reporting with line numbers and code snippets
- Markdown code block parsing with various scenarios
- Multiple code blocks with mixed valid/invalid content
- Real-world TypeScript patterns (async, try-catch, template literals)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 23:53:26 +00:00
Saurabh Misra
07ae9db684 fix: improve TypeScript/JavaScript validation error messages
Add better error diagnostics for TypeScript/JavaScript syntax validation:

- Add line numbers and code snippets to error messages
- Log warnings when markdown parsing finds no code blocks
- Show the actual problematic code in error logs
- Help debug "Invalid syntax" errors by showing exact location

This helps diagnose issues where the API rejects code that tree-sitter
parses correctly on the client side by providing more context in the
error messages and logs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 23:47:39 +00:00
Aseem Saxena
f6ec336246
Merge pull request #2320 from codeflash-ai/ranker-multidim-scoring
Multi dimensional scoring and structured json parsing for ranker
2026-01-30 15:00:33 -08:00
aseembits93
af2935f4f2 0-index finally 2026-01-30 14:42:28 -08:00
Kevin Turcios
c1a25b33e5
Merge branch 'main' into ranker-multidim-scoring 2026-01-30 22:16:08 +00:00
mohammed ahmed
391702c986
Merge pull request #2336 from codeflash-ai/fix/js-ts-validator
[FIX] Correctly handle markdown code for validating js/ts
2026-01-30 20:29:17 +02:00
ali
99a7a32b32
safer caching 2026-01-30 19:52:18 +02:00
ali
879aa93967
fix validating js/ts code with markdown syntax 2026-01-30 19:44:31 +02:00