## Summary
- Restructure CLAUDE.md hierarchy so Claude Code auto-discovers
project-specific instructions
- Delete dead `AGENTS.md` files (referenced non-existent
`.tessl/RULES.md`)
- Rename `django/aiservice/AGENTS.md` → `CLAUDE.md` for auto-discovery
- Create `js/CLAUDE.md` with package commands and gotchas
- Move PR review guidelines to `.claude/rules/pr-review.md` (auto-loaded
rule)
- Move prek workflow to `.claude/skills/fix-prek.md` (on-demand skill)
- Add path-scoped rules for Python and Next.js patterns
- Add domain glossary, service architecture diagram, and per-package
gotchas
## Test plan
- Verify `CLAUDE.md` files exist at root, `django/aiservice/`, and `js/`
- Verify no remaining references to `AGENTS.md` or `.tessl/`
- Verify `.claude/rules/` and `.claude/skills/` files are committed
## Summary
- Pass test_index through LLM call context so observability chat can
attribute responses to specific test generation calls
- Fix SSE streaming to send keepalive pings from the start
CF-504
## Summary
- Chat panel on the observability timeline that uses Claude to answer
questions about optimization traces
- Tool-based context retrieval (fetches candidates, tests, errors on
demand instead of stuffing everything upfront)
- Uses `@anthropic-ai/sdk` via Azure AI Foundry
- Strengthened testgen prompts to ban mocks/fakes for test inputs
Store qualified function name (e.g., HttpInterface.__init__) and
file_path in testgen metadata instead of bare function_name (__init__).
Update the frontend parser to handle qualified names by splitting into
class + method and searching within the correct class using both
tree-sitter and regex. Prioritize the file matching filePath before
searching all files.
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Optimize timeline data fetching/rendering with pre-computed maps and
reduced re-renders
- Split timeline monolith into focused components, lazy-load debug data,
use IntersectionObserver for active section tracking
- Optimize component rendering with `memo`, stable ref callbacks, and
pre-computed sort data
- Fix observability nav toggle not syncing with current URL pathname
- Fix Response button overlapping dialog close button in LLM debug
dialog
## Summary
- Add split (side-by-side) diff view to the observability timeline for
comparing original vs optimized code
- Fix scroll handler not updating active section + expand container for
candidates
- Add LLM export route that returns plain text markdown of the full
trace, accessible via button next to search bar
## Test plan
- [ ] Load a trace in observability and verify the split diff view
renders correctly
- [ ] Verify the "LLM Export" button appears next to Search when results
are loaded
- [ ] Click the button and verify the new tab returns raw markdown text
(no HTML chrome)
- [ ] Verify all sections are present: function info, original code,
tests, candidates, ranking, errors, summary, and prompts
## Summary
- Rewrite testgen system prompts from constraint-heavy to positive-first
structure with chain-of-thought instructions
- Simplify LLM message structure from `[system, user, user, user]` to
`[system, user]` by absorbing plan_content guidelines into system
prompts
- Observability UI: add search to LLM debug dialog, expand timeline view
- Fix data capture: raw LLM responses, all user messages in prompt
column, nested code fences, empty notes handling
## Test plan
- [ ] Verify testgen produces valid test suites with the new prompt
structure
- [ ] Verify observability timeline displays LLM prompts/responses
correctly
- [ ] Check that search works in the LLM debug dialog
## Summary
- Published `@codeflash-ai/common@1.0.30` with `dist/` and
`instrumented_perf_test` schema field
- Updated webapp to use the new package so Prisma generates correct
types
- Removed `Record<string, unknown>` type cast workaround in `page.tsx`
The instrumented perf test data was already being stored in the DB but
the webapp's Prisma client didn't have the field in its generated types,
so it was never returned from queries.
## Test plan
- [ ] Search a trace that has perf tests (e.g.
`59a508fb-8d00-4830-992b-fa342e5d6c94`) and verify the `+perf` badge and
"Perf" tab appear in Test Generation
## Summary
- Bump `@codeflash-ai/common` from 1.0.28 to 1.0.29 to include the
`instrumented_perf_test` Prisma schema field in the published package
- This unblocks the observability timeline from displaying performance
tests (currently only generated + behavior tests show)
The field was added to the schema in #2330 but the package version was
never bumped, so the deployed webapp's Prisma client doesn't SELECT
`instrumented_perf_test`.
After merging: publish the package and redeploy the webapp.
introducing this due to pain points in V1, not a complete rewrite, based
off v1
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Add `instrumented_perf_test` field to `OptimizationFeatures` model
- Update `log_features` function to accept and store performance
instrumented tests
---------
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
## Summary
Adds a line-by-line performance profiler visualization to the webapp,
allowing users to compare execution times between original and optimized
code.
## Changes
### New Line Profiler View
- **`LineProfilerView.tsx`**: Side-by-side comparison component showing:
- Line-by-line execution times with heat map visualization
- Syntax highlighting using `prism-react-renderer`
- Collapsible function blocks
- Light/dark mode support
- Heat legend (cold → hot based on % time)
- **`lineProfilerParser.ts`**: Parser utilities for line profiler data:
- `parseLineProfilerResults()` - parses markdown table output from
Python's line_profiler
- `formatTime()` - converts timer units to human-readable format (ns,
µs, ms, s)
- `getHeatLevel()` - determines heat coloring based on % time
- **`/review-optimizations/[traceId]/profiler/page.tsx`**: New route for
the profiler view
### API Changes
- **`create-pr.ts`**: Adds "📊 Performance Profile" link to PR
description when profiler data exists
- **`github-app.ts`**: Removes line profiler data from metadata when PR
is closed/merged
- **`create-staging.ts`**, **`suggest-pr-changes.ts`**: Handle line
profiler data in staging
- **`staging-storage-strategy.ts`**: Interface updates for line profiler
fields
### Webapp Integration
- **`page.tsx`**: Added "Performance Profile" button (only visible when
profiler data exists)
- **`action.ts`**: Sends line profiler data when creating PR from webapp
Fixes CF-1018
https://codeflash-ai.slack.com/files/U08MSR1UN6L/F0A9YVDJY75/screen_recording_2026-01-21_at_10.03.18___pm.movhttps://github.com/HeshamHM28/my-best-repo/pull/21
linked to https://github.com/codeflash-ai/codeflash/pull/1139
---------
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
## Summary
- Remove the hardcoded check that blocked PR suggestions for roboflow
repositories
- Remove the corresponding unit test that validated the roboflow
restriction
Fixes CF-967
samples https://github.com/codeflash-ai/my-best-repo/pull/461
## Summary
- Fix review comment creation to properly handle single-line changes by
only
using `line` parameter (without `start_line`)
- Fix condition to allow single-line hunks where `oldStart === oldEnd`
(changed from `<` to `<=`)
# Observability Platform: UI Enhancements & Bug Fixes
## Overview
Enhances the observability UI, fixes data accuracy issues, and adds
organization filtering. Includes reusable components, performance
optimizations, and improved error handling.
---
## Fixed Issues & Enhancements
### 1. Total Traces Count Incorrect
**Problem**: Total traces count was inaccurate, especially with filters.
**Fix**: Added `getTotalTracesCount()` using `groupBy` for accurate
distinct trace counting.
**Code Changes**:
- `traces/page.tsx`: Added cached `getTotalTracesCount()` function
- Uses `groupBy` with proper filtering before counting
**How to Test**:
1. Go to `/observability/traces`
2. Check "Total Traces" stat card — should match the number of unique
traces
3. Apply organization filter — count should reflect filtered results
4. Compare with table rows — count should be accurate
---
### 2. Organization Filtering
**Problem**: No way to filter traces/LLM calls by organization.
**Fix**: Added organization filter dropdown and database-level filtering
using `IN` clause.
**Code Changes**:
- `traces/page.tsx`: Added organization filter in search form
- `llm-calls/page.tsx`: Added organization filter in filters section
- Both pages: Fetch trace IDs from `optimization_features` and use `IN`
clause for filtering
- Added `getUniqueOrganizations()` cached function
**How to Test**:
1. Go to `/observability/traces` or `/observability/llm-calls`
2. Select an organization from the dropdown
3. Verify only traces/calls for that organization are shown
4. Check that pagination works correctly with filter applied
5. Verify "Total Traces" updates when filter is applied
---
### 3. Cost Column Showing Incorrect Values
**Problem**: Cost aggregation was incorrect on traces page.
**Fix**: Use `safeCostTokens()` utility and proper aggregation logic.
**Code Changes**:
- `traces/page.tsx`: Use `safeCostTokens()` for null-safe cost/token
handling
- Proper aggregation in trace grouping logic
**How to Test**:
1. Go to `/observability/traces`
2. Check "Total Cost" stat card — should sum all trace costs correctly
3. Compare individual trace costs in table with sum
4. Verify cost displays correctly for traces with null values
---
### 4. Negative Duration Values
**Problem**: Some traces showed negative duration (e.g., "-41.81s").
**Fix**: Use `Math.min`/`Math.max` to handle out-of-order timestamps.
**Code Changes**:
- `trace/[trace_id]/page.tsx`: Calculate duration using
`Math.min`/`Math.max` on timestamps
- `traces/page.tsx`: Use `Math.max(0, ...)` to prevent negative
durations
**How to Test**:
1. Go to `/observability/trace/[any-trace-id]`
2. Check "Duration" in summary — should never be negative
3. Go to `/observability/traces`
4. Check "Duration" column — all values should be positive or zero
---
### 5. Missing Source Information
**Problem**: No indication of where LLM calls originated (GitHub Action,
CLI, etc.).
**Fix**: Added source detection and display using `getCallSource()`
utility.
**Code Changes**:
- `lib/observability-utils.ts`: Added `getCallSource()` function
- `llm-calls/page.tsx`: Added "Source" column with icons
- `trace/[trace_id]/page.tsx`: Added source in summary section
- Fetches `optimization_events` to get `event_type`
**How to Test**:
1. Go to `/observability/llm-calls`
2. Check "Source" column — should show "GitHub Action" or "CLI/VSCode"
with icons
3. Go to `/observability/trace/[trace-id]`
4. Check "Source" in summary — should display correctly
---
### 6. Partial Success Status Missing
**Problem**: No distinction between complete success and partial
success.
**Fix**: Added "Partial" status detection and display.
**Code Changes**:
- `trace/[trace_id]/page.tsx`: Check for `partial_success` status in
calls
- `llm-calls/page.tsx`: Added "Partial" option in status filter
- Both pages: Display "Partial" status with yellow/warning styling
**How to Test**:
1. Go to `/observability/traces`
2. Find traces with partial success — should show "Partial" status
(yellow)
3. Go to `/observability/llm-calls`
4. Filter by "Partial" status — should show only partial success calls
5. Check trace detail page — status should reflect partial success
correctly
---
### 7. Missing LLM Call Details Link for Candidates
**Problem**: Generated candidates didn't have links to their LLM call
details.
**Fix**: Added "View LLM Call Details" link for each candidate.
**Code Changes**:
- `trace/[trace_id]/page.tsx`: Map candidates to their LLM calls and add
link
**How to Test**:
1. Go to `/observability/trace/[trace-id]`
2. Expand "Generated Candidates" section
3. Expand any candidate
4. Verify "View LLM Call Details →" link appears at bottom
5. Click link — should navigate to LLM call detail page
---
### 8. Test Failure Details Not Shown
**Problem**: Test failures only showed error message, not detailed
failure info.
**Fix**: Added detailed test failure display with expected/actual
values.
**Code Changes**:
- `trace/[trace_id]/page.tsx`: Parse error context for test failures
- Display test name, failure reason, expected/actual values, test output
**How to Test**:
1. Go to `/observability/trace/[trace-id-with-test-failure]`
2. Scroll to "Errors" section
3. Find a test failure error
4. Verify "Test Failure Details" section shows:
- Test name
- Failure reason
- Expected value
- Actual value
- Test output
---
### 9. UI/UX Improvements
**Problem**: Basic UI without helpful tooltips, copy functionality, or
visual enhancements.
**Fix**: Added reusable components and improved visual design.
**Code Changes**:
- New components: `StatCard`, `CopyButton`, `InfoIcon`, `HelpButton`,
`ColumnHeader`
- All pages: Enhanced with tooltips, copy buttons, better spacing
- `llm-call/[id]/page.tsx`: Complete UI overhaul with visual token
breakdown
**How to Test**:
1. Go to any observability page
2. Hover over info icons (?) — tooltips should appear
3. Click copy buttons — text should copy to clipboard
4. Check stat cards — should have hover effects and icons
5. Go to `/observability/llm-call/[id]`
6. Check token distribution bar — visual breakdown of prompt vs
completion tokens
7. Verify all sections have proper spacing and visual hierarchy
---
### 10. Performance Optimizations
**Problem**: Unconditional fetching of organizations, call types, and
models on every page load.
**Fix**: Added caching for frequently accessed dropdown data.
**Code Changes**:
- `traces/page.tsx`: `getUniqueOrganizations()` with 5-minute cache
- `llm-calls/page.tsx`: `getUniqueOrganizations()`, `getCallTypes()`,
`getModels()` with caching
- All use `unstable_cache` with 300-second revalidation
**How to Test**:
1. Load `/observability/traces` — note load time
2. Reload page — should be faster (cached data)
3. Check browser network tab — fewer database queries on subsequent
loads
4. Wait 5+ minutes and reload — cache should refresh
---
### 11. Database Query Optimization
**Problem**: Organization filtering used inefficient `OR` with
`startsWith` conditions.
**Fix**: Switched to `IN` clause for exact trace ID matches.
**Code Changes**:
- Both pages: Changed from `OR` array to `where.trace_id = { in:
filteredTraceIds }`
- Added comments explaining why Prisma relations can't be used
**How to Test**:
1. Apply organization filter with many traces
2. Check page load time — should be faster
3. Verify results are still accurate
4. Test with organizations having 100+ traces — should handle
efficiently
---
## New Components
### StatCard
Reusable stat display with icons, tooltips, and variants.
**Usage**: Used in all observability pages for summary statistics.
### CopyButton
One-click copy to clipboard with visual feedback.
**Usage**: Used for IDs, prompts, responses, error messages.
### InfoIcon
Tooltip helper for inline explanations.
**Usage**: Used throughout tables and forms for field explanations.
### HelpButton
Modal dialog for detailed help content.
**Usage**: Used in page headers for comprehensive guides.
### ColumnHeader
Table header with integrated tooltips.
**Usage**: Used in all data tables for column explanations.
---
## Code Quality Improvements
1. Shared utilities: `lib/observability-utils.ts` with `getCallSource()`
and `safeCostTokens()`
2. Type safety: Proper TypeScript types, nullish coalescing (`??`)
instead of `||`
3. Comments: Added inline comments explaining architectural decisions
4. Error handling: Better error states and empty state messages
---
## Testing Checklist
- [ ] Total traces count is accurate with/without filters
- [ ] Organization filter works on traces and LLM calls pages
- [ ] Cost values are correctly aggregated
- [ ] No negative duration values appear
- [ ] Source column shows correct values with icons
- [ ] Partial success status displays correctly
- [ ] LLM call details links work from candidates
- [ ] Test failure details show complete information
- [ ] Copy buttons work for all copyable content
- [ ] Tooltips appear on hover for info icons
- [ ] Token distribution bar displays correctly
- [ ] Pagination works with all filters
- [ ] Performance is acceptable with large datasets
---------
Co-authored-by: Codeflash Bot <bot@codeflash.ai>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
## Summary
- Fix race condition in `ensureLabelExists` where concurrent requests
both get a 404, then both try to create the label, causing an
"already_exists" validation error
- Add `isLabelAlreadyExistsError` helper function to properly detect
GitHub's 422 validation error for duplicate labels
- Change error handling from throwing to logging to Sentry, preventing
the entire PR creation flow from failing due to label issues
- Update tests to verify new Sentry logging behavior and race condition
handling
## Problem
When multiple concurrent requests call `ensureLabelExists`:
1. Both check if label exists → both get 404
2. Both try to create the label
3. First succeeds, second fails with
`{"resource":"Label","code":"already_exists","field":"name"}`
4. This error was thrown and caused the entire suggest-pr-changes flow
to fail
## Solution
- Catch the `already_exists` error during label creation and silently
ignore it (label exists, which is the desired state)
- Log other errors to Sentry instead of throwing, so label issues don't
block PR creation
FIxes CF-1017
- Updated property names from original_line_profiler_results and optimized_line_profiler_results to original_line_profiler and optimized_line_profiler for consistency.
- Adjusted related code in SidebarProvider and CommentThreadProvider to reflect the new property names.
- Ensured that the OptimizationService interface and its implementation are aligned with the updated property names.
Fixes cf-998
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
- Added a function to normalize line endings to LF for consistent patch processing.
- Updated `showPatch` method to always use the native diff editor, removing the webview fallback.
- Improved handling of multi-file patches by opening separate diff editors for each file.
- Enhanced error handling and logging for better debugging and user feedback.
- Removed unused code related to pull request creation and secondary action buttons in the explanation panel.
### Summary
- Add new "Created by" column to API keys table showing the key
creator's GitHub avatar and name
- Display "Me" for keys owned by the current user, otherwise show
creator's name/email/username
- Restrict API key deletion to only the key owner (users can no longer
delete keys created by other org members)
Fixes CF-1002
<img width="2303" height="1254" alt="Screenshot 2026-01-13 at 7 24
05 PM"
src="https://github.com/user-attachments/assets/3f8eb068-ccca-475f-b379-b8869314a908"
/>
Fixes cf-1001
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
å## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->