Commit graph

6473 commits

Author SHA1 Message Date
Kevin Turcios
c8a66b5ec6
fix: stop dashboard sidebar infinite refetch loop (#2564)
## Summary
- Fix infinite refetch loop in the dashboard sidebar that fires hundreds
of POST+GET requests per second
- The `subscriptionFetchRef` was reset in `finally()`, allowing
re-entrancy: fetch → `setSubscription` → re-render → ref is `false` →
fetch again → infinite loop
- Move the ref reset to the effect cleanup function so it only resets
when `mode` actually changes

## Note: Auth0 favicon 404
The Auth0 login page at `codeflash-ai.us.auth0.com` returns a 404 for
`/favicon.ico`. This is configured in **Auth0 Dashboard > Branding >
Universal Login**, not in application code. Upload the Codeflash favicon
there to resolve.

## Test plan
- [ ] Navigate to dashboard, open Network tab — confirm no repeated
POST/GET polling
- [ ] Switch between personal/org mode — confirm subscription data still
loads correctly
- [ ] Verify sidebar subscription usage display still renders
2026-04-04 12:46:56 -05:00
Kevin Turcios
f6a7d9b29d
chore: add cf-webapp quality gates CI workflow (#2563)
## Summary
- Adds GitHub Actions workflow that runs on PRs touching
`js/cf-webapp/**`
- Runs type-check (`tsc --noEmit`), tests (`vitest run`), and build
(`next build`)
- Posts a PR comment with results table and collapsible route size
details
- Fails the check if any gate fails

## Evidence
- Proof doc: `js/cf-webapp/proof/20-quality-gates.md`

## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/20-quality-gates.sh` — 10/10
checks pass
- [ ] Workflow triggers on a PR touching cf-webapp files
- [ ] PR comment appears with quality report
2026-04-04 11:43:02 -05:00
Kevin Turcios
0c37015650
chore: remove unused dependencies and replace react-papaparse with papaparse (#2562)
## Summary
- Removes `@azure/msal-node` (unused — Auth0 is the auth provider)
- Removes `github-markdown-css` (not imported anywhere)
- Replaces `react-papaparse` with `papaparse` (only core parser needed,
not React wrapper)
- Adds `@types/papaparse` for TypeScript types

## Evidence
- No imports of removed packages exist in source
- Proof doc: `js/cf-webapp/proof/18-remove-unused-deps.md`

## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/18-remove-unused-deps.sh` —
7/7 checks pass
- [ ] `npm run build` succeeds
- [ ] CSV parsing functionality still works
2026-04-04 11:42:18 -05:00
Kevin Turcios
87db8f6026
perf: parallelize LLM call detail and errors queries (#2561)
## Summary
- Runs `llm_calls.findUnique` and `optimization_errors.findMany` in
parallel via `Promise.all`
- Both queries use `params.id` directly — no data dependency between
them
- Page load time reduced from sum to max of both queries

## Evidence
- Proof doc: `js/cf-webapp/proof/17-parallel-llm-call-detail.md`

## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/17-parallel-llm-call-detail.sh` — 5/5
checks pass
- [ ] LLM call detail page renders correctly with error list
2026-04-04 11:41:16 -05:00
Kevin Turcios
94b9de946e
perf: deduplicate trace page Prisma query with React cache() (#2560)
## Summary
- Wraps `optimization_features.findUnique` in `React.cache()` so
`generateMetadata()` and the page component share one DB hit
- Eliminates redundant query — before: 2 identical queries per page
load, after: 1
- Replaces inline type annotation with `Awaited<ReturnType<...>>` for
DRY types

## Evidence
- Proof doc: `js/cf-webapp/proof/16-react-cache-dedup.md`

## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/16-react-cache-dedup.sh` —
5/5 checks pass
- [ ] Trace detail page renders correctly with metadata title
2026-04-04 11:41:05 -05:00
Kevin Turcios
c7df7bf27c
perf: parallelize event + features queries in getOptimizationEventById (#2559)
## Summary
- Runs `optimization_events.findFirst` and
`optimization_features.findUnique` in parallel via `Promise.all`
- The features query only needs `trace_id` (a parameter), not the event
result, making the queries independent
- Wall-clock time goes from sum of both queries to max of either

## Evidence
- Proof doc: `js/cf-webapp/proof/15-parallel-optimization-event.md`

## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/15-parallel-optimization-event.sh` — 6/6
checks pass
- [ ] Optimization review page loads correctly with review quality data
2026-04-04 11:40:10 -05:00
Kevin Turcios
dc684b9a28
perf: use PostHog singleton and replace shutdown() with flush() (#2558)
## Summary
- Converts `PostHogClient()` to singleton pattern — reuses one `PostHog`
instance instead of creating new ones per call
- Replaces `shutdown()` with `flush()` across 5 files (6 call sites) —
flush sends events without destroying the shared client

## Evidence
- Before: each call to `PostHogClient()` creates new HTTP connection +
event queue
- After: single instance reused across all server components/actions in
the same process
- Proof doc: `js/cf-webapp/proof/14-posthog-singleton.md`

## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/14-posthog-singleton.sh` —
9/9 checks pass
- [ ] PostHog events still appear in PostHog dashboard after deployment
2026-04-04 11:37:05 -05:00
Kevin Turcios
67ec032429
perf: dynamic-import LineProfilerView to defer prism-react-renderer (#2557)
## Summary
- Replaces static import of `LineProfilerView` with `next/dynamic` +
`ssr: false`
- Defers loading of `prism-react-renderer` (~100KB+) until user
navigates to profiler tab
- Adds `<Skeleton>` loading fallback for smooth UX

## Evidence
- Proof doc: `js/cf-webapp/proof/13-dynamic-import-line-profiler.md`

## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/13-dynamic-import-line-profiler.sh` — 7/7
checks pass
- [ ] `npm run build` succeeds
- [ ] Profiler page loads and renders LineProfilerView correctly
2026-04-04 11:36:59 -05:00
Kevin Turcios
fbedf0c1ee
perf: migrate framer-motion to motion/react for smaller bundle (#2556)
## Summary
- Replaces `framer-motion` with `motion` (the official tree-shakeable
successor)
- Updates import in onboarding page from `"framer-motion"` to
`"motion/react"`
- Enables better tree-shaking: only `AnimatePresence` + `motion`
component are imported

## Evidence
- `motion` uses ESM-first exports with `sideEffects: false`
- Same API, officially recommended migration path
- Proof doc: `js/cf-webapp/proof/12-framer-motion-migration.md`

## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/12-framer-motion-migration.sh` — 6/6
checks pass
- [ ] `npm run build` in cf-webapp succeeds
- [ ] Onboarding page animations work correctly
2026-04-04 11:36:54 -05:00
Kevin Turcios
eed0646f32
fix: use @sentry/nextjs instead of @sentry/node in repository action (#2555)
## Summary

Replace `@sentry/node` import with `@sentry/nextjs` in the repository
action. `@sentry/nextjs` already re-exports all server-side APIs, so
importing `@sentry/node` separately pulls in a duplicate SDK.

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/11-sentry-nextjs-consistency.sh
```

3 checks: no @sentry/node in app/, repository action uses
@sentry/nextjs, all app/ Sentry imports consistent.
2026-04-04 11:36:48 -05:00
Kevin Turcios
6ebff6f079
perf: lazy-load Sentry Replay integration to reduce initial bundle ~600KB (#2554)
## Summary

- Move `replayIntegration` from eager initialization to
`lazyLoadIntegration()`
- Removes ~300KB per copy (two copies were shipped) from the critical
path
- Replay still activates after page is interactive via
`Sentry.addIntegration`

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/10-lazy-sentry-replay.sh
```

6 checks: lazyLoadIntegration used, empty init integrations,
addIntegration for deferred loading, maskAllText/blockAllMedia
preserved.

## Test Plan

- [ ] Run reproducer (6/6 pass)
- [ ] Verify Sentry Replay still works after page load
2026-04-04 11:36:41 -05:00
Kevin Turcios
278aab2b11
test: add test coverage for server actions and withTiming (#2553)
## Summary

- Add 90 unit tests across 4 test files covering server action timing,
members, repository, and review-optimizations
- Add Vitest configuration with `@/` path alias matching Next.js
tsconfig
- Add global mock setup for Prisma, Sentry (nextjs + node), and
analytics

## Test Files

| File | Tests |
|------|-------|
| `server-action-timing.test.ts` | 24 (timing, slow detection, error
handling, Sentry spans) |
| `members/action.test.ts` | 14 (access control, member mapping, error
handling) |
| `repositories/action.test.ts` | 20 (parallel fetch, auth, is_active,
analytics) |
| `review-optimizations/action.test.ts` | 32 (both code paths, N+1 fix,
raw SQL, pagination, search, filter) |

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/09-test-coverage.sh
```

## Test Plan

- [ ] Run reproducer (11/11 pass)
- [ ] Run `npm test` to execute all tests
2026-04-04 11:35:19 -05:00
Kevin Turcios
566424e97f
feat: add server action timing and expand PostHog analytics (#2552)
## Summary

- Add `withTiming()` wrapper for server actions with Sentry span
reporting and slow action warnings (>1s)
- Add centralized `captureEvent()` helper for PostHog tracking
- Add 5 new PostHog tracking events: optimization_reviewed,
repository_connected, api_key_created, member_invited,
billing_page_viewed
- Instrument 4 server actions with `withTiming()`:
getOrganizationMembers, getRepositoryById,
getRepositoriesWithStagingEvents, getAllOptimizationEvents

## Proof of Correctness

See
[`js/cf-webapp/proof/08-server-action-timing.md`](js/cf-webapp/proof/08-server-action-timing.md)

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/08-server-action-timing.sh
```

21 checks verify: withTiming utility, 4 instrumented actions,
captureEvent helper, 5 tracking functions, and all tracking calls wired
into action files.

## Test Plan

- [ ] Run reproducer: `bash
proof/reproducers/08-server-action-timing.sh` (21/21 pass)
- [ ] Verify server actions still work correctly
- [ ] Check Sentry for `server.action` spans after deployment
2026-04-04 11:34:53 -05:00
Kevin Turcios
e0d76d4338
feat: add observability stack (OTel, Sentry tuning, Prisma logging, bundle-analyzer) (#2547)
## Summary

- Add OpenTelemetry distributed tracing with Sentry bridge
(`instrumentation.ts`) — dynamic imports so OTel is only loaded when
active
- Reduce Sentry `tracesSampleRate` from 100% to 10% in production
(server + client), cutting event volume ~90%
- Add `skipOpenTelemetrySetup` to prevent duplicate OTel SDK
initialization
- Add `browserTracingIntegration` with long animation frame detection
for Web Vitals
- Add Prisma slow query logging (>500ms) and error forwarding to Sentry
- Add `@next/bundle-analyzer` for on-demand CI bundle tracking
(`ANALYZE=true npm run build`)
- Fix Edge-incompatible OTel exports (`SentryContextManager`,
`validateOpenTelemetrySetup`)

## Proof of Correctness

See
[`js/cf-webapp/proof/07-observability-stack.md`](js/cf-webapp/proof/07-observability-stack.md)
for detailed analysis.

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/07-observability-stack.sh
```

The reproducer verifies (24 checks):
1. OTel SDK configured with Sentry bridge (NodeSDK, SentrySpanProcessor,
SentryPropagator, PrismaInstrumentation)
2. Dynamic imports (4 packages only loaded when tracing active)
3. Noisy instrumentations disabled (fs, dns, net)
4. Sentry 10% production sampling (server + client)
5. `skipOpenTelemetrySetup: true` prevents duplicate OTel
6. Prisma slow query logging + Sentry error forwarding
7. `@next/bundle-analyzer` wired into next.config.mjs
8. All 5 required packages in package.json
9. `browserTracingIntegration` with long animation frame detection

## New Dependencies

| Package | Purpose |
|---------|---------|
| `@opentelemetry/sdk-node` | OTel Node.js SDK |
| `@opentelemetry/auto-instrumentations-node` | Auto-instrumentation for
HTTP, Express, etc. |
| `@prisma/instrumentation` | Prisma query spans |
| `@sentry/opentelemetry` | OTel → Sentry bridge |
| `@next/bundle-analyzer` (dev) | Interactive bundle treemap |

## Test Plan

- [ ] Run reproducer: `bash proof/reproducers/07-observability-stack.sh`
(24/24 pass)
- [ ] Verify `npm run build` succeeds
- [ ] Verify `npm run analyze` generates bundle treemap
- [ ] Confirm no Edge runtime build errors
(SentryContextManager/validateOpenTelemetrySetup removed)
2026-04-04 11:27:10 -05:00
Kevin Turcios
d9faaf4722
perf: parallelize data fetches on repository detail page (#2546)
## Summary

- Parallelize `getRepositoryById` server action: repo fetch + auth check
now run via `Promise.all` instead of sequentially
- Parallelize 6 independent stats queries on the repository detail page
via `Promise.all`: optimization counts, time series data, PR event data,
and leaderboard
- Reduces repository detail page server-side latency from ~350ms (7
sequential round-trips) to ~100ms (2 parallel batches)

## Proof of Correctness

See
[`js/cf-webapp/proof/06-parallel-repo-page.md`](js/cf-webapp/proof/06-parallel-repo-page.md)
for detailed analysis.

## How to Verify

```bash
cd js/cf-webapp
bash proof/reproducers/06-parallel-repo-page.sh
```

The reproducer verifies:
1. `getRepositoryById` uses `Promise.all` for repo + auth
2. At least 5 of 6 stats queries are inside `Promise.all`
3. No sequential `await` patterns remain for stats queries
4. All stats queries are independent (take only `repositoryId`)

## Why This Is Real

1. **All 6 stats queries are independent** — each takes only
`repositoryId` and returns different data
2. **Repo fetch and auth check are independent** — `findFirst` needs
`repoId`, `getRepositoriesForAccountCached` needs `payload`
3. **Latency reduction is significant** — 6 sequential DB round-trips
become 1 parallel batch

## Test Plan

- [ ] Run reproducer script: `bash
proof/reproducers/06-parallel-repo-page.sh`
- [ ] Verify repository detail page loads correctly
- [ ] Confirm all stats widgets render with correct data
2026-04-04 11:26:58 -05:00
Kevin Turcios
025ed0a980
proof: parallelize members page fetches (8a039c52) (#2545)
## Proof of Correctness — Commit 5/22

**Optimization:** Run `getCurrentUserRole` and `getOrganizationMembers`
concurrently via `Promise.all` instead of sequentially.

**Claim:** Saves one round-trip latency. Total latency: `role_time +
members_time` → `max(role_time, members_time)`.

### Evidence

Both calls take `(userId, orgId)` and are independent — neither uses the
other's result. The sequential pattern was unnecessary.

### Reproducer

```bash
cd js/cf-webapp
bash proof/reproducers/05-parallel-members-page.sh
```

Verifies: both calls inside `Promise.all`, no sequential awaits remain,
no cross-dependency.

### Files
- `js/cf-webapp/proof/05-parallel-members-page.md` — proof
- `js/cf-webapp/proof/reproducers/05-parallel-members-page.sh` —
reproducer
2026-04-04 11:26:53 -05:00
Kevin Turcios
385692b3c3
proof: N+1 query elimination in getAllOptimizationEvents (25013adb) (#2544)
## Proof of Correctness — Commit 4/22

**Optimization:** Eliminate N+1 query pattern in
`getAllOptimizationEvents` — the server action powering the
review-optimizations page.

**Claim:** For a page of 10 events: 12–22 queries → 2–3 queries.

### Evidence

Two code paths, both had N+1 patterns:

**Raw SQL path (before → after):**
- Before: events query + count query + N per-event
`repositories.findUnique` = 12 queries
- After: single JOIN query (includes `r.full_name`, `r.name`, `r.id`) +
count query, run in parallel via `Promise.all` = 2 queries

**Prisma path (before → after):**
- Before: events query + N per-event `optimization_features.findUnique`
+ count query = 12 queries
- After: events + count in `Promise.all`, then 1 batch `findMany({
where: { trace_id: { in: traceIds } } })` with `Map` lookup = 3 queries

Also includes the `r.name` fix (originally 5c94f329) since it modifies
the same JOIN query.

### Reproducer

```bash
cd js/cf-webapp
bash proof/reproducers/04-n-plus-one-benchmark.sh
```

Static analysis that:
1. Confirms no `findUnique` inside `map()` loops remains (the N+1
pattern)
2. Verifies batch `findMany` with `IN` filter on trace_ids
3. Verifies `Promise.all` parallelizes events + count
4. Verifies raw SQL JOIN includes repository fields
5. Prints before/after query count comparison

### Files
- `js/cf-webapp/proof/04-n-plus-one-fix.md` — detailed proof
- `js/cf-webapp/proof/reproducers/04-n-plus-one-benchmark.sh` —
reproducer

### Reference
- Source commits: 25013adb + 5c94f329 from PR #2536
2026-04-04 11:26:47 -05:00
Kevin Turcios
fad39c934d
proof: PrismaClient singleton consolidation (16c5887a) (#2543)
## Proof of Correctness — Commit 3/22

**Optimization:** Replace 5 separate `new PrismaClient()` instances with
a shared singleton at `@/lib/prisma`.

**Claim:** Eliminates 5 independent connection pools → 1 shared pool
with `connection_limit=10`, `pool_timeout=20`. Prevents connection pool
exhaustion under concurrent requests.

### Evidence

Each `new PrismaClient()` creates its own query engine and PostgreSQL
connection pool (default 5 connections). With 5 instances, the app could
hold 25 connections simultaneously — a real risk against PostgreSQL's
hard limit (typically 100, often lower on Azure).

Files that had their own `new PrismaClient()`:
- `src/app/(dashboard)/apikeys/page.tsx`
- `src/app/(dashboard)/apikeys/tokenfuncs.ts`
- `src/app/api/traces/[trace_id]/save-modified-code/route.ts`
- `src/app/trace/[trace_id]/page.tsx`
- `src/lib/modified-code-utils.ts`

The singleton pattern is [Prisma's official recommendation for
Next.js](https://www.prisma.io/docs/orm/more/help-and-troubleshooting/help-articles/nextjs-prisma-client-dev-practices).

### Reproducer

```bash
cd js/cf-webapp
bash proof/reproducers/03-prisma-singleton.sh
```

Verifies:
1. No `new PrismaClient()` outside `src/lib/prisma.ts`
2. All 5 affected files import from `@/lib/prisma`
3. Singleton has connection pooling + globalThis caching
4. TypeScript compiles cleanly

### Files
- `js/cf-webapp/proof/03-prisma-singleton.md` — detailed proof
- `js/cf-webapp/proof/reproducers/03-prisma-singleton.sh` — reproducer

### Reference
- Source commit: 16c5887a from PR #2536
2026-04-04 11:26:33 -05:00
Kevin Turcios
b4d89c5cd3
proof: named diff import + Sentry import fix (36bd47b4) (#2540)
## Proof of Correctness — Commit 2/22

**Optimization:** Replace `import * as Diff` with named `import {
createPatch }` for tree-shaking, and `@sentry/browser` →
`@sentry/nextjs` to eliminate duplicate SDK bundle.

### Evidence

1. **`import *` prevents tree-shaking** — webpack/turbopack must include
all 15+ exports from the `diff` package. Named import allows dead code
elimination of unused functions (`diffChars`, `diffWords`, `diffLines`,
`structuredPatch`, etc.).

2. **`@sentry/browser` duplicates `@sentry/nextjs` core** — the app
already uses `@sentry/nextjs` everywhere. Having one file import
`@sentry/browser` pulls in a second copy of Sentry's core code.

### Reproducer

```bash
cd js/cf-webapp
bash proof/reproducers/02-named-diff-sentry-import.sh
```

Verifies via grep that:
- No `import * as Diff` remains
- No `@sentry/browser` imports remain  
- Only `createPatch` is used from the `diff` package
- All Sentry imports use `@sentry/nextjs`

For bundle size measurement, run with `MEASURE=1`:
```bash
MEASURE=1 bash proof/reproducers/02-named-diff-sentry-import.sh
```

### Files
- `js/cf-webapp/proof/02-named-diff-sentry-import.md` — detailed proof
- `js/cf-webapp/proof/reproducers/02-named-diff-sentry-import.sh` —
reproducer

### Reference
- Source commit: 36bd47b4 from PR #2536
2026-04-04 11:26:18 -05:00
Kevin Turcios
06d824d70e
proof: PrismLight switch benchmark (e249a1cf) (#2539)
## Proof of Correctness — Commit 1/22

**Optimization:** Replace full `react-syntax-highlighter/Prism` build
(300 language grammars via `refractor/all`) with PrismLight registering
only 11 used languages.

**Claim:** Client JS bundle 5,990 KB → 3,146 KB (−47.5%, −2,844 KB)

### Evidence

The default Prism import resolves to `refractor/all.js` — a barrel file
that eagerly imports grammar definitions for 300+ languages (each 3–10
KB). The app only uses 11: python, javascript, typescript, java, json,
css, html, bash, jsx, tsx, markup.

PrismLight uses `refractor/core` and requires explicit
`registerLanguage()` calls, eliminating ~289 unused grammars from the
bundle.

### Reproducer

```bash
cd js/cf-webapp
bash proof/reproducers/01-prismlight-benchmark.sh
```

This runs two real `next build` passes (baseline on main, then with only
the PrismLight diff applied) and prints the route tables side-by-side
for comparison.

### Files
- `js/cf-webapp/proof/01-prismlight-switch.md` — detailed proof document
- `js/cf-webapp/proof/reproducers/01-prismlight-benchmark.sh` —
reproducible benchmark

### Reference
- Source commit: e249a1cf from PR #2536
- [react-syntax-highlighter light build
docs](https://github.com/react-syntax-highlighter/react-syntax-highlighter#light-build)
2026-04-04 11:14:09 -05:00
mohammed ahmed
d10868e05e
Fix: Strip .js extensions from all test outputs in JS/TS testgen (#2551)
## Problem
Generated tests imported TypeScript files with `.js` extensions, causing
"Cannot find module" errors. The AI service was only stripping
extensions from `generated_test_source` but NOT from
`instrumented_behavior_tests` and `instrumented_perf_tests` (which the
CLI actually uses).

## Root Cause
**File:** `core/languages/js_ts/testgen.py:608`

Only `generated_test_source` received `strip_js_extensions()` call. The
CLI uses `instrumented_behavior_tests` from the response, which still
had incorrect `.js` extensions on TypeScript imports.

## Impact
- Affected **15 out of 20 test runs (~75% failure rate)**
- **Severity: HIGH** - systematic bug blocking all TypeScript projects
- **Error:** `Cannot find module '../../google.js'` when source is
`google.ts`

## Fix
Apply `strip_js_extensions()` to all three test output variants:
- `generated_test_source` (already done)
- `instrumented_behavior_tests`  **NEW**
- `instrumented_perf_tests`  **NEW**

## Testing
 All 32 existing JavaScript testgen tests pass
 Added regression test for extension stripping  
 Verified with `--rerun` on trace
`03899729-131e-4ff6-8149-c132bd888089`
 No "Cannot find module *.js" errors after fix

## References
**Trace IDs exhibiting this bug:**
- `03899729-131e-4ff6-8149-c132bd888089`
- `19446b34-cd22-4a38-b304-22c16ba86747`
- (and 13 others - see `/workspace/logs`)

**Related:** AI Service Reference doc section 10.1 issue #2

---------

Co-authored-by: mohammed ahmed <mohammedahmed18@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2026-04-04 16:30:41 +05:30
Aseem Saxena
c54904daf9
Merge pull request #2548 from codeflash-ai/fix/llm-close-errors
Fix: Handle LLM client close() errors gracefully
2026-04-03 14:37:35 -07:00
claude[bot]
681e8187ca fix: resolve mypy type errors in test file
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:21:57 +00:00
claude[bot]
2135849f27 style: auto-fix linting issues
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 19:20:57 +00:00
mohammed ahmed
4c4b497d2a Fix: Handle LLM client close() errors gracefully
**Issue**: The `/ai/optimization_review` endpoint was returning 500 errors
when trying to close LLM clients during event loop changes.

**Root Cause**: In `aiservice/llm.py` lines 96-99, the `close()` calls on
OpenAI and Anthropic clients were not wrapped in exception handlers. When
the httpx transport was already closed or in a bad state (e.g., event loop
closure, connection already closed), the exception would propagate and cause
the entire request to fail with a 500 error.

**Fix**: Wrapped both `openai_client.close()` and `anthropic_client.close()`
in try-except blocks that catch and log exceptions at DEBUG level. This
prevents transport errors from crashing requests while still attempting to
clean up resources properly.

**Impact**: Fixes 500 errors on `/ai/optimization_review` and other endpoints
that use the LLM client when event loops change or clients are in bad states.

**Testing**: Added `test_llm_client_close.py` with 2 test cases that verify:
1. Transport errors during close() are handled gracefully
2. Event loop closed errors are handled gracefully

**Traces**: 312d7392, 5bbdf214, a1325051
2026-04-03 19:19:19 +00:00
misrasaurabh1
76aa5db528 commit the initial plan 2026-04-03 12:18:40 -07:00
mohammed ahmed
b814e1e7e6
Merge pull request #2535 from codeflash-ai/fix/llm-client-event-loop-closure
fix: close old LLM clients when event loop changes
2026-04-03 19:00:36 +02:00
mohammed ahmed
b2debb96b7
Merge branch 'main' into fix/llm-client-event-loop-closure 2026-04-03 15:31:36 +02:00
Sarthak Agarwal
9bf81e7418
aiservice logs add and misc fix to track the errors (#2530)
# Pull Request Checklist

## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets

## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes

## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code

## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---

## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->

Co-authored-by: ali <mohammed18200118@gmail.com>
2026-04-03 16:50:45 +05:30
claude[bot]
35519b6e84 fix: resolve mypy type errors in test_llm_client.py 2026-04-03 07:25:08 +00:00
claude[bot]
20b0b01994 style: auto-fix linting issues
- ruff-format: reformat test file
- fix ty type error: cast mock clients to MagicMock for assert_called_once

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 07:23:09 +00:00
Codeflash Bot
322d8736c9 fix: close old LLM clients when event loop changes
This fixes a critical bug where old AsyncAzureOpenAI and AsyncAnthropicBedrock
clients were not being closed when the event loop changed, causing:

1. Connection pool exhaustion → "couldn't get a connection after 30.00 sec"
2. RuntimeError: Event loop is closed during httpx client cleanup

Root cause:
In LLMClient.call(), when the event loop changed, new clients were created
but old clients were not properly closed, leading to connection leaks.

Fix:
- Added await client.close() for both openai_client and anthropic_client
  before creating new instances
- Added comprehensive unit tests to verify proper cleanup

Impact:
- Resolves ~150+ test generation failures (500 errors)
- Fixes event loop closure errors in aiservice logs

Trace IDs affected: 04500fbd-88e0-44e4-8d20-32f6a0dc06cc (and many others)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-03 07:20:37 +00:00
Kevin Turcios
d04d0dbbd2
fix: remove middleware.ts conflicting with proxy.ts (#2534)
Removes middleware.ts added by #2532. Next.js 16 uses proxy.ts — having
both causes build failure.
2026-04-03 00:46:19 -05:00
Kevin Turcios
ba64b92eb4
fix: add /api/healthcheck to proxy.ts ignorePaths (#2533)
## Summary
- `/api/healthcheck` was returning 401 because `proxy.ts` requires auth
for all `/api/*` routes
- Application Gateway health probe got 401 → marked backend unhealthy →
**502 for all users**
- Adds `/api/healthcheck` to `ignorePaths` so it bypasses auth
- Also removes the erroneously added `middleware.ts` (Next.js 16 uses
`proxy.ts`)

## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
2026-04-03 00:44:28 -05:00
Kevin Turcios
081d0c15dd
fix: restore middleware.ts for Auth0 v4 healthcheck (#2532)
## Summary
- Auth0 v4 auto-generates middleware that protects all routes when no
`middleware.ts` exists
- This caused `/api/healthcheck` to return 401, making the Application
Gateway mark the backend as unhealthy → **502 for all users**
- Restores explicit middleware with Auth0 v4 API and excludes
`/api/healthcheck` from the matcher

## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
2026-04-02 23:37:13 -05:00
Kevin Turcios
5dca735fc8
Upgrade Next.js 14 → 16, React 18 → 19, and dependencies (#2385)
## Summary
- Upgrade Next.js 14.2 → 16.1, React 18 → 19, React DOM 18 → 19
- Upgrade @sentry/nextjs 9 → 10, @auth0/nextjs-auth0 3 → 4, ESLint 8 → 9
- Migrate all async request APIs (cookies, params, searchParams are now
Promises)
- Migrate middleware.ts → proxy.ts (Next.js 16 convention)
- Rewrite ESLint config for flat config format
- New Auth0Client setup with backward-compatible AUTH0_DOMAIN derivation
- Turbopack browser-only resolveAlias for web-tree-sitter Node.js stubs

## Test plan
- [ ] `npm run build` passes
- [ ] `npm run lint` passes (0 errors, warnings only from React Compiler
rules)
- [ ] `npm run type-check` passes
- [ ] `npm run dev` starts successfully with Turbopack
- [ ] Auth login/logout flow works end-to-end
- [ ] Verify `AUTH0_DOMAIN` or `AUTH0_ISSUER_BASE_URL` env var is set in
deployment
2026-04-02 22:38:01 -05:00
Kevin Turcios
c2feaf91f0
fix: return 422 for operational failures instead of 500 across all endpoints (#2528)
## Summary
- Return **422 Unprocessable Entity** instead of 500 for known
operational failures (LLM output parsing failures, no valid candidates
produced, invalid rankings, etc.) across all aiservice endpoints
- Keeps 500 for genuine internal errors (bare `except Exception`
catch-alls that could include DB/network failures)
- Adds `422` to Django-Ninja response schemas so the framework
serializes responses correctly

## Endpoints changed
| Endpoint | Failure type | Old | New |
|---|---|---|---|
| `/ai/testgen` | `TestGenerationFailedError`, `ParserSyntaxError` | 500
| 422 |
| `/ai/optimize` | No valid candidates generated | 500 | 422 |
| `/ai/optimize-line-profiler` | No optimizations generated | 500 | 422
|
| `/ai/adaptive_optimize` | LLM parse error, no candidate | 500 | 422 |
| `/ai/code_repair` | LLM error, `ParserSyntaxError`, `ValidationError`
| 500 | 422 |
| `/ai/rank` | Invalid ranking from LLM | 500 | 422 |
| `/ai/explain` | LLM failure, XML parse failure | 500 | 422 |
| `/ai/optimization_review` | JSON parse failure, no JSON block | 500 |
422 |

## Why
These endpoints were returning 500 for expected outcomes (e.g., LLM
returning unparseable output), which triggered Azure 5xx alerts and
inflated error metrics. 422 correctly signals that the request was
understood but the server couldn't produce a valid result.

## Test plan
- [x] `uv run pytest -x -q -k "optimizer or rank or explain or
code_repair or review"` — 199 passed
- [ ] Verify Azure 5xx alert rate drops after deploy

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:37:27 -05:00
Kevin Turcios
e7f4bb40b3
perf: lazy debug_log_sensitive_data to skip model_dump_json in production (#2527)
## Summary

- Convert
`debug_log_sensitive_data(f"...{response.model_dump_json(indent=2)}")`
to `debug_log_sensitive_data_from_callable(lambda: ...)` across 8
endpoint files
- In production, `debug_log_sensitive_data` is a no-op but the f-string
interpolation (including `model_dump_json(indent=2)`) was always
evaluated — serializing the full LLM response to JSON on every call
- The `_from_callable` variant only invokes the lambda when debug
logging is active (non-production)
- **Fix pre-existing bug**: `log_response()` closures in 4 endpoint
files returned `None` instead of a string, causing
`debug_log_sensitive_data_from_callable` to log `None`. Now they return
the concatenated log string as expected by the callable-based API.

Affected endpoints: Python optimizer, line profiler, jit_rewrite, Java
optimizer, Java line profiler, JS/TS optimizer, JS/TS line profiler,
testgen.

## Test plan

- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean
- [ ] Verify debug logging still works in non-production environments

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-04-02 19:37:25 -05:00
Kevin Turcios
0029a0e76e
perf: optimize postprocessing pipeline — eliminate redundant CST codegen (#2526)
## Summary

- Replace Pydantic frozen dataclass with stdlib
`@dataclass(frozen=True)` for `CodeExplanationAndID` and
`CodeAndExplanation`, removing `field_validator` that ran `.code` +
`compile()` ~280 times per pipeline run
- Pre-compute `original_module.code` once and pass to pipeline steps
(`clean_extraneous_comments`, `equality_check`) that previously called
it independently
- Replace `ast.dump(annotate_fields=False)` with `ast.unparse` in
`deduplicate_optimizations` (70% faster)
- Skip re-parse in `dedup_and_sort_imports` when isort returns unchanged
code
- Cache comment-stripped original code across candidates in
`clean_extraneous_comments`

**Pipeline median per-run: ~1.5s → 184ms** (4 candidates, controlled
measurement). Saves ~4-5s of CPU per optimization request in production.

## Test plan

- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean (no new warnings)
- [ ] Verify optimizer endpoints return correct results in staging

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
2026-04-02 19:37:15 -05:00
Kevin Turcios
d0e97992d6
perf: fire-and-forget logging to reduce response latency 100-300ms (#2525)
## Summary

- Move `safe_log_features()` and `update_optimization_cost()` out of
blocking `TaskGroup`s into fire-and-forget background tasks across 4
optimization endpoints (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer)
- These DB writes are analytics-only and don't affect response bodies —
waiting for them adds 100-300ms per request unnecessarily
- Add `aiservice/background.py` with `fire_and_forget()` helper using
the same `set` + `add_done_callback` pattern already used in `LLMClient`
- `get_or_create_optimization_event()` remains awaited where the
response needs `event.id`

## Test plan

- [x] All 550 tests pass locally
- [ ] Verify response latency improvement in production metrics after
deploy
- [ ] Confirm `safe_log_features` and `update_optimization_cost` still
complete successfully in background (check DB records)

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 11:37:52 -05:00
mohammed ahmed
de0f30ae15
Fix: Strip .js extensions from vi.mock() calls in Vitest tests (#2524)
## Summary

Vitest tests were failing with "Cannot find module" errors because
`vi.mock()` calls retained `.js` extensions while imports had them
stripped, causing mock/import path mismatch in ESM mode.

## Root Cause

The `strip_js_extensions()` function in `testgen.py` only handled
`jest.mock()` but not `vi.mock()`, which is used by Vitest. The pattern
`_JEST_MOCK_EXTENSION_PATTERN` matched Jest mocking functions but not
Vitest's `vi.*` equivalents.

## Fix

Added `_VITEST_MOCK_EXTENSION_PATTERN` regex to match and strip
extensions from:
- `vi.mock()`
- `vi.doMock()`
- `vi.unmock()`
- `vi.requireActual()`
- `vi.requireMock()`
- `vi.importActual()`
- `vi.importMock()`

## Affected Trace IDs

- `0fe99c9f-b348-4f0a-b051-0ea9455231ba`
- `127cdaec-a343-4918-a86a-b646dd4d79cf`
- `2b6c896e-20d7-4505-8bf4-e4a2f20b37fc`

These trace IDs exhibited the bug where generated tests had
`vi.mock('../config/paths.js')` but imports had `from
'../config/paths'`, causing module resolution failures.

## Test Coverage

- Added 8 new tests in `TestStripJsExtensions` class
- All 31 tests in `test_testgen_javascript.py` pass
- Specific regression test for vi.mock() extension stripping
- Tests cover all vi.mock variants and edge cases

## Files Changed

- `django/aiservice/core/languages/js_ts/testgen.py` (fix)
- `django/aiservice/tests/testgen/test_testgen_javascript.py` (tests)

---------

Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
2026-04-02 21:50:45 +05:30
mohammed ahmed
179302d006
Fix test generation replay 500 error when arrays contain None values (#2521)
## Summary

Fixes 500 Internal Server Error when replaying test generation with
`--rerun` flag and database arrays contain `None`/`NULL` values.

## Root Cause

The `rerun_testgen()` function in `core/shared/replay.py` accessed array
elements without checking if they were `None`. When PostgreSQL arrays
contained `NULL` values (e.g., `generated_test = [NULL, 'test2']`), the
function returned a `TestGenResponseSchema` with `None` values, causing
Pydantic validation to fail:

```
pydantic_core._pydantic_core.ValidationError: 2 validation errors for TestGenResponseSchema
generated_tests
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
instrumented_behavior_tests
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
```

## Changes

Added explicit `None` checks before creating `TestGenResponseSchema`:
- If `generated_test[index]` or `instrumented_generated_test[index]` is
`None`, return `None` (skip this test)
- If `instrumented_perf_test[index]` is `None`, default to empty string
(non-critical field)

## Impact

Resolves **10+ replay failures** where test generation produced partial
results stored as `NULL` in database arrays.

## Test Coverage

Added comprehensive test suite for `replay.py`:
- `test_rerun_with_valid_test_data()` - Happy path
- `test_rerun_with_none_values_in_arrays()` - **Primary bug fix test**
- `test_rerun_with_index_out_of_bounds()` - Boundary conditions
- `test_rerun_with_empty_arrays()` - Empty data handling
- `test_rerun_with_none_arrays()` - NULL arrays
- `test_rerun_with_mismatched_array_lengths()` - Length mismatches
- `test_rerun_missing_perf_test()` - Missing perf data

All 7 tests pass.

## Trace IDs

This fix addresses errors seen in traces:
- Primary: `056561cc-94af-4d7b-ac79-85dfd4b7282d`
- And 9 additional trace IDs with the same "500 - Error generating
JavaScript tests" error

## Verification

Tested with original failing trace:
```bash
cd /workspace/target && codeflash --file src/daemon/constants.ts --function formatGatewayServiceDescription --rerun 056561cc-94af-4d7b-ac79-85dfd4b7282d
```

**Before fix:** `ERROR: 500 - Traceback... ValidationError: Input should
be a valid string [type=string_type, input_value=None]`
**After fix:** Gracefully skips None entries, no 500 error 

---------

Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 21:49:58 +05:30
Kevin Turcios
d504f111a7
fix: plug memory leak from LogRecord buffering and unblock async event loop (#2523)
## Summary

- **Memory leak fix**: Added explicit `LOGGING` config in `settings.py`
to prevent unbounded `LogRecord` buffering. Django's `django.request`
logger creates WARNING records for 4xx responses with the full
`ASGIRequest` (headers, body, payload) pinned in `args`. Without
explicit config, Django's default handlers and Sentry's
`enable_logs=True` buffer these indefinitely. Setting `django.request`
to ERROR level + removing `enable_logs=True` eliminated the leak — load
testing showed **84% reduction** in per-request memory growth (7.4 → 1.2
KiB/req).

- **Async event loop fix**: Wrapped
`parse_and_generate_candidate_schema()` in `asyncio.to_thread()` across
all 4 async callers (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer). This offloads the synchronous libcst parsing +
8-stage postprocessing pipeline to the thread pool, preventing it from
blocking the event loop during peak traffic.

## Test plan

- [x] All 550 tests pass (`uv run pytest tests/ --ignore=tests/profiling
-x -q`)
- [ ] Monitor Azure memory alerts after deploy — expect significant
reduction in memory growth rate
- [ ] Monitor 5xx error rate during peak traffic — expect reduction from
event loop no longer blocked by sync postprocessing

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:57:58 -05:00
Kevin Turcios
df90110fe8
fix: prevent log_features from 500ing optimization endpoints (#2518)
## Summary

- **`thread_sensitive=False`** on `sync_to_async` so concurrent
`log_features` calls get their own threads instead of serializing
through one (was `True`, causing a bottleneck)
- **Raised DB pool `max_size` from 10 to 100** — prod Postgres allows
859 connections, giving plenty of headroom
- **Added `safe_log_features` wrapper** that catches errors via Sentry
instead of propagating — used at all 9 TaskGroup and bare-await call
sites so a logging failure can't crash an otherwise successful
optimization endpoint
- **Kept `transaction.atomic` + `select_for_update`** for correctness
(Django doesn't support async transactions yet, and removing these
causes lost-update races on dict-merge fields)

## Root cause

`log_features` uses `@sync_to_async` + `@transaction.atomic` because
Django lacks async transaction support. The previous fix for pool
exhaustion changed `thread_sensitive=False` to `True`, which serialized
all calls through a single thread — fixing pool exhaustion but creating
a throughput bottleneck that caused 500s under load. Additionally, 6
call sites used `asyncio.TaskGroup` where any `log_features` exception
would propagate and crash the entire endpoint.

## Test plan

- [x] `tests/log_features/test_log_features_concurrency.py` — verifies
`thread_sensitive=False` and `safe_log_features` is async
- [x] `ruff check` passes on all changed files
- [ ] Deploy to staging and verify no 500s under concurrent optimization
requests
2026-04-02 06:51:20 -05:00
mohammed ahmed
c4222a4aeb
Merge pull request #2508 from codeflash-ai/fix/js-import-resolution-detect-export-style
Fix JS/TS import resolution to detect export style from source code
2026-04-02 10:51:10 +02:00
claude[bot]
76c605c6d9 style: auto-fix linting issues 2026-04-01 17:22:46 +00:00
mohammed ahmed
868a9d5d37
Merge pull request #2511 from codeflash-ai/codeflash/optimize-pr2508-2026-04-01T17.18.19
️ Speed up function `_resolve_import` by 1,027% in PR #2508 (`fix/js-import-resolution-detect-export-style`)
2026-04-01 19:20:42 +02:00
codeflash-ai[bot]
dd518c18aa
Optimize _resolve_import
The optimization hoisted the 70-element `reserved_words` set out of `_is_valid_js_identifier` into a module-level `frozenset`, eliminating 1677 repeated set constructions that consumed 1.79 ms per profiler (42% of that function's time). More significantly, `_detect_export_style` previously compiled six regex patterns on every invocation via f-string interpolation with `escaped_id`; the optimized version pre-compiles generic patterns once at module load and uses `finditer` plus manual identifier comparison, cutting the function's runtime from 3.17 s to 14.7 ms across 1146 calls—a 99.5% reduction that accounts for nearly all of the 10× speedup. Test annotations confirm the largest gains occur in the `test_large_scale_many_class_methods_with_alternating_export_styles` case (107 ms → 4.66 ms), where repeated export detection dominated.
2026-04-01 17:18:23 +00:00
claude[bot]
534d0317b1 style: auto-fix linting issues
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 17:10:38 +00:00
ali
bb747096c8
fix existing unit tests 2026-04-01 19:07:25 +02:00