## Summary
- Fix infinite refetch loop in the dashboard sidebar that fires hundreds
of POST+GET requests per second
- The `subscriptionFetchRef` was reset in `finally()`, allowing
re-entrancy: fetch → `setSubscription` → re-render → ref is `false` →
fetch again → infinite loop
- Move the ref reset to the effect cleanup function so it only resets
when `mode` actually changes
## Note: Auth0 favicon 404
The Auth0 login page at `codeflash-ai.us.auth0.com` returns a 404 for
`/favicon.ico`. This is configured in **Auth0 Dashboard > Branding >
Universal Login**, not in application code. Upload the Codeflash favicon
there to resolve.
## Test plan
- [ ] Navigate to dashboard, open Network tab — confirm no repeated
POST/GET polling
- [ ] Switch between personal/org mode — confirm subscription data still
loads correctly
- [ ] Verify sidebar subscription usage display still renders
## Summary
- Runs `llm_calls.findUnique` and `optimization_errors.findMany` in
parallel via `Promise.all`
- Both queries use `params.id` directly — no data dependency between
them
- Page load time reduced from sum to max of both queries
## Evidence
- Proof doc: `js/cf-webapp/proof/17-parallel-llm-call-detail.md`
## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/17-parallel-llm-call-detail.sh` — 5/5
checks pass
- [ ] LLM call detail page renders correctly with error list
## Summary
- Wraps `optimization_features.findUnique` in `React.cache()` so
`generateMetadata()` and the page component share one DB hit
- Eliminates redundant query — before: 2 identical queries per page
load, after: 1
- Replaces inline type annotation with `Awaited<ReturnType<...>>` for
DRY types
## Evidence
- Proof doc: `js/cf-webapp/proof/16-react-cache-dedup.md`
## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/16-react-cache-dedup.sh` —
5/5 checks pass
- [ ] Trace detail page renders correctly with metadata title
## Summary
- Runs `optimization_events.findFirst` and
`optimization_features.findUnique` in parallel via `Promise.all`
- The features query only needs `trace_id` (a parameter), not the event
result, making the queries independent
- Wall-clock time goes from sum of both queries to max of either
## Evidence
- Proof doc: `js/cf-webapp/proof/15-parallel-optimization-event.md`
## Test plan
- [ ] `bash
js/cf-webapp/proof/reproducers/15-parallel-optimization-event.sh` — 6/6
checks pass
- [ ] Optimization review page loads correctly with review quality data
## Summary
- Converts `PostHogClient()` to singleton pattern — reuses one `PostHog`
instance instead of creating new ones per call
- Replaces `shutdown()` with `flush()` across 5 files (6 call sites) —
flush sends events without destroying the shared client
## Evidence
- Before: each call to `PostHogClient()` creates new HTTP connection +
event queue
- After: single instance reused across all server components/actions in
the same process
- Proof doc: `js/cf-webapp/proof/14-posthog-singleton.md`
## Test plan
- [ ] `bash js/cf-webapp/proof/reproducers/14-posthog-singleton.sh` —
9/9 checks pass
- [ ] PostHog events still appear in PostHog dashboard after deployment
## Summary
Replace `@sentry/node` import with `@sentry/nextjs` in the repository
action. `@sentry/nextjs` already re-exports all server-side APIs, so
importing `@sentry/node` separately pulls in a duplicate SDK.
## How to Verify
```bash
cd js/cf-webapp
bash proof/reproducers/11-sentry-nextjs-consistency.sh
```
3 checks: no @sentry/node in app/, repository action uses
@sentry/nextjs, all app/ Sentry imports consistent.
## Summary
- Move `replayIntegration` from eager initialization to
`lazyLoadIntegration()`
- Removes ~300KB per copy (two copies were shipped) from the critical
path
- Replay still activates after page is interactive via
`Sentry.addIntegration`
## How to Verify
```bash
cd js/cf-webapp
bash proof/reproducers/10-lazy-sentry-replay.sh
```
6 checks: lazyLoadIntegration used, empty init integrations,
addIntegration for deferred loading, maskAllText/blockAllMedia
preserved.
## Test Plan
- [ ] Run reproducer (6/6 pass)
- [ ] Verify Sentry Replay still works after page load
## Summary
- Add `withTiming()` wrapper for server actions with Sentry span
reporting and slow action warnings (>1s)
- Add centralized `captureEvent()` helper for PostHog tracking
- Add 5 new PostHog tracking events: optimization_reviewed,
repository_connected, api_key_created, member_invited,
billing_page_viewed
- Instrument 4 server actions with `withTiming()`:
getOrganizationMembers, getRepositoryById,
getRepositoriesWithStagingEvents, getAllOptimizationEvents
## Proof of Correctness
See
[`js/cf-webapp/proof/08-server-action-timing.md`](js/cf-webapp/proof/08-server-action-timing.md)
## How to Verify
```bash
cd js/cf-webapp
bash proof/reproducers/08-server-action-timing.sh
```
21 checks verify: withTiming utility, 4 instrumented actions,
captureEvent helper, 5 tracking functions, and all tracking calls wired
into action files.
## Test Plan
- [ ] Run reproducer: `bash
proof/reproducers/08-server-action-timing.sh` (21/21 pass)
- [ ] Verify server actions still work correctly
- [ ] Check Sentry for `server.action` spans after deployment
## Summary
- Parallelize `getRepositoryById` server action: repo fetch + auth check
now run via `Promise.all` instead of sequentially
- Parallelize 6 independent stats queries on the repository detail page
via `Promise.all`: optimization counts, time series data, PR event data,
and leaderboard
- Reduces repository detail page server-side latency from ~350ms (7
sequential round-trips) to ~100ms (2 parallel batches)
## Proof of Correctness
See
[`js/cf-webapp/proof/06-parallel-repo-page.md`](js/cf-webapp/proof/06-parallel-repo-page.md)
for detailed analysis.
## How to Verify
```bash
cd js/cf-webapp
bash proof/reproducers/06-parallel-repo-page.sh
```
The reproducer verifies:
1. `getRepositoryById` uses `Promise.all` for repo + auth
2. At least 5 of 6 stats queries are inside `Promise.all`
3. No sequential `await` patterns remain for stats queries
4. All stats queries are independent (take only `repositoryId`)
## Why This Is Real
1. **All 6 stats queries are independent** — each takes only
`repositoryId` and returns different data
2. **Repo fetch and auth check are independent** — `findFirst` needs
`repoId`, `getRepositoriesForAccountCached` needs `payload`
3. **Latency reduction is significant** — 6 sequential DB round-trips
become 1 parallel batch
## Test Plan
- [ ] Run reproducer script: `bash
proof/reproducers/06-parallel-repo-page.sh`
- [ ] Verify repository detail page loads correctly
- [ ] Confirm all stats widgets render with correct data
## Proof of Correctness — Commit 5/22
**Optimization:** Run `getCurrentUserRole` and `getOrganizationMembers`
concurrently via `Promise.all` instead of sequentially.
**Claim:** Saves one round-trip latency. Total latency: `role_time +
members_time` → `max(role_time, members_time)`.
### Evidence
Both calls take `(userId, orgId)` and are independent — neither uses the
other's result. The sequential pattern was unnecessary.
### Reproducer
```bash
cd js/cf-webapp
bash proof/reproducers/05-parallel-members-page.sh
```
Verifies: both calls inside `Promise.all`, no sequential awaits remain,
no cross-dependency.
### Files
- `js/cf-webapp/proof/05-parallel-members-page.md` — proof
- `js/cf-webapp/proof/reproducers/05-parallel-members-page.sh` —
reproducer
## Proof of Correctness — Commit 3/22
**Optimization:** Replace 5 separate `new PrismaClient()` instances with
a shared singleton at `@/lib/prisma`.
**Claim:** Eliminates 5 independent connection pools → 1 shared pool
with `connection_limit=10`, `pool_timeout=20`. Prevents connection pool
exhaustion under concurrent requests.
### Evidence
Each `new PrismaClient()` creates its own query engine and PostgreSQL
connection pool (default 5 connections). With 5 instances, the app could
hold 25 connections simultaneously — a real risk against PostgreSQL's
hard limit (typically 100, often lower on Azure).
Files that had their own `new PrismaClient()`:
- `src/app/(dashboard)/apikeys/page.tsx`
- `src/app/(dashboard)/apikeys/tokenfuncs.ts`
- `src/app/api/traces/[trace_id]/save-modified-code/route.ts`
- `src/app/trace/[trace_id]/page.tsx`
- `src/lib/modified-code-utils.ts`
The singleton pattern is [Prisma's official recommendation for
Next.js](https://www.prisma.io/docs/orm/more/help-and-troubleshooting/help-articles/nextjs-prisma-client-dev-practices).
### Reproducer
```bash
cd js/cf-webapp
bash proof/reproducers/03-prisma-singleton.sh
```
Verifies:
1. No `new PrismaClient()` outside `src/lib/prisma.ts`
2. All 5 affected files import from `@/lib/prisma`
3. Singleton has connection pooling + globalThis caching
4. TypeScript compiles cleanly
### Files
- `js/cf-webapp/proof/03-prisma-singleton.md` — detailed proof
- `js/cf-webapp/proof/reproducers/03-prisma-singleton.sh` — reproducer
### Reference
- Source commit: 16c5887a from PR #2536
## Proof of Correctness — Commit 2/22
**Optimization:** Replace `import * as Diff` with named `import {
createPatch }` for tree-shaking, and `@sentry/browser` →
`@sentry/nextjs` to eliminate duplicate SDK bundle.
### Evidence
1. **`import *` prevents tree-shaking** — webpack/turbopack must include
all 15+ exports from the `diff` package. Named import allows dead code
elimination of unused functions (`diffChars`, `diffWords`, `diffLines`,
`structuredPatch`, etc.).
2. **`@sentry/browser` duplicates `@sentry/nextjs` core** — the app
already uses `@sentry/nextjs` everywhere. Having one file import
`@sentry/browser` pulls in a second copy of Sentry's core code.
### Reproducer
```bash
cd js/cf-webapp
bash proof/reproducers/02-named-diff-sentry-import.sh
```
Verifies via grep that:
- No `import * as Diff` remains
- No `@sentry/browser` imports remain
- Only `createPatch` is used from the `diff` package
- All Sentry imports use `@sentry/nextjs`
For bundle size measurement, run with `MEASURE=1`:
```bash
MEASURE=1 bash proof/reproducers/02-named-diff-sentry-import.sh
```
### Files
- `js/cf-webapp/proof/02-named-diff-sentry-import.md` — detailed proof
- `js/cf-webapp/proof/reproducers/02-named-diff-sentry-import.sh` —
reproducer
### Reference
- Source commit: 36bd47b4 from PR #2536
## Proof of Correctness — Commit 1/22
**Optimization:** Replace full `react-syntax-highlighter/Prism` build
(300 language grammars via `refractor/all`) with PrismLight registering
only 11 used languages.
**Claim:** Client JS bundle 5,990 KB → 3,146 KB (−47.5%, −2,844 KB)
### Evidence
The default Prism import resolves to `refractor/all.js` — a barrel file
that eagerly imports grammar definitions for 300+ languages (each 3–10
KB). The app only uses 11: python, javascript, typescript, java, json,
css, html, bash, jsx, tsx, markup.
PrismLight uses `refractor/core` and requires explicit
`registerLanguage()` calls, eliminating ~289 unused grammars from the
bundle.
### Reproducer
```bash
cd js/cf-webapp
bash proof/reproducers/01-prismlight-benchmark.sh
```
This runs two real `next build` passes (baseline on main, then with only
the PrismLight diff applied) and prints the route tables side-by-side
for comparison.
### Files
- `js/cf-webapp/proof/01-prismlight-switch.md` — detailed proof document
- `js/cf-webapp/proof/reproducers/01-prismlight-benchmark.sh` —
reproducible benchmark
### Reference
- Source commit: e249a1cf from PR #2536
- [react-syntax-highlighter light build
docs](https://github.com/react-syntax-highlighter/react-syntax-highlighter#light-build)
## Problem
Generated tests imported TypeScript files with `.js` extensions, causing
"Cannot find module" errors. The AI service was only stripping
extensions from `generated_test_source` but NOT from
`instrumented_behavior_tests` and `instrumented_perf_tests` (which the
CLI actually uses).
## Root Cause
**File:** `core/languages/js_ts/testgen.py:608`
Only `generated_test_source` received `strip_js_extensions()` call. The
CLI uses `instrumented_behavior_tests` from the response, which still
had incorrect `.js` extensions on TypeScript imports.
## Impact
- Affected **15 out of 20 test runs (~75% failure rate)**
- **Severity: HIGH** - systematic bug blocking all TypeScript projects
- **Error:** `Cannot find module '../../google.js'` when source is
`google.ts`
## Fix
Apply `strip_js_extensions()` to all three test output variants:
- `generated_test_source` (already done)
- `instrumented_behavior_tests` ✨ **NEW**
- `instrumented_perf_tests` ✨ **NEW**
## Testing
✅ All 32 existing JavaScript testgen tests pass
✅ Added regression test for extension stripping
✅ Verified with `--rerun` on trace
`03899729-131e-4ff6-8149-c132bd888089`
✅ No "Cannot find module *.js" errors after fix
## References
**Trace IDs exhibiting this bug:**
- `03899729-131e-4ff6-8149-c132bd888089`
- `19446b34-cd22-4a38-b304-22c16ba86747`
- (and 13 others - see `/workspace/logs`)
**Related:** AI Service Reference doc section 10.1 issue #2
---------
Co-authored-by: mohammed ahmed <mohammedahmed18@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
**Issue**: The `/ai/optimization_review` endpoint was returning 500 errors
when trying to close LLM clients during event loop changes.
**Root Cause**: In `aiservice/llm.py` lines 96-99, the `close()` calls on
OpenAI and Anthropic clients were not wrapped in exception handlers. When
the httpx transport was already closed or in a bad state (e.g., event loop
closure, connection already closed), the exception would propagate and cause
the entire request to fail with a 500 error.
**Fix**: Wrapped both `openai_client.close()` and `anthropic_client.close()`
in try-except blocks that catch and log exceptions at DEBUG level. This
prevents transport errors from crashing requests while still attempting to
clean up resources properly.
**Impact**: Fixes 500 errors on `/ai/optimization_review` and other endpoints
that use the LLM client when event loops change or clients are in bad states.
**Testing**: Added `test_llm_client_close.py` with 2 test cases that verify:
1. Transport errors during close() are handled gracefully
2. Event loop closed errors are handled gracefully
**Traces**: 312d7392, 5bbdf214, a1325051
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
Co-authored-by: ali <mohammed18200118@gmail.com>
- ruff-format: reformat test file
- fix ty type error: cast mock clients to MagicMock for assert_called_once
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This fixes a critical bug where old AsyncAzureOpenAI and AsyncAnthropicBedrock
clients were not being closed when the event loop changed, causing:
1. Connection pool exhaustion → "couldn't get a connection after 30.00 sec"
2. RuntimeError: Event loop is closed during httpx client cleanup
Root cause:
In LLMClient.call(), when the event loop changed, new clients were created
but old clients were not properly closed, leading to connection leaks.
Fix:
- Added await client.close() for both openai_client and anthropic_client
before creating new instances
- Added comprehensive unit tests to verify proper cleanup
Impact:
- Resolves ~150+ test generation failures (500 errors)
- Fixes event loop closure errors in aiservice logs
Trace IDs affected: 04500fbd-88e0-44e4-8d20-32f6a0dc06cc (and many others)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Summary
- `/api/healthcheck` was returning 401 because `proxy.ts` requires auth
for all `/api/*` routes
- Application Gateway health probe got 401 → marked backend unhealthy →
**502 for all users**
- Adds `/api/healthcheck` to `ignorePaths` so it bypasses auth
- Also removes the erroneously added `middleware.ts` (Next.js 16 uses
`proxy.ts`)
## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
## Summary
- Auth0 v4 auto-generates middleware that protects all routes when no
`middleware.ts` exists
- This caused `/api/healthcheck` to return 401, making the Application
Gateway mark the backend as unhealthy → **502 for all users**
- Restores explicit middleware with Auth0 v4 API and excludes
`/api/healthcheck` from the matcher
## Test plan
- [ ] `/api/healthcheck` returns 200 without auth
- [ ] Authenticated routes still require login
- [ ] Application Gateway backend health shows Healthy
## Summary
- Convert
`debug_log_sensitive_data(f"...{response.model_dump_json(indent=2)}")`
to `debug_log_sensitive_data_from_callable(lambda: ...)` across 8
endpoint files
- In production, `debug_log_sensitive_data` is a no-op but the f-string
interpolation (including `model_dump_json(indent=2)`) was always
evaluated — serializing the full LLM response to JSON on every call
- The `_from_callable` variant only invokes the lambda when debug
logging is active (non-production)
- **Fix pre-existing bug**: `log_response()` closures in 4 endpoint
files returned `None` instead of a string, causing
`debug_log_sensitive_data_from_callable` to log `None`. Now they return
the concatenated log string as expected by the callable-based API.
Affected endpoints: Python optimizer, line profiler, jit_rewrite, Java
optimizer, Java line profiler, JS/TS optimizer, JS/TS line profiler,
testgen.
## Test plan
- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean
- [ ] Verify debug logging still works in non-production environments
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Replace Pydantic frozen dataclass with stdlib
`@dataclass(frozen=True)` for `CodeExplanationAndID` and
`CodeAndExplanation`, removing `field_validator` that ran `.code` +
`compile()` ~280 times per pipeline run
- Pre-compute `original_module.code` once and pass to pipeline steps
(`clean_extraneous_comments`, `equality_check`) that previously called
it independently
- Replace `ast.dump(annotate_fields=False)` with `ast.unparse` in
`deduplicate_optimizations` (70% faster)
- Skip re-parse in `dedup_and_sort_imports` when isort returns unchanged
code
- Cache comment-stripped original code across candidates in
`clean_extraneous_comments`
**Pipeline median per-run: ~1.5s → 184ms** (4 candidates, controlled
measurement). Saves ~4-5s of CPU per optimization request in production.
## Test plan
- [x] All 558 unit tests pass
- [x] mypy clean
- [x] ruff clean (no new warnings)
- [ ] Verify optimizer endpoints return correct results in staging
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
## Summary
- Move `safe_log_features()` and `update_optimization_cost()` out of
blocking `TaskGroup`s into fire-and-forget background tasks across 4
optimization endpoints (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer)
- These DB writes are analytics-only and don't affect response bodies —
waiting for them adds 100-300ms per request unnecessarily
- Add `aiservice/background.py` with `fire_and_forget()` helper using
the same `set` + `add_done_callback` pattern already used in `LLMClient`
- `get_or_create_optimization_event()` remains awaited where the
response needs `event.id`
## Test plan
- [x] All 550 tests pass locally
- [ ] Verify response latency improvement in production metrics after
deploy
- [ ] Confirm `safe_log_features` and `update_optimization_cost` still
complete successfully in background (check DB records)
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Vitest tests were failing with "Cannot find module" errors because
`vi.mock()` calls retained `.js` extensions while imports had them
stripped, causing mock/import path mismatch in ESM mode.
## Root Cause
The `strip_js_extensions()` function in `testgen.py` only handled
`jest.mock()` but not `vi.mock()`, which is used by Vitest. The pattern
`_JEST_MOCK_EXTENSION_PATTERN` matched Jest mocking functions but not
Vitest's `vi.*` equivalents.
## Fix
Added `_VITEST_MOCK_EXTENSION_PATTERN` regex to match and strip
extensions from:
- `vi.mock()`
- `vi.doMock()`
- `vi.unmock()`
- `vi.requireActual()`
- `vi.requireMock()`
- `vi.importActual()`
- `vi.importMock()`
## Affected Trace IDs
- `0fe99c9f-b348-4f0a-b051-0ea9455231ba`
- `127cdaec-a343-4918-a86a-b646dd4d79cf`
- `2b6c896e-20d7-4505-8bf4-e4a2f20b37fc`
These trace IDs exhibited the bug where generated tests had
`vi.mock('../config/paths.js')` but imports had `from
'../config/paths'`, causing module resolution failures.
## Test Coverage
- Added 8 new tests in `TestStripJsExtensions` class
- All 31 tests in `test_testgen_javascript.py` pass
- Specific regression test for vi.mock() extension stripping
- Tests cover all vi.mock variants and edge cases
## Files Changed
- `django/aiservice/core/languages/js_ts/testgen.py` (fix)
- `django/aiservice/tests/testgen/test_testgen_javascript.py` (tests)
---------
Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
## Summary
Fixes 500 Internal Server Error when replaying test generation with
`--rerun` flag and database arrays contain `None`/`NULL` values.
## Root Cause
The `rerun_testgen()` function in `core/shared/replay.py` accessed array
elements without checking if they were `None`. When PostgreSQL arrays
contained `NULL` values (e.g., `generated_test = [NULL, 'test2']`), the
function returned a `TestGenResponseSchema` with `None` values, causing
Pydantic validation to fail:
```
pydantic_core._pydantic_core.ValidationError: 2 validation errors for TestGenResponseSchema
generated_tests
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
instrumented_behavior_tests
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
```
## Changes
Added explicit `None` checks before creating `TestGenResponseSchema`:
- If `generated_test[index]` or `instrumented_generated_test[index]` is
`None`, return `None` (skip this test)
- If `instrumented_perf_test[index]` is `None`, default to empty string
(non-critical field)
## Impact
Resolves **10+ replay failures** where test generation produced partial
results stored as `NULL` in database arrays.
## Test Coverage
Added comprehensive test suite for `replay.py`:
- `test_rerun_with_valid_test_data()` - Happy path
- `test_rerun_with_none_values_in_arrays()` - **Primary bug fix test**
- `test_rerun_with_index_out_of_bounds()` - Boundary conditions
- `test_rerun_with_empty_arrays()` - Empty data handling
- `test_rerun_with_none_arrays()` - NULL arrays
- `test_rerun_with_mismatched_array_lengths()` - Length mismatches
- `test_rerun_missing_perf_test()` - Missing perf data
All 7 tests pass.
## Trace IDs
This fix addresses errors seen in traces:
- Primary: `056561cc-94af-4d7b-ac79-85dfd4b7282d`
- And 9 additional trace IDs with the same "500 - Error generating
JavaScript tests" error
## Verification
Tested with original failing trace:
```bash
cd /workspace/target && codeflash --file src/daemon/constants.ts --function formatGatewayServiceDescription --rerun 056561cc-94af-4d7b-ac79-85dfd4b7282d
```
**Before fix:** `ERROR: 500 - Traceback... ValidationError: Input should
be a valid string [type=string_type, input_value=None]`
**After fix:** Gracefully skips None entries, no 500 error ✅
---------
Co-authored-by: Codeflash Bot <codeflash-bot@codeflash.ai>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- **Memory leak fix**: Added explicit `LOGGING` config in `settings.py`
to prevent unbounded `LogRecord` buffering. Django's `django.request`
logger creates WARNING records for 4xx responses with the full
`ASGIRequest` (headers, body, payload) pinned in `args`. Without
explicit config, Django's default handlers and Sentry's
`enable_logs=True` buffer these indefinitely. Setting `django.request`
to ERROR level + removing `enable_logs=True` eliminated the leak — load
testing showed **84% reduction** in per-request memory growth (7.4 → 1.2
KiB/req).
- **Async event loop fix**: Wrapped
`parse_and_generate_candidate_schema()` in `asyncio.to_thread()` across
all 4 async callers (optimizer, optimizer_line_profiler, jit_rewrite,
adaptive_optimizer). This offloads the synchronous libcst parsing +
8-stage postprocessing pipeline to the thread pool, preventing it from
blocking the event loop during peak traffic.
## Test plan
- [x] All 550 tests pass (`uv run pytest tests/ --ignore=tests/profiling
-x -q`)
- [ ] Monitor Azure memory alerts after deploy — expect significant
reduction in memory growth rate
- [ ] Monitor 5xx error rate during peak traffic — expect reduction from
event loop no longer blocked by sync postprocessing
---------
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- **`thread_sensitive=False`** on `sync_to_async` so concurrent
`log_features` calls get their own threads instead of serializing
through one (was `True`, causing a bottleneck)
- **Raised DB pool `max_size` from 10 to 100** — prod Postgres allows
859 connections, giving plenty of headroom
- **Added `safe_log_features` wrapper** that catches errors via Sentry
instead of propagating — used at all 9 TaskGroup and bare-await call
sites so a logging failure can't crash an otherwise successful
optimization endpoint
- **Kept `transaction.atomic` + `select_for_update`** for correctness
(Django doesn't support async transactions yet, and removing these
causes lost-update races on dict-merge fields)
## Root cause
`log_features` uses `@sync_to_async` + `@transaction.atomic` because
Django lacks async transaction support. The previous fix for pool
exhaustion changed `thread_sensitive=False` to `True`, which serialized
all calls through a single thread — fixing pool exhaustion but creating
a throughput bottleneck that caused 500s under load. Additionally, 6
call sites used `asyncio.TaskGroup` where any `log_features` exception
would propagate and crash the entire endpoint.
## Test plan
- [x] `tests/log_features/test_log_features_concurrency.py` — verifies
`thread_sensitive=False` and `safe_log_features` is async
- [x] `ruff check` passes on all changed files
- [ ] Deploy to staging and verify no 500s under concurrent optimization
requests
The optimization hoisted the 70-element `reserved_words` set out of `_is_valid_js_identifier` into a module-level `frozenset`, eliminating 1677 repeated set constructions that consumed 1.79 ms per profiler (42% of that function's time). More significantly, `_detect_export_style` previously compiled six regex patterns on every invocation via f-string interpolation with `escaped_id`; the optimized version pre-compiles generic patterns once at module load and uses `finditer` plus manual identifier comparison, cutting the function's runtime from 3.17 s to 14.7 ms across 1146 calls—a 99.5% reduction that accounts for nearly all of the 10× speedup. Test annotations confirm the largest gains occur in the `test_large_scale_many_class_methods_with_alternating_export_styles` case (107 ms → 4.66 ms), where repeated export detection dominated.