codeflash-internal/.codeflash/learnings.md

# Cross-Session Learnings

## Personal Account Queries Use Bitmap OR Merge

Dashboard statistics and PR data queries for personal accounts (users without an organization) originally used a 3-way OR condition: `WHERE userId = $1 OR orgMember.userId = $1 OR orgAdmin.userId = $1`. PostgreSQL optimized this with a bitmap OR merge scan across multiple indexes, which is less efficient than individual index-backed scans.

**Solution:** Rewrite as UNION queries where each branch uses its own index-backed scan:
```sql
WITH filtered AS (
  -- Branch 1: personal repos
  SELECT id FROM repositories WHERE userId = $1
  UNION
  -- Branch 2: org member repos
  SELECT r.id FROM repositories r JOIN org_members om ON ... WHERE om.userId = $1
  UNION
  -- Branch 3: org admin repos
  SELECT r.id FROM repositories r JOIN org_admins oa ON ... WHERE oa.userId = $1
)
SELECT * FROM repositories WHERE id IN (SELECT id FROM filtered)
```

Each UNION branch hits a specific index cleanly instead of merging bitmaps.

## findFirst with Composite Index Lookup

Many Prisma queries used `findFirst` with a composite unique key (e.g., `{organizationId, userId}`) that could be replaced with `findUnique` for guaranteed single-row index seek.

**Evidence:** `members/action.ts`, `repositories/action.ts`, `members/data.ts`, `privacy-actions.ts` all had patterns like:
```ts
const member = await prisma.organization_members.findFirst({
  where: { organizationId, userId }
})
```

When the schema has a unique constraint `@@unique([organizationId, userId])`, use:
```ts
const member = await prisma.organization_members.findUnique({
  where: { organizationId_userId: { organizationId, userId } }
})
```

This guarantees Prisma uses the unique index for a single-row seek instead of a table scan with LIMIT 1.

## Permission Checks Load All Members

Several functions loaded all organization members into arrays, then used `Array.some()` or `Array.find()` for permission checks:
```ts
const members = await prisma.organizations.findFirst({...}).members
return members.some(m => m.userId === userId)
```

This fetches all N members (O(N) DB transfer), then scans the array (O(N) CPU).

**Solution:** Use indexed Prisma query that exits early:
```ts
const member = await prisma.organization_members.findUnique({
  where: { organizationId_userId: { organizationId, userId } }
})
return member !== null
```

This is O(1) DB query with early exit. For multiple permission checks in parallel, use `Promise.all` with individual indexed queries instead of loading all members once.

## Select Narrowing Skips Large Columns

Many Prisma queries fetched all columns when only a few were consumed in the UI or API response. This is especially wasteful for:
- Large JSON columns: `messages`, `parsed_response`, `context`, `experiment_metadata`, `optimizations_raw`
- Text columns: `stack_trace`
- Unused metadata: `github_org_id`, `auto_add_github_members`, `retry_count`, `python_version`, `is_async`, etc.

**Solution:** Add explicit `select` clause listing only consumed fields:
```ts
const call = await prisma.llm_calls.findUnique({
  where: { id },
  select: {
    id: true, model: true, status: true, // ... only fields used in page
    // Omit: messages, parsed_response, context (large JSON)
  }
})
```

**Evidence:** `llm-call/[id]/page.tsx` reduced from fetching all 30 llm_calls columns to 22 (skipped 3 large JSON blobs + metadata). `get-trace-data.ts` reduced optimization_features from 30+ columns to 12 consumed fields.

## CTE Phase 1: LIMIT Before JOIN

When paginating a query that joins large tables, restructure the CTE to identify the page of IDs first (with LIMIT), then JOIN only those IDs in phase 2.

**Before (inefficient):**
```sql
WITH data AS (
  SELECT e.id, e.created_at, f.*, r.*
  FROM optimization_events e
  LEFT JOIN optimization_features f ON ...
  LEFT JOIN repositories r ON ...
  WHERE <filters>
  ORDER BY e.created_at DESC
  LIMIT 10
)
SELECT * FROM data
```

This creates a large intermediate JOIN set before applying LIMIT.

**After (efficient):**
```sql
WITH page_ids AS (
  SELECT e.id
  FROM optimization_events e
  WHERE EXISTS (SELECT 1 FROM optimization_features f WHERE f.optimization_event_id = e.id)
    AND <filters>
  ORDER BY e.created_at DESC
  LIMIT 10
),
data AS (
  SELECT e.id, e.created_at, f.*, r.*
  FROM optimization_events e
  JOIN page_ids p ON e.id = p.id
  LEFT JOIN optimization_features f ON ...
  LEFT JOIN repositories r ON ...
)
SELECT * FROM data
```

Phase 1 uses EXISTS (index-only check, no full JOIN) to identify ~10 event IDs. Phase 2 joins only those 10 IDs with the large tables.

**Evidence:** `getOptimizationPRs` in `dashboard/action.ts` — both org and personal account paths now use this two-phase CTE structure.

## EXISTS Subqueries vs LEFT JOIN for Filtering

When filtering rows based on the existence of related data, using `LEFT JOIN` with a boolean check is often faster than `EXISTS` subqueries, especially when the subquery would be evaluated row-by-row for many candidate rows.

**Before (slow):**
```sql
SELECT id FROM candidates c
WHERE c.field IS NOT NULL
  OR EXISTS (
    SELECT 1 FROM related_table r
    WHERE r.key = c.key AND r.field IS NOT NULL
  )
```

This evaluates the EXISTS subquery once per row in candidates. If there are 10,000 candidates, that's 10,000 subquery executions.

**After (fast):**
```sql
SELECT c.id, r.field IS NOT NULL AS has_related_field
FROM candidates c
LEFT JOIN related_table r ON c.key = r.key
WHERE c.field IS NOT NULL OR r.field IS NOT NULL
```

The LEFT JOIN is evaluated once with a hash join or index seek, then the filter is applied. Much more efficient for large candidate sets.

**Evidence:** `getOptimizationPRs` in `dashboard/action.ts` — replaced EXISTS checks for `optimization_features.pull_request` with LEFT JOIN in both count and data queries, for both org and personal account paths. Expected 921ms + 1435ms → <800ms combined.

## Pre-existing Failures Masked by Test Runner

Found 3 test failures and 5 type errors that were pre-existing but not caught in previous sessions:
- Missing `_count` field in `getRepositoryById` test mock (test runner didn't fail until accessed)
- Missing `id` and `created_at` in optimization_errors select clause (TypeScript TS2339 errors when accessed in UI)

**Lesson:** Always run full test suite AND type check (`tsc --noEmit`) after each optimization session, even if individual experiments passed their guard checks.
perf: fix CI build + lazy-load heavy libs + parallelize DB queries (#2601) ## Summary - Fix CI build failure: Auth0Client crashes during Next.js prerendering when env vars aren't set. Returns a no-op stub (`getSession → null`) when domain is missing — semantically correct for static generation - Lazy-load markdown libs (~260kb): ReactMarkdown, remarkGfm, and react-syntax-highlighter were eagerly imported in monaco-diff-viewer but only rendered when user expands "Generated Tests". Extracted into a dynamic component - Parallelize repo detail query: `getRepositoryById` ran the activity count sequentially after the repo lookup. Since `repoId` is already available, all three queries now run in parallel ## Test plan - [ ] CI `build` check passes (was failing since #2598) - [ ] Trace page still renders generated tests correctly when expanded - [ ] Repository detail page loads correctly with activity status 2026-04-13 16:03:05 +00:00			`# Cross-Session Learnings`

			`## Personal Account Queries Use Bitmap OR Merge`

			Dashboard statistics and PR data queries for personal accounts (users without an organization) originally used a 3-way OR condition: `WHERE userId = $1 OR orgMember.userId = $1 OR orgAdmin.userId = $1`. PostgreSQL optimized this with a bitmap OR merge scan across multiple indexes, which is less efficient than individual index-backed scans.

			`Solution: Rewrite as UNION queries where each branch uses its own index-backed scan:`
			```sql
			`WITH filtered AS (`
			`-- Branch 1: personal repos`
			`SELECT id FROM repositories WHERE userId = $1`
			`UNION`
			`-- Branch 2: org member repos`
			`SELECT r.id FROM repositories r JOIN org_members om ON ... WHERE om.userId = $1`
			`UNION`
			`-- Branch 3: org admin repos`
			`SELECT r.id FROM repositories r JOIN org_admins oa ON ... WHERE oa.userId = $1`
			`)`
			`SELECT * FROM repositories WHERE id IN (SELECT id FROM filtered)`
			```

			`Each UNION branch hits a specific index cleanly instead of merging bitmaps.`

			`## findFirst with Composite Index Lookup`

			Many Prisma queries used `findFirst` with a composite unique key (e.g., `{organizationId, userId}`) that could be replaced with `findUnique` for guaranteed single-row index seek.

			Evidence: `members/action.ts`, `repositories/action.ts`, `members/data.ts`, `privacy-actions.ts` all had patterns like:
			```ts
			`const member = await prisma.organization_members.findFirst({`
			`where: { organizationId, userId }`
			`})`
			```

			When the schema has a unique constraint `@@unique([organizationId, userId])`, use:
			```ts
			`const member = await prisma.organization_members.findUnique({`
			`where: { organizationId_userId: { organizationId, userId } }`
			`})`
			```

			`This guarantees Prisma uses the unique index for a single-row seek instead of a table scan with LIMIT 1.`

			`## Permission Checks Load All Members`

			Several functions loaded all organization members into arrays, then used `Array.some()` or `Array.find()` for permission checks:
			```ts
			`const members = await prisma.organizations.findFirst({...}).members`
			`return members.some(m => m.userId === userId)`
			```

			`This fetches all N members (O(N) DB transfer), then scans the array (O(N) CPU).`

			`Solution: Use indexed Prisma query that exits early:`
			```ts
			`const member = await prisma.organization_members.findUnique({`
			`where: { organizationId_userId: { organizationId, userId } }`
			`})`
			`return member !== null`
			```

			This is O(1) DB query with early exit. For multiple permission checks in parallel, use `Promise.all` with individual indexed queries instead of loading all members once.

			`## Select Narrowing Skips Large Columns`

			`Many Prisma queries fetched all columns when only a few were consumed in the UI or API response. This is especially wasteful for:`
			- Large JSON columns: `messages`, `parsed_response`, `context`, `experiment_metadata`, `optimizations_raw`
			- Text columns: `stack_trace`
			- Unused metadata: `github_org_id`, `auto_add_github_members`, `retry_count`, `python_version`, `is_async`, etc.

			Solution: Add explicit `select` clause listing only consumed fields:
			```ts
			`const call = await prisma.llm_calls.findUnique({`
			`where: { id },`
			`select: {`
			`id: true, model: true, status: true, // ... only fields used in page`
			`// Omit: messages, parsed_response, context (large JSON)`
			`}`
			`})`
			```

			Evidence: `llm-call/[id]/page.tsx` reduced from fetching all 30 llm_calls columns to 22 (skipped 3 large JSON blobs + metadata). `get-trace-data.ts` reduced optimization_features from 30+ columns to 12 consumed fields.

			`## CTE Phase 1: LIMIT Before JOIN`

			`When paginating a query that joins large tables, restructure the CTE to identify the page of IDs first (with LIMIT), then JOIN only those IDs in phase 2.`

			`Before (inefficient):`
			```sql
			`WITH data AS (`
			`SELECT e.id, e.created_at, f., r.`
			`FROM optimization_events e`
			`LEFT JOIN optimization_features f ON ...`
			`LEFT JOIN repositories r ON ...`
			`WHERE <filters>`
			`ORDER BY e.created_at DESC`
			`LIMIT 10`
			`)`
			`SELECT * FROM data`
			```

			`This creates a large intermediate JOIN set before applying LIMIT.`

			`After (efficient):`
			```sql
			`WITH page_ids AS (`
			`SELECT e.id`
			`FROM optimization_events e`
			`WHERE EXISTS (SELECT 1 FROM optimization_features f WHERE f.optimization_event_id = e.id)`
			`AND <filters>`
			`ORDER BY e.created_at DESC`
			`LIMIT 10`
			`),`
			`data AS (`
			`SELECT e.id, e.created_at, f., r.`
			`FROM optimization_events e`
			`JOIN page_ids p ON e.id = p.id`
			`LEFT JOIN optimization_features f ON ...`
			`LEFT JOIN repositories r ON ...`
			`)`
			`SELECT * FROM data`
			```

			`Phase 1 uses EXISTS (index-only check, no full JOIN) to identify ~10 event IDs. Phase 2 joins only those 10 IDs with the large tables.`

			Evidence: `getOptimizationPRs` in `dashboard/action.ts` — both org and personal account paths now use this two-phase CTE structure.

			`## EXISTS Subqueries vs LEFT JOIN for Filtering`

			When filtering rows based on the existence of related data, using `LEFT JOIN` with a boolean check is often faster than `EXISTS` subqueries, especially when the subquery would be evaluated row-by-row for many candidate rows.

			`Before (slow):`
			```sql
			`SELECT id FROM candidates c`
			`WHERE c.field IS NOT NULL`
			`OR EXISTS (`
			`SELECT 1 FROM related_table r`
			`WHERE r.key = c.key AND r.field IS NOT NULL`
			`)`
			```

			`This evaluates the EXISTS subquery once per row in candidates. If there are 10,000 candidates, that's 10,000 subquery executions.`

			`After (fast):`
			```sql
			`SELECT c.id, r.field IS NOT NULL AS has_related_field`
			`FROM candidates c`
			`LEFT JOIN related_table r ON c.key = r.key`
			`WHERE c.field IS NOT NULL OR r.field IS NOT NULL`
			```

			`The LEFT JOIN is evaluated once with a hash join or index seek, then the filter is applied. Much more efficient for large candidate sets.`

			Evidence: `getOptimizationPRs` in `dashboard/action.ts` — replaced EXISTS checks for `optimization_features.pull_request` with LEFT JOIN in both count and data queries, for both org and personal account paths. Expected 921ms + 1435ms → <800ms combined.

			`## Pre-existing Failures Masked by Test Runner`

			`Found 3 test failures and 5 type errors that were pre-existing but not caught in previous sessions:`
			- Missing `_count` field in `getRepositoryById` test mock (test runner didn't fail until accessed)
			- Missing `id` and `created_at` in optimization_errors select clause (TypeScript TS2339 errors when accessed in UI)

			Lesson: Always run full test suite AND type check (`tsc --noEmit`) after each optimization session, even if individual experiments passed their guard checks.