mirror of
https://github.com/codeflash-ai/codeflash-internal.git
synced 2026-05-04 18:25:18 +00:00
## Summary - **Fix CI build failure**: Auth0Client crashes during Next.js prerendering when env vars aren't set. Returns a no-op stub (`getSession → null`) when domain is missing — semantically correct for static generation - **Lazy-load markdown libs (~260kb)**: ReactMarkdown, remarkGfm, and react-syntax-highlighter were eagerly imported in monaco-diff-viewer but only rendered when user expands "Generated Tests". Extracted into a dynamic component - **Parallelize repo detail query**: `getRepositoryById` ran the activity count sequentially after the repo lookup. Since `repoId` is already available, all three queries now run in parallel ## Test plan - [ ] CI `build` check passes (was failing since #2598) - [ ] Trace page still renders generated tests correctly when expanded - [ ] Repository detail page loads correctly with activity status
162 lines
6.3 KiB
Markdown
162 lines
6.3 KiB
Markdown
# Cross-Session Learnings
|
|
|
|
## Personal Account Queries Use Bitmap OR Merge
|
|
|
|
Dashboard statistics and PR data queries for personal accounts (users without an organization) originally used a 3-way OR condition: `WHERE userId = $1 OR orgMember.userId = $1 OR orgAdmin.userId = $1`. PostgreSQL optimized this with a bitmap OR merge scan across multiple indexes, which is less efficient than individual index-backed scans.
|
|
|
|
**Solution:** Rewrite as UNION queries where each branch uses its own index-backed scan:
|
|
```sql
|
|
WITH filtered AS (
|
|
-- Branch 1: personal repos
|
|
SELECT id FROM repositories WHERE userId = $1
|
|
UNION
|
|
-- Branch 2: org member repos
|
|
SELECT r.id FROM repositories r JOIN org_members om ON ... WHERE om.userId = $1
|
|
UNION
|
|
-- Branch 3: org admin repos
|
|
SELECT r.id FROM repositories r JOIN org_admins oa ON ... WHERE oa.userId = $1
|
|
)
|
|
SELECT * FROM repositories WHERE id IN (SELECT id FROM filtered)
|
|
```
|
|
|
|
Each UNION branch hits a specific index cleanly instead of merging bitmaps.
|
|
|
|
## findFirst with Composite Index Lookup
|
|
|
|
Many Prisma queries used `findFirst` with a composite unique key (e.g., `{organizationId, userId}`) that could be replaced with `findUnique` for guaranteed single-row index seek.
|
|
|
|
**Evidence:** `members/action.ts`, `repositories/action.ts`, `members/data.ts`, `privacy-actions.ts` all had patterns like:
|
|
```ts
|
|
const member = await prisma.organization_members.findFirst({
|
|
where: { organizationId, userId }
|
|
})
|
|
```
|
|
|
|
When the schema has a unique constraint `@@unique([organizationId, userId])`, use:
|
|
```ts
|
|
const member = await prisma.organization_members.findUnique({
|
|
where: { organizationId_userId: { organizationId, userId } }
|
|
})
|
|
```
|
|
|
|
This guarantees Prisma uses the unique index for a single-row seek instead of a table scan with LIMIT 1.
|
|
|
|
## Permission Checks Load All Members
|
|
|
|
Several functions loaded all organization members into arrays, then used `Array.some()` or `Array.find()` for permission checks:
|
|
```ts
|
|
const members = await prisma.organizations.findFirst({...}).members
|
|
return members.some(m => m.userId === userId)
|
|
```
|
|
|
|
This fetches all N members (O(N) DB transfer), then scans the array (O(N) CPU).
|
|
|
|
**Solution:** Use indexed Prisma query that exits early:
|
|
```ts
|
|
const member = await prisma.organization_members.findUnique({
|
|
where: { organizationId_userId: { organizationId, userId } }
|
|
})
|
|
return member !== null
|
|
```
|
|
|
|
This is O(1) DB query with early exit. For multiple permission checks in parallel, use `Promise.all` with individual indexed queries instead of loading all members once.
|
|
|
|
## Select Narrowing Skips Large Columns
|
|
|
|
Many Prisma queries fetched all columns when only a few were consumed in the UI or API response. This is especially wasteful for:
|
|
- Large JSON columns: `messages`, `parsed_response`, `context`, `experiment_metadata`, `optimizations_raw`
|
|
- Text columns: `stack_trace`
|
|
- Unused metadata: `github_org_id`, `auto_add_github_members`, `retry_count`, `python_version`, `is_async`, etc.
|
|
|
|
**Solution:** Add explicit `select` clause listing only consumed fields:
|
|
```ts
|
|
const call = await prisma.llm_calls.findUnique({
|
|
where: { id },
|
|
select: {
|
|
id: true, model: true, status: true, // ... only fields used in page
|
|
// Omit: messages, parsed_response, context (large JSON)
|
|
}
|
|
})
|
|
```
|
|
|
|
**Evidence:** `llm-call/[id]/page.tsx` reduced from fetching all 30 llm_calls columns to 22 (skipped 3 large JSON blobs + metadata). `get-trace-data.ts` reduced optimization_features from 30+ columns to 12 consumed fields.
|
|
|
|
## CTE Phase 1: LIMIT Before JOIN
|
|
|
|
When paginating a query that joins large tables, restructure the CTE to identify the page of IDs first (with LIMIT), then JOIN only those IDs in phase 2.
|
|
|
|
**Before (inefficient):**
|
|
```sql
|
|
WITH data AS (
|
|
SELECT e.id, e.created_at, f.*, r.*
|
|
FROM optimization_events e
|
|
LEFT JOIN optimization_features f ON ...
|
|
LEFT JOIN repositories r ON ...
|
|
WHERE <filters>
|
|
ORDER BY e.created_at DESC
|
|
LIMIT 10
|
|
)
|
|
SELECT * FROM data
|
|
```
|
|
|
|
This creates a large intermediate JOIN set before applying LIMIT.
|
|
|
|
**After (efficient):**
|
|
```sql
|
|
WITH page_ids AS (
|
|
SELECT e.id
|
|
FROM optimization_events e
|
|
WHERE EXISTS (SELECT 1 FROM optimization_features f WHERE f.optimization_event_id = e.id)
|
|
AND <filters>
|
|
ORDER BY e.created_at DESC
|
|
LIMIT 10
|
|
),
|
|
data AS (
|
|
SELECT e.id, e.created_at, f.*, r.*
|
|
FROM optimization_events e
|
|
JOIN page_ids p ON e.id = p.id
|
|
LEFT JOIN optimization_features f ON ...
|
|
LEFT JOIN repositories r ON ...
|
|
)
|
|
SELECT * FROM data
|
|
```
|
|
|
|
Phase 1 uses EXISTS (index-only check, no full JOIN) to identify ~10 event IDs. Phase 2 joins only those 10 IDs with the large tables.
|
|
|
|
**Evidence:** `getOptimizationPRs` in `dashboard/action.ts` — both org and personal account paths now use this two-phase CTE structure.
|
|
|
|
## EXISTS Subqueries vs LEFT JOIN for Filtering
|
|
|
|
When filtering rows based on the existence of related data, using `LEFT JOIN` with a boolean check is often faster than `EXISTS` subqueries, especially when the subquery would be evaluated row-by-row for many candidate rows.
|
|
|
|
**Before (slow):**
|
|
```sql
|
|
SELECT id FROM candidates c
|
|
WHERE c.field IS NOT NULL
|
|
OR EXISTS (
|
|
SELECT 1 FROM related_table r
|
|
WHERE r.key = c.key AND r.field IS NOT NULL
|
|
)
|
|
```
|
|
|
|
This evaluates the EXISTS subquery once per row in candidates. If there are 10,000 candidates, that's 10,000 subquery executions.
|
|
|
|
**After (fast):**
|
|
```sql
|
|
SELECT c.id, r.field IS NOT NULL AS has_related_field
|
|
FROM candidates c
|
|
LEFT JOIN related_table r ON c.key = r.key
|
|
WHERE c.field IS NOT NULL OR r.field IS NOT NULL
|
|
```
|
|
|
|
The LEFT JOIN is evaluated once with a hash join or index seek, then the filter is applied. Much more efficient for large candidate sets.
|
|
|
|
**Evidence:** `getOptimizationPRs` in `dashboard/action.ts` — replaced EXISTS checks for `optimization_features.pull_request` with LEFT JOIN in both count and data queries, for both org and personal account paths. Expected 921ms + 1435ms → <800ms combined.
|
|
|
|
## Pre-existing Failures Masked by Test Runner
|
|
|
|
Found 3 test failures and 5 type errors that were pre-existing but not caught in previous sessions:
|
|
- Missing `_count` field in `getRepositoryById` test mock (test runner didn't fail until accessed)
|
|
- Missing `id` and `created_at` in optimization_errors select clause (TypeScript TS2339 errors when accessed in UI)
|
|
|
|
**Lesson:** Always run full test suite AND type check (`tsc --noEmit`) after each optimization session, even if individual experiments passed their guard checks.
|