Merge pull request #2424 from codeflash-ai/tessl-setup

chore: add tessl tiles, claude skills, and local settings
This commit is contained in:
Kevin Turcios 2026-02-15 00:05:28 -05:00 committed by GitHub
commit e4050920d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
29 changed files with 1275 additions and 0 deletions

View file

@ -0,0 +1 @@
../../.tessl/tiles/codeflash/codeflash-internal-skills/skills/add-api-endpoint

View file

@ -0,0 +1 @@
../../.tessl/tiles/codeflash/codeflash-internal-skills/skills/add-language-support

View file

@ -0,0 +1 @@
../../.tessl/tiles/codeflash/codeflash-internal-skills/skills/debug-optimization-failure

View file

@ -0,0 +1 @@
../../.tessl/tiles/codeflash/codeflash-internal-skills/skills/debug-test-generation

31
.tessl/RULES.md Normal file
View file

@ -0,0 +1,31 @@
# Agent Rules
This file is updated when running `tessl install`. If a linked file is missing, make sure to run the command to download any missing tiles from the registry.
## codeflash/codeflash-internal-rules — code-style
@tiles/codeflash/codeflash-internal-rules/rules/code-style.md [code-style](tiles/codeflash/codeflash-internal-rules/rules/code-style.md)
## codeflash/codeflash-internal-rules — architecture
@tiles/codeflash/codeflash-internal-rules/rules/architecture.md [architecture](tiles/codeflash/codeflash-internal-rules/rules/architecture.md)
## codeflash/codeflash-internal-rules — optimization-patterns
@tiles/codeflash/codeflash-internal-rules/rules/optimization-patterns.md [optimization-patterns](tiles/codeflash/codeflash-internal-rules/rules/optimization-patterns.md)
## codeflash/codeflash-internal-rules — git-conventions
@tiles/codeflash/codeflash-internal-rules/rules/git-conventions.md [git-conventions](tiles/codeflash/codeflash-internal-rules/rules/git-conventions.md)
## codeflash/codeflash-internal-rules — testing-rules
@tiles/codeflash/codeflash-internal-rules/rules/testing-rules.md [testing-rules](tiles/codeflash/codeflash-internal-rules/rules/testing-rules.md)
## codeflash/codeflash-internal-rules — multi-language-handlers
@tiles/codeflash/codeflash-internal-rules/rules/multi-language-handlers.md [multi-language-handlers](tiles/codeflash/codeflash-internal-rules/rules/multi-language-handlers.md)
## tessl/cli-setup — query_library_docs
@tiles/tessl/cli-setup/steering/query_library_docs.md [query_library_docs](tiles/tessl/cli-setup/steering/query_library_docs.md)

View file

@ -0,0 +1,29 @@
# AIService Endpoints
All Django-Ninja API endpoints registered in `aiservice/urls.py`.
## Endpoint Map
| Path | API | Module | Description |
|------|-----|--------|-------------|
| `/ai/optimize` | `optimize_api` | `core.shared.optimizer_router` | Main optimization — dispatches by language |
| `/ai/optimize-line-profiler` | `optimize_line_profiler_api` | `core.languages.python.optimizer.optimizer_line_profiler` | Line profiler optimization |
| `/ai/testgen` | `testgen_api` | `core.shared.testgen_router` | Test generation — dispatches by language |
| `/ai/log_features` | `features_api` | `core.log_features.log_features` | Feature logging |
| `/ai/refinement` | `refinement_api` | `core.languages.python.optimizer.refinement` | Candidate refinement |
| `/ai/explain` | `explanations_api` | `core.languages.python.explanations.explanations` | Code explanations |
| `/ai/rank` | `ranker_api` | `core.shared.ranker.ranker` | Function ranking |
| `/ai/optimization_review` | `optimization_review_api` | `core.languages.python.optimization_review.optimization_review` | Optimization review |
| `/ai/code_repair` | `code_repair_api` | `core.languages.python.code_repair.code_repair` | Code repair for failed candidates |
| `/ai/adaptive_optimize` | `adaptive_optimize_api` | `core.languages.python.adaptive_optimizer.adaptive_optimizer` | Adaptive optimization |
| `/ai/workflow-gen` | `workflow_gen_api` | `core.shared.workflow_gen.workflow_gen` | Workflow generation |
| `/ai/rewrite_jit` | `jit_rewrite_api` | `core.languages.python.jit_rewrite.jit_rewrite` | JIT rewrite |
## Common Patterns
- All endpoints are `async def`
- Authentication via `AuthenticatedRequest` (HttpBearer + django_auth)
- Request schemas use `ninja.Schema` (Pydantic under the hood)
- Response schemas define typed success and error responses: `response={200: SuccessSchema, 400: ErrorSchema, 500: ErrorSchema}`
- Multi-language endpoints (optimize, testgen) dispatch by `data.language` field
- Python-only endpoints import directly from `core.languages.python.*`

View file

@ -0,0 +1,58 @@
# CF-API Endpoints
Express routes in `js/cf-api/routes/`. The cf-api acts as middleware between clients (VSC-Extension, CLI) and the aiservice backend.
## Route Registration Order (`routes/index.ts`)
Registration order matters — webhook routes must be before the body parser:
1. **Webhook routes** — before `express.json()` (raw body for signature verification)
2. **Body parser**`express.json({ limit: JSON_BODY_LIMIT })`
3. **Public routes** — no authentication required
4. **Protected routes** — require API key (`checkForValidAPIKey` middleware)
## Route Files
### `webhook.routes.ts`
- `POST /github/webhooks` — GitHub App webhook handler (Octokit signature verification)
- `POST /stripe/webhooks` — Stripe webhook handler
- Both need raw body access (before JSON parser)
### `optimization.routes.ts`
Protected optimization endpoints:
- `POST /suggest-pr-changes` — suggest PR changes
- `POST /create-pr` — create optimization PR
- `POST /verify-existing-optimizations` — check existing optimizations
- `POST /is-already-optimized` — check if code was already optimized
- `POST /add-code-hash` — add optimized code context hash
- `POST /mark-as-success` — mark optimization as successful
- `POST /create-staging` — create staging review
- `POST /get-staging-code` — get staged code
- `POST /commit-staging-code` — commit staged code
- `POST /test-repo` — add repository manually
### `github.routes.ts`
GitHub-related endpoints for repository management.
### `subscription.routes.ts`
Subscription management endpoints.
### `user.routes.ts`
User management endpoints.
### `public.routes.ts`
Public endpoints (no authentication): health checks, version info.
## Middleware Stack
- `checkForValidAPIKey` — API key authentication
- `trackEndpointCalls` — PostHog endpoint tracking
- `idLimiter` — rate limiting
- `logAuthEvent` / `logRequestBody` — enhanced logging (dev only)
- `trackUsage` — usage tracking for optimization endpoints

View file

@ -0,0 +1,49 @@
# Configuration & Thresholds
Key constants and configuration values used across the optimization pipeline.
## Model Distribution (`core/shared/optimizer_config.py`)
- `MAX_OPTIMIZER_CALLS = 6` — maximum parallel optimization calls
- `MAX_OPTIMIZER_LP_CALLS = 7` — maximum line profiler optimization calls
- Distribution formula: `claude_calls = (total - 1) // 2`, `gpt_calls = total - claude_calls`
- Example: 6 total → 4 OpenAI + 2 Anthropic
## Default Request Values
- `OptimizeSchema.n_candidates = 5` — default optimization candidates
- `OptimizeSchemaLP.n_candidates = 6` — default line profiler candidates
- `OptimizeSchema.language = "python"` — default language
## LLM Model Costs (USD per 1M tokens)
| Model | Input | Cached Input | Output |
|-------|-------|-------------|--------|
| GPT-4.1 | $2.00 | $0.50 | $8.00 |
| GPT-5-mini | $0.25 | $0.03 | $2.00 |
| Claude Sonnet 4.5 | $3.00 | — | $15.00 |
| Claude Haiku 4.5 | $1.00 | — | $5.00 |
## Model Assignments
| Purpose | Primary | Fallback |
|---------|---------|----------|
| Optimization | GPT-5-mini | Claude Sonnet 4.5 |
| Explanation | GPT-5-mini | Claude Sonnet 4.5 |
| Planning | GPT-5-mini | Claude Sonnet 4.5 |
| Execution | GPT-5-mini | Claude Sonnet 4.5 |
| Ranking | GPT-5-mini | Claude Sonnet 4.5 |
| Refinement | Claude Sonnet 4.5 | GPT-5-mini |
| Code Repair | Claude Sonnet 4.5 | GPT-5-mini |
| Explanations | Claude Sonnet 4.5 | GPT-5-mini |
| Optimization Review | Claude Sonnet 4.5 | GPT-5-mini |
| Adaptive Optimize | Claude Sonnet 4.5 | GPT-5-mini |
| Cost-effective (testgen) | Claude Haiku 4.5 | GPT-5-mini |
## Ruff Configuration (`pyproject.toml`)
- `line-length = 120`
- `select = ["ALL"]` with specific ignores (see `tool.ruff.lint.ignore`)
- `fix = true`, `show-fixes = true`
- Runtime-evaluated base classes: `pydantic.BaseModel`, `ninja.Schema`, `typing.TypedDict`
- Excluded paths: `core/languages/python/testgen/tests`, `core/languages/python/testgen/sqlalchemy`

View file

@ -0,0 +1,49 @@
# Context Extraction
How code context is extracted and prepared for LLM optimization prompts.
## Context Types
### Single-File Context (`SingleOptimizerContext`)
Used when the function to optimize lives in a single file:
- Extracts the function source code
- Collects helper functions and class definitions
- Formats as system prompt + user prompt
### Multi-File Context (`MultiOptimizerContext`)
Used when the function spans or depends on multiple files:
- Collects code from multiple source files
- Manages file-path-annotated code blocks
## `BaseOptimizerContext` (`optimizer_context.py`)
Abstract base class for all context types:
### Factory Method
`get_dynamic_context()` — dispatches to `SingleOptimizerContext` or `MultiOptimizerContext` based on the input.
### Prompt Construction
- `get_system_prompt(python_version_str)` — builds system prompt with language version
- `get_user_prompt(dependency_code, line_profiler_results)` — builds user prompt with code and optional profiler data
### LLM Response Parsing
- `extract_code_and_explanation_from_llm_res(content)` — parses markdown code blocks from LLM output, extracts code and explanation text
- `parse_and_generate_candidate_schema()` — converts extracted code into `OptimizeResponseItemSchema`
- `is_valid_code()` — validates the extracted code is syntactically correct
## Code Formatting
LLM responses use markdown code blocks with file path annotations:
```
\`\`\`python:path/to/file.py
# optimized code here
\`\`\`
```
The context extraction system both generates this format (for prompts) and parses it (from responses).

View file

@ -0,0 +1,59 @@
# Domain Types
Core schemas and models used across the codeflash-internal backend.
## Request/Response Schemas (`core/shared/optimizer_models.py`)
### `OptimizedCandidateSource` enum
How a candidate was generated: `OPTIMIZE`, `OPTIMIZE_LP`, `REFINE`, `REPAIR`, `ADAPTIVE`, `JIT_REWRITE`.
### `OptimizeSchema` (ninja.Schema)
Main optimization request:
- `source_code: str` — code to optimize
- `dependency_code: str | None` — read-only dependency code
- `trace_id: str` — unique request identifier
- `language: str = "python"` — language identifier (python, javascript, typescript)
- `language_version: str | None` — e.g., "ES2022", "Node 20", or Python version
- `n_candidates: int = 5` — number of optimization candidates to generate
- `is_async: bool | None = False` — whether the function is async
- `python_version: str | None` — optional, for multi-language support
- `experiment_metadata`, `codeflash_version`, `current_username`, `repo_owner`, `repo_name` — tracking fields
### `OptimizeSchemaLP` (ninja.Schema)
Line profiler variant — same as `OptimizeSchema` plus:
- `line_profiler_results: str | None` — line profiler output
- `n_candidates: int = 6` — default higher for line profiler optimization
## Domain Models (`core/log_features/models.py`)
### `OptimizationFeatures` (Django Model)
Stores per-optimization data:
- `trace_id` (PK) — unique optimization identifier
- `original_code` — original source code
- `optimizations_raw` / `optimizations_post` — raw and postprocessed candidate code
- `speedup_ratio` — best speedup achieved
- `test_results` — test execution results
- Approval workflow: `approval_status`, `approved_by`, `approved_at`
### `OptimizationEvents` (Django Model)
Event logging for optimization lifecycle:
- `pr_created`, `pr_merged`, `pr_closed` — PR events
- `llm_cost` — total LLM API cost
- `speedup_x` / `speedup_pct` — performance improvement metrics
### `Repositories` (Django Model)
GitHub repository metadata:
- `installation_id` — GitHub App installation ID
- `is_private` — whether the repo is private
- `optimizations_limit` / `optimizations_used` — usage tracking
## Response Schemas (`core/shared/optimizer_schemas.py`)
- `OptimizeResponseSchema` — successful optimization response with candidates
- `OptimizeErrorResponseSchema` — error response with message

View file

@ -0,0 +1,21 @@
# Codeflash Internal Documentation
CodeFlash's AI-powered code optimization backend. The aiservice (Django-Ninja) receives optimization requests from cf-api, dispatches to language-specific handlers, calls LLMs, postprocesses results, and returns optimized candidates.
## Service Flow
```
VSC-Extension / CLI → cf-api (Express, :3001) → aiservice (Django-Ninja, :8000)
cf-webapp (:3000) reads from the same PostgreSQL DB via Prisma
```
## Documentation Pages
- [Domain Types](domain-types.md) — Core schemas and domain models
- [Optimization Pipeline](optimization-pipeline.md) — End-to-end optimization flow
- [Test Generation Pipeline](test-generation-pipeline.md) — Test generation flow
- [Context Extraction](context-extraction.md) — How code context is extracted for LLM prompts
- [AIService Endpoints](aiservice-endpoints.md) — All Django-Ninja endpoints
- [CF-API Endpoints](cf-api-endpoints.md) — All Express routes
- [Configuration & Thresholds](configuration-thresholds.md) — Model distribution, costs, constants
- [LLM Provider Abstraction](llm-provider-abstraction.md) — llm.py usage, client creation, cost tracking

View file

@ -0,0 +1,63 @@
# LLM Provider Abstraction
Unified LLM interface in `aiservice/llm.py` — all LLM calls go through this module.
## Model Definition (`LLM` dataclass)
```python
@pydantic_dataclass
class LLM:
name: str # deployment name (e.g., "gpt-5-mini")
max_tokens: int # max context window
model_type: Literal["openai", "anthropic", "google"]
input_cost: float # USD per 1M tokens
cached_input_cost: float # USD per 1M cached tokens
output_cost: float # USD per 1M tokens
```
Concrete models: `OpenAI_GPT_4_1`, `OpenAI_GPT_5_Mini`, `Anthropic_Claude_Sonnet_4_5`, `Anthropic_Claude_Haiku_4_5`.
## Client Setup
- `_create_openai_client()` — returns `AsyncAzureOpenAI` (reads `AZURE_OPENAI_*` env vars)
- `_create_anthropic_client()` — returns `AsyncAnthropicFoundry` (reads `ANTHROPIC_FOUNDRY_API_KEY` + `ANTHROPIC_FOUNDRY_BASE_URL`)
- `get_llm_client(model_type)` — creates a fresh client per request to avoid event loop issues with Django dev server
## Calling LLMs
```python
from aiservice.llm import call_llm, LLM
response: LLMResponse = await call_llm(
llm=model, # LLM instance
messages=messages, # OpenAI-format messages
call_type="optimization", # tracking label
trace_id=trace_id, # request identifier
max_tokens=16384, # max output tokens
user_id=user_id, # optional tracking
)
# response.content: str
# response.usage: LLMUsage(input_tokens, output_tokens)
# response.raw_response: ChatCompletion | AnthropicMessage
```
### Provider Handling
- **OpenAI (Azure)**: Uses `client.chat.completions.create()`. GPT-5-mini uses `max_completion_tokens`, older models use `max_tokens`
- **Anthropic (Foundry)**: Extracts system prompt from messages list, passes separately via `system=` kwarg. Concatenates text blocks from response
### Observability
- Every call is recorded to database via `record_llm_call()` in the `finally` block
- Includes: trace_id, call_type, model, messages, result, error, cost, latency
## Cost Calculation
`calculate_llm_cost(response, llm)` accounts for cached vs non-cached input tokens:
- **Anthropic**: `cache_read_input_tokens` and `cache_creation_input_tokens` are additive to `input_tokens`
- **OpenAI**: `cached_tokens` is a subset of `prompt_tokens`
## Response Types
- `LLMResponse` — wraps `content: str`, `usage: LLMUsage`, `raw_response`
- `LLMUsage``input_tokens: int`, `output_tokens: int`

View file

@ -0,0 +1,70 @@
# Optimization Pipeline
End-to-end flow from optimization request to response.
## 1. Request Entry (`core/shared/optimizer_router.py`)
The `optimize_api` NinjaAPI router receives a POST request with `OptimizeSchema`. It dispatches by `data.language`:
- `"javascript"` / `"typescript"``core.languages.js_ts.optimizer.optimize_javascript`
- `"java"``core.languages.java.optimizer.optimize_java`
- Default → `core.languages.python.optimizer.optimizer.optimize_python`
All imports are lazy (inside the function body) to avoid circular dependencies.
## 2. Python Optimization (`core/languages/python/optimizer/optimizer.py`)
### `optimize_python(request, data) → (status, response)`
Entry point for Python optimization. Extracts user ID from request, builds context, and calls `optimize_python_code()`.
### `optimize_python_code_single()`
Single LLM optimization call:
1. Builds system + user prompts from `BaseOptimizerContext`
2. Calls `call_llm()` with the optimize model
3. Parses response via `ctx.extract_code_and_explanation_from_llm_res()`
4. Validates via `ctx.parse_and_generate_candidate_schema()`
5. Returns `(OptimizeResponseItemSchema, llm_cost, model_name)` or `(None, cost, model)`
### `optimize_python_code()`
Parallel optimization with multiple models:
1. Gets model distribution via `get_model_distribution(n_candidates, MAX_OPTIMIZER_CALLS)`
2. Runs parallel calls using `asyncio.TaskGroup`
3. Each call gets a `call_sequence` number for tracking
4. Collects results, costs, and model info from all parallel calls
## 3. Context Extraction (`optimizer_context.py`)
- `BaseOptimizerContext.get_dynamic_context()` — factory dispatching to `SingleOptimizerContext` or `MultiOptimizerContext`
- Handles prompt construction with `get_system_prompt()` / `get_user_prompt()`
- `extract_code_and_explanation_from_llm_res()` — parses markdown code blocks from LLM output
- `parse_and_generate_candidate_schema()` — converts extracted code to response schema
## 4. Postprocessing (`postprocess.py`)
- `deduplicate_optimizations()` — AST-based dedup using `ast.parse()` + `ast.dump()`
- `equality_check()` — filters out candidates identical to original code
- Uses `libcst` for all code transformations
## 5. Model Distribution (`optimizer_config.py`)
- `MAX_OPTIMIZER_CALLS = 6`, `MAX_OPTIMIZER_LP_CALLS = 7`
- `get_model_distribution()` splits between OpenAI and Anthropic
- Formula: `claude_calls = (total - 1) // 2`, `gpt_calls = total - claude_calls`
- Returns `[(OPENAI_MODEL, gpt_calls), (ANTHROPIC_MODEL, claude_calls)]`
## 6. Response
Returns `(200, OptimizeResponseSchema)` with list of `OptimizeResponseItemSchema` candidates, or `(400/500, OptimizeErrorResponseSchema)` on failure.
## Key Files
| File | Role |
|------|------|
| `core/shared/optimizer_router.py` | Language dispatch |
| `core/languages/python/optimizer/optimizer.py` | Python optimization flow |
| `core/languages/python/optimizer/context_utils/optimizer_context.py` | Context & prompt management |
| `core/languages/python/optimizer/postprocess.py` | Dedup & validation |
| `core/shared/optimizer_config.py` | Model distribution |
| `core/shared/optimizer_models.py` | Request/response schemas |

View file

@ -0,0 +1,55 @@
# Test Generation Pipeline
Flow from test generation request to instrumented test output.
## 1. Request Entry (`core/shared/testgen_router.py`)
The `testgen_api` NinjaAPI router receives a POST request and dispatches by `data.language`:
- `"javascript"` / `"typescript"``core.languages.js_ts.testgen.generate_tests_javascript`
- `"java"``core.languages.java.testgen.generate_tests_java`
- Default → `core.languages.python.testgen.testgen.generate_tests_python`
## 2. Python Testgen (`core/languages/python/testgen/testgen.py`)
### `build_prompt()`
Jinja2-based prompt builder for test generation:
- Constructs system and user prompts from context
- Handles async/sync function variants
- Includes source code, dependency code, and test context
### `instrument_tests()`
Applies instrumentation to generated tests:
- Behavior instrumentation (captures return values, stdout)
- Performance instrumentation (timing loops)
### `LLMOutputParseError`
Custom exception for LLM response parsing failures — includes raw output for debugging.
## 3. Instrumentation (`testgen/instrumentation/instrument_new_tests.py`)
### Framework Detection
- `detect_frameworks_from_code()` — parses imports to identify ML frameworks (PyTorch, TensorFlow, JAX) and their aliases
- Used to add GPU sync calls in timing blocks
### Device Sync
- `_create_device_sync_precompute_statements()` — pre-computes device sync checks (CUDA, MPS) outside timing blocks
- Ensures accurate timing for GPU-accelerated code
## 4. Postprocessing (`testgen/postprocessing/`)
- Import management: `add_missing_imports.py` adds `from __future__ import annotations`
- Test validation and cleanup
## Key Files
| File | Role |
|------|------|
| `core/shared/testgen_router.py` | Language dispatch |
| `core/languages/python/testgen/testgen.py` | Testgen flow |
| `core/languages/python/testgen/instrumentation/instrument_new_tests.py` | Test instrumentation |
| `core/languages/python/testgen/postprocessing/` | Import management, cleanup |

View file

@ -0,0 +1,7 @@
{
"name": "codeflash/codeflash-internal-docs",
"version": "0.1.0",
"summary": "Internal documentation for the codeflash-internal aiservice backend",
"private": true,
"docs": "docs/index.md"
}

View file

@ -0,0 +1,55 @@
# Architecture
## Service Flow
```
VSC-Extension / CLI → cf-api (Express, :3001) → aiservice (Django-Ninja, :8000)
cf-webapp (:3000) reads from the same PostgreSQL DB via Prisma
```
## Monorepo Layout
```
codeflash-internal/
├── django/aiservice/ # Python backend (Django-Ninja, ASGI)
│ ├── aiservice/ # Django project: settings, urls.py, llm.py
│ ├── authapp/ # Authentication (HttpBearer + django_auth)
│ ├── core/ # Business logic
│ │ ├── shared/ # Cross-language routers (optimizer_router, testgen_router, ranker)
│ │ ├── languages/ # Per-language handlers (python/, js_ts/, java/)
│ │ ├── protocols/ # Handler protocols (LanguageHandler, OptimizerProtocol, etc.)
│ │ ├── registry.py # Language handler registration
│ │ ├── dispatcher.py # Handler lookup by language + feature
│ │ └── log_features/ # Domain models (OptimizationFeatures, OptimizationEvents)
│ └── tests/ # pytest tests by feature: optimizer/, testgen/, integration/
├── js/
│ ├── cf-api/ # Express middleware layer (:3001)
│ ├── cf-webapp/ # Next.js dashboard (:3000)
│ ├── common/ # Shared Prisma schema + utilities (CommonJS)
│ └── VSC-Extension/ # VS Code extension
├── cli/ # CLI tools
└── deployment/ # Infrastructure configs
```
## Glossary
- **Optimization Candidate** — LLM-generated code that may be faster
- **Read-Write Context** — code the LLM can modify
- **Read-Only Context** — code provided as info only (not modified)
- **Tracer** — collects input args for a Python function at runtime
- **Replay Test** — reruns traced inputs to verify behavior
- **Comparator** — compares two Python objects for equality
## Key Entry Points
| Task | Start here |
|------|------------|
| Optimization dispatch | `core/shared/optimizer_router.py` |
| Testgen dispatch | `core/shared/testgen_router.py` |
| Python optimization | `core/languages/python/optimizer/optimizer.py` |
| LLM provider abstraction | `aiservice/llm.py` |
| Endpoint registration | `aiservice/urls.py` |
| Domain models | `core/log_features/models.py` |
| Request/response schemas | `core/shared/optimizer_models.py` |
| Handler registration | `core/registry.py` + `core/dispatcher.py` |
| cf-api route registration | `js/cf-api/routes/index.ts` |

View file

@ -0,0 +1,24 @@
# Code Style
## Python (aiservice)
- **Python 3.12+** — use modern syntax (type unions `X | Y`, `match` statements)
- **Line length**: 120 characters
- **Tooling**: Ruff for linting/formatting (`ruff check .`, `ruff format .`), mypy strict mode, ty for type checking, prek for pre-commit (`uv run prek run --all-files`)
- **Package management**: Always use `uv` — run commands via `uv run`
- **Comments**: Minimal — only explain "why", not "what"
- **Docstrings**: Do not add unless explicitly requested
- **Source transforms**: Use `libcst` for code modification/transformation (preserves formatting); `ast` is acceptable for read-only analysis
- **Async**: All endpoints are `async def` — runs under ASGI via uvicorn. Use `asyncio.TaskGroup` for concurrent operations
- **Schemas**: Use Pydantic `BaseModel` or `ninja.Schema` for all request/response types
- **LLM calls**: Use `aiservice/llm.py` — never call provider APIs directly
- **Prompts**: Stored as `.md` files alongside their modules, rendered with Jinja2
- **Lazy imports**: Use inside function bodies in routers to avoid circular dependencies (`# noqa: PLC0415`)
## JavaScript/TypeScript (cf-api, cf-webapp, VSC-Extension)
- All JS/TS packages use ESLint + Prettier. Run commands from each package directory
- **cf-api**: Express app. Webhook routes MUST be registered before body parser (raw body needed for signature verification). `instrument.ts` must be imported first in entry point (Sentry). Tests use dependency injection: `setXxxDependencies()` / `resetXxxDependencies()`
- **cf-webapp**: Next.js. Default to server components; `"use client"` only for interactivity. Server actions in `"use server"` files. Prisma queries in server components only. Path alias `@/*``./src/*`
- **VSC-Extension**: Different prettier config (80 width + semicolons vs 100/no-semi elsewhere). npm workspaces for local `@codeflash/*` packages. Sidebar is a separate Vite/React app embedded via webview postMessage
- **Prisma**: Schema in `common/prisma/schema.prisma`, shared by cf-api and cf-webapp. `common` is CommonJS — use `require`-style imports

View file

@ -0,0 +1,10 @@
# Git Conventions
- **Always create a new branch from `main`** — never commit directly to `main`
- Use conventional commit format: `fix:`, `feat:`, `refactor:`, `docs:`, `test:`, `chore:`
- Keep commits atomic — one logical change per commit
- Commit message body should be concise (1-2 sentences max)
- PR titles should also use conventional format
- Branch naming: `cf-#-title` (lowercase, hyphenated) where `#` is the Linear issue number
- If related to a Linear issue, include `CF-#` in the PR body
- Pre-commit: `uv run prek run --all-files` from repo root

View file

@ -0,0 +1,32 @@
# Multi-Language Handlers
## Registry Pattern
- `core/registry.py` provides `LanguageRegistry` — a dict mapping `language_id → handler_class`
- Module-level singleton: `registry = LanguageRegistry()`
- Decorator registration: `@register_handler("python")` on handler classes
- Duplicate registration raises `ValueError`
## Dispatcher
- `core/dispatcher.py` provides `get_handler_for_language()` and `get_handler_for_feature()`
- `Feature` enum maps feature names to capability flags: `TESTGEN`, `OPTIMIZER`, `CODE_REPAIR`, `JIT_REWRITE`, `OPTIMIZATION_REVIEW`, `EXPLANATIONS`
- `get_handler_for_feature()` checks `supports_*` attribute before instantiation, raises `HandlerNotImplementedError` if unsupported
## Protocols
- `core/protocols/` defines runtime-checkable protocols:
- `LanguageHandler` (base): declares `language`, `supports_*` booleans
- `OptimizerProtocol`: `optimizer_optimize(request, data)`
- `TestGenProtocol`: `testgen_generate(request, data)`
- `CodeRepairProtocol`: `code_repair_repair(user_id, optimization_id, ctx)`
## Adding a New Language
1. Create `core/languages/<lang>/` directory
2. Implement handler class with `@register_handler("<lang>")` decorator
3. Set `supports_*` flags for implemented features
4. Implement protocol methods for each supported feature
5. Import the module in `core/languages/__init__.py` so the decorator fires
6. Add language routing in `core/shared/optimizer_router.py` and `core/shared/testgen_router.py`
7. Add tests in `tests/` for the new handler

View file

@ -0,0 +1,39 @@
# Optimization Patterns
## Router Dispatch
- Shared routers in `core/shared/` dispatch by the `language` field on the request schema
- Lazy imports inside the endpoint body to avoid circular dependencies:
```python
if data.language in ("javascript", "typescript"):
from core.languages.js_ts.optimizer import optimize_javascript # noqa: PLC0415
return await optimize_javascript(request, data)
```
- Default language is Python
## Context Extraction
- `BaseOptimizerContext` in `optimizer_context.py` handles prompt management and code extraction
- Two context types: `SingleOptimizerContext` (single-file) and `MultiOptimizerContext` (multi-file)
- `extract_code_and_explanation_from_llm_res()` parses LLM markdown response into code blocks
- `parse_and_generate_candidate_schema()` converts extracted code to `OptimizeResponseItemSchema`
## Model Distribution
- `get_model_distribution()` in `optimizer_config.py` splits calls between OpenAI and Anthropic
- Formula: `claude_calls = (total - 1) // 2`, `gpt_calls = total - claude_calls`
- `MAX_OPTIMIZER_CALLS = 6`, `MAX_OPTIMIZER_LP_CALLS = 7`
- Claude gets fewer calls as it's more expensive
## Postprocessing
- `postprocess.py` handles deduplication and validation of optimization candidates
- Deduplication: normalize with `ast.parse()` + `ast.dump()`, skip duplicates with identical AST
- Equality check: compare optimized code to original to skip no-ops
- Uses `libcst` for all code transformations (preserves formatting)
## Prompt Templates
- Prompts stored as `.md` files alongside their modules
- Rendered with Jinja2 (e.g., `build_prompt()` in testgen)
- System and user prompts are constructed per-context type

View file

@ -0,0 +1,20 @@
# Testing Rules
## Python (aiservice)
- Tests in `tests/` by feature: `optimizer/`, `testgen/`, `testgen_postprocessing/`, `testgen_instrumentation/`, `integration/`, `validators/`
- `@pytest.mark.asyncio` for async tests
- `pytest.ini` sets `DJANGO_SETTINGS_MODULE = aiservice.settings`
- `conftest.py` provides `normalize_code()` helper (AST parse/unparse for quote normalization)
- Test factories: `create_optimizer_context()`, `create_refiner_context()`
- Run tests: `uv run pytest` (all), `uv run pytest tests/path/test_file.py::test_name -v` (single)
## JS/TS (cf-api)
- Tests use dependency injection: `setXxxDependencies()` / `resetXxxDependencies()`
## PR Review
- **Always comment on**: Logic errors, security vulnerabilities, test name typos, breaking changes without migration
- **Never comment on**: Code style/formatting (linters handle it), "consider" suggestions, performance without profiling data
- Limit to 5-7 high-signal comments per review

View file

@ -0,0 +1,26 @@
{
"name": "codeflash/codeflash-internal-rules",
"version": "0.1.0",
"summary": "Coding standards and conventions for the codeflash-internal monorepo",
"private": true,
"rules": {
"code-style": {
"rules": "rules/code-style.md"
},
"architecture": {
"rules": "rules/architecture.md"
},
"optimization-patterns": {
"rules": "rules/optimization-patterns.md"
},
"git-conventions": {
"rules": "rules/git-conventions.md"
},
"testing-rules": {
"rules": "rules/testing-rules.md"
},
"multi-language-handlers": {
"rules": "rules/multi-language-handlers.md"
}
}
}

View file

@ -0,0 +1,170 @@
---
name: add-api-endpoint
description: >
Guide for adding a new API endpoint to the codeflash-internal system.
Use when adding a new endpoint, creating a route, or implementing a new API.
Covers both aiservice (Django-Ninja) and cf-api (Express) endpoints
including schemas, routers, authentication, URL registration, and tests.
---
# Add API Endpoint
Use this workflow when adding a new endpoint to either the aiservice backend or the cf-api middleware. Follow the section that matches your target service.
## Part A: AIService Endpoint (Django-Ninja)
### Step 1: Define Schemas
Create request and response schemas.
1. Create or update schemas in the appropriate module (e.g., `core/shared/` for shared, `core/languages/python/` for Python-specific)
2. Use `ninja.Schema` for all request/response types:
```python
from ninja import Schema
class MyRequestSchema(Schema):
source_code: str
trace_id: str
language: str = "python"
class MyResponseSchema(Schema):
results: list[str]
class MyErrorResponseSchema(Schema):
message: str
```
3. Follow existing patterns in `core/shared/optimizer_models.py`
**Checkpoint**: Schemas should use Pydantic validation. Test with `MyRequestSchema.model_validate(data)`.
### Step 2: Create the Router
Create a NinjaAPI router for the endpoint.
1. Create a new module (e.g., `core/shared/my_feature.py` or `core/languages/python/my_feature/my_feature.py`)
2. Define the router and endpoint:
```python
from ninja import NinjaAPI
from authapp.auth import AuthenticatedRequest
my_feature_api = NinjaAPI(urls_namespace="my_feature")
@my_feature_api.post(
"/", response={200: MyResponseSchema, 400: MyErrorResponseSchema, 500: MyErrorResponseSchema}
)
async def my_endpoint(
request: AuthenticatedRequest, data: MyRequestSchema
) -> tuple[int, MyResponseSchema | MyErrorResponseSchema]:
# Implementation here
return 200, MyResponseSchema(results=[])
```
3. All endpoints must be `async def`
4. Use `AuthenticatedRequest` for authenticated endpoints
5. For multi-language endpoints, add dispatch by `data.language` with lazy imports
**Checkpoint**: The endpoint should handle success and error cases with proper status codes.
### Step 3: Register in urls.py
Add the endpoint to `aiservice/urls.py`.
1. Import the API object:
```python
from core.shared.my_feature import my_feature_api
```
2. Add to `urlpatterns`:
```python
path("ai/my-feature", my_feature_api.urls),
```
3. Follow the naming convention: `ai/<kebab-case-name>`
**Checkpoint**: The endpoint should be accessible at `/ai/my-feature/`.
### Step 4: Add Tests
Write tests for the endpoint.
1. Create test file in `tests/` matching the source structure
2. Test the handler function directly (not via HTTP):
```python
@pytest.mark.asyncio
async def test_my_endpoint():
# Mock request and data
result = await my_endpoint(mock_request, mock_data)
assert result[0] == 200
```
3. Run: `uv run pytest tests/path/test_file.py -v`
**Checkpoint**: Tests pass. Run `uv run prek run --all-files`.
## Part B: CF-API Endpoint (Express)
### Step 1: Create Endpoint Handler
Create the endpoint handler in `js/cf-api/endpoints/`.
1. Create `js/cf-api/endpoints/my-feature.ts`:
```typescript
import { Request, Response } from "express"
export async function myFeature(req: Request, res: Response) {
// Implementation
res.status(200).json({ results: [] })
}
```
2. Follow existing patterns in `endpoints/`
**Checkpoint**: Handler should return appropriate status codes and JSON responses.
### Step 2: Create Route File or Add to Existing
Add the route to the appropriate route file.
1. If it's a new domain, create `js/cf-api/routes/my-feature.routes.ts`:
```typescript
import { Router } from "express"
import { addAsync } from "@awaitjs/express"
import { myFeature } from "../endpoints/my-feature.js"
import { ROUTES } from "../constants/index.js"
const router = addAsync(Router()) as any
router.postAsync(ROUTES.MY_FEATURE, myFeature)
export default router
```
2. If it belongs to an existing domain, add to the corresponding route file
3. Add the route constant to `constants/index.ts`
**Checkpoint**: Route file exports a router with the endpoint registered.
### Step 3: Register in Route Index
Register the route in `js/cf-api/routes/index.ts`.
1. Import the route module
2. Register in the correct section:
- Before body parser: webhook routes only
- Public routes: no auth required
- Protected routes: after `checkForValidAPIKey` middleware
3. Apply middleware as needed (`trackUsage`, `idLimiter`, etc.)
**Checkpoint**: The endpoint should be accessible with proper authentication.
### Step 4: Add Tests
Write tests using the dependency injection pattern.
1. Create test file in `js/cf-api/__tests__/`
2. Use `setXxxDependencies()` / `resetXxxDependencies()` for mocking
3. Follow existing test patterns
**Checkpoint**: Tests pass with `npm test`.
## Key Files Reference
| File | What to modify |
|------|---------------|
| `core/shared/optimizer_models.py` | Schema patterns |
| `aiservice/urls.py` | AIService endpoint registration |
| `js/cf-api/routes/index.ts` | CF-API route registration |
| `js/cf-api/constants/index.ts` | Route constants |
| `authapp/auth.py` | Authentication patterns |

View file

@ -0,0 +1,135 @@
---
name: add-language-support
description: >
Guide for adding a new programming language to the multi-language system.
Use when extending the aiservice to support a new language (e.g., Rust, Go).
Covers directory creation, handler implementation, registry registration,
protocol compliance, prompt templates, router updates, and tests.
---
# Add Language Support
Use this workflow when adding a new programming language to the codeflash-internal optimization system. Follow steps in order — each builds on the previous.
## Step 1: Create Language Directory
Create the directory structure under `core/languages/`.
1. Create `core/languages/<lang>/` with `__init__.py`
2. Create subdirectories for each feature:
```
core/languages/<lang>/
├── __init__.py # Handler class + @register_handler decorator
├── optimizer/ # Optimization handler
│ ├── __init__.py
│ └── optimizer.py
└── testgen/ # Test generation handler (if supported)
├── __init__.py
└── testgen.py
```
3. Follow existing patterns from `core/languages/python/` or `core/languages/js_ts/`
**Checkpoint**: The directory structure should match existing language modules. Verify with `ls core/languages/`.
## Step 2: Implement Handler Class
Create the handler class in `core/languages/<lang>/__init__.py`.
1. Read `core/protocols/base.py` for the `LanguageHandler` protocol
2. Implement the handler:
```python
from core.registry import register_handler
@register_handler("<lang>")
class <Lang>Handler:
language = "<lang>"
supports_testgen = False # Set True if implementing testgen
supports_optimizer = True # Set True if implementing optimizer
supports_code_repair = False
supports_jit_rewrite = False
supports_optimization_review = False
supports_explanations = False
```
3. Set `supports_*` flags for each feature you implement
4. The `@register_handler` decorator registers the class with the global registry
**Checkpoint**: After this step, `registry.list_available()` should include your language ID.
## Step 3: Register the Module
Ensure the handler module is imported so the decorator fires.
1. Read `core/languages/__init__.py` — check how existing languages are imported
2. Add your language import so `@register_handler` fires on startup
3. Import should be at module level or via lazy import pattern
**Checkpoint**: Run `python -c "from core.languages import <lang>"` to verify no import errors.
## Step 4: Implement Protocol Methods
Implement the protocol methods for each supported feature.
1. Read `core/protocols/optimizer.py``OptimizerProtocol` requires `optimizer_optimize(request, data)`
2. Read `core/protocols/testgen.py``TestGenProtocol` requires `testgen_generate(request, data)`
3. Read `core/protocols/code_repair.py``CodeRepairProtocol` requires `code_repair_repair(user_id, optimization_id, ctx)`
4. Follow the pattern from Python/JS handlers:
```python
async def optimizer_optimize(self, request, data):
# 1. Build context from data.source_code
# 2. Call call_llm() with language-specific prompts
# 3. Parse response
# 4. Return (status_code, response_schema)
```
**Checkpoint**: Each method must be `async def` and return the expected response type.
## Step 5: Create Prompt Templates
Create language-specific prompt templates.
1. Create `.md` files alongside the handler modules (e.g., `optimizer/system_prompt.md`)
2. Use Jinja2 templating for dynamic content
3. Prompts should include language-specific context (version, conventions, stdlib)
4. Follow the pattern from `core/languages/python/optimizer/context_utils/`
**Checkpoint**: Prompts should produce valid, non-empty system and user messages.
## Step 6: Update Routers
Add language dispatch to the shared routers.
1. Edit `core/shared/optimizer_router.py` — add a branch for your language:
```python
if data.language == "<lang>":
from core.languages.<lang>.optimizer import optimize_<lang> # noqa: PLC0415
return await optimize_<lang>(request, data)
```
2. Edit `core/shared/testgen_router.py` — add the same pattern if testgen is supported
3. Use lazy imports (inside the function body) to avoid circular dependencies
**Checkpoint**: Both routers should dispatch correctly for the new language.
## Step 7: Add Tests
Write tests for the new language handler.
1. Create `tests/<lang>/` directory mirroring the source structure
2. Test handler registration: verify `registry.get_handler("<lang>")` returns the correct class
3. Test feature dispatch: verify `get_handler_for_feature("<lang>", "optimizer")` works
4. Test optimization flow end-to-end (mock LLM calls)
5. Use `@pytest.mark.asyncio` for async tests
6. Run: `uv run pytest tests/<lang>/ -v`
**Checkpoint**: All tests must pass. Run `uv run prek run --all-files` for formatting/lint.
## Key Files Reference
| File | What to modify |
|------|---------------|
| `core/languages/<lang>/` | New handler directory |
| `core/registry.py` | No changes needed (decorator handles it) |
| `core/dispatcher.py` | No changes needed (dynamic lookup) |
| `core/protocols/` | Reference for protocol methods |
| `core/shared/optimizer_router.py` | Add language dispatch branch |
| `core/shared/testgen_router.py` | Add language dispatch branch (if testgen) |
| `core/languages/__init__.py` | Add import for decorator registration |

View file

@ -0,0 +1,107 @@
---
name: debug-optimization-failure
description: >
Diagnose why an optimization produced no results or failed silently.
Use when an optimization request returns errors, empty results, or all
candidates are rejected. Walks through request validation, router dispatch,
context extraction, LLM calls, postprocessing, and logging stages.
---
# Debug Optimization Failure
Use this workflow when an optimization request fails or produces no results. Work through the stages sequentially — stop at the first failure found.
## Step 1: Validate the Request
Check that the incoming `OptimizeSchema` is well-formed.
1. Read `core/shared/optimizer_models.py` — verify the request matches `OptimizeSchema` fields
2. Check required fields: `source_code`, `trace_id` must be non-empty
3. Check `language` field — must be `"python"`, `"javascript"`, `"typescript"`, or `"java"`
4. Check `n_candidates` — default is 5, must be positive
**Checkpoint**: If the request schema is invalid, the error comes from Pydantic validation. Check the 400 response for field-level errors.
## Step 2: Check Router Dispatch
Verify the correct language handler is invoked.
1. Read `core/shared/optimizer_router.py` — the `optimize()` endpoint dispatches by `data.language`
2. Supported routes:
- `"javascript"` / `"typescript"``core.languages.js_ts.optimizer.optimize_javascript`
- `"java"``core.languages.java.optimizer.optimize_java`
- Default → `core.languages.python.optimizer.optimizer.optimize_python`
3. Check for import errors — lazy imports inside the function body may fail if a language module is missing
**Checkpoint**: If dispatch fails, you'll see an `ImportError`. Check that the language module exists under `core/languages/`.
## Step 3: Check Context Extraction
Verify the optimization context is built correctly.
1. Read `core/languages/python/optimizer/context_utils/optimizer_context.py`
2. `BaseOptimizerContext.get_dynamic_context()` dispatches to Single or Multi context
3. Check `get_system_prompt()` and `get_user_prompt()` — they should produce non-empty prompts
4. Check `extract_code_and_explanation_from_llm_res()` — this parses markdown code blocks from the LLM response
**Checkpoint**: If context extraction returns empty prompts, check that `source_code` in the request is valid Python/JS code.
## Step 4: Check LLM Calls
Verify the LLM is called correctly and returns valid responses.
1. Read `aiservice/llm.py``call_llm()` is the universal call handler
2. Check `get_llm_client(model_type)` returns a valid client (not `None`)
3. Environment variables required:
- OpenAI: `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `OPENAI_API_VERSION`
- Anthropic: `ANTHROPIC_FOUNDRY_API_KEY`, `ANTHROPIC_FOUNDRY_BASE_URL`
4. Check `optimizer_config.py``get_model_distribution()` determines how many calls per model
5. Look for exceptions: `"LLM client for model type '...' is not available"`
**Checkpoint**: If LLM calls fail, check environment variables and API key validity. Network errors will raise exceptions.
## Step 5: Check Postprocessing
Verify candidates survive postprocessing.
1. Read `core/languages/python/optimizer/postprocess.py`
2. `deduplicate_optimizations()` — removes candidates with identical AST (via `ast.parse()` + `ast.dump()`)
3. `equality_check()` — removes candidates identical to the original code
4. Check if ALL candidates were deduplicated or matched the original
**Checkpoint**: If all candidates are removed by postprocessing, the LLM is generating identical or no-op code. Try increasing `n_candidates` or checking prompt quality.
## Step 6: Check Response Construction
Verify the response is properly constructed.
1. Each successful candidate produces an `OptimizeResponseItemSchema`
2. `parse_and_generate_candidate_schema()` converts extracted code to the schema
3. `is_valid_code()` validates syntax — `cst.ParserSyntaxError` or `ValidationError` means malformed output
4. If parsing fails, the candidate is dropped and a Sentry message is captured
**Checkpoint**: If candidates parse but the response is empty, check the validation step in the optimizer flow.
## Step 7: Check Logging
Verify the optimization was logged for debugging.
1. Read `core/log_features/models.py``OptimizationFeatures` stores per-trace-id data
2. Check `optimizations_raw` (before postprocessing) vs `optimizations_post` (after)
3. LLM calls are recorded via `record_llm_call()` in the `finally` block of `call_llm()`
4. PostHog events track `aiservice-optimize-openai-usage`
**Checkpoint**: If logging shows raw candidates but no post candidates, postprocessing removed them all.
## Key Files Reference
| File | What to check |
|------|---------------|
| `core/shared/optimizer_router.py` | Language dispatch |
| `core/shared/optimizer_models.py` | Request validation |
| `core/languages/python/optimizer/optimizer.py` | Optimization flow |
| `core/languages/python/optimizer/context_utils/optimizer_context.py` | Context extraction |
| `core/languages/python/optimizer/postprocess.py` | Dedup and validation |
| `aiservice/llm.py` | LLM calls and client setup |
| `core/shared/optimizer_config.py` | Model distribution |
| `core/log_features/models.py` | Logging and tracking |

View file

@ -0,0 +1,103 @@
---
name: debug-test-generation
description: >
Diagnose why test generation failed or produced invalid tests.
Use when testgen returns errors, empty results, or produces tests
that fail to compile or run. Walks through request validation,
router dispatch, context building, prompt construction, LLM calls,
postprocessing, instrumentation, and output validation.
---
# Debug Test Generation
Use this workflow when test generation fails or produces invalid tests. Work through the stages sequentially — stop at the first failure found.
## Step 1: Validate the Request
Check that the incoming testgen request is well-formed.
1. Read the testgen request schema in the relevant testgen module
2. Verify required fields: `source_code`, `trace_id` must be non-empty
3. Check `language` field — must match a supported language
4. Check for valid code — source code should parse without syntax errors
**Checkpoint**: If the request schema is invalid, the error comes from Pydantic validation. Check the 400 response.
## Step 2: Check Router Dispatch
Verify the correct language handler is invoked.
1. Read `core/shared/testgen_router.py` — the `testgen()` endpoint dispatches by `data.language`
2. Supported routes:
- `"javascript"` / `"typescript"``core.languages.js_ts.testgen.generate_tests_javascript`
- `"java"``core.languages.java.testgen.generate_tests_java`
- Default → `core.languages.python.testgen.testgen.generate_tests_python`
3. Check for `ImportError` — lazy imports may fail if a language module is broken
**Checkpoint**: If dispatch fails, check that the language module exists and imports cleanly.
## Step 3: Check Context Building
Verify the testgen context is constructed correctly.
1. Read `core/languages/python/testgen/testgen.py``build_prompt()` constructs the prompts
2. Check that source code and dependency code are passed correctly
3. Verify the Jinja2 template renders without errors
4. Check for async/sync variants — the prompt builder handles both
**Checkpoint**: If context is empty or malformed, check the input `source_code` and `dependency_code`.
## Step 4: Check Prompt Construction
Verify the LLM prompts are well-formed.
1. The `build_prompt()` function uses Jinja2 templates (`.md` files alongside the module)
2. System prompt sets the role and language context
3. User prompt includes the source code and test context
4. Check that prompts are non-empty and contain the function to test
**Checkpoint**: If prompts are empty, check the template files and Jinja2 rendering.
## Step 5: Check LLM Response
Verify the LLM returns valid test code.
1. Read `aiservice/llm.py``call_llm()` handles the API call
2. Check for network errors or API key issues (same as optimization debugging)
3. Look for `LLMOutputParseError` — this means the LLM returned unparseable output
4. Check the raw response content — it should contain markdown code blocks with test code
**Checkpoint**: If the LLM returns malformed output, check the prompt quality and model selection.
## Step 6: Check Postprocessing
Verify generated tests survive postprocessing.
1. Read `core/languages/python/testgen/postprocessing/` directory
2. `add_missing_imports.py` — adds `from __future__ import annotations` if needed
3. Check for syntax errors in generated test code — `cst.ParserSyntaxError` means malformed code
4. Verify imports are resolved correctly
**Checkpoint**: If postprocessing fails, the LLM generated syntactically invalid code. Check the raw output.
## Step 7: Check Instrumentation
Verify tests are properly instrumented.
1. Read `core/languages/python/testgen/instrumentation/instrument_new_tests.py`
2. `instrument_tests()` applies behavior and performance instrumentation
3. `detect_frameworks_from_code()` identifies ML frameworks (PyTorch, TensorFlow, JAX)
4. `_create_device_sync_precompute_statements()` adds GPU sync calls for timing accuracy
5. Check that instrumented tests still compile — instrumentation may introduce syntax errors
**Checkpoint**: If instrumented tests fail to compile, check the instrumentation transforms. The issue is usually in import handling or device sync injection.
## Key Files Reference
| File | What to check |
|------|---------------|
| `core/shared/testgen_router.py` | Language dispatch |
| `core/languages/python/testgen/testgen.py` | Testgen flow, prompt building |
| `core/languages/python/testgen/postprocessing/` | Import management, cleanup |
| `core/languages/python/testgen/instrumentation/instrument_new_tests.py` | Test instrumentation |
| `aiservice/llm.py` | LLM calls and client setup |

View file

@ -0,0 +1,20 @@
{
"name": "codeflash/codeflash-internal-skills",
"version": "0.1.0",
"summary": "Procedural workflows for developing and debugging codeflash-internal",
"private": true,
"skills": {
"debug-optimization-failure": {
"path": "skills/debug-optimization-failure/SKILL.md"
},
"add-language-support": {
"path": "skills/add-language-support/SKILL.md"
},
"add-api-endpoint": {
"path": "skills/add-api-endpoint/SKILL.md"
},
"debug-test-generation": {
"path": "skills/debug-test-generation/SKILL.md"
}
}
}

View file

@ -0,0 +1,28 @@
You are a coding agent with access to an MCP tool called `query_library_docs` provided by `tessl`. This tool allows you to query documentation that may be relevant to your task.
Before you begin working on the user request, you should use the `query_library_docs` tool to search for relevant documentation. This is especially important when:
- The request mentions specific code files, functions, classes, or services
- The request involves debugging errors or investigating issues
- The request asks about how something works or why something behaves a certain way
- The request involves making changes to existing code
- The request references specific frameworks, libraries, or systems
When in doubt, use the tool. It's better to query documentation even if it might not be relevant than to miss important context.
How to use the tool:
1. Extract key terms, file names, service names, error types, or concepts from the user's request
2. Call `query_library_docs` with relevant search terms
3. Use the returned documentation to inform your approach to the task
4. Proceed with completing the user's request using both the documentation context and your own analysis
Important rules:
- Do NOT ask for permission to use the tool - just use it
- Do NOT explain that you're going to use the tool - just use it
- If the tool fails or returns no results, simply continue with the task as you normally would
- You may call the tool multiple times with different search terms if needed
- Use the tool BEFORE reading files or making changes
After using the tool (or if the tool returns nothing useful), proceed directly to completing the user's request. Your response should focus on addressing the user's needs, incorporating any relevant documentation you found.

View file

@ -0,0 +1,11 @@
{
"name": "tessl/cli-setup",
"private": false,
"version": "0.70.0",
"summary": "Tessl CLI MCP tool usage guidelines",
"steering": {
"query_library_docs": {
"rules": "steering/query_library_docs.md"
}
}
}