feat: add private tessl tiles for codeflash rules, docs, and skills

Three private tiles in the codeflash workspace: - codeflash-rules: 6 steering rules (code-style, architecture, optimization-patterns, git-conventions, testing-rules, language-rules) - codeflash-docs: 7 doc pages (domain-types, optimization-pipeline, context-extraction, verification, ai-service, configuration) - codeflash-skills: 2 skills (debug-optimization-failure, add-codeflash-feature)
2026-05-04 18:25:17 +00:00 · 2026-02-14 20:55:06 -05:00 · 2026-02-14 20:55:06 -05:00 · 6718e66582
commit 6718e66582
parent 90601c3324
20 changed files with 965 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -33,3 +33,5 @@ Discovery → Ranking → Context Extraction → Test Gen + Optimization → Bas
 # Agent Rules <!-- tessl-managed -->

@.tessl/RULES.md follow the [instructions](.tessl/RULES.md)
+
+@AGENTS.md
--- a/tessl.json
+++ b/tessl.json
@ -63,6 +63,15 @@
    },
    "tessl/pypi-filelock": {
      "version": "3.19.0"
+    },
+    "codeflash/codeflash-rules": {
+      "version": "0.1.0"
+    },
+    "codeflash/codeflash-docs": {
+      "version": "0.1.0"
+    },
+    "codeflash/codeflash-skills": {
+      "version": "0.1.0"
    }
  }
 }
--- a/tiles/codeflash-docs/docs/ai-service.md
+++ b/tiles/codeflash-docs/docs/ai-service.md
@ -0,0 +1,108 @@
+# AI Service
+
+How codeflash communicates with the AI optimization backend.
+
+## `AiServiceClient` (`api/aiservice.py`)
+
+The client connects to the AI service at `https://app.codeflash.ai` (or `http://localhost:8000` when `CODEFLASH_AIS_SERVER=local`).
+
+Authentication uses Bearer token from `get_codeflash_api_key()`. All requests go through `make_ai_service_request()` which handles JSON serialization via Pydantic encoder.
+
+Timeout: 90s for production, 300s for local.
+
+## Endpoints
+
+### `/ai/optimize` — Generate Candidates
+
+Method: `optimize_code()`
+
+Sends source code + dependency context to generate optimization candidates.
+
+Payload:
+- `source_code` — The read-writable code (markdown format)
+- `dependency_code` — Read-only context code
+- `trace_id` — Unique trace ID for the optimization run
+- `language` — `"python"`, `"javascript"`, or `"typescript"`
+- `n_candidates` — Number of candidates to generate (controlled by effort level)
+- `is_async` — Whether the function is async
+- `is_numerical_code` — Whether the code is numerical (affects optimization strategy)
+
+Returns: `list[OptimizedCandidate]` with `source=OptimizedCandidateSource.OPTIMIZE`
+
+### `/ai/optimize_line_profiler` — Line-Profiler-Guided Candidates
+
+Method: `optimize_python_code_line_profiler()`
+
+Like `/optimize` but includes `line_profiler_results` to guide the LLM toward hot lines.
+
+Returns: candidates with `source=OptimizedCandidateSource.OPTIMIZE_LP`
+
+### `/ai/refine` — Refine Existing Candidate
+
+Method: `refine_code()`
+
+Request type: `AIServiceRefinerRequest`
+
+Sends an existing candidate with runtime data and line profiler results to generate an improved version.
+
+Key fields:
+- `original_source_code` / `optimized_source_code` — Before and after
+- `original_code_runtime` / `optimized_code_runtime` — Timing data
+- `speedup` — Current speedup ratio
+- `original_line_profiler_results` / `optimized_line_profiler_results`
+
+Returns: candidates with `source=OptimizedCandidateSource.REFINE` and `parent_id` set to the refined candidate's ID
+
+### `/ai/repair` — Fix Failed Candidate
+
+Method: `repair_code()`
+
+Request type: `AIServiceCodeRepairRequest`
+
+Sends a failed candidate with test diffs showing what went wrong.
+
+Key fields:
+- `original_source_code` / `modified_source_code`
+- `test_diffs: list[TestDiff]` — Each with `scope` (return_value/stdout/did_pass), original vs candidate values, and test source code
+
+Returns: candidates with `source=OptimizedCandidateSource.REPAIR` and `parent_id` set
+
+### `/ai/adaptive_optimize` — Multi-Candidate Adaptive
+
+Method: `adaptive_optimize()`
+
+Request type: `AIServiceAdaptiveOptimizeRequest`
+
+Sends multiple previous candidates with their speedups for the LLM to learn from and generate better candidates.
+
+Key fields:
+- `candidates: list[AdaptiveOptimizedCandidate]` — Previous candidates with source code, explanation, source type, and speedup
+
+Returns: candidates with `source=OptimizedCandidateSource.ADAPTIVE`
+
+### `/ai/rewrite_jit` — JIT Rewrite
+
+Method: `get_jit_rewritten_code()`
+
+Rewrites code to use JIT compilation (e.g., Numba).
+
+Returns: candidates with `source=OptimizedCandidateSource.JIT_REWRITE`
+
+## Candidate Parsing
+
+All endpoints return JSON with an `optimizations` array. Each entry has:
+- `source_code` — Markdown-formatted code blocks
+- `explanation` — LLM explanation
+- `optimization_id` — Unique ID
+- `parent_id` — Optional parent reference
+- `model` — Which LLM model was used
+
+`_get_valid_candidates()` parses the markdown code via `CodeStringsMarkdown.parse_markdown_code()` and filters out entries with empty code blocks.
+
+## `LocalAiServiceClient`
+
+Used when `CODEFLASH_EXPERIMENT_ID` is set. Mirrors `AiServiceClient` but sends to a separate experimental endpoint for A/B testing optimization strategies.
+
+## LLM Call Sequencing
+
+`AiServiceClient` tracks call sequence via `llm_call_counter` (itertools.count). Each request includes a `call_sequence` number, used by the backend to maintain conversation context across multiple calls for the same function.
--- a/tiles/codeflash-docs/docs/configuration.md
+++ b/tiles/codeflash-docs/docs/configuration.md
@ -0,0 +1,79 @@
+# Configuration
+
+Key configuration constants, effort levels, and thresholds.
+
+## Constants (`code_utils/config_consts.py`)
+
+### Test Execution
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MAX_TEST_RUN_ITERATIONS` | 5 | Maximum test loop iterations |
+| `INDIVIDUAL_TESTCASE_TIMEOUT` | 15s | Timeout per individual test case |
+| `MAX_FUNCTION_TEST_SECONDS` | 60s | Max total time for function testing |
+| `MAX_TEST_FUNCTION_RUNS` | 50 | Max test function executions |
+| `MAX_CUMULATIVE_TEST_RUNTIME_NANOSECONDS` | 100ms | Max cumulative test runtime |
+| `TOTAL_LOOPING_TIME` | 10s | Candidate benchmarking budget |
+| `MIN_TESTCASE_PASSED_THRESHOLD` | 6 | Minimum test cases that must pass |
+
+### Performance Thresholds
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MIN_IMPROVEMENT_THRESHOLD` | 0.05 (5%) | Minimum speedup to accept a candidate |
+| `MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD` | 0.10 (10%) | Minimum async throughput improvement |
+| `MIN_CONCURRENCY_IMPROVEMENT_THRESHOLD` | 0.20 (20%) | Minimum concurrency ratio improvement |
+| `COVERAGE_THRESHOLD` | 60.0% | Minimum test coverage |
+
+### Stability Thresholds
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `STABILITY_WINDOW_SIZE` | 0.35 | 35% of total iteration window |
+| `STABILITY_CENTER_TOLERANCE` | 0.0025 | ±0.25% around median |
+| `STABILITY_SPREAD_TOLERANCE` | 0.0025 | 0.25% window spread |
+
+### Context Limits
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `OPTIMIZATION_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for optimization context |
+| `TESTGEN_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for test generation context |
+| `MAX_CONTEXT_LEN_REVIEW` | 1000 | Max context length for optimization review |
+
+### Other
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MIN_CORRECT_CANDIDATES` | 2 | Min correct candidates before skipping repair |
+| `REPEAT_OPTIMIZATION_PROBABILITY` | 0.1 | Probability of re-optimizing a function |
+| `DEFAULT_IMPORTANCE_THRESHOLD` | 0.001 | Minimum addressable time to consider a function |
+| `CONCURRENCY_FACTOR` | 10 | Number of concurrent executions for concurrency benchmark |
+| `REFINED_CANDIDATE_RANKING_WEIGHTS` | (2, 1) | (runtime, diff) weights — runtime 2x more important |
+
+## Effort Levels
+
+`EffortLevel` enum: `LOW`, `MEDIUM`, `HIGH`
+
+Effort controls the number of candidates, repairs, and refinements:
+
+| Key | LOW | MEDIUM | HIGH |
+|-----|-----|--------|------|
+| `N_OPTIMIZER_CANDIDATES` | 3 | 5 | 6 |
+| `N_OPTIMIZER_LP_CANDIDATES` | 4 | 6 | 7 |
+| `N_GENERATED_TESTS` | 2 | 2 | 2 |
+| `MAX_CODE_REPAIRS_PER_TRACE` | 2 | 3 | 5 |
+| `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` | 0.2 | 0.3 | 0.4 |
+| `TOP_VALID_CANDIDATES_FOR_REFINEMENT` | 2 | 3 | 4 |
+| `ADAPTIVE_OPTIMIZATION_THRESHOLD` | 0 | 0 | 2 |
+| `MAX_ADAPTIVE_OPTIMIZATIONS_PER_TRACE` | 0 | 0 | 4 |
+
+Use `get_effort_value(EffortKeys.KEY, effort_level)` to retrieve values.
+
+## Project Configuration
+
+Configuration is read from `pyproject.toml` under `[tool.codeflash]`. Key settings are auto-detected by `setup/detector.py`:
+- `module-root` — Root of the module to optimize
+- `tests-root` — Root of test files
+- `test-framework` — pytest, unittest, jest, etc.
+- `formatter-cmds` — Code formatting commands
--- a/tiles/codeflash-docs/docs/context-extraction.md
+++ b/tiles/codeflash-docs/docs/context-extraction.md
@ -0,0 +1,60 @@
+# Context Extraction
+
+How codeflash extracts and limits code context for optimization and test generation.
+
+## Overview
+
+Context extraction (`context/code_context_extractor.py`) builds a `CodeOptimizationContext` containing all code needed for the LLM to understand and optimize a function, split into:
+
+- **Read-writable code** (`CodeContextType.READ_WRITABLE`): The function being optimized plus its helper functions — code the LLM is allowed to modify
+- **Read-only context** (`CodeContextType.READ_ONLY`): Dependency code for reference — imports, type definitions, base classes
+- **Testgen context** (`CodeContextType.TESTGEN`): Context for test generation, may include imported class definitions and external base class inits
+- **Hashing context** (`CodeContextType.HASHING`): Used for deduplication of optimization runs
+
+## Token Limits
+
+Both optimization and test generation contexts are token-limited:
+- `OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 16000` tokens
+- `TESTGEN_CONTEXT_TOKEN_LIMIT = 16000` tokens
+
+Token counting uses `encoded_tokens_len()` from `code_utils/code_utils.py`. Functions whose context exceeds these limits are skipped.
+
+## Context Building Process
+
+### 1. Helper Discovery
+
+For the target function (`FunctionToOptimize`), the extractor finds:
+- **Helpers of the function**: Functions/classes in the same file that the target function calls
+- **Helpers of helpers**: Transitive dependencies of the helper functions
+
+These are organized as `dict[Path, set[FunctionSource]]` — mapping file paths to the set of helper functions found in each file.
+
+### 2. Code Extraction
+
+`extract_code_markdown_context_from_files()` builds `CodeStringsMarkdown` from the helper dictionaries. Each file's relevant code is extracted as a `CodeString` with its file path.
+
+### 3. Testgen Context Enrichment
+
+`build_testgen_context()` extends the basic context with:
+- Imported class definitions (resolved from imports)
+- External base class `__init__` methods
+- External class `__init__` methods referenced in the context
+
+### 4. Unused Definition Removal
+
+`detect_unused_helper_functions()` and `remove_unused_definitions_by_function_names()` from `context/unused_definition_remover.py` prune definitions that are not transitively reachable from the target function, reducing token usage.
+
+### 5. Deduplication
+
+The hashing context (`hashing_code_context`) generates a hash (`hashing_code_context_hash`) used to detect when the same function context has already been optimized in a previous run, avoiding redundant work.
+
+## Key Functions
+
+| Function | Location | Purpose |
+|----------|----------|---------|
+| `build_testgen_context()` | `context/code_context_extractor.py` | Build enriched testgen context |
+| `extract_code_markdown_context_from_files()` | `context/code_context_extractor.py` | Convert helper dicts to `CodeStringsMarkdown` |
+| `detect_unused_helper_functions()` | `context/unused_definition_remover.py` | Find unused definitions |
+| `remove_unused_definitions_by_function_names()` | `context/unused_definition_remover.py` | Remove unused definitions |
+| `collect_top_level_defs_with_usages()` | `context/unused_definition_remover.py` | Analyze definition usage |
+| `encoded_tokens_len()` | `code_utils/code_utils.py` | Count tokens in code |
--- a/tiles/codeflash-docs/docs/domain-types.md
+++ b/tiles/codeflash-docs/docs/domain-types.md
@ -0,0 +1,153 @@
+# Domain Types
+
+Core data types used throughout the codeflash optimization pipeline.
+
+## Function Representation
+
+### `FunctionToOptimize` (`models/function_types.py`)
+
+The canonical dataclass representing a function candidate for optimization. Works across Python, JavaScript, and TypeScript.
+
+Key fields:
+- `function_name: str` — The function name
+- `file_path: Path` — Absolute file path where the function is located
+- `parents: list[FunctionParent]` — Parent scopes (classes/functions), each with `name` and `type`
+- `starting_line / ending_line: Optional[int]` — Line range (1-indexed)
+- `is_async: bool` — Whether the function is async
+- `is_method: bool` — Whether it belongs to a class
+- `language: str` — Programming language (default: `"python"`)
+
+Key properties:
+- `qualified_name` — Full dotted name including parent classes (e.g., `MyClass.my_method`)
+- `top_level_parent_name` — Name of outermost parent, or function name if no parents
+- `class_name` — Immediate parent class name, or `None`
+
+### `FunctionParent` (`models/function_types.py`)
+
+Represents a parent scope: `name: str` (e.g., `"MyClass"`) and `type: str` (e.g., `"ClassDef"`).
+
+### `FunctionSource` (`models/models.py`)
+
+Represents a resolved function with source code. Used for helper functions in context extraction.
+
+Fields: `file_path`, `qualified_name`, `fully_qualified_name`, `only_function_name`, `source_code`, `jedi_definition`.
+
+## Code Representation
+
+### `CodeString` (`models/models.py`)
+
+A single code block with validated syntax:
+- `code: str` — The source code
+- `file_path: Optional[Path]` — Origin file path
+- `language: str` — Language for validation (default: `"python"`)
+
+Validates syntax on construction via `model_validator`.
+
+### `CodeStringsMarkdown` (`models/models.py`)
+
+A collection of `CodeString` blocks — the primary format for passing code through the pipeline.
+
+Key properties:
+- `.flat` — Combined source code with file-path comment prefixes (e.g., `# file: path/to/file.py`)
+- `.markdown` — Markdown-formatted with fenced code blocks: `` ```python:filepath\ncode\n``` ``
+- `.file_to_path()` — Dict mapping file path strings to code
+
+Static method:
+- `parse_markdown_code(markdown_code, expected_language)` — Parses markdown code blocks back into `CodeStringsMarkdown`
+
+## Optimization Context
+
+### `CodeOptimizationContext` (`models/models.py`)
+
+Holds all code context needed for optimization:
+- `read_writable_code: CodeStringsMarkdown` — Code the LLM can modify
+- `read_only_context_code: str` — Reference-only dependency code
+- `testgen_context: CodeStringsMarkdown` — Context for test generation
+- `hashing_code_context: str` / `hashing_code_context_hash: str` — For deduplication
+- `helper_functions: list[FunctionSource]` — Helper functions in the writable code
+- `preexisting_objects: set[tuple[str, tuple[FunctionParent, ...]]]` — Objects that already exist in the code
+
+### `CodeContextType` enum (`models/models.py`)
+
+Defines context categories: `READ_WRITABLE`, `READ_ONLY`, `TESTGEN`, `HASHING`.
+
+## Candidates
+
+### `OptimizedCandidate` (`models/models.py`)
+
+A generated code variant:
+- `source_code: CodeStringsMarkdown` — The optimized code
+- `explanation: str` — LLM explanation of the optimization
+- `optimization_id: str` — Unique identifier
+- `source: OptimizedCandidateSource` — How it was generated
+- `parent_id: str | None` — ID of parent candidate (for refinements/repairs)
+- `model: str | None` — Which LLM model generated it
+
+### `OptimizedCandidateSource` enum (`models/models.py`)
+
+How a candidate was generated: `OPTIMIZE`, `OPTIMIZE_LP` (line profiler), `REFINE`, `REPAIR`, `ADAPTIVE`, `JIT_REWRITE`.
+
+### `CandidateEvaluationContext` (`models/models.py`)
+
+Tracks state during candidate evaluation:
+- `speedup_ratios` / `optimized_runtimes` / `is_correct` — Per-candidate results
+- `ast_code_to_id` — Deduplication map (normalized AST → first seen candidate)
+- `valid_optimizations` — Candidates that passed all checks
+
+Key methods: `record_failed_candidate()`, `record_successful_candidate()`, `handle_duplicate_candidate()`, `register_new_candidate()`.
+
+## Baseline & Results
+
+### `OriginalCodeBaseline` (`models/models.py`)
+
+Baseline measurements for the original code:
+- `behavior_test_results: TestResults` / `benchmarking_test_results: TestResults`
+- `line_profile_results: dict`
+- `runtime: int` — Total runtime in nanoseconds
+- `coverage_results: Optional[CoverageData]`
+
+### `BestOptimization` (`models/models.py`)
+
+The winning candidate after evaluation:
+- `candidate: OptimizedCandidate`
+- `helper_functions: list[FunctionSource]`
+- `code_context: CodeOptimizationContext`
+- `runtime: int`
+- `winning_behavior_test_results` / `winning_benchmarking_test_results: TestResults`
+
+## Test Types
+
+### `TestType` enum (`models/test_type.py`)
+
+- `EXISTING_UNIT_TEST` (1) — Pre-existing tests from the codebase
+- `INSPIRED_REGRESSION` (2) — Tests inspired by existing tests
+- `GENERATED_REGRESSION` (3) — AI-generated regression tests
+- `REPLAY_TEST` (4) — Tests from recorded benchmark data
+- `CONCOLIC_COVERAGE_TEST` (5) — Coverage-guided tests
+- `INIT_STATE_TEST` (6) — Class init state verification
+
+### `TestFile` / `TestFiles` (`models/models.py`)
+
+`TestFile` represents a single test file with `instrumented_behavior_file_path`, optional `benchmarking_file_path`, `original_file_path`, `test_type`, and `tests_in_file`.
+
+`TestFiles` is a collection with lookup methods: `get_by_type()`, `get_by_original_file_path()`, `get_test_type_by_instrumented_file_path()`.
+
+### `TestResults` (`models/models.py`)
+
+Collection of `FunctionTestInvocation` results with indexed lookup. Key methods:
+- `add(invocation)` — Deduplicated insert
+- `total_passed_runtime()` — Sum of minimum runtimes per test case (nanoseconds)
+- `number_of_loops()` — Max loop index across all results
+- `usable_runtime_data_by_test_case()` — Dict of invocation ID → list of runtimes
+
+## Result Type
+
+### `Result[L, R]` / `Success` / `Failure` (`either.py`)
+
+Functional error handling type:
+- `Success(value)` — Wraps a successful result
+- `Failure(error)` — Wraps an error
+- `result.is_successful()` / `result.is_failure()` — Check type
+- `result.unwrap()` — Get success value (raises if Failure)
+- `result.failure()` — Get failure value (raises if Success)
+- `is_successful(result)` — Module-level helper function
--- a/tiles/codeflash-docs/docs/index.md
+++ b/tiles/codeflash-docs/docs/index.md
@ -0,0 +1,41 @@
+# Codeflash Internal Documentation
+
+CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. It uses LLMs to generate optimization candidates, verifies correctness through test execution, and benchmarks performance improvements.
+
+## Pipeline Overview
+
+```
+Discovery → Ranking → Context Extraction → Test Gen + Optimization → Baseline → Candidate Evaluation → PR
+```
+
+1. **Discovery** (`discovery/`): Find optimizable functions across the codebase using `FunctionVisitor`
+2. **Ranking** (`benchmarking/function_ranker.py`): Rank functions by addressable time using trace data
+3. **Context** (`context/`): Extract code dependencies — split into read-writable (modifiable) and read-only (reference)
+4. **Optimization** (`optimization/`, `api/`): Generate candidates via AI service, runs concurrently with test generation
+5. **Verification** (`verification/`): Run candidates against tests via custom pytest plugin, compare outputs
+6. **Benchmarking** (`benchmarking/`): Measure performance, select best candidate by speedup
+7. **Result** (`result/`, `github/`): Create PR with winning optimization
+
+## Key Entry Points
+
+| Task | File |
+|------|------|
+| CLI arguments & commands | `cli_cmds/cli.py` |
+| Optimization orchestration | `optimization/optimizer.py` → `Optimizer.run()` |
+| Per-function optimization | `optimization/function_optimizer.py` → `FunctionOptimizer` |
+| Function discovery | `discovery/functions_to_optimize.py` |
+| Context extraction | `context/code_context_extractor.py` |
+| Test execution | `verification/test_runner.py`, `verification/pytest_plugin.py` |
+| Performance ranking | `benchmarking/function_ranker.py` |
+| Domain types | `models/models.py`, `models/function_types.py` |
+| AI service | `api/aiservice.py` → `AiServiceClient` |
+| Configuration | `code_utils/config_consts.py` |
+
+## Documentation Pages
+
+- [Domain Types](domain-types.md) — Core data types and their relationships
+- [Optimization Pipeline](optimization-pipeline.md) — Step-by-step data flow through the pipeline
+- [Context Extraction](context-extraction.md) — How code context is extracted and token-limited
+- [Verification](verification.md) — Test execution, pytest plugin, deterministic patches
+- [AI Service](ai-service.md) — AI service client endpoints and request types
+- [Configuration](configuration.md) — Config schema, effort levels, thresholds
--- a/tiles/codeflash-docs/docs/optimization-pipeline.md
+++ b/tiles/codeflash-docs/docs/optimization-pipeline.md
@ -0,0 +1,84 @@
+# Optimization Pipeline
+
+Step-by-step data flow from function discovery to PR creation.
+
+## 1. Entry Point: `Optimizer.run()` (`optimization/optimizer.py`)
+
+The `Optimizer` class is initialized with CLI args and creates:
+- `TestConfig` with test roots, project root, pytest command
+- `AiServiceClient` for AI service communication
+- Optional `LocalAiServiceClient` for experiments
+
+`run()` orchestrates the full pipeline: discovers functions, optionally ranks them, then optimizes each in turn.
+
+## 2. Function Discovery (`discovery/functions_to_optimize.py`)
+
+`FunctionVisitor` traverses source files to find optimizable functions, producing `FunctionToOptimize` instances. Filters include:
+- Skipping functions that are too small or trivial
+- Skipping previously optimized functions (via `was_function_previously_optimized()`)
+- Applying user-configured include/exclude patterns
+
+## 3. Function Ranking (`benchmarking/function_ranker.py`)
+
+When trace data is available, `FunctionRanker` ranks functions by **addressable time** — the time a function spends that could be optimized (own time + callee time / call count). Functions below `DEFAULT_IMPORTANCE_THRESHOLD=0.001` are skipped.
+
+## 4. Per-Function Optimization: `FunctionOptimizer` (`optimization/function_optimizer.py`)
+
+For each function, `FunctionOptimizer.optimize_function()` runs the full optimization loop:
+
+### 4a. Context Extraction (`context/code_context_extractor.py`)
+
+Extracts `CodeOptimizationContext` containing:
+- `read_writable_code` — Code the LLM can modify (the function + helpers)
+- `read_only_context_code` — Dependency code for reference only
+- `testgen_context` — Context for test generation (may include imported class definitions)
+
+Token limits are enforced: `OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000` and `TESTGEN_CONTEXT_TOKEN_LIMIT=16000`. Functions exceeding these are rejected.
+
+### 4b. Concurrent Test Generation + LLM Optimization
+
+These run in parallel using `concurrent.futures`:
+- **Test generation**: Generates regression tests from the function context
+- **LLM optimization**: Sends `read_writable_code.markdown` + `read_only_context_code` to the AI service
+
+The number of candidates depends on effort level (see Configuration docs).
+
+### 4c. Candidate Evaluation
+
+For each `OptimizedCandidate`:
+
+1. **Deduplication**: Normalize code AST and check against `CandidateEvaluationContext.ast_code_to_id`. If duplicate, copy results from previous evaluation.
+
+2. **Code replacement**: Replace the original function with the candidate using `replace_function_definitions_in_module()`.
+
+3. **Behavioral testing**: Run instrumented tests in subprocess. The custom pytest plugin applies deterministic patches. Compare return values, stdout, and pass/fail status against the original baseline.
+
+4. **Benchmarking**: If behavior matches, run performance tests with looping (`TOTAL_LOOPING_TIME=10s`). Calculate speedup ratio.
+
+5. **Validation**: Candidate must beat `MIN_IMPROVEMENT_THRESHOLD=0.05` (5% speedup) and pass stability checks.
+
+### 4d. Refinement & Repair
+
+- **Repair**: If fewer than `MIN_CORRECT_CANDIDATES=2` pass, failed candidates can be repaired via `AIServiceCodeRepairRequest` (sends test diffs to LLM).
+- **Refinement**: Top valid candidates are refined via `AIServiceRefinerRequest` (sends runtime data, line profiler results).
+- **Adaptive**: At HIGH effort, additional adaptive optimization rounds via `AIServiceAdaptiveOptimizeRequest`.
+
+### 4e. Best Candidate Selection
+
+The winning candidate is selected by:
+1. Highest speedup ratio
+2. For tied speedups, shortest diff length from original
+3. Refinement candidates use weighted ranking: `(2 * runtime_rank + 1 * diff_rank)`
+
+Result is a `BestOptimization` with the candidate, context, test results, and runtime.
+
+## 5. PR Creation (`github/`)
+
+If a winning candidate is found, a PR is created with:
+- The optimized code diff
+- Performance benchmark details
+- Explanation from the LLM
+
+## Worktree Mode
+
+When `--worktree` is enabled, optimization runs in an isolated git worktree (`code_utils/git_worktree_utils.py`). This allows parallel optimization without affecting the working tree. Changes are captured as patch files.
--- a/tiles/codeflash-docs/docs/verification.md
+++ b/tiles/codeflash-docs/docs/verification.md
@ -0,0 +1,93 @@
+# Verification
+
+How codeflash verifies candidate correctness and measures performance.
+
+## Test Execution Architecture
+
+Tests are executed in a **subprocess** to isolate the test environment from the main codeflash process. The test runner (`verification/test_runner.py`) invokes pytest (or Jest for JS/TS) with specific plugin configurations.
+
+### Plugin Blocklists
+
+- **Behavioral tests**: Block `benchmark`, `codspeed`, `xdist`, `sugar`
+- **Benchmarking tests**: Block `codspeed`, `cov`, `benchmark`, `profiling`, `xdist`, `sugar`
+
+These are defined as `BEHAVIORAL_BLOCKLISTED_PLUGINS` and `BENCHMARKING_BLOCKLISTED_PLUGINS` in `verification/test_runner.py`.
+
+## Custom Pytest Plugin (`verification/pytest_plugin.py`)
+
+The plugin is loaded into the test subprocess and provides:
+
+### Deterministic Patches
+
+`_apply_deterministic_patches()` replaces non-deterministic functions with fixed values to ensure reproducible test output:
+
+| Module | Function | Fixed Value |
+|--------|----------|-------------|
+| `time` | `time()` | `1761717605.108106` |
+| `time` | `perf_counter()` | Incrementing by 1ms per call |
+| `datetime` | `datetime.now()` | `2021-01-01 02:05:10 UTC` |
+| `datetime` | `datetime.utcnow()` | `2021-01-01 02:05:10 UTC` |
+| `uuid` | `uuid4()` / `uuid1()` | `12345678-1234-5678-9abc-123456789012` |
+| `random` | `random()` | `0.123456789` (seeded with 42) |
+| `os` | `urandom(n)` | `b"\x42" * n` |
+| `numpy.random` | seed | `42` |
+
+Patches call the original function first to maintain performance characteristics (same call overhead).
+
+### Timing Markers
+
+Test results include timing markers in stdout: `!######<id>:<duration_ns>######!`
+
+The pattern `_TIMING_MARKER_PATTERN` extracts timing data for calculating function utilization fraction.
+
+### Loop Stability
+
+Performance benchmarking uses configurable stability thresholds:
+- `STABILITY_WINDOW_SIZE = 0.35` (35% of total iterations)
+- `STABILITY_CENTER_TOLERANCE = 0.0025` (±0.25% around median)
+- `STABILITY_SPREAD_TOLERANCE = 0.0025` (0.25% window spread)
+
+### Memory Limits (Linux)
+
+On Linux, the plugin sets `RLIMIT_AS` to 85% of total system memory (RAM + swap) to prevent OOM kills.
+
+## Test Result Processing
+
+### `TestResults` (`models/models.py`)
+
+Collects `FunctionTestInvocation` results with:
+- Deduplicated insertion via `unique_invocation_loop_id`
+- `total_passed_runtime()` — Sum of minimum runtimes per test case (nanoseconds)
+- `number_of_loops()` — Max loop index
+- `usable_runtime_data_by_test_case()` — Grouped timing data
+
+### `FunctionTestInvocation`
+
+Each invocation records:
+- `loop_index` — Iteration number (starts at 1)
+- `id: InvocationId` — Fully qualified test identifier
+- `did_pass: bool` — Pass/fail status
+- `runtime: Optional[int]` — Time in nanoseconds
+- `return_value: Optional[object]` — Captured return value
+- `test_type: TestType` — Which test category
+
+### Behavioral vs Performance Testing
+
+1. **Behavioral**: Runs with `TestingMode.BEHAVIOR`. Compares return values and stdout between original and candidate. Any difference = candidate rejected.
+2. **Performance**: Runs with `TestingMode.PERFORMANCE`. Loops for `TOTAL_LOOPING_TIME=10s` to get stable timing. Calculates speedup ratio.
+3. **Line Profile**: Runs with `TestingMode.LINE_PROFILE`. Collects per-line timing data for refinement.
+
+## Test Types
+
+| TestType | Value | Description |
+|----------|-------|-------------|
+| `EXISTING_UNIT_TEST` | 1 | Pre-existing tests from the codebase |
+| `INSPIRED_REGRESSION` | 2 | Tests inspired by existing tests |
+| `GENERATED_REGRESSION` | 3 | AI-generated regression tests |
+| `REPLAY_TEST` | 4 | Tests from recorded benchmark data |
+| `CONCOLIC_COVERAGE_TEST` | 5 | Coverage-guided tests |
+| `INIT_STATE_TEST` | 6 | Class init state verification |
+
+## Coverage
+
+Coverage is measured via `CoverageData` with a threshold of `COVERAGE_THRESHOLD=60.0%`. Low coverage may affect confidence in the optimization's correctness.
--- a/tiles/codeflash-docs/tile.json
+++ b/tiles/codeflash-docs/tile.json
@ -0,0 +1,7 @@
+{
+  "name": "codeflash/codeflash-docs",
+  "version": "0.1.0",
+  "summary": "Internal documentation for the codeflash optimization engine",
+  "private": true,
+  "docs": "docs/index.md"
+}
--- a/tiles/codeflash-rules/rules/architecture.md
+++ b/tiles/codeflash-rules/rules/architecture.md
@ -0,0 +1,45 @@
+# Architecture
+
+```
+codeflash/
+├── main.py                 # CLI entry point
+├── cli_cmds/               # Command handling, console output (Rich)
+├── discovery/              # Find optimizable functions
+├── context/                # Extract code dependencies and imports
+├── optimization/           # Generate optimized code via AI
+│   ├── optimizer.py        # Main optimization orchestration
+│   └── function_optimizer.py  # Per-function optimization logic
+├── verification/           # Run deterministic tests (pytest plugin)
+├── benchmarking/           # Performance measurement
+├── github/                 # PR creation
+├── api/                    # AI service communication
+├── code_utils/             # Code parsing, git utilities
+├── models/                 # Pydantic models and types
+├── languages/              # Multi-language support (Python, JavaScript/TypeScript)
+├── setup/                  # Config schema, auto-detection, first-run experience
+├── picklepatch/            # Serialization/deserialization utilities
+├── tracing/                # Function call tracing
+├── tracer.py               # Root-level tracer entry point for profiling
+├── lsp/                    # IDE integration (Language Server Protocol)
+├── telemetry/              # Sentry, PostHog
+├── either.py               # Functional Result type for error handling
+├── result/                 # Result types and handling
+└── version.py              # Version information
+```
+
+## Key Entry Points
+
+| Task | Start here |
+|------|------------|
+| CLI arguments & commands | `cli_cmds/cli.py` |
+| Optimization orchestration | `optimization/optimizer.py` → `Optimizer.run()` |
+| Per-function optimization | `optimization/function_optimizer.py` → `FunctionOptimizer` |
+| Function discovery | `discovery/functions_to_optimize.py` |
+| Context extraction | `context/code_context_extractor.py` |
+| Test execution | `verification/test_runner.py`, `verification/pytest_plugin.py` |
+| Performance ranking | `benchmarking/function_ranker.py` |
+| Domain types | `models/models.py`, `models/function_types.py` |
+| Result handling | `either.py` (`Result`, `Success`, `Failure`, `is_successful`) |
+| AI service communication | `api/aiservice.py` → `AiServiceClient` |
+| Configuration constants | `code_utils/config_consts.py` |
+| Language support | `languages/registry.py` → `get_language_support()` |
--- a/tiles/codeflash-rules/rules/code-style.md
+++ b/tiles/codeflash-rules/rules/code-style.md
@ -0,0 +1,11 @@
+# Code Style
+
+- **Line length**: 120 characters
+- **Python**: 3.9+ syntax (use `from __future__ import annotations` for type hints)
+- **Package management**: Always use `uv`, never `pip` — run commands via `uv run`
+- **Tooling**: Ruff for linting/formatting, mypy strict mode, prek for pre-commit checks (`uv run prek run`)
+- **Comments**: Minimal — only explain "why", not "what"
+- **Docstrings**: Do not add unless explicitly requested
+- **Naming**: NEVER use leading underscores (`_function_name`) — Python has no true private functions, use public names
+- **Paths**: Always use absolute `Path` objects, handle encoding explicitly (UTF-8)
+- **Source transforms**: Use `libcst` for code modification/transformation to preserve formatting; `ast` is acceptable for read-only analysis and parsing
--- a/tiles/codeflash-rules/rules/git-conventions.md
+++ b/tiles/codeflash-rules/rules/git-conventions.md
@ -0,0 +1,9 @@
+# Git Conventions
+
+- **Always create a new branch from `main`** — never commit directly to `main` or reuse an existing feature branch for unrelated changes
+- Use conventional commit format: `fix:`, `feat:`, `refactor:`, `docs:`, `test:`, `chore:`
+- Keep commits atomic — one logical change per commit
+- Commit message body should be concise (1-2 sentences max)
+- PR titles should also use conventional format
+- Branch naming: `cf-#-title` (lowercase, hyphenated) where `#` is the Linear issue number
+- If related to a Linear issue, include `CF-#` in the PR body
--- a/tiles/codeflash-rules/rules/language-rules.md
+++ b/tiles/codeflash-rules/rules/language-rules.md
@ -0,0 +1,9 @@
+# Language Support Rules
+
+- Current language is a module-level singleton in `languages/current.py` — use `set_current_language()` / `current_language()`, never pass language as a parameter through call chains
+- Use `get_language_support(identifier)` from `languages/registry.py` to get a `LanguageSupport` instance — accepts `Path`, `Language` enum, or string; never import language classes directly
+- New language support classes must use the `@register_language` decorator to register with the extension and language registries
+- `languages/__init__.py` uses `__getattr__` for lazy imports to avoid circular dependencies — follow this pattern when adding new exports
+- `is_javascript()` returns `True` for both JavaScript and TypeScript
+- Language modules are lazily imported on first `get_language_support()` call via `_ensure_languages_registered()` — the `@register_language` decorator fires on import and populates `_EXTENSION_REGISTRY` and `_LANGUAGE_REGISTRY`
+- `LanguageSupport` instances are cached in `_SUPPORT_CACHE` — use `clear_cache()` only in tests
--- a/tiles/codeflash-rules/rules/optimization-patterns.md
+++ b/tiles/codeflash-rules/rules/optimization-patterns.md
@ -0,0 +1,11 @@
+# Optimization Pipeline Patterns
+
+- All major operations return `Result[SuccessType, ErrorType]` — construct with `Success(value)` / `Failure(error)`, check with `is_successful()` before calling `unwrap()`
+- Code context has token limits (`OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000`, `TESTGEN_CONTEXT_TOKEN_LIMIT=16000` in `code_utils/config_consts.py`) — exceeding them rejects the function
+- `read_writable_code` (modifiable code) can span multiple files; `read_only_context_code` is reference-only dependency code
+- Code is serialized as markdown code blocks: `` ```language:filepath\ncode\n``` `` — see `CodeStringsMarkdown` in `models/models.py`
+- Candidates form a forest (DAG): refinements/repairs reference `parent_id` on previous candidates via `OptimizedCandidateSource` (OPTIMIZE, REFINE, REPAIR, ADAPTIVE, JIT_REWRITE)
+- Test generation and optimization run concurrently — coordinate through `CandidateEvaluationContext`
+- Generated tests are instrumented with `codeflash_capture.py` to record return values and traces
+- Minimum improvement threshold is 5% (`MIN_IMPROVEMENT_THRESHOLD=0.05`) — candidates below this are rejected
+- Stability thresholds: `STABILITY_WINDOW_SIZE=0.35`, `STABILITY_CENTER_TOLERANCE=0.0025`, `STABILITY_SPREAD_TOLERANCE=0.0025`
--- a/tiles/codeflash-rules/rules/testing-rules.md
+++ b/tiles/codeflash-rules/rules/testing-rules.md
@ -0,0 +1,13 @@
+# Testing Rules
+
+- Code context extraction and replacement tests must assert full string equality — no substring matching
+- Use pytest's `tmp_path` fixture for temp directories (it's a `Path` object)
+- Write temp files inside `tmp_path`, never use `NamedTemporaryFile` (causes Windows file contention)
+- Always call `.resolve()` on Path objects to ensure absolute paths and resolve symlinks
+- Use `.as_posix()` when converting resolved paths to strings (normalizes to forward slashes)
+- Any new feature or bug fix that can be tested automatically must have test cases
+- If changes affect existing test expectations, update the tests accordingly — tests must always pass after changes
+- The pytest plugin patches `time`, `random`, `uuid`, `datetime`, `os.urandom`, and `numpy.random` for deterministic test execution — never assume real randomness or real time in verification tests
+- `conftest.py` uses an autouse fixture that calls `reset_current_language()` — tests always start with Python as the default language
+- Test types are defined by the `TestType` enum: `EXISTING_UNIT_TEST`, `INSPIRED_REGRESSION`, `GENERATED_REGRESSION`, `REPLAY_TEST`, `CONCOLIC_COVERAGE_TEST`, `INIT_STATE_TEST`
+- Verification runs tests in a subprocess using a custom pytest plugin (`verification/pytest_plugin.py`) — behavioral tests use blocklisted plugins (`benchmark`, `codspeed`, `xdist`, `sugar`), benchmarking tests additionally block `cov` and `profiling`
--- a/tiles/codeflash-rules/tile.json
+++ b/tiles/codeflash-rules/tile.json
@ -0,0 +1,26 @@
+{
+  "name": "codeflash/codeflash-rules",
+  "version": "0.1.0",
+  "summary": "Coding standards and conventions for the codeflash codebase",
+  "private": true,
+  "rules": {
+    "code-style": {
+      "rules": "rules/code-style.md"
+    },
+    "architecture": {
+      "rules": "rules/architecture.md"
+    },
+    "optimization-patterns": {
+      "rules": "rules/optimization-patterns.md"
+    },
+    "git-conventions": {
+      "rules": "rules/git-conventions.md"
+    },
+    "testing-rules": {
+      "rules": "rules/testing-rules.md"
+    },
+    "language-rules": {
+      "rules": "rules/language-rules.md"
+    }
+  }
+}
--- a/tiles/codeflash-skills/skills/add-codeflash-feature/SKILL.md
+++ b/tiles/codeflash-skills/skills/add-codeflash-feature/SKILL.md
@ -0,0 +1,96 @@
+---
+name: add-codeflash-feature
+description: Step-by-step workflow for adding a new feature to the codeflash codebase
+---
+
+# Add Codeflash Feature
+
+Use this workflow when implementing a new feature in the codeflash codebase.
+
+## Step 1: Identify Target Modules
+
+Determine which module(s) need modification based on the feature:
+
+| Feature area | Primary module | Key files |
+|-------------|----------------|-----------|
+| New optimization strategy | `optimization/` | `function_optimizer.py`, `optimizer.py` |
+| New test type | `verification/`, `models/` | `test_runner.py`, `pytest_plugin.py`, `test_type.py` |
+| New AI service endpoint | `api/` | `aiservice.py` |
+| New language support | `languages/` | Create new `languages/<lang>/support.py` |
+| Context extraction change | `context/` | `code_context_extractor.py` |
+| New CLI command | `cli_cmds/` | `cli.py` |
+| New config option | `setup/`, `code_utils/` | `config_consts.py`, `setup/detector.py` |
+| Discovery filter | `discovery/` | `functions_to_optimize.py` |
+| PR/result changes | `github/`, `result/` | Relevant handlers |
+
+## Step 2: Follow Result Type Pattern
+
+Use the `Result[L, R]` type from `either.py` for error handling in pipeline operations:
+
+```python
+from codeflash.either import Success, Failure, is_successful
+
+def my_operation() -> Result[str, MyResultType]:
+    if error_condition:
+        return Failure("descriptive error message")
+    return Success(result_value)
+
+# Usage:
+result = my_operation()
+if not is_successful(result):
+    logger.error(result.failure())
+    return
+value = result.unwrap()
+```
+
+## Step 3: Add Configuration Constants
+
+If the feature needs configurable thresholds or limits:
+
+1. Add constants to `code_utils/config_consts.py`
+2. If effort-dependent, add to `EFFORT_VALUES` dict with values for `LOW`, `MEDIUM`, `HIGH`
+3. Add a corresponding `EffortKeys` enum entry
+4. Access via `get_effort_value(EffortKeys.MY_KEY, effort_level)`
+
+## Step 4: Add Domain Types
+
+If new data structures are needed:
+
+1. Add Pydantic models or frozen dataclasses to `models/models.py` or `models/function_types.py`
+2. Use `@dataclass(frozen=True)` for immutable data
+3. Use `BaseModel` for models that need serialization
+4. Keep `function_types.py` dependency-free (no imports from other codeflash modules)
+
+## Step 5: Write Tests
+
+Follow existing test patterns:
+
+1. Create test files in the `tests/` directory mirroring the source structure
+2. Use pytest's `tmp_path` fixture for temp directories
+3. Always call `.resolve()` on Path objects
+4. Assert full string equality for code context tests — no substring matching
+5. Remember the pytest plugin patches `time`, `random`, `uuid`, `datetime` — don't rely on real values
+
+## Step 6: Run Quality Checks
+
+Run all validation before committing:
+
+```bash
+# Pre-commit checks (ruff format + lint)
+uv run prek run
+
+# Type checking
+uv run mypy codeflash/
+
+# Run relevant tests
+uv run pytest tests/path/to/relevant/tests -x
+```
+
+## Step 7: Language Support Considerations
+
+If the feature needs to work across languages:
+
+1. Check if the feature uses language-specific APIs — use `get_language_support(identifier)` from `languages/registry.py`
+2. Current language is a singleton: `set_current_language()` / `current_language()` from `languages/current.py`
+3. Use `is_python()` / `is_javascript()` guards for language-specific branches
+4. New language support classes must use `@register_language` decorator
--- a/tiles/codeflash-skills/skills/debug-optimization-failure/SKILL.md
+++ b/tiles/codeflash-skills/skills/debug-optimization-failure/SKILL.md
@ -0,0 +1,95 @@
+---
+name: debug-optimization-failure
+description: Debug why a codeflash optimization failed at any pipeline stage
+---
+
+# Debug Optimization Failure
+
+Use this workflow when an optimization run fails or produces no results. Work through the stages sequentially — stop at the first failure found.
+
+## Step 1: Check Function Discovery
+
+Determine if the function was discovered by `FunctionVisitor`.
+
+1. Look at the discovery output or logs for the function name
+2. Check `discovery/functions_to_optimize.py` — the `FunctionVisitor` filters out:
+   - Functions that are too small or trivial
+   - Functions matching exclude patterns in config
+   - Functions already optimized (`was_function_previously_optimized()`)
+3. Verify the function file is under the configured `module-root`
+
+**If not discovered**: Check config patterns, file location, and function size.
+
+## Step 2: Check Ranking
+
+If trace data is used, check if the function was ranked high enough.
+
+1. Look at `benchmarking/function_ranker.py` output
+2. The function's **addressable time** must exceed `DEFAULT_IMPORTANCE_THRESHOLD=0.001`
+3. Addressable time = own time + callee time / call count
+
+**If ranked too low**: The function doesn't spend enough time to be worth optimizing.
+
+## Step 3: Check Context Token Limits
+
+Verify the function's context fits within token limits.
+
+1. Check `OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000` and `TESTGEN_CONTEXT_TOKEN_LIMIT=16000` in `code_utils/config_consts.py`
+2. Token counting is done by `encoded_tokens_len()` in `code_utils/code_utils.py`
+3. Large helper function chains or deep dependency trees can blow the limit
+
+**If context too large**: The function has too many dependencies. Consider refactoring to reduce context size.
+
+## Step 4: Check AI Service Response
+
+Verify the AI service returned valid candidates.
+
+1. Check logs for `AiServiceClient` request/response
+2. Look for HTTP errors (non-200 status codes)
+3. Verify `_get_valid_candidates()` parsed the response — empty `code_strings` means invalid markdown code blocks
+4. Check if all candidates were filtered out during parsing
+
+**If no candidates returned**: Check API key, network connectivity, and service status.
+
+## Step 5: Check Test Failures
+
+Determine if candidates failed behavioral or benchmark tests.
+
+1. **Behavioral failures**: Compare return values, stdout, pass/fail status between original baseline and candidate
+   - Check `TestDiffScope`: `RETURN_VALUE`, `STDOUT`, `DID_PASS`
+   - Look at JUnit XML results for specific test failures
+2. **Benchmark failures**: Check if candidate met `MIN_IMPROVEMENT_THRESHOLD=0.05` (5% speedup)
+3. **Stability failures**: Check if timing was stable within `STABILITY_WINDOW_SIZE=0.35`
+
+**If behavioral failure**: The optimization changed the function's behavior. Check test diffs for specific mismatches.
+**If benchmark failure**: The optimization didn't provide enough speedup.
+
+## Step 6: Check Deduplication
+
+Verify candidates weren't deduplicated away.
+
+1. `CandidateEvaluationContext.ast_code_to_id` tracks normalized code → candidate mapping
+2. `normalize_code()` from `code_utils/deduplicate_code.py` normalizes AST for comparison
+3. If all candidates normalize to the same code, only one is actually tested
+
+**If all duplicates**: The LLM generated the same optimization multiple times. Try higher effort level.
+
+## Step 7: Check Repair/Refinement
+
+If initial candidates failed, check repair and refinement stages.
+
+1. Repair only runs if fewer than `MIN_CORRECT_CANDIDATES=2` passed
+2. Repair sends `AIServiceCodeRepairRequest` with test diffs
+3. Check `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` — if too many tests failed, repair is skipped
+4. Refinement only runs on top valid candidates
+
+**If repair also failed**: The optimization approach may not work for this function.
+
+## Key Files to Check
+
+- `optimization/function_optimizer.py` — Main optimization loop, `determine_best_candidate()`
+- `verification/test_runner.py` — Test execution
+- `api/aiservice.py` — AI service communication
+- `code_utils/config_consts.py` — Thresholds
+- `context/code_context_extractor.py` — Context extraction
+- `models/models.py` — `CandidateEvaluationContext`, `TestResults`
--- a/tiles/codeflash-skills/tile.json
+++ b/tiles/codeflash-skills/tile.json
@ -0,0 +1,14 @@
+{
+  "name": "codeflash/codeflash-skills",
+  "version": "0.1.0",
+  "summary": "Procedural workflows for developing and debugging codeflash",
+  "private": true,
+  "skills": {
+    "debug-optimization-failure": {
+      "path": "skills/debug-optimization-failure/SKILL.md"
+    },
+    "add-codeflash-feature": {
+      "path": "skills/add-codeflash-feature/SKILL.md"
+    }
+  }
+}