Kevin Turcios 6718e66582 feat: add private tessl tiles for codeflash rules, docs, and skills

Three private tiles in the codeflash workspace:
- codeflash-rules: 6 steering rules (code-style, architecture, optimization-patterns, git-conventions, testing-rules, language-rules)
- codeflash-docs: 7 doc pages (domain-types, optimization-pipeline, context-extraction, verification, ai-service, configuration)
- codeflash-skills: 2 skills (debug-optimization-failure, add-codeflash-feature)

2026-02-14 20:55:06 -05:00

4.1 KiB

Raw Blame History

Optimization Pipeline

Step-by-step data flow from function discovery to PR creation.

1. Entry Point: `Optimizer.run()` (`optimization/optimizer.py`)

The Optimizer class is initialized with CLI args and creates:

TestConfig with test roots, project root, pytest command
AiServiceClient for AI service communication
Optional LocalAiServiceClient for experiments

run() orchestrates the full pipeline: discovers functions, optionally ranks them, then optimizes each in turn.

2. Function Discovery (`discovery/functions_to_optimize.py`)

FunctionVisitor traverses source files to find optimizable functions, producing FunctionToOptimize instances. Filters include:

Skipping functions that are too small or trivial
Skipping previously optimized functions (via was_function_previously_optimized())
Applying user-configured include/exclude patterns

3. Function Ranking (`benchmarking/function_ranker.py`)

When trace data is available, FunctionRanker ranks functions by addressable time — the time a function spends that could be optimized (own time + callee time / call count). Functions below DEFAULT_IMPORTANCE_THRESHOLD=0.001 are skipped.

4. Per-Function Optimization: `FunctionOptimizer` (`optimization/function_optimizer.py`)

For each function, FunctionOptimizer.optimize_function() runs the full optimization loop:

4a. Context Extraction (`context/code_context_extractor.py`)

Extracts CodeOptimizationContext containing:

read_writable_code — Code the LLM can modify (the function + helpers)
read_only_context_code — Dependency code for reference only
testgen_context — Context for test generation (may include imported class definitions)

Token limits are enforced: OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000 and TESTGEN_CONTEXT_TOKEN_LIMIT=16000. Functions exceeding these are rejected.

4b. Concurrent Test Generation + LLM Optimization

These run in parallel using concurrent.futures:

Test generation: Generates regression tests from the function context
LLM optimization: Sends read_writable_code.markdown + read_only_context_code to the AI service

The number of candidates depends on effort level (see Configuration docs).

4c. Candidate Evaluation

For each OptimizedCandidate:

Deduplication: Normalize code AST and check against CandidateEvaluationContext.ast_code_to_id. If duplicate, copy results from previous evaluation.
Code replacement: Replace the original function with the candidate using replace_function_definitions_in_module().
Behavioral testing: Run instrumented tests in subprocess. The custom pytest plugin applies deterministic patches. Compare return values, stdout, and pass/fail status against the original baseline.
Benchmarking: If behavior matches, run performance tests with looping (TOTAL_LOOPING_TIME=10s). Calculate speedup ratio.
Validation: Candidate must beat MIN_IMPROVEMENT_THRESHOLD=0.05 (5% speedup) and pass stability checks.

4d. Refinement & Repair

Repair: If fewer than MIN_CORRECT_CANDIDATES=2 pass, failed candidates can be repaired via AIServiceCodeRepairRequest (sends test diffs to LLM).
Refinement: Top valid candidates are refined via AIServiceRefinerRequest (sends runtime data, line profiler results).
Adaptive: At HIGH effort, additional adaptive optimization rounds via AIServiceAdaptiveOptimizeRequest.

4e. Best Candidate Selection

The winning candidate is selected by:

Highest speedup ratio
For tied speedups, shortest diff length from original
Refinement candidates use weighted ranking: (2 * runtime_rank + 1 * diff_rank)

Result is a BestOptimization with the candidate, context, test results, and runtime.

5. PR Creation (`github/`)

If a winning candidate is found, a PR is created with:

The optimized code diff
Performance benchmark details
Explanation from the LLM

Worktree Mode

When --worktree is enabled, optimization runs in an isolated git worktree (code_utils/git_worktree_utils.py). This allows parallel optimization without affecting the working tree. Changes are captured as patch files.

4.1 KiB Raw Blame History

Optimization Pipeline

1. Entry Point: Optimizer.run() (optimization/optimizer.py)

2. Function Discovery (discovery/functions_to_optimize.py)

3. Function Ranking (benchmarking/function_ranker.py)

4. Per-Function Optimization: FunctionOptimizer (optimization/function_optimizer.py)

4a. Context Extraction (context/code_context_extractor.py)