codeflash-internal/tiles/codeflash-internal-docs/docs/context-extraction.md
Kevin Turcios dfc56f19a0 feat: add Tessl tiles for codeflash-internal (rules, docs, skills)
Three private tiles published to the codeflash workspace:
- codeflash-internal-rules: 6 eager rules (code-style, architecture,
  optimization-patterns, git-conventions, testing-rules, multi-language-handlers)
- codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline,
  test-generation-pipeline, context-extraction, aiservice/cf-api endpoints,
  configuration-thresholds, llm-provider-abstraction)
- codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure,
  add-language-support, add-api-endpoint, debug-test-generation)
2026-02-14 22:16:33 -05:00

1.6 KiB

Context Extraction

How code context is extracted and prepared for LLM optimization prompts.

Context Types

Single-File Context (SingleOptimizerContext)

Used when the function to optimize lives in a single file:

  • Extracts the function source code
  • Collects helper functions and class definitions
  • Formats as system prompt + user prompt

Multi-File Context (MultiOptimizerContext)

Used when the function spans or depends on multiple files:

  • Collects code from multiple source files
  • Manages file-path-annotated code blocks

BaseOptimizerContext (optimizer_context.py)

Abstract base class for all context types:

Factory Method

get_dynamic_context() — dispatches to SingleOptimizerContext or MultiOptimizerContext based on the input.

Prompt Construction

  • get_system_prompt(python_version_str) — builds system prompt with language version
  • get_user_prompt(dependency_code, line_profiler_results) — builds user prompt with code and optional profiler data

LLM Response Parsing

  • extract_code_and_explanation_from_llm_res(content) — parses markdown code blocks from LLM output, extracts code and explanation text
  • parse_and_generate_candidate_schema() — converts extracted code into OptimizeResponseItemSchema
  • is_valid_code() — validates the extracted code is syntactically correct

Code Formatting

LLM responses use markdown code blocks with file path annotations:

\`\`\`python:path/to/file.py
# optimized code here
\`\`\`

The context extraction system both generates this format (for prompts) and parses it (from responses).