codeflash-internal/tiles/codeflash-internal-docs/docs/context-extraction.md

50 lines
1.6 KiB
Markdown
Raw Normal View History

# Context Extraction
How code context is extracted and prepared for LLM optimization prompts.
## Context Types
### Single-File Context (`SingleOptimizerContext`)
Used when the function to optimize lives in a single file:
- Extracts the function source code
- Collects helper functions and class definitions
- Formats as system prompt + user prompt
### Multi-File Context (`MultiOptimizerContext`)
Used when the function spans or depends on multiple files:
- Collects code from multiple source files
- Manages file-path-annotated code blocks
## `BaseOptimizerContext` (`optimizer_context.py`)
Abstract base class for all context types:
### Factory Method
`get_dynamic_context()` — dispatches to `SingleOptimizerContext` or `MultiOptimizerContext` based on the input.
### Prompt Construction
- `get_system_prompt(python_version_str)` — builds system prompt with language version
- `get_user_prompt(dependency_code, line_profiler_results)` — builds user prompt with code and optional profiler data
### LLM Response Parsing
- `extract_code_and_explanation_from_llm_res(content)` — parses markdown code blocks from LLM output, extracts code and explanation text
- `parse_and_generate_candidate_schema()` — converts extracted code into `OptimizeResponseItemSchema`
- `is_valid_code()` — validates the extracted code is syntactically correct
## Code Formatting
LLM responses use markdown code blocks with file path annotations:
```
\`\`\`python:path/to/file.py
# optimized code here
\`\`\`
```
The context extraction system both generates this format (for prompts) and parses it (from responses).