refactor: modernize explanation prompts with Jinja2 templates

Extract inline prompts into .md.j2 templates, move schemas to models.py, and add model_type branching (XML for Anthropic, markdown for OpenAI) following the testgen pattern. Uses StrictUndefined, trim_blocks, and lstrip_blocks.
2026-05-04 18:25:18 +00:00 · 2026-02-27 13:29:23 -05:00 · 2026-02-27 13:29:23 -05:00 · 396d7cc7e8
commit 396d7cc7e8
parent 09fafeb914
5 changed files with 424 additions and 256 deletions
--- a/django/aiservice/core/languages/python/explanations/explanations.py
+++ b/django/aiservice/core/languages/python/explanations/explanations.py
@ -2,9 +2,11 @@ from __future__ import annotations

 import asyncio
 import logging
+from pathlib import Path

 import sentry_sdk
-from ninja import NinjaAPI, Schema
+from jinja2 import Environment, FileSystemLoader, StrictUndefined
+from ninja import NinjaAPI
 from openai.types.chat import ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam
 from packaging import version

@ -14,242 +16,72 @@ from aiservice.common.xml_utils import extract_xml_tag
 from aiservice.common_utils import validate_trace_id
 from aiservice.env_specific import debug_log_sensitive_data
 from aiservice.llm import EXPLANATIONS_MODEL, LLM, calculate_llm_cost, call_llm
+from core.languages.python.explanations.models import (
+    ExplanationsErrorResponseSchema,
+    ExplanationsResponseSchema,
+    ExplanationsSchema,
+)
 from core.log_features.log_event import update_optimization_cost
 from core.log_features.log_features import log_features

 explanations_api = NinjaAPI(urls_namespace="explanations")

-SYSTEM_PROMPT = """You are an expert software engineer who understands why programs run fast. You have deep expertise in data structures and algorithms.
+_PROMPT_DIR = Path(__file__).parent / "prompts"
+_jinja_env = Environment(  # noqa: S701 — rendering LLM prompts, not HTML
+    loader=FileSystemLoader(_PROMPT_DIR),
+    undefined=StrictUndefined,
+    keep_trailing_newline=True,
+    trim_blocks=True,
+    lstrip_blocks=True,
+)

-Your goal is to explain why a piece of code is more performant than a baseline code, to make it easier for a developer to accept and merge the optimized version.
-
-You are provided the following information to succeed in the explanation process -
-
- original_source_code: The baseline implementation of the code being optimized
- original_line_profiler_results - The results after running line_profiler on the original_source_code
- original_code_runtime - The runtime for the original_source_code
- optimized_source_code - This is the suggested optimized version of the original_source_code that you should explain.
- optimized_line_profiler_results - The results after running line_profiler on the optimized_source_code
- optimized_code_runtime - The runtime for the optimized_source_code
- speedup - The relative gain in runtime for the optimized_source_code
- annotated_tests - The regression tests that were run to test for performance and correctness, with runtime results annotated next to the respective test case.
- read_only_dependency_code - The READ ONLY dependencies for the code provided, to help you better understand the code being provided.
- original_explanation - The original explanation generated for the optimized_source_code. Note that the original_explanation may be out of sync as some of the micro-optimizations and irrelevant changes might have been reverted in the optimized_source_code.
- python_version - The version of python the code would be executed on.
- function_references - Python markdown blocks with filename and references of some functions which call the function being optimized. The filenames and/or references could indicate if the function being optimized is in a hot path. The reference could have the function being called from a place that is important, for example in a loop, which means the effect of optimization might be important.
-
-Keep explanations **developer-focused and concise**. Focus on:
- **What** specific optimizations were applied.
- **Key changes** that affect behavior or dependencies.
- **Why** the specific optimization leads to a speedup based on your knowledge of performance in Python code.
- **How** the optimization could potentially impact existing workloads based on function_references which can help determine whether the function being optimized is called in a hot path or not, and if the context where the function is called may benefit from the optimization.
- What kind of test cases are the specific optimizations good for based on the annotated_tests results.
- Avoid mentioning obvious preservation details (file structure, imports, signatures) unless they were specifically modified.
-
-Please provide your explanation in the following format:
-
-<explain>
-Your *Brief* explanation of why and how optimized_source_code is faster than original_source_code.
-</explain>
-"""
-
-BASE_USER_PROMPT = """The original_source_code is as follows
-
-<original_source_code>
-```python
-{original_source_code}
-```
-</original_source_code>
-
-The optimized_source_code is as follows
-
-<optimized_source_code>
-```python
-{optimized_source_code}
-```
-</optimized_source_code>
-
-Here is the line profiler information for the original_source_code
-
-<original_line_profiler_results>
-{original_line_profiler_results}
-</original_line_profiler_results>
-
-Here is the line profiler information for the optimized_source_code
-
-<optimized_line_profiler_results>
-{optimized_line_profiler_results}
-</optimized_line_profiler_results>
-
-Here is the original_code_runtime
-<original_code_runtime>
-{original_code_runtime}
-</original_code_runtime>
-
-Here is the optimized_code_runtime
-<optimized_code_runtime>
-{optimized_code_runtime}
-</optimized_code_runtime>
-
-Here is the speedup
-<speedup>
-{speedup}
-</speedup>
-
-Here is the test function code with runtime results annotated next to the respective test case.
-
-<annotated_tests>
-{annotated_tests}
-</annotated_tests>
-
-Here is the read_only_dependency_code
-
-<read_only_dependency_code>
-{read_only_dependency_code}
-</read_only_dependency_code>
-
-Here is the original_explanation
-<original_explanation>
-{original_explanation}
-</original_explanation>
-
-Here is the python_version
-<python_version>
-{python_version}
-</python_version>
-
-Here is the function_references
-<function_references>
-{function_references}
-</function_references>
-
-"""
-
-THROUGHPUT_PROMPT_SECTION = """Here is the original_throughput (operations per second)
-<original_throughput>
-{original_throughput}
-</original_throughput>
-
-Here is the optimized_throughput (operations per second)
-<optimized_throughput>
-{optimized_throughput}
-</optimized_throughput>
-
-Here is the throughput_improvement
-<throughput_improvement>
-{throughput_improvement}
-</throughput_improvement>
-
-"""
-
-THROUGHPUT_SYSTEM_SECTION = """Additional throughput data is provided:
- original_throughput - The throughput (operations per second) for the original_source_code
- optimized_throughput - The throughput (operations per second) for the optimized_source_code
- throughput_improvement - The percentage improvement in throughput
-
-When explaining optimizations:
- **Throughput improvements** - explain how the optimization affects the rate of operations/processing.
- When both runtime and throughput data are provided (for async functions), explain both metrics and how they relate to the optimization.
-"""
-
-CONCURRENCY_PROMPT_SECTION = """Here is the original_concurrency_ratio (how much faster concurrent execution is vs sequential)
-<original_concurrency_ratio>
-{original_concurrency_ratio}
-</original_concurrency_ratio>
-
-Here is the optimized_concurrency_ratio
-<optimized_concurrency_ratio>
-{optimized_concurrency_ratio}
-</optimized_concurrency_ratio>
-
-Here is the concurrency_improvement
-<concurrency_improvement>
-{concurrency_improvement}
-</concurrency_improvement>
-
-"""
-
-CONCURRENCY_SYSTEM_SECTION = """Additional concurrency data is provided:
- original_concurrency_ratio - How much faster concurrent execution is vs sequential for the original code (e.g., 1.0x means no concurrency benefit, 10.0x means concurrent is 10x faster)
- optimized_concurrency_ratio - The concurrency ratio for the optimized code
- concurrency_improvement - The percentage improvement in concurrency ratio
-
-Concurrency ratio measures how well async code scales with concurrent execution:
- Blocking code (like `time.sleep()`) has a low ratio (~1.0x) because it blocks the event loop
- Non-blocking code (like `asyncio.sleep()`) has a high ratio because it yields control to the event loop
-
-When the concurrency ratio improves, it means the optimized code better utilizes async patterns and allows other concurrent tasks to run while waiting.
-"""
-
-ACCEPTANCE_REASON_SYSTEM_SECTION = """
-**CRITICAL**: The optimization was accepted because of improvements in **{acceptance_reason}**.
-
-When explaining this optimization, you MUST:
-1. Lead with the {acceptance_reason} improvement as the PRIMARY benefit
-2. Frame the optimization positively around the {acceptance_reason} metric
-3. If other metrics regressed (e.g., runtime got slower), mention this as a reasonable trade-off for the {acceptance_reason} benefit
-4. Do NOT frame the optimization as a regression or negative change - it was accepted because it improves {acceptance_reason}
-
-For example:
- If acceptance_reason is "concurrency": Focus on how the code now properly yields to the event loop, improving concurrent execution even if raw runtime increased
- If acceptance_reason is "throughput": Focus on increased operations per second
- If acceptance_reason is "runtime": Focus on decreased execution time
-"""
+SYSTEM_PROMPT_TEMPLATE = _jinja_env.get_template("system_prompt.md.j2")
+USER_PROMPT_TEMPLATE = _jinja_env.get_template("user_prompt.md.j2")


-async def explain_optimizations(  # noqa: D417
+async def explain_optimizations(
    user_id: str, data: ExplanationsSchema, explanations_model: LLM = EXPLANATIONS_MODEL
 ) -> ExplanationsResponseSchema | ExplanationsErrorResponseSchema:
-    """Optimize the given python code for performance using the Claude 4 model.
-
-    Parameters
-    ----------
-    - source_code (str): The python code to optimize.
-    - n (int): Number of optimization variants to generate. Default is 1.
-    - python_version (tuple[int, int, int]): The python version to use. Default is (3,12,9).
-
-    Returns: - List[Tuple[Union[str, None], Union[str, None]]]: A list of tuples where the first element is the
-    optimized code and the second is the explanation.
-    :param explanations_model:
-
-    """
    debug_log_sensitive_data(f"Generating an explanation for {user_id}:\n{data.optimized_code}")
    if version.parse(data.codeflash_version) <= version.parse("0.18.2") and data.annotated_tests:
        data.annotated_tests = wrap_code_in_markdown(data.annotated_tests)
-    user_prompt = BASE_USER_PROMPT.format(
-        original_source_code=data.source_code,
-        original_line_profiler_results=data.original_line_profiler_results or "[No profiler results available]",
-        optimized_source_code=data.optimized_code,
-        optimized_line_profiler_results=data.optimized_line_profiler_results or "[No profiler results available]",
-        original_code_runtime=data.original_code_runtime,
-        optimized_code_runtime=data.optimized_code_runtime,
-        speedup=data.speedup,
-        annotated_tests=data.annotated_tests,
-        read_only_dependency_code=data.dependency_code or "[No read only code present]",
-        original_explanation=data.original_explanation,
-        python_version=data.python_version or "Not Available",
-        function_references=data.function_references or "Not Available",
-    )

-    system_prompt = SYSTEM_PROMPT
-    if data.original_throughput is not None and data.optimized_throughput is not None:
-        user_prompt += THROUGHPUT_PROMPT_SECTION.format(
-            original_throughput=data.original_throughput,
-            optimized_throughput=data.optimized_throughput,
-            throughput_improvement=data.throughput_improvement or "[Unable to calculate throughput improvement]",
+    include_throughput = data.original_throughput is not None and data.optimized_throughput is not None
+    include_concurrency = data.original_concurrency_ratio is not None and data.optimized_concurrency_ratio is not None
+
+    template_vars = {
+        "model_type": explanations_model.model_type,
+        "original_source_code": data.source_code,
+        "original_line_profiler_results": data.original_line_profiler_results or "[No profiler results available]",
+        "optimized_source_code": data.optimized_code,
+        "optimized_line_profiler_results": data.optimized_line_profiler_results or "[No profiler results available]",
+        "original_code_runtime": data.original_code_runtime,
+        "optimized_code_runtime": data.optimized_code_runtime,
+        "speedup": data.speedup,
+        "annotated_tests": data.annotated_tests,
+        "read_only_dependency_code": data.dependency_code or "[No read only code present]",
+        "original_explanation": data.original_explanation,
+        "python_version": data.python_version or "Not Available",
+        "function_references": data.function_references or "Not Available",
+        "include_throughput": include_throughput,
+        "include_concurrency": include_concurrency,
+        "acceptance_reason": data.acceptance_reason,
+    }
+    if include_throughput:
+        template_vars["original_throughput"] = data.original_throughput
+        template_vars["optimized_throughput"] = data.optimized_throughput
+        template_vars["throughput_improvement"] = (
+            data.throughput_improvement or "[Unable to calculate throughput improvement]"
        )
-        system_prompt += "\n" + THROUGHPUT_SYSTEM_SECTION
-
-    if data.original_concurrency_ratio is not None and data.optimized_concurrency_ratio is not None:
-        user_prompt += CONCURRENCY_PROMPT_SECTION.format(
-            original_concurrency_ratio=data.original_concurrency_ratio,
-            optimized_concurrency_ratio=data.optimized_concurrency_ratio,
-            concurrency_improvement=data.concurrency_improvement or "[Unable to calculate concurrency improvement]",
+    if include_concurrency:
+        template_vars["original_concurrency_ratio"] = data.original_concurrency_ratio
+        template_vars["optimized_concurrency_ratio"] = data.optimized_concurrency_ratio
+        template_vars["concurrency_improvement"] = (
+            data.concurrency_improvement or "[Unable to calculate concurrency improvement]"
        )
-        system_prompt += "\n" + CONCURRENCY_SYSTEM_SECTION

-    if data.acceptance_reason is not None:
-        system_prompt += "\n" + ACCEPTANCE_REASON_SYSTEM_SECTION.format(acceptance_reason=data.acceptance_reason)
+    system_prompt = SYSTEM_PROMPT_TEMPLATE.render(template_vars)
+    user_prompt = USER_PROMPT_TEMPLATE.render(template_vars)

    system_message = ChatCompletionSystemMessageParam(role="system", content=system_prompt)
    user_message = ChatCompletionUserMessageParam(role="user", content=user_prompt)
@ -291,40 +123,6 @@ async def explain_optimizations(  # noqa: D417
    return ExplanationsResponseSchema(explanation=output.content)


-class ExplanationsSchema(Schema):
-    trace_id: str
-    source_code: str
-    optimized_code: str
-    original_line_profiler_results: str
-    optimized_line_profiler_results: str
-    original_code_runtime: str
-    optimized_code_runtime: str
-    speedup: str
-    annotated_tests: str
-    dependency_code: str | None
-    optimization_id: str
-    original_explanation: str
-    original_throughput: str | None = None
-    optimized_throughput: str | None = None
-    throughput_improvement: str | None = None
-    python_version: str | None = None
-    function_references: str | None = None
-    acceptance_reason: str | None = None
-    original_concurrency_ratio: str | None = None
-    optimized_concurrency_ratio: str | None = None
-    concurrency_improvement: str | None = None
-    codeflash_version: str = "0.18.2"
-    call_sequence: int | None = None
-
-
-class ExplanationsResponseSchema(Schema):
-    explanation: str
-
-
-class ExplanationsErrorResponseSchema(Schema):
-    error: str
-
-
@explanations_api.post(
    "/",
    response={
@ -334,7 +132,8 @@ class ExplanationsErrorResponseSchema(Schema):
    },
 )
 async def explain(
-    request, data: ExplanationsSchema
+    request,  # noqa: ANN001
+    data: ExplanationsSchema,
 ) -> tuple[int, ExplanationsResponseSchema | ExplanationsErrorResponseSchema]:
    await asyncio.to_thread(ph, request.user, "aiservice-explain-called")
    if not validate_trace_id(data.trace_id):
--- a/django/aiservice/core/languages/python/explanations/models.py
+++ b/django/aiservice/core/languages/python/explanations/models.py
@ -0,0 +1,37 @@
+from __future__ import annotations
+
+from ninja import Schema
+
+
+class ExplanationsSchema(Schema):
+    trace_id: str
+    source_code: str
+    optimized_code: str
+    original_line_profiler_results: str
+    optimized_line_profiler_results: str
+    original_code_runtime: str
+    optimized_code_runtime: str
+    speedup: str
+    annotated_tests: str
+    dependency_code: str | None
+    optimization_id: str
+    original_explanation: str
+    original_throughput: str | None = None
+    optimized_throughput: str | None = None
+    throughput_improvement: str | None = None
+    python_version: str | None = None
+    function_references: str | None = None
+    acceptance_reason: str | None = None
+    original_concurrency_ratio: str | None = None
+    optimized_concurrency_ratio: str | None = None
+    concurrency_improvement: str | None = None
+    codeflash_version: str = "0.18.2"
+    call_sequence: int | None = None
+
+
+class ExplanationsResponseSchema(Schema):
+    explanation: str
+
+
+class ExplanationsErrorResponseSchema(Schema):
+    error: str
--- a/django/aiservice/core/languages/python/explanations/prompts/system_prompt.md.j2
+++ b/django/aiservice/core/languages/python/explanations/prompts/system_prompt.md.j2
@ -0,0 +1,171 @@
+{% if model_type == "anthropic" %}
+<role>
+You are an expert software engineer who understands why programs run fast. You have deep expertise in data structures and algorithms.
+
+Your goal is to explain why a piece of code is more performant than a baseline code, to make it easier for a developer to accept and merge the optimized version.
+</role>
+
+<context>
+You are provided the following information to succeed in the explanation process -
+
+- original_source_code: The baseline implementation of the code being optimized
+- original_line_profiler_results - The results after running line_profiler on the original_source_code
+- original_code_runtime - The runtime for the original_source_code
+- optimized_source_code - This is the suggested optimized version of the original_source_code that you should explain.
+- optimized_line_profiler_results - The results after running line_profiler on the optimized_source_code
+- optimized_code_runtime - The runtime for the optimized_source_code
+- speedup - The relative gain in runtime for the optimized_source_code
+- annotated_tests - The regression tests that were run to test for performance and correctness, with runtime results annotated next to the respective test case.
+- read_only_dependency_code - The READ ONLY dependencies for the code provided, to help you better understand the code being provided.
+- original_explanation - The original explanation generated for the optimized_source_code. Note that the original_explanation may be out of sync as some of the micro-optimizations and irrelevant changes might have been reverted in the optimized_source_code.
+- python_version - The version of python the code would be executed on.
+- function_references - Python markdown blocks with filename and references of some functions which call the function being optimized. The filenames and/or references could indicate if the function being optimized is in a hot path. The reference could have the function being called from a place that is important, for example in a loop, which means the effect of optimization might be important.
+</context>
+
+<guidelines>
+Keep explanations **developer-focused and concise**. Focus on:
+- **What** specific optimizations were applied.
+- **Key changes** that affect behavior or dependencies.
+- **Why** the specific optimization leads to a speedup based on your knowledge of performance in Python code.
+- **How** the optimization could potentially impact existing workloads based on function_references which can help determine whether the function being optimized is called in a hot path or not, and if the context where the function is called may benefit from the optimization.
+- What kind of test cases are the specific optimizations good for based on the annotated_tests results.
+- Avoid mentioning obvious preservation details (file structure, imports, signatures) unless they were specifically modified.
+</guidelines>
+
+<output_format>
+Please provide your explanation in the following format:
+
+<explain>
+Your *Brief* explanation of why and how optimized_source_code is faster than original_source_code.
+</explain>
+</output_format>
+{% if include_throughput %}
+
+<throughput_context>
+Additional throughput data is provided:
+- original_throughput - The throughput (operations per second) for the original_source_code
+- optimized_throughput - The throughput (operations per second) for the optimized_source_code
+- throughput_improvement - The percentage improvement in throughput
+
+When explaining optimizations:
+- **Throughput improvements** - explain how the optimization affects the rate of operations/processing.
+- When both runtime and throughput data are provided (for async functions), explain both metrics and how they relate to the optimization.
+</throughput_context>
+{% endif %}
+{% if include_concurrency %}
+
+<concurrency_context>
+Additional concurrency data is provided:
+- original_concurrency_ratio - How much faster concurrent execution is vs sequential for the original code (e.g., 1.0x means no concurrency benefit, 10.0x means concurrent is 10x faster)
+- optimized_concurrency_ratio - The concurrency ratio for the optimized code
+- concurrency_improvement - The percentage improvement in concurrency ratio
+
+Concurrency ratio measures how well async code scales with concurrent execution:
+- Blocking code (like `time.sleep()`) has a low ratio (~1.0x) because it blocks the event loop
+- Non-blocking code (like `asyncio.sleep()`) has a high ratio because it yields control to the event loop
+
+When the concurrency ratio improves, it means the optimized code better utilizes async patterns and allows other concurrent tasks to run while waiting.
+</concurrency_context>
+{% endif %}
+{% if acceptance_reason %}
+
+<acceptance_criteria>
+**CRITICAL**: The optimization was accepted because of improvements in **{{ acceptance_reason }}**.
+
+When explaining this optimization, you MUST:
+1. Lead with the {{ acceptance_reason }} improvement as the PRIMARY benefit
+2. Frame the optimization positively around the {{ acceptance_reason }} metric
+3. If other metrics regressed (e.g., runtime got slower), mention this as a reasonable trade-off for the {{ acceptance_reason }} benefit
+4. Do NOT frame the optimization as a regression or negative change - it was accepted because it improves {{ acceptance_reason }}
+
+For example:
+- If acceptance_reason is "concurrency": Focus on how the code now properly yields to the event loop, improving concurrent execution even if raw runtime increased
+- If acceptance_reason is "throughput": Focus on increased operations per second
+- If acceptance_reason is "runtime": Focus on decreased execution time
+</acceptance_criteria>
+{% endif %}
+{% else %}
+You are an expert software engineer who understands why programs run fast. You have deep expertise in data structures and algorithms.
+
+Your goal is to explain why a piece of code is more performant than a baseline code, to make it easier for a developer to accept and merge the optimized version.
+
+## Context
+
+You are provided the following information to succeed in the explanation process -
+
+- original_source_code: The baseline implementation of the code being optimized
+- original_line_profiler_results - The results after running line_profiler on the original_source_code
+- original_code_runtime - The runtime for the original_source_code
+- optimized_source_code - This is the suggested optimized version of the original_source_code that you should explain.
+- optimized_line_profiler_results - The results after running line_profiler on the optimized_source_code
+- optimized_code_runtime - The runtime for the optimized_source_code
+- speedup - The relative gain in runtime for the optimized_source_code
+- annotated_tests - The regression tests that were run to test for performance and correctness, with runtime results annotated next to the respective test case.
+- read_only_dependency_code - The READ ONLY dependencies for the code provided, to help you better understand the code being provided.
+- original_explanation - The original explanation generated for the optimized_source_code. Note that the original_explanation may be out of sync as some of the micro-optimizations and irrelevant changes might have been reverted in the optimized_source_code.
+- python_version - The version of python the code would be executed on.
+- function_references - Python markdown blocks with filename and references of some functions which call the function being optimized. The filenames and/or references could indicate if the function being optimized is in a hot path. The reference could have the function being called from a place that is important, for example in a loop, which means the effect of optimization might be important.
+
+## Guidelines
+
+Keep explanations **developer-focused and concise**. Focus on:
+- **What** specific optimizations were applied.
+- **Key changes** that affect behavior or dependencies.
+- **Why** the specific optimization leads to a speedup based on your knowledge of performance in Python code.
+- **How** the optimization could potentially impact existing workloads based on function_references which can help determine whether the function being optimized is called in a hot path or not, and if the context where the function is called may benefit from the optimization.
+- What kind of test cases are the specific optimizations good for based on the annotated_tests results.
+- Avoid mentioning obvious preservation details (file structure, imports, signatures) unless they were specifically modified.
+
+## Output Format
+
+Please provide your explanation in the following format:
+
+<explain>
+Your *Brief* explanation of why and how optimized_source_code is faster than original_source_code.
+</explain>
+{% if include_throughput %}
+
+## Throughput Context
+
+Additional throughput data is provided:
+- original_throughput - The throughput (operations per second) for the original_source_code
+- optimized_throughput - The throughput (operations per second) for the optimized_source_code
+- throughput_improvement - The percentage improvement in throughput
+
+When explaining optimizations:
+- **Throughput improvements** - explain how the optimization affects the rate of operations/processing.
+- When both runtime and throughput data are provided (for async functions), explain both metrics and how they relate to the optimization.
+{% endif %}
+{% if include_concurrency %}
+
+## Concurrency Context
+
+Additional concurrency data is provided:
+- original_concurrency_ratio - How much faster concurrent execution is vs sequential for the original code (e.g., 1.0x means no concurrency benefit, 10.0x means concurrent is 10x faster)
+- optimized_concurrency_ratio - The concurrency ratio for the optimized code
+- concurrency_improvement - The percentage improvement in concurrency ratio
+
+Concurrency ratio measures how well async code scales with concurrent execution:
+- Blocking code (like `time.sleep()`) has a low ratio (~1.0x) because it blocks the event loop
+- Non-blocking code (like `asyncio.sleep()`) has a high ratio because it yields control to the event loop
+
+When the concurrency ratio improves, it means the optimized code better utilizes async patterns and allows other concurrent tasks to run while waiting.
+{% endif %}
+{% if acceptance_reason %}
+
+## CRITICAL
+
+**CRITICAL**: The optimization was accepted because of improvements in **{{ acceptance_reason }}**.
+
+When explaining this optimization, you MUST:
+1. Lead with the {{ acceptance_reason }} improvement as the PRIMARY benefit
+2. Frame the optimization positively around the {{ acceptance_reason }} metric
+3. If other metrics regressed (e.g., runtime got slower), mention this as a reasonable trade-off for the {{ acceptance_reason }} benefit
+4. Do NOT frame the optimization as a regression or negative change - it was accepted because it improves {{ acceptance_reason }}
+
+For example:
+- If acceptance_reason is "concurrency": Focus on how the code now properly yields to the event loop, improving concurrent execution even if raw runtime increased
+- If acceptance_reason is "throughput": Focus on increased operations per second
+- If acceptance_reason is "runtime": Focus on decreased execution time
+{% endif %}
+{% endif %}
--- a/django/aiservice/core/languages/python/explanations/prompts/user_prompt.md.j2
+++ b/django/aiservice/core/languages/python/explanations/prompts/user_prompt.md.j2
@ -0,0 +1,161 @@
+{% if model_type == "anthropic" %}
+<original_source_code>
+```python
+{{ original_source_code }}
+```
+</original_source_code>
+
+<optimized_source_code>
+```python
+{{ optimized_source_code }}
+```
+</optimized_source_code>
+
+<original_line_profiler_results>
+{{ original_line_profiler_results }}
+</original_line_profiler_results>
+
+<optimized_line_profiler_results>
+{{ optimized_line_profiler_results }}
+</optimized_line_profiler_results>
+
+<original_code_runtime>
+{{ original_code_runtime }}
+</original_code_runtime>
+
+<optimized_code_runtime>
+{{ optimized_code_runtime }}
+</optimized_code_runtime>
+
+<speedup>
+{{ speedup }}
+</speedup>
+
+<annotated_tests>
+{{ annotated_tests }}
+</annotated_tests>
+
+<read_only_dependency_code>
+{{ read_only_dependency_code }}
+</read_only_dependency_code>
+
+<original_explanation>
+{{ original_explanation }}
+</original_explanation>
+
+<python_version>
+{{ python_version }}
+</python_version>
+
+<function_references>
+{{ function_references }}
+</function_references>
+{% if include_throughput %}
+
+<original_throughput>
+{{ original_throughput }}
+</original_throughput>
+
+<optimized_throughput>
+{{ optimized_throughput }}
+</optimized_throughput>
+
+<throughput_improvement>
+{{ throughput_improvement }}
+</throughput_improvement>
+{% endif %}
+{% if include_concurrency %}
+
+<original_concurrency_ratio>
+{{ original_concurrency_ratio }}
+</original_concurrency_ratio>
+
+<optimized_concurrency_ratio>
+{{ optimized_concurrency_ratio }}
+</optimized_concurrency_ratio>
+
+<concurrency_improvement>
+{{ concurrency_improvement }}
+</concurrency_improvement>
+{% endif %}
+{% else %}
+## Original Source Code
+
+```python
+{{ original_source_code }}
+```
+
+## Optimized Source Code
+
+```python
+{{ optimized_source_code }}
+```
+
+## Original Line Profiler Results
+
+{{ original_line_profiler_results }}
+
+## Optimized Line Profiler Results
+
+{{ optimized_line_profiler_results }}
+
+## Original Code Runtime
+
+{{ original_code_runtime }}
+
+## Optimized Code Runtime
+
+{{ optimized_code_runtime }}
+
+## Speedup
+
+{{ speedup }}
+
+## Annotated Tests
+
+{{ annotated_tests }}
+
+## Read-Only Dependency Code
+
+{{ read_only_dependency_code }}
+
+## Original Explanation
+
+{{ original_explanation }}
+
+## Python Version
+
+{{ python_version }}
+
+## Function References
+
+{{ function_references }}
+{% if include_throughput %}
+
+## Original Throughput (operations per second)
+
+{{ original_throughput }}
+
+## Optimized Throughput (operations per second)
+
+{{ optimized_throughput }}
+
+## Throughput Improvement
+
+{{ throughput_improvement }}
+{% endif %}
+{% if include_concurrency %}
+
+## Original Concurrency Ratio
+
+{{ original_concurrency_ratio }}
+
+## Optimized Concurrency Ratio
+
+{{ optimized_concurrency_ratio }}
+
+## Concurrency Improvement
+
+{{ concurrency_improvement }}
+{% endif %}
+{% endif %}
--- a/django/aiservice/core/languages/python/handler.py
+++ b/django/aiservice/core/languages/python/handler.py
@ -22,7 +22,7 @@ if TYPE_CHECKING:
        CodeRepairIntermediateResponseItemschema,
    )
    from core.languages.python.code_repair.code_repair_context import CodeRepairContext
-    from core.languages.python.explanations.explanations import (
+    from core.languages.python.explanations.models import (
        ExplanationsErrorResponseSchema,
        ExplanationsResponseSchema,
        ExplanationsSchema,