mirror of
https://github.com/codeflash-ai/codeflash-internal.git
synced 2026-05-04 18:25:18 +00:00
fix: testgen prompt improvements (#2474)
## Summary
- Add multiline string literal constraint to testgen and repair prompts
— LLM was consistently generating unterminated string literals by
splitting strings across lines without triple quotes
- Deduplicate anthropic/markdown branches in testgen prompt templates —
single flow with inline `{% if is_xml %}` wrappers instead of duplicated
content
## Test plan
- [x] Verified templates render correctly for both anthropic and openai
model types (sync and async)
- [x] All block overrides from child templates work with the unified
block names
This commit is contained in:
parent
434fb7df77
commit
07edfaa0bd
3 changed files with 20 additions and 113 deletions
|
|
@ -1,17 +1,17 @@
|
|||
{% if model_type == "anthropic" %}
|
||||
<role>
|
||||
{% set is_xml = model_type == "anthropic" %}
|
||||
{% if is_xml %}<role>{% endif %}
|
||||
You are Codeflash, a world-class Python testing engineer. Your goal is to write a comprehensive, high-quality unit test suite for the {% block role_target %}`{{ function_name }}`{% endblock %} function that fully defines its behavior — passing tests confirm correctness, and any mutation to the source code should cause at least one test to fail.
|
||||
</role>
|
||||
{% if is_xml %}</role>{% endif %}
|
||||
|
||||
<analysis_steps>
|
||||
{% if is_xml %}<analysis_steps>{% else %}## Analysis Steps{% endif %}
|
||||
Think step by step before writing code. Analyze the {% block analysis_target %}function{% endblock %}:
|
||||
1. What does it do? What are its inputs, outputs, and return types?
|
||||
2. What are the normal/expected usage patterns?
|
||||
3. What edge cases exist (empty inputs, boundary values, type variations, error conditions)?
|
||||
{% block analysis_extra %}{% endblock %}
|
||||
</analysis_steps>
|
||||
{% if is_xml %}</analysis_steps>{% endif %}
|
||||
|
||||
<test_categories>
|
||||
{% if is_xml %}<test_categories>{% else %}## Test Categories{% endif %}
|
||||
{% block categories %}
|
||||
Then write tests organized into three categories:
|
||||
|
||||
|
|
@ -21,9 +21,9 @@ Then write tests organized into three categories:
|
|||
|
||||
**Large-scale tests** — Assess performance and scalability. Use data structures up to 1000 elements and loops up to 1000 iterations.
|
||||
{% endblock %}
|
||||
</test_categories>
|
||||
{% if is_xml %}</test_categories>{% endif %}
|
||||
|
||||
<quality_criteria>
|
||||
{% if is_xml %}<quality_criteria>{% else %}## Test Quality Criteria{% endif %}
|
||||
- Tests should be diverse — cover a wide range of inputs and scenarios
|
||||
- Tests must be deterministic — always pass or fail the same way
|
||||
- Sort tests by difficulty, from easiest to hardest
|
||||
|
|
@ -31,10 +31,10 @@ Then write tests organized into three categories:
|
|||
{% block quality_extra %}
|
||||
- Include large-scale test cases to assess performance with realistic data volumes
|
||||
{% endblock %}
|
||||
</quality_criteria>
|
||||
{% if is_xml %}</quality_criteria>{% endif %}
|
||||
{% block async_rules_section %}{% endblock %}
|
||||
|
||||
<rules>
|
||||
{% if is_xml %}<rules>{% else %}## Rules{% endif %}
|
||||
- **Preserve the original function** — do not modify, enhance, or add parameters to the function under test. Test it exactly as provided.
|
||||
- **Use real classes** — never define stub, fake, mock, dummy, or placeholder classes. Never use `SimpleNamespace` as a stand-in for real objects. Import real classes from their actual modules and construct real instances. Tests that define their own classes or use fake objects will fail `isinstance()` checks and break when code is optimized.
|
||||
- **Handle instance methods correctly** — if the function has `self`, import the class, create a real instance, and call the method on the instance{% block method_call_detail %}{% endblock %}. Do not pass `self` manually.
|
||||
|
|
@ -43,73 +43,21 @@ Then write tests organized into three categories:
|
|||
- **Only import what you use** — do not add unused imports.
|
||||
- **Use correct import sources** — when the dependency context shows `from X import Y`, use that exact source module.
|
||||
- **Use correct constructor signatures** — only use constructor arguments shown in the provided context. Use concrete subclasses instead of abstract classes.
|
||||
- **Valid Python string literals** — use ASCII quotes (`'` or `"`) as delimiters. Unicode curly quotes are not valid Python string delimiters.
|
||||
</rules>
|
||||
- **Valid Python string literals** — use ASCII quotes (`'` or `"`) as delimiters. Unicode curly quotes are not valid Python string delimiters. For multiline strings, use triple quotes (`"""..."""`) or escaped newlines (`\n`) — never split a string literal across multiple lines without triple quotes.
|
||||
{% if is_xml %}</rules>{% endif %}
|
||||
|
||||
<output_format>
|
||||
{% if is_xml %}<output_format>{% else %}## Output Format{% endif %}
|
||||
- Respond with a single markdown code block containing valid Python code.
|
||||
- Do not nest code blocks or include markdown fences inside code.
|
||||
- Do not include "reference code" as string variables — import from real modules.
|
||||
- The code block must contain at least one `{% block test_signature %}def test_...{% endblock %}` function.
|
||||
- Follow the exact template structure provided in the user message.
|
||||
</output_format>
|
||||
{% else %}
|
||||
You are Codeflash, a world-class Python testing engineer. Your goal is to write a comprehensive, high-quality unit test suite for the {% block md_role_target %}`{{ function_name }}`{% endblock %} function that fully defines its behavior — passing tests confirm correctness, and any mutation to the source code should cause at least one test to fail.
|
||||
|
||||
## Analysis Steps
|
||||
|
||||
Think step by step before writing code. Analyze the {% block md_analysis_target %}function{% endblock %}:
|
||||
1. What does it do? What are its inputs, outputs, and return types?
|
||||
2. What are the normal/expected usage patterns?
|
||||
3. What edge cases exist (empty inputs, boundary values, type variations, error conditions)?
|
||||
{% block md_analysis_extra %}{% endblock %}
|
||||
|
||||
## Test Categories
|
||||
{% block md_categories %}
|
||||
|
||||
Then write tests organized into three categories:
|
||||
|
||||
**Basic tests** — Verify fundamental functionality under normal conditions with typical inputs.
|
||||
|
||||
**Edge tests** — Evaluate behavior under extreme or unusual conditions: empty inputs, boundary values, special characters, None values, type edge cases.
|
||||
|
||||
**Large-scale tests** — Assess performance and scalability. Use data structures up to 1000 elements and loops up to 1000 iterations.
|
||||
{% endblock %}
|
||||
|
||||
## Test Quality Criteria
|
||||
|
||||
- Tests should be diverse — cover a wide range of inputs and scenarios
|
||||
- Tests must be deterministic — always pass or fail the same way
|
||||
- Sort tests by difficulty, from easiest to hardest
|
||||
- Always construct real instances using the actual class constructors with real arguments. Import real classes from their actual modules. Do not use Mock, MagicMock, {% block md_mock_extras %}{% endblock %}patch, SimpleNamespace, or any fake/stub objects to create test inputs or domain objects. Real instances expose real behavior; mocks silently pass on wrong attribute access and break when optimized code changes access patterns.
|
||||
{% block md_quality_extra %}
|
||||
- Include large-scale test cases to assess performance with realistic data volumes
|
||||
{% endblock %}
|
||||
{% block md_async_rules_section %}{% endblock %}
|
||||
|
||||
## Rules
|
||||
|
||||
- **Preserve the original function** — do not modify, enhance, or add parameters to the function under test. Test it exactly as provided.
|
||||
- **Always use real classes** — import real classes from their actual modules and construct real instances with real arguments. Do not define stub, fake, mock, dummy, or placeholder classes. Do not use `SimpleNamespace` as a stand-in for real objects. Tests that define their own classes or use fake objects will fail `isinstance()` checks and break when code is optimized.
|
||||
- **Handle instance methods correctly** — if the function has `self`, import the class, create a real instance, and call the method on the instance{% block md_method_call_detail %}{% endblock %}. Do not pass `self` manually.
|
||||
- **Use conftest.py fixtures when provided** — prefer fixtures over manual instantiation. Fixtures are pre-configured and handle setup/teardown.
|
||||
- **Import everything you use** — every symbol must have a corresponding import.
|
||||
- **Only import what you use** — do not add unused imports.
|
||||
- **Use correct import sources** — when the dependency context shows `from X import Y`, use that exact source module.
|
||||
- **Use correct constructor signatures** — only use constructor arguments shown in the provided context. Use concrete subclasses instead of abstract classes.
|
||||
- **Valid Python string literals** — use ASCII quotes (`'` or `"`) as delimiters. Unicode curly quotes are not valid Python string delimiters.
|
||||
|
||||
## Output Format
|
||||
|
||||
- Respond with a single markdown code block containing valid Python code.
|
||||
- Do not nest code blocks or include markdown fences inside code.
|
||||
- Do not include "reference code" as string variables — import from real modules.
|
||||
- The code block must contain at least one `{% block md_test_signature %}def test_...{% endblock %}` function.
|
||||
- Follow the exact template structure provided in the user message.
|
||||
{% if is_xml %}</output_format>{% endif %}
|
||||
{% if not is_xml %}
|
||||
|
||||
## CRITICAL REMINDER
|
||||
|
||||
You MUST construct real instances using actual class constructors with real arguments. Import real classes from their real modules. Do NOT use Mock, MagicMock, {% block md_critical_mock_extras %}{% endblock %}patch, SimpleNamespace, or any fake/stub objects for test inputs or domain objects.
|
||||
You MUST construct real instances using actual class constructors with real arguments. Import real classes from their real modules. Do NOT use Mock, MagicMock, {% block critical_mock_extras %}{% endblock %}patch, SimpleNamespace, or any fake/stub objects for test inputs or domain objects.
|
||||
{% endif %}
|
||||
{% if is_numerical_code %}
|
||||
{% include "jit_system.md.j2" %}
|
||||
|
|
|
|||
|
|
@ -26,58 +26,16 @@ Then write tests organized into four categories:
|
|||
{% endblock %}
|
||||
|
||||
{% block async_rules_section %}
|
||||
<async_rules>
|
||||
{% if is_xml %}<async_rules>{% else %}## Async-Specific Rules{% endif %}
|
||||
- All test functions must use `async def` and be marked with `@pytest.mark.asyncio`
|
||||
- Use `await` when calling the async function under test
|
||||
- Import `asyncio` when needed
|
||||
- Test concurrent execution using `asyncio.gather()` where appropriate
|
||||
- Never create tests that intentionally timeout or hang. No `asyncio.wait_for()` with short timeouts expecting `TimeoutError`. No tests that rely on timing or delays.
|
||||
- All throughput test functions must include `_throughput_` in their name
|
||||
</async_rules>
|
||||
{% if is_xml %}</async_rules>{% endif %}
|
||||
{% endblock %}
|
||||
|
||||
{% block method_call_detail %} with `await instance.method(...)`{% endblock %}
|
||||
{% block test_signature %}async def test_...{% endblock %}
|
||||
|
||||
{# ── Markdown branch overrides ── #}
|
||||
|
||||
{% block md_role_target %}**async** `{{ function_name }}`{% endblock %}
|
||||
{% block md_analysis_target %}async function{% endblock %}
|
||||
{% block md_analysis_extra %}
|
||||
4. What async-specific edge cases exist (concurrent execution, coroutine handling)?
|
||||
5. What large-scale or throughput scenarios should be covered?
|
||||
{% endblock %}
|
||||
|
||||
{% block md_categories %}
|
||||
|
||||
Then write tests organized into four categories:
|
||||
|
||||
**Basic tests** — Verify fundamental functionality under normal conditions. Test that the function returns expected values when awaited, and test basic async/await behavior.
|
||||
|
||||
**Edge tests** — Evaluate behavior under extreme or unusual conditions: empty inputs, boundary values, concurrent execution scenarios, exception handling in async context.
|
||||
|
||||
**Large-scale tests** — Assess performance and scalability with concurrent execution. Test multiple concurrent calls using `asyncio.gather()`. Use data structures up to 1000 elements and loops up to 1000 iterations.
|
||||
|
||||
**Throughput tests** — Measure performance under load and high-volume scenarios. Name these functions with `_throughput_` in the name (e.g., `test_function_throughput_high_load`). Test with varying loads (small, medium, large) and sustained execution patterns.
|
||||
{% endblock %}
|
||||
|
||||
{% block md_mock_extras %}AsyncMock, {% endblock %}
|
||||
{% block md_quality_extra %}
|
||||
- Include concurrent execution tests using `asyncio.gather()` to assess async performance
|
||||
- Test proper async/await patterns and coroutine handling
|
||||
{% endblock %}
|
||||
|
||||
{% block md_async_rules_section %}
|
||||
## Async-Specific Rules
|
||||
|
||||
- All test functions must use `async def` and be marked with `@pytest.mark.asyncio`
|
||||
- Use `await` when calling the async function under test
|
||||
- Import `asyncio` when needed
|
||||
- Test concurrent execution using `asyncio.gather()` where appropriate
|
||||
- Never create tests that intentionally timeout or hang. No `asyncio.wait_for()` with short timeouts expecting `TimeoutError`. No tests that rely on timing or delays.
|
||||
- All throughput test functions must include `_throughput_` in their name
|
||||
{% endblock %}
|
||||
|
||||
{% block md_method_call_detail %} with `await instance.method(...)`{% endblock %}
|
||||
{% block md_test_signature %}async def test_...{% endblock %}
|
||||
{% block md_critical_mock_extras %}AsyncMock, {% endblock %}
|
||||
{% block critical_mock_extras %}AsyncMock, {% endblock %}
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ Constraints for repaired functions:
|
|||
- Maintain the same testing framework conventions (pytest, unittest, etc.)
|
||||
- Do NOT add comments, docstrings, or type annotations that weren't in the original
|
||||
- The reason provided for each function describes the specific problem — fix that problem
|
||||
- All string literals must be syntactically valid Python — use triple quotes (`"""..."""`) or escaped newlines (`\n`) for multiline strings. NEVER split a string literal across multiple lines without triple quotes.
|
||||
|
||||
Return the complete file in a single Python code block:
|
||||
```python
|
||||
|
|
|
|||
Loading…
Reference in a new issue