codeflash-internal/tiles/codeflash-internal-skills/evals/scenario-5/task.md

53 lines
2.6 KiB
Markdown
Raw Normal View History

# Scenario: Generated tests fail to compile after instrumentation
## Context
Test generation for a Python function that uses PyTorch produces tests that fail to compile. The function under test is:
```python
import torch
def normalize_tensor(t: torch.Tensor) -> torch.Tensor:
return (t - t.mean()) / t.std()
```
The testgen request completes successfully -- the LLM returns valid test code. However, after the instrumentation stage, the tests fail with:
```
SyntaxError: invalid syntax
File "test_normalize_tensor_instrumented.py", line 15
torch.cuda.synchronize()import torch
^^^^^^
```
The raw (pre-instrumentation) test code compiles fine. The issue only appears after instrumentation.
Server logs:
```
INFO 2026-02-14 14:00:00 testgen: build_prompt completed, prompts non-empty
INFO 2026-02-14 14:00:02 llm: call_llm completed for model gpt-4o
INFO 2026-02-14 14:00:03 postprocessing: add_missing_imports completed
INFO 2026-02-14 14:00:03 instrumentation: detect_frameworks_from_code found ['torch']
INFO 2026-02-14 14:00:03 instrumentation: _create_device_sync_precompute_statements injecting cuda.synchronize()
ERROR 2026-02-14 14:00:03 instrumentation: instrumented tests failed to compile - SyntaxError
```
## Task
Diagnose why the instrumented tests fail to compile. Walk through the test generation pipeline, focusing on the instrumentation stage. Provide:
1. The exact stage where the failure occurs.
2. The specific function responsible for the syntax error.
3. An explanation of what `_create_device_sync_precompute_statements()` does and how it can introduce syntax errors.
4. The file to investigate and what to look for.
5. A recommendation for fixing the instrumentation bug.
## Expected Outputs
- Identification that the failure is at the **instrumentation** stage (Step 7 of the debug-test-generation workflow).
- The file responsible is `core/languages/python/testgen/instrumentation/instrument_new_tests.py`.
- Explanation that `detect_frameworks_from_code()` detected PyTorch, which triggered `_create_device_sync_precompute_statements()` to inject `torch.cuda.synchronize()` calls for GPU timing accuracy.
- The syntax error (`torch.cuda.synchronize()import torch`) indicates the sync statement was injected without a newline separator, concatenating with an existing import line.
- Recommendation: check the injection logic in `_create_device_sync_precompute_statements()` to ensure it adds proper newlines/indentation when inserting sync calls, and add a compilation check after instrumentation to catch such issues before returning.