codeflash-internal/tiles/codeflash-internal-skills/evals/scenario-5/task.md
2026-02-14 22:25:30 -05:00

2.6 KiB

Scenario: Generated tests fail to compile after instrumentation

Context

Test generation for a Python function that uses PyTorch produces tests that fail to compile. The function under test is:

import torch

def normalize_tensor(t: torch.Tensor) -> torch.Tensor:
    return (t - t.mean()) / t.std()

The testgen request completes successfully -- the LLM returns valid test code. However, after the instrumentation stage, the tests fail with:

SyntaxError: invalid syntax
  File "test_normalize_tensor_instrumented.py", line 15
    torch.cuda.synchronize()import torch
                            ^^^^^^

The raw (pre-instrumentation) test code compiles fine. The issue only appears after instrumentation.

Server logs:

INFO  2026-02-14 14:00:00 testgen: build_prompt completed, prompts non-empty
INFO  2026-02-14 14:00:02 llm: call_llm completed for model gpt-4o
INFO  2026-02-14 14:00:03 postprocessing: add_missing_imports completed
INFO  2026-02-14 14:00:03 instrumentation: detect_frameworks_from_code found ['torch']
INFO  2026-02-14 14:00:03 instrumentation: _create_device_sync_precompute_statements injecting cuda.synchronize()
ERROR 2026-02-14 14:00:03 instrumentation: instrumented tests failed to compile - SyntaxError

Task

Diagnose why the instrumented tests fail to compile. Walk through the test generation pipeline, focusing on the instrumentation stage. Provide:

  1. The exact stage where the failure occurs.
  2. The specific function responsible for the syntax error.
  3. An explanation of what _create_device_sync_precompute_statements() does and how it can introduce syntax errors.
  4. The file to investigate and what to look for.
  5. A recommendation for fixing the instrumentation bug.

Expected Outputs

  • Identification that the failure is at the instrumentation stage (Step 7 of the debug-test-generation workflow).
  • The file responsible is core/languages/python/testgen/instrumentation/instrument_new_tests.py.
  • Explanation that detect_frameworks_from_code() detected PyTorch, which triggered _create_device_sync_precompute_statements() to inject torch.cuda.synchronize() calls for GPU timing accuracy.
  • The syntax error (torch.cuda.synchronize()import torch) indicates the sync statement was injected without a newline separator, concatenating with an existing import line.
  • Recommendation: check the injection logic in _create_device_sync_precompute_statements() to ensure it adds proper newlines/indentation when inserting sync calls, and add a compilation check after instrumentation to catch such issues before returning.