codeflash-internal/tiles/codeflash-internal-skills/skills/debug-test-generation/SKILL.md
Kevin Turcios dfc56f19a0 feat: add Tessl tiles for codeflash-internal (rules, docs, skills)
Three private tiles published to the codeflash workspace:
- codeflash-internal-rules: 6 eager rules (code-style, architecture,
  optimization-patterns, git-conventions, testing-rules, multi-language-handlers)
- codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline,
  test-generation-pipeline, context-extraction, aiservice/cf-api endpoints,
  configuration-thresholds, llm-provider-abstraction)
- codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure,
  add-language-support, add-api-endpoint, debug-test-generation)
2026-02-14 22:16:33 -05:00

4.5 KiB

name description
debug-test-generation Diagnose why test generation failed or produced invalid tests. Use when testgen returns errors, empty results, or produces tests that fail to compile or run. Walks through request validation, router dispatch, context building, prompt construction, LLM calls, postprocessing, instrumentation, and output validation.

Debug Test Generation

Use this workflow when test generation fails or produces invalid tests. Work through the stages sequentially — stop at the first failure found.

Step 1: Validate the Request

Check that the incoming testgen request is well-formed.

  1. Read the testgen request schema in the relevant testgen module
  2. Verify required fields: source_code, trace_id must be non-empty
  3. Check language field — must match a supported language
  4. Check for valid code — source code should parse without syntax errors

Checkpoint: If the request schema is invalid, the error comes from Pydantic validation. Check the 400 response.

Step 2: Check Router Dispatch

Verify the correct language handler is invoked.

  1. Read core/shared/testgen_router.py — the testgen() endpoint dispatches by data.language
  2. Supported routes:
    • "javascript" / "typescript"core.languages.js_ts.testgen.generate_tests_javascript
    • "java"core.languages.java.testgen.generate_tests_java
    • Default → core.languages.python.testgen.testgen.generate_tests_python
  3. Check for ImportError — lazy imports may fail if a language module is broken

Checkpoint: If dispatch fails, check that the language module exists and imports cleanly.

Step 3: Check Context Building

Verify the testgen context is constructed correctly.

  1. Read core/languages/python/testgen/testgen.pybuild_prompt() constructs the prompts
  2. Check that source code and dependency code are passed correctly
  3. Verify the Jinja2 template renders without errors
  4. Check for async/sync variants — the prompt builder handles both

Checkpoint: If context is empty or malformed, check the input source_code and dependency_code.

Step 4: Check Prompt Construction

Verify the LLM prompts are well-formed.

  1. The build_prompt() function uses Jinja2 templates (.md files alongside the module)
  2. System prompt sets the role and language context
  3. User prompt includes the source code and test context
  4. Check that prompts are non-empty and contain the function to test

Checkpoint: If prompts are empty, check the template files and Jinja2 rendering.

Step 5: Check LLM Response

Verify the LLM returns valid test code.

  1. Read aiservice/llm.pycall_llm() handles the API call
  2. Check for network errors or API key issues (same as optimization debugging)
  3. Look for LLMOutputParseError — this means the LLM returned unparseable output
  4. Check the raw response content — it should contain markdown code blocks with test code

Checkpoint: If the LLM returns malformed output, check the prompt quality and model selection.

Step 6: Check Postprocessing

Verify generated tests survive postprocessing.

  1. Read core/languages/python/testgen/postprocessing/ directory
  2. add_missing_imports.py — adds from __future__ import annotations if needed
  3. Check for syntax errors in generated test code — cst.ParserSyntaxError means malformed code
  4. Verify imports are resolved correctly

Checkpoint: If postprocessing fails, the LLM generated syntactically invalid code. Check the raw output.

Step 7: Check Instrumentation

Verify tests are properly instrumented.

  1. Read core/languages/python/testgen/instrumentation/instrument_new_tests.py
  2. instrument_tests() applies behavior and performance instrumentation
  3. detect_frameworks_from_code() identifies ML frameworks (PyTorch, TensorFlow, JAX)
  4. _create_device_sync_precompute_statements() adds GPU sync calls for timing accuracy
  5. Check that instrumented tests still compile — instrumentation may introduce syntax errors

Checkpoint: If instrumented tests fail to compile, check the instrumentation transforms. The issue is usually in import handling or device sync injection.

Key Files Reference

File What to check
core/shared/testgen_router.py Language dispatch
core/languages/python/testgen/testgen.py Testgen flow, prompt building
core/languages/python/testgen/postprocessing/ Import management, cleanup
core/languages/python/testgen/instrumentation/instrument_new_tests.py Test instrumentation
aiservice/llm.py LLM calls and client setup