Three private tiles published to the codeflash workspace: - codeflash-internal-rules: 6 eager rules (code-style, architecture, optimization-patterns, git-conventions, testing-rules, multi-language-handlers) - codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline, test-generation-pipeline, context-extraction, aiservice/cf-api endpoints, configuration-thresholds, llm-provider-abstraction) - codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure, add-language-support, add-api-endpoint, debug-test-generation)
4.5 KiB
| name | description |
|---|---|
| debug-test-generation | Diagnose why test generation failed or produced invalid tests. Use when testgen returns errors, empty results, or produces tests that fail to compile or run. Walks through request validation, router dispatch, context building, prompt construction, LLM calls, postprocessing, instrumentation, and output validation. |
Debug Test Generation
Use this workflow when test generation fails or produces invalid tests. Work through the stages sequentially — stop at the first failure found.
Step 1: Validate the Request
Check that the incoming testgen request is well-formed.
- Read the testgen request schema in the relevant testgen module
- Verify required fields:
source_code,trace_idmust be non-empty - Check
languagefield — must match a supported language - Check for valid code — source code should parse without syntax errors
Checkpoint: If the request schema is invalid, the error comes from Pydantic validation. Check the 400 response.
Step 2: Check Router Dispatch
Verify the correct language handler is invoked.
- Read
core/shared/testgen_router.py— thetestgen()endpoint dispatches bydata.language - Supported routes:
"javascript"/"typescript"→core.languages.js_ts.testgen.generate_tests_javascript"java"→core.languages.java.testgen.generate_tests_java- Default →
core.languages.python.testgen.testgen.generate_tests_python
- Check for
ImportError— lazy imports may fail if a language module is broken
Checkpoint: If dispatch fails, check that the language module exists and imports cleanly.
Step 3: Check Context Building
Verify the testgen context is constructed correctly.
- Read
core/languages/python/testgen/testgen.py—build_prompt()constructs the prompts - Check that source code and dependency code are passed correctly
- Verify the Jinja2 template renders without errors
- Check for async/sync variants — the prompt builder handles both
Checkpoint: If context is empty or malformed, check the input source_code and dependency_code.
Step 4: Check Prompt Construction
Verify the LLM prompts are well-formed.
- The
build_prompt()function uses Jinja2 templates (.mdfiles alongside the module) - System prompt sets the role and language context
- User prompt includes the source code and test context
- Check that prompts are non-empty and contain the function to test
Checkpoint: If prompts are empty, check the template files and Jinja2 rendering.
Step 5: Check LLM Response
Verify the LLM returns valid test code.
- Read
aiservice/llm.py—call_llm()handles the API call - Check for network errors or API key issues (same as optimization debugging)
- Look for
LLMOutputParseError— this means the LLM returned unparseable output - Check the raw response content — it should contain markdown code blocks with test code
Checkpoint: If the LLM returns malformed output, check the prompt quality and model selection.
Step 6: Check Postprocessing
Verify generated tests survive postprocessing.
- Read
core/languages/python/testgen/postprocessing/directory add_missing_imports.py— addsfrom __future__ import annotationsif needed- Check for syntax errors in generated test code —
cst.ParserSyntaxErrormeans malformed code - Verify imports are resolved correctly
Checkpoint: If postprocessing fails, the LLM generated syntactically invalid code. Check the raw output.
Step 7: Check Instrumentation
Verify tests are properly instrumented.
- Read
core/languages/python/testgen/instrumentation/instrument_new_tests.py instrument_tests()applies behavior and performance instrumentationdetect_frameworks_from_code()identifies ML frameworks (PyTorch, TensorFlow, JAX)_create_device_sync_precompute_statements()adds GPU sync calls for timing accuracy- Check that instrumented tests still compile — instrumentation may introduce syntax errors
Checkpoint: If instrumented tests fail to compile, check the instrumentation transforms. The issue is usually in import handling or device sync injection.
Key Files Reference
| File | What to check |
|---|---|
core/shared/testgen_router.py |
Language dispatch |
core/languages/python/testgen/testgen.py |
Testgen flow, prompt building |
core/languages/python/testgen/postprocessing/ |
Import management, cleanup |
core/languages/python/testgen/instrumentation/instrument_new_tests.py |
Test instrumentation |
aiservice/llm.py |
LLM calls and client setup |