mirror of https://github.com/codeflash-ai/codeflash-internal.git synced 2026-05-04 18:25:18 +00:00

Kevin Turcios dfc56f19a0 feat: add Tessl tiles for codeflash-internal (rules, docs, skills)

Three private tiles published to the codeflash workspace:
- codeflash-internal-rules: 6 eager rules (code-style, architecture,
  optimization-patterns, git-conventions, testing-rules, multi-language-handlers)
- codeflash-internal-docs: 8 lazy doc pages (domain-types, optimization-pipeline,
  test-generation-pipeline, context-extraction, aiservice/cf-api endpoints,
  configuration-thresholds, llm-provider-abstraction)
- codeflash-internal-skills: 4 on-demand skills (debug-optimization-failure,
  add-language-support, add-api-endpoint, debug-test-generation)

2026-02-14 22:16:33 -05:00

4.5 KiB

Raw Blame History

name	description
debug-test-generation	Diagnose why test generation failed or produced invalid tests. Use when testgen returns errors, empty results, or produces tests that fail to compile or run. Walks through request validation, router dispatch, context building, prompt construction, LLM calls, postprocessing, instrumentation, and output validation.

Debug Test Generation

Use this workflow when test generation fails or produces invalid tests. Work through the stages sequentially — stop at the first failure found.

Step 1: Validate the Request

Check that the incoming testgen request is well-formed.

Read the testgen request schema in the relevant testgen module
Verify required fields: source_code, trace_id must be non-empty
Check language field — must match a supported language
Check for valid code — source code should parse without syntax errors

Checkpoint: If the request schema is invalid, the error comes from Pydantic validation. Check the 400 response.

Step 2: Check Router Dispatch

Verify the correct language handler is invoked.

Read core/shared/testgen_router.py — the testgen() endpoint dispatches by data.language
Supported routes:
- "javascript" / "typescript" → core.languages.js_ts.testgen.generate_tests_javascript
- "java" → core.languages.java.testgen.generate_tests_java
- Default → core.languages.python.testgen.testgen.generate_tests_python
Check for ImportError — lazy imports may fail if a language module is broken

Checkpoint: If dispatch fails, check that the language module exists and imports cleanly.

Step 3: Check Context Building

Verify the testgen context is constructed correctly.

Read core/languages/python/testgen/testgen.py — build_prompt() constructs the prompts
Check that source code and dependency code are passed correctly
Verify the Jinja2 template renders without errors
Check for async/sync variants — the prompt builder handles both

Checkpoint: If context is empty or malformed, check the input source_code and dependency_code.

Step 4: Check Prompt Construction

Verify the LLM prompts are well-formed.

The build_prompt() function uses Jinja2 templates (.md files alongside the module)
System prompt sets the role and language context
User prompt includes the source code and test context
Check that prompts are non-empty and contain the function to test

Checkpoint: If prompts are empty, check the template files and Jinja2 rendering.

Step 5: Check LLM Response

Verify the LLM returns valid test code.

Read aiservice/llm.py — call_llm() handles the API call
Check for network errors or API key issues (same as optimization debugging)
Look for LLMOutputParseError — this means the LLM returned unparseable output
Check the raw response content — it should contain markdown code blocks with test code

Checkpoint: If the LLM returns malformed output, check the prompt quality and model selection.

Step 6: Check Postprocessing

Verify generated tests survive postprocessing.

Read core/languages/python/testgen/postprocessing/ directory
add_missing_imports.py — adds from __future__ import annotations if needed
Check for syntax errors in generated test code — cst.ParserSyntaxError means malformed code
Verify imports are resolved correctly

Checkpoint: If postprocessing fails, the LLM generated syntactically invalid code. Check the raw output.

Step 7: Check Instrumentation

Verify tests are properly instrumented.

Read core/languages/python/testgen/instrumentation/instrument_new_tests.py
instrument_tests() applies behavior and performance instrumentation
detect_frameworks_from_code() identifies ML frameworks (PyTorch, TensorFlow, JAX)
_create_device_sync_precompute_statements() adds GPU sync calls for timing accuracy
Check that instrumented tests still compile — instrumentation may introduce syntax errors

Checkpoint: If instrumented tests fail to compile, check the instrumentation transforms. The issue is usually in import handling or device sync injection.

Key Files Reference

File	What to check
`core/shared/testgen_router.py`	Language dispatch
`core/languages/python/testgen/testgen.py`	Testgen flow, prompt building
`core/languages/python/testgen/postprocessing/`	Import management, cleanup
`core/languages/python/testgen/instrumentation/instrument_new_tests.py`	Test instrumentation
`aiservice/llm.py`	LLM calls and client setup

4.5 KiB Raw Blame History