mirror of
https://github.com/codeflash-ai/codeflash-internal.git
synced 2026-05-04 18:25:18 +00:00
feat: add debugging workflow and response checklist to observability chat prompt
Guide the chat agent to use the new tools proactively: a DEBUGGING TOOLS section with structured guidance for get_llm_call_detail and codebase browsing, a 4-step workflow (OBSERVE → INVESTIGATE → LOCATE → RECOMMEND), and a RESPONSE CHECKLIST at the end of the prompt requiring the agent to cite real file paths before responding.
This commit is contained in:
parent
782ee508de
commit
51372ca0ad
1 changed files with 65 additions and 0 deletions
|
|
@ -185,6 +185,58 @@ export function buildSummaryPrompt(data: IndexedTraceData): string {
|
|||
"navigate to the pipeline code to suggest a concrete fix.",
|
||||
)
|
||||
|
||||
lines.push("")
|
||||
lines.push("=== DEBUGGING TOOLS — USE THESE PROACTIVELY ===")
|
||||
lines.push(
|
||||
"You have tools beyond trace data. Your job is not just to describe what happened — it's to " +
|
||||
"investigate WHY it happened and point to the specific code or prompt that needs to change. " +
|
||||
"Always go one level deeper than the surface-level observation.\n\n" +
|
||||
"IMPORTANT: When you identify a problem (bad tests, failed optimizations, parsing errors, etc.), " +
|
||||
"you MUST use get_llm_call_detail to inspect the actual prompts and responses involved. Then, if " +
|
||||
"the issue traces back to a prompt or pipeline bug, use the codebase browsing tools to find the " +
|
||||
"source code and suggest a concrete fix. Do not stop at 'the tests used mocks' — find out what " +
|
||||
"prompt instructions led to that and where to fix them.\n\n" +
|
||||
"=== get_llm_call_detail(call_id) ===\n" +
|
||||
"Fetches the full system prompt, user prompt, raw LLM response, and parsing results for any " +
|
||||
"LLM call in this trace. You SHOULD use this:\n" +
|
||||
"- When analyzing test quality: inspect the testgen prompt to see what instructions the model " +
|
||||
"received. Did the prompt forbid mocks? Did it provide enough context about the classes?\n" +
|
||||
"- When investigating bad optimizations: read the optimizer prompt to check if context was " +
|
||||
"missing or if instructions were unclear\n" +
|
||||
"- When debugging parsing failures: compare raw_response vs parsed_response to find extraction bugs\n" +
|
||||
"- When understanding ranking decisions: read the ranker prompt and response\n\n" +
|
||||
"=== read_file, search_code, list_directory ===\n" +
|
||||
"Browse the codeflash-internal and codeflash (CLI) source repos. You SHOULD use these:\n" +
|
||||
"- After inspecting an LLM call, find the prompt template to suggest a specific fix\n" +
|
||||
"- To understand how a pipeline stage works (postprocessing, deduplication, instrumentation)\n" +
|
||||
"- To trace a code path from an LLM call back to the pipeline logic that invoked it\n" +
|
||||
"- When the user asks 'where does X happen' or 'why does Y behave this way'\n\n" +
|
||||
"Key paths in codeflash-internal:\n" +
|
||||
"- django/aiservice/core/shared/ — optimizer_router, testgen_router, ranker\n" +
|
||||
"- django/aiservice/core/languages/python/optimizer/ — Python optimizer pipeline\n" +
|
||||
"- django/aiservice/core/languages/python/testgen/ — test generation pipeline\n" +
|
||||
"- django/aiservice/aiservice/llm.py — LLM provider abstraction\n" +
|
||||
"- Prompt templates are .md files alongside their modules (rendered with Jinja2)\n\n" +
|
||||
"=== EXPECTED WORKFLOW — YOU MUST COMPLETE ALL STEPS ===\n" +
|
||||
"When you find a problem in a trace, DO NOT stop at describing the symptoms. You MUST complete " +
|
||||
"the full investigation:\n\n" +
|
||||
"1. OBSERVE: Answer the user's question using trace data tools (get_test_code, get_candidate_code, etc.)\n" +
|
||||
"2. INVESTIGATE: Use get_llm_call_detail to read the prompts and responses that caused the problem. " +
|
||||
"Identify whether the issue is a prompt gap, a model failure to follow instructions, or a pipeline bug.\n" +
|
||||
"3. LOCATE: Use search_code to find the prompt template or pipeline code responsible. Read it with " +
|
||||
"read_file. Prompt templates are .md files — search for distinctive phrases from the prompt you found " +
|
||||
"in step 2 to locate the template file.\n" +
|
||||
"4. RECOMMEND: Suggest a concrete fix — name the file, quote the relevant section, and describe " +
|
||||
"what to change. For example: 'In django/aiservice/core/languages/python/testgen/prompt.md, the " +
|
||||
"no-mocks instruction at line 45 should be moved to the system prompt for stronger enforcement.'\n\n" +
|
||||
"If you skip steps 3-4, your response is INCOMPLETE. The user is a developer who wants actionable " +
|
||||
"fixes, not just observations about what went wrong.\n\n" +
|
||||
"HARD REQUIREMENT: When you identify a problem caused by a prompt or pipeline stage, your response " +
|
||||
"MUST include at least one real file path from the codebase that you found via search_code or " +
|
||||
"read_file. Generic advice like 'strengthen the prompt' is not enough — find the actual file, " +
|
||||
"read it, and reference the specific lines that need to change.",
|
||||
)
|
||||
|
||||
lines.push("")
|
||||
lines.push("=== CODEFLASH TESTING GUIDELINES ===")
|
||||
lines.push(
|
||||
|
|
@ -262,6 +314,19 @@ export function buildSummaryPrompt(data: IndexedTraceData): string {
|
|||
}
|
||||
}
|
||||
|
||||
lines.push("")
|
||||
lines.push("=== RESPONSE CHECKLIST (review before responding) ===")
|
||||
lines.push(
|
||||
"Before you send your response, verify:\n" +
|
||||
"[ ] If you identified a problem (bad tests, failed optimization, parsing error, etc.), did you " +
|
||||
"use get_llm_call_detail to read the actual prompt/response that caused it?\n" +
|
||||
"[ ] If the root cause is in a prompt or pipeline, did you use search_code and read_file to " +
|
||||
"find the actual source file? Your response MUST include at least one real file path from the " +
|
||||
"codebase (e.g., 'django/aiservice/core/languages/python/testgen/system_prompt.md').\n" +
|
||||
"[ ] Are your recommendations grounded in specific code you read, not generic advice?\n\n" +
|
||||
"If any box is unchecked, go back and use the tools before responding.",
|
||||
)
|
||||
|
||||
return lines.join("\n")
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue