5 scenarios testing: sequential debugging, Result type + effort config, test patterns, domain type conventions, and deduplication/repair mechanics. Also adds tessl-labs/tessl-skill-eval-scenarios dev dependency.
825 B
825 B
Investigate Low Candidate Diversity
Context
A codeflash user is optimizing a data processing function at medium effort level. The AI service returns 5 candidates, but the optimization log shows only 1 candidate was actually benchmarked. Of the 5 candidates, 1 passed behavioral tests but didn't meet the performance threshold. The user wants to understand what happened to the other 4 candidates and why no repair attempts were made.
Task
Write an analysis document explaining:
- Why only 1 out of 5 candidates was benchmarked
- How the system determines which candidates to actually test
- Under what conditions the system would have attempted to repair the failing candidates
- What the user could change to get more diverse results
Expected Outputs
A markdown file analysis.md with the explanation.