Add support for v2 evals that clone a real repo at a specific commit instead of using bundled template source. The agent handles setup, diagnosis, and fixing on its own. - run-eval.sh: v1/v2 dispatch, repos/ directory, prompt from manifest - First v2 eval: codeflash-internal psycopg serialization (PR #2489) - EVAL-V2-SKETCH.md: design doc for the v2 eval system - intro.md: repo onboarding guide |
||
|---|---|---|
| .. | ||
| repos/codeflash-internal-psycopg-serialization | ||
| templates | ||
| .gitignore | ||
| run-eval.sh | ||
| score-eval.sh | ||
| score.py | ||