codeflash-agent/evals
Kevin Turcios 0d4fc9d8b7 feat: eval v2 — real-repo evals cloned from git
Add support for v2 evals that clone a real repo at a specific commit
instead of using bundled template source. The agent handles setup,
diagnosis, and fixing on its own.

- run-eval.sh: v1/v2 dispatch, repos/ directory, prompt from manifest
- First v2 eval: codeflash-internal psycopg serialization (PR #2489)
- EVAL-V2-SKETCH.md: design doc for the v2 eval system
- intro.md: repo onboarding guide
2026-03-27 07:25:10 -05:00
..
repos/codeflash-internal-psycopg-serialization feat: eval v2 — real-repo evals cloned from git 2026-03-27 07:25:10 -05:00
templates Hello World 2026-03-24 16:14:04 -05:00
.gitignore Hello World 2026-03-24 16:14:04 -05:00
run-eval.sh feat: eval v2 — real-repo evals cloned from git 2026-03-27 07:25:10 -05:00
score-eval.sh Hello World 2026-03-24 16:14:04 -05:00
score.py Hello World 2026-03-24 16:14:04 -05:00