Commit graph

3 commits

Author SHA1 Message Date
Kevin Turcios
66187bbcc3 fix: v2 eval runner — shallow cached clones + non-interactive prompt
- Shallow clone (--no-checkout --depth 1 + fetch specific commit) instead
  of full clone — 15s vs 2+ min for large repos like codeflash-internal
- Cache clone in evals/repos/<name>/workspace/, cp -r for each run
- Use gh repo clone for private repo auth
- Fix eval prompt to skip skill's AskUserQuestion step in non-interactive mode
- Gitignore workspace/ dirs
- Update intro.md with v2 eval docs
2026-03-27 07:27:12 -05:00
Kevin Turcios
0d4fc9d8b7 feat: eval v2 — real-repo evals cloned from git
Add support for v2 evals that clone a real repo at a specific commit
instead of using bundled template source. The agent handles setup,
diagnosis, and fixing on its own.

- run-eval.sh: v1/v2 dispatch, repos/ directory, prompt from manifest
- First v2 eval: codeflash-internal psycopg serialization (PR #2489)
- EVAL-V2-SKETCH.md: design doc for the v2 eval system
- intro.md: repo onboarding guide
2026-03-27 07:25:10 -05:00
Kevin Turcios
64268dd023 Hello World 2026-03-24 16:14:04 -05:00