codeflash-agent/.github/workflows/validate.yml

name: Plugin Validation

on:
  pull_request:
    types: [opened, synchronize, ready_for_review, reopened]
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  pull_request_review:
    types: [submitted]

jobs:
  validate:
    concurrency:
      group: validate-${{ github.head_ref || github.run_id }}
      cancel-in-progress: true
    if: |
      (
        github.event_name == 'pull_request' &&
        github.event.sender.login != 'claude[bot]' &&
        github.event.pull_request.head.repo.full_name == github.repository
      )
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      pull-requests: write
      issues: read
      id-token: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: ${{ github.event.pull_request.head.ref }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Run Plugin Validation
        uses: anthropics/claude-code-action@v1
        with:
          use_bedrock: "true"
          use_sticky_comment: true
          track_progress: true
          show_full_output: true
          prompt: |
            You are validating the codeflash-agent Claude Code plugin. This plugin has:
            - 6 agents in `agents/` (router + setup + 4 domain agents)
            - 2 skills in `skills/` (codeflash-optimize, memray-profiling)
            - Eval templates in `codeflash-evals/templates/`
            - Plugin manifest at `.claude-plugin/plugin.json`
            - No hooks directory

            Execute each step in order. If a step finds no issues, state that and continue.

            <step name="triage">
            Assess what changed in this PR:
            1. Run `gh pr diff ${{ github.event.pull_request.number }} --name-only` to get changed files.
            2. Classify changes:
               - AGENTS: files in `agents/`
               - SKILLS: files in `skills/`
               - EVALS: files in `codeflash-evals/`
               - PLUGIN_CONFIG: `.claude-plugin/plugin.json`, hooks
               - DOCS: `*.md` outside agents/skills, LICENSE
               - OTHER: anything else
            3. Record which categories have changes — later steps only run if relevant.
            </step>

            <step name="plugin_structure">
            First, use the Agent tool to launch a **claude-code-guide** agent with this prompt:
            "Look up the full Claude Code plugin specification. I need the required and optional fields for:
            1. plugin.json manifest schema
            2. Agent .md frontmatter (YAML between --- markers) — all valid fields
            3. Skill SKILL.md frontmatter — all valid fields
            Return the complete field lists with types and whether each is required."

            Then, using the spec returned by that agent, validate this plugin:
            - Read `.claude-plugin/plugin.json` and check against the plugin.json schema
            - Read each `agents/*.md` and validate frontmatter fields against the agent spec
            - Read each `skills/*/SKILL.md` and validate frontmatter fields against the skill spec
            - Check file cross-references (agents referenced in plugin.json exist, skills referenced in agent frontmatter exist)
            - Report any issues found
            </step>

            <step name="agent_consistency">
            Only run if AGENTS changed.

            The 4 domain agents (codeflash-cpu.md, codeflash-memory.md, codeflash-async.md, codeflash-structure.md)
            must all have these steps in their experiment loops:
            1. A "Review git history" step (step 1) with `git log --oneline -20` and `git diff HEAD~1`
            2. A "Guard" step (if configured in conventions.md) with revert/rework/discard logic
            3. A "Config audit" step (after KEEP) checking for dead/inconsistent config flags

            Check each domain agent:
            1. Read the experiment loop section of each file.
            2. Verify all 3 steps are present.
            3. Verify step numbering is sequential with no gaps.
            4. Verify the Guard step includes "revert, rework (max 2 attempts), then discard".
            5. Verify the Config audit step has domain-specific guidance (not generic).

            Also check: router agent (codeflash.md) domain detection table matches the 4 domain agents that exist.
            </step>

            <step name="eval_manifests">
            Only run if EVALS changed.

            For each `codeflash-evals/templates/*/manifest.json`:
            1. Verify valid JSON.
            2. Verify required fields: `name`, `eval_type`, `bugs` (array), `rubric` (object with `criteria`).
            3. Verify each bug has: `id`, `file`, `description`, `domain`.
            4. Verify `rubric.criteria` values are positive integers.
            5. Verify `rubric.total` equals the sum of criteria values (if present).
            6. Verify referenced files (`file` in bugs, `test_file`) actually exist in that template directory.
            </step>

            <step name="skill_review">
            Only run if SKILLS changed.

            First, use the Agent tool to launch a **claude-code-guide** agent with this prompt:
            "Look up Claude Code skill best practices. I need:
            1. What makes a good skill description (trigger terms, specificity, completeness)
            2. Best practices for allowed-tools restrictions
            3. Best practices for skill content structure (conciseness, actionability, progressive disclosure)
            Return the complete guidelines."

            Then, using those guidelines, review each skill in `skills/`:
            - Check description quality and trigger term coverage
            - Check allowed-tools restrictions are appropriate
            - Check content follows best practices (concise, actionable, clear workflow)
            - Report any issues found
            </step>

            <step name="summary">
            Post exactly one summary comment with all results:

            ## Plugin Validation

            ### Plugin Structure
            (validation findings or "All checks passed")

            ### Agent Consistency
            (experiment loop check results or "Not applicable — no agent changes")

            ### Eval Manifests
            (manifest validation results or "Not applicable — no eval changes")

            ### Skill Review
            (skill review findings or "Not applicable — no skill changes")

            ---
            *Validated by claude-code-guide + codeflash-agent checks*
            </step>

            <step name="verdict">
            End your summary comment with exactly one of these lines (no other text on that line):

            **Verdict: PASS**
            **Verdict: FAIL**

            Use FAIL only if a step found a **major** issue (broken functionality, missing required fields, incorrect cross-references).
            Warnings and minor style suggestions are NOT blocking — use PASS if the only findings are warnings.
            Use PASS if every step passed or only had minor/warning-level findings.
            </step>
          claude_args: '--model us.anthropic.claude-sonnet-4-6 --allowedTools "Agent,Read,Glob,Grep,Bash(gh pr diff*),Bash(gh pr view*),Bash(gh pr comment*),Bash(gh api*),Bash(git diff*),Bash(git log*),Bash(git status*),Bash(cat *),Bash(python3 *),Bash(jq *)"'

      - name: Check validation verdict
        if: always()
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          # Parse verdict from Claude's PR comment
          VERDICT=$(gh api repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments \
            --jq '[.[] | select(.user.login == "claude[bot]")] | last | .body' \
            | grep -oP 'Verdict:\s*\K(PASS|FAIL)' | tail -1 || true)

          if [ -z "$VERDICT" ]; then
            echo "::warning::Could not find verdict in Claude's PR comment"
            exit 0
          fi

          echo "Verdict: $VERDICT"
          if [ "$VERDICT" = "FAIL" ]; then
            echo "::error::Plugin validation found issues that need fixing"
            exit 1
          fi

  claude-mention:
    concurrency:
      group: claude-mention-${{ github.event.issue.number || github.event.pull_request.number || github.run_id }}
      cancel-in-progress: false
    if: |
      (
        github.event_name == 'issue_comment' &&
        contains(github.event.comment.body, '@claude') &&
        (github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'COLLABORATOR')
      ) ||
      (
        github.event_name == 'pull_request_review_comment' &&
        contains(github.event.comment.body, '@claude') &&
        (github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'COLLABORATOR') &&
        github.event.pull_request.head.repo.full_name == github.repository
      ) ||
      (
        github.event_name == 'pull_request_review' &&
        contains(github.event.review.body, '@claude') &&
        (github.event.review.author_association == 'OWNER' || github.event.review.author_association == 'MEMBER' || github.event.review.author_association == 'COLLABORATOR') &&
        github.event.pull_request.head.repo.full_name == github.repository
      )
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
      issues: read
      id-token: write
    steps:
      - name: Get PR head ref
        id: pr-ref
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          if [ "${{ github.event_name }}" = "issue_comment" ]; then
            PR_REF=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.issue.number }} --jq '.head.ref')
            echo "ref=$PR_REF" >> $GITHUB_OUTPUT
          else
            echo "ref=${{ github.event.pull_request.head.ref || github.head_ref }}" >> $GITHUB_OUTPUT
          fi

      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          ref: ${{ steps.pr-ref.outputs.ref }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Run Claude Code
        uses: anthropics/claude-code-action@v1
        with:
          use_bedrock: "true"
          claude_args: '--model us.anthropic.claude-sonnet-4-6 --allowedTools "Agent,Read,Edit,Write,Glob,Grep,Bash(git status*),Bash(git diff*),Bash(git add *),Bash(git commit *),Bash(git push*),Bash(git log*),Bash(gh pr comment*),Bash(gh pr view*),Bash(gh pr diff*)"'