Replace inline determine-changes, prek, and required-checks-passed
jobs with reusable workflows and composite actions from
codeflash-ai/github-workflows. This reduces CI maintenance burden
by centralizing common CI logic.
pytest-cov's trace function conflicts with the Tracer class under test,
causing it to self-disable in CI. Linux also reports ~1% lower coverage
than macOS due to platform-specific branches.
- Add pytest-cov to dev dependencies
- Add .coveragerc with branch coverage, 60% floor (current baseline),
and source/omit configuration
- Add coverage CI job (ubuntu/py3.13) that runs pytest with --cov,
enforces the floor, and uploads coverage.xml as an artifact
- Wire coverage into the required-checks-passed gate
Closes#2080
- Add PR template with required linked issue/discussion section
- Add check-linked-issue CI job that validates PR body contains a
reference (#123, Closes/Fixes/Relates, GitHub URL, or CF-# ticket)
- Wire into required-checks-passed gate so it blocks merge
- Update CONTRIBUTING.md with the policy and motivation
Make TOTAL_LOOPING_TIME configurable via CODEFLASH_LOOPING_TIME env var
(defaults to 10s). Set to 5s in Java E2E CI jobs to cut verification
time per candidate. Also cache the codeflash-runtime JAR keyed on
source hash to skip mvn install when unchanged.
uv-dynamic-versioning rewrites version.py on every `uv run`, so the
ruff auto-format job was inadvertently committing dev version strings.
Restore version.py files after formatting and revert the ones already
changed on this branch.
The change detection for JS E2E tests was missing the test fixture
directory, so PRs that only modify JS test data (like this one) were
skipped. Java already had its equivalent path included.
- Delete standalone java-e2e-tests.yml (duplicate of ci.yaml e2e-java)
- Add npm cache to e2e-js jobs via setup-node cache option
- Consolidate Maven build: mvn clean package + install → single mvn install
- Add .github/workflows/ci.yaml and .github/actions/** to push paths
so CI validates its own changes when merged to main
- Remove codeflash-java-runtime/ from unit_tests change detection
- Narrow e2e flag from codeflash/ to explicit Python subdirs (excludes java/, javascript/)
- Narrow tests/ in e2e_java/e2e_js to specific test scripts
- Extract duplicated Validate PR step into composite action
- Use fetch-depth: 1 for unit-tests and type-check (no git history needed)
- Remove continue-on-error: true from unit-tests (was masking real failures)
- Change git add -A to git add -u in prek auto-fix (won't stage untracked files)
Expand e2e_java and e2e_js change detection to include shared pipeline
code (optimization/, verification/, languages/base.py) but decouple
from the broad e2e flag. A change to codeflash/version.py now only
triggers Python E2Es, not Java/JS E2Es.
Only 5 of 3,943 unit tests need Java, and they already have
skip_if_maven_not_available() guards. Java execution is validated
by the e2e-java job. Saves ~30-60s per matrix entry (7 entries).
When matrix jobs are skipped, `${{ matrix.name }}` is never expanded,
showing literal "matrix.name" in the checks UI. Removing the `name:`
field lets GitHub use the job ID when skipped and auto-expand matrix
values when running.
Add all non-required-check E2E workflows and prek lint to the
consolidated ci.yaml:
- 4 standard Python E2Es (async, benchmark, coverage, init)
- 3 JS E2Es (cjs-function, esm-async, ts-class)
- 2 Java E2Es (fibonacci-nogit, tracer)
- prek lint
New change detection outputs:
- e2e_js: triggers JS E2Es when packages/ changes
- e2e_java: triggers Java E2Es when java runtime/fixtures change
Total: 17 jobs + determine-changes + gate = 19 jobs in one file.
Down from 22 workflow files to 7 (remaining are non-test: claude,
codeflash self-optimize, label-workflow-changes, publish, java-e2e).
Additional savings per irrelevant PR: ~$0.80 (10 jobs x ~$0.08).
Total per skipped PR: ~$1.85.
ci.yaml was in all three check_paths calls, so creating/modifying
the workflow itself triggered all test jobs. Workflow-only PRs
should skip tests — the gate job still validates the pattern.
Replace 7 individual required-check workflows (unit-tests, mypy,
5 E2E tests) with a single ci.yaml following the astral-sh/ruff
gate pattern:
- determine-changes job uses native git diff (no third-party deps)
- Each test job skipped at job level when paths don't match
- Single required-checks-passed gate job accepts success + skipped
- E2E security preserved: environment gating, author allowlists
This fixes the long-standing issue where workflow-level path filters
leave required checks "Pending" on PRs that don't touch code paths,
blocking merge without admin override.
Estimated savings: ~$1.05/skipped PR ($0.64 unit-tests + $0.01
type-check + $0.40 E2E), ~$50-100/yr in compute, plus eliminating
all admin-merge workarounds.
Editing a workflow YAML file should not trigger that same workflow
to run. Removes .github/workflows/<file> from its own paths filter
in mypy.yml, prek.yaml, and unit-tests.yaml.
- codeflash-optimize.yaml: replace paths: ['**'] wildcard with targeted filters
- mypy.yml: add path filters (was firing on every PR/push including docs)
- prek.yaml: add path filters (was firing on every PR)
- unit-tests.yaml: add path filters (was firing on every PR/push)
Docs-only, README, experiment, and LICENSE changes no longer trigger
these workflows. Saves ~20 workflow runs per docs-only PR.
Adds `false &&` guard to the pr-review job condition. The job will
be skipped on all triggers until this is reverted. The @claude mention
job is unaffected.
v1.0.90 broke Bedrock OIDC auth — all Claude Code runs have been
failing with 403 since Apr 8.
Root cause: anthropics/claude-code-action#1196
Pinning to v1.0.89 (last working version) until upstream fix lands.
All 12 E2E workflows used `paths: ['**']` which triggered on every file
change — docs, configs, experiments, etc. This caused ~140-200 min of
compute per push event (18+ parallel workflows).
Now E2E tests only trigger when relevant source code changes:
- Python E2E: codeflash/**, tests/**, pyproject.toml, uv.lock, workflow files
- JS E2E: same + packages/**
- Java E2E: already had proper path filters (no change needed)
Estimated savings: ~$150-200/mo in CI compute.
- Trigger on any codeflash/** or tests/** changes (not just java subset)
- Validate replay test files are discovered per-function
- Already validates: replay test generation, global discovery count,
optimization success, and minimum speedup percentage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Triage now classifies PRs as TRIVIAL/SMALL/LARGE based on lines changed
- SMALL PRs: focused correctness check, quick duplicate scan, skip coverage
- LARGE PRs: full review with design checks, deep duplicate detection, coverage
- Optimization PRs: concise correctness verdict instead of long essays
- Added explicit scope rules: only read files in the diff, don't explore broadly
Instead of immediately closing optimization PRs when CI fails, Claude
now checks out the branch, inspects failures, and attempts to fix them.
Only closes if unfixable, with a specific explanation of the failures.