Replace 7 individual required-check workflows (unit-tests, mypy,
5 E2E tests) with a single ci.yaml following the astral-sh/ruff
gate pattern:
- determine-changes job uses native git diff (no third-party deps)
- Each test job skipped at job level when paths don't match
- Single required-checks-passed gate job accepts success + skipped
- E2E security preserved: environment gating, author allowlists
This fixes the long-standing issue where workflow-level path filters
leave required checks "Pending" on PRs that don't touch code paths,
blocking merge without admin override.
Estimated savings: ~$1.05/skipped PR ($0.64 unit-tests + $0.01
type-check + $0.40 E2E), ~$50-100/yr in compute, plus eliminating
all admin-merge workarounds.
Editing a workflow YAML file should not trigger that same workflow
to run. Removes .github/workflows/<file> from its own paths filter
in mypy.yml, prek.yaml, and unit-tests.yaml.
- codeflash-optimize.yaml: replace paths: ['**'] wildcard with targeted filters
- mypy.yml: add path filters (was firing on every PR/push including docs)
- prek.yaml: add path filters (was firing on every PR)
- unit-tests.yaml: add path filters (was firing on every PR/push)
Docs-only, README, experiment, and LICENSE changes no longer trigger
these workflows. Saves ~20 workflow runs per docs-only PR.
Adds `false &&` guard to the pr-review job condition. The job will
be skipped on all triggers until this is reverted. The @claude mention
job is unaffected.
v1.0.90 broke Bedrock OIDC auth — all Claude Code runs have been
failing with 403 since Apr 8.
Root cause: anthropics/claude-code-action#1196
Pinning to v1.0.89 (last working version) until upstream fix lands.
All 12 E2E workflows used `paths: ['**']` which triggered on every file
change — docs, configs, experiments, etc. This caused ~140-200 min of
compute per push event (18+ parallel workflows).
Now E2E tests only trigger when relevant source code changes:
- Python E2E: codeflash/**, tests/**, pyproject.toml, uv.lock, workflow files
- JS E2E: same + packages/**
- Java E2E: already had proper path filters (no change needed)
Estimated savings: ~$150-200/mo in CI compute.
- Trigger on any codeflash/** or tests/** changes (not just java subset)
- Validate replay test files are discovered per-function
- Already validates: replay test generation, global discovery count,
optimization success, and minimum speedup percentage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Triage now classifies PRs as TRIVIAL/SMALL/LARGE based on lines changed
- SMALL PRs: focused correctness check, quick duplicate scan, skip coverage
- LARGE PRs: full review with design checks, deep duplicate detection, coverage
- Optimization PRs: concise correctness verdict instead of long essays
- Added explicit scope rules: only read files in the diff, don't explore broadly
Instead of immediately closing optimization PRs when CI fails, Claude
now checks out the branch, inspects failures, and attempts to fix them.
Only closes if unfixable, with a specific explanation of the failures.
Merges the omni-main-java branch which synced main into omni-java,
including JavaFunctionOptimizer, removal of is_java()/is_python() guards,
protocol dispatch for parse_test_xml, and deletion of concolic_testing.py.
Drop the /simplify step that caused unprompted refactors and scope
creep in PR reviews. Also add prek pre-commit rule to project config
so the PR bot and all contributors see it.
Drop the /simplify step that caused unprompted refactors and scope
creep in PR reviews. Also add prek pre-commit rule to project config
so the PR bot and all contributors see it.
The unit-tests workflow relied on a pre-committed JAR binary in
resources/ which could become stale when Comparator.java changes.
Now the workflow builds the JAR from source and installs it to the
local Maven repo, matching what java-e2e and fibonacci-nogit already do.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add paths-ignore to skip reviews for docs, config, CI, and non-production files
- Use github.event.sender.login instead of github.actor for reliable bot detection
- Add triage step to early-exit on trivial PRs
Consolidates duplicate-code-detector.yml into claude.yml as a step in
the pr-review job. Adds concurrency groups with cancel-in-progress to
prevent comment spam from racing workflow runs.
- Use XML structure instead of markdown for clearer step boundaries
- Resolve stale review threads via GraphQL instead of leaving them
- Positive framing instead of negation for instructions
- Replace aggressive language with calm direct instructions
- Add /simplify skill invocation for code quality pass
- Add verification checkpoint at the end
- Auto-close stale codeflash optimization PRs (age, conflicts, CI failures, deleted functions)
- Remove inline comment MCP tool, add Skill tool
Remove redundant windows-unit-tests.yml and add Windows Python 3.13 job
to the main unit-tests.yaml workflow. Add PYTHONIOENCODING env var for
Windows compatibility.
Run unit tests on Windows with Python 3.13 in addition to all Python
versions (3.9-3.14) on Ubuntu. This ensures cross-platform compatibility
is tested while keeping Windows test duration reasonable.
Extend the publish workflow to handle both codeflash and codeflash-benchmark
releases from a single workflow file, triggered by their respective version
files. Also syncs benchmark __init__.py version to match pyproject.toml.
The cross-region inference profile for Claude Opus 4.6 on Bedrock is
`us.anthropic.claude-opus-4-6-v1`, not `us.anthropic.claude-opus-4-6-v1:0`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Azure Foundry authentication with AWS Bedrock OIDC in all
Claude Code GitHub Actions workflows.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merged omni-java base into PR #1279 to resolve conflicts.
Resolution approach:
1. test_discovery.py: Used refactored method call resolution from base
- New approach uses sophisticated type tracking (jedi-like "goto")
- Already includes duplicate checking (line 141)
- Removed old Strategy 3 (class-based fallback) as it's not needed
and caused single-function optimization issues
2. test_instrumentation.py: Combined both changes
- Added API key setup from PR #1279
- Kept FunctionToOptimize imports from base
The refactored code is more accurate and fixes the single-function
optimization issue that existed in the original PR.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The PR review bot was claiming lint issues were fixed without actually
fixing or committing them. Add a mandatory re-run of prek after fixes
and explicit instructions to report unfixed issues honestly.
- Remove instruction to use gh pr comment for summaries
- Add STEP 4 with explicit single-comment policy
- Include instructions to update existing comments
- Add cleanup step to delete duplicate comments
- Add git merge/fetch/checkout/branch to allowed tools
- Add gh pr merge/close to allowed tools
- Add allowed_bots for claude[bot] to trigger pr-review
- Restrict @claude mentions to maintainers only (OWNER/MEMBER/COLLABORATOR)
- Block fork PRs from triggering pr-review and claude-mention
- Run tests with coverage on changed files
- Compare coverage between PR and main branch
- New files require ≥75% test coverage
- Modified files must have changed lines covered
- Flag coverage regressions in PR comment
- Consolidate claude-code-review.yml into claude.yml with two jobs
- Add auto-fix for safe linting issues (formatting, imports) before review
- Use --from-ref origin/main to only check changed files
- Add smart re-review logic that resolves fixed comments
- Add inline comment support via MCP tool with 5-7 comment limit
Replace pre-commit with prek (faster Rust-based alternative) for linting.
- Add prek to dev dependencies
- Replace pre-commit workflow with prek workflow using setup-uv@v6
- Update Claude workflow allowed tools to use prek
Add allowedTools for pre-commit, ruff, pytest, mypy, coverage, and git/gh commands
to enable Claude to run linting and testing. Strengthen naming convention guidance
to explicitly forbid leading underscores on functions.
- Add CODEFLASH_API_KEY for test_instrumentation.py tests that instantiate Optimizer
- Create pom.xml for codeflash-java-runtime with Gson and SQLite JDBC dependencies
- Add CI step to build and install JAR before running tests
- Update .gitignore to allow pom.xml in codeflash-java-runtime
- All 348 Java tests now pass including 5 Comparator JAR integration tests
Add comprehensive e2e tests for the Java optimization pipeline:
- Function discovery (BubbleSort, Calculator)
- Code context extraction
- Code replacement
- Test discovery (JUnit 5)
- Project detection (Maven)
- Compilation and test execution
Also add:
- GitHub Actions workflow for Java e2e tests (java-e2e-tests.yml)
- Maven pom.xml for the Java sample project
- .gitignore exception for pom.xml
The e2e tests verify the full Java pipeline works correctly,
from function discovery through code replacement.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* first pass
restore
restore this too
Revert "first pass"
This reverts commit b507770b2c79cc948b33222d8877fb784bfe108a.
* continue
* Update uv.lock
* refresh lockfile
* bugfix
* temp
* fix these
* pytest changes
* formatting
* set up test env properly here too
* ruff
* make ruff happy
* Update e2e-bubblesort-unittest.yaml
* with pytest
* bugfix
* oops