Merge branch 'main' of github.com:codeflash-ai/codeflash

This commit is contained in:
ali 2026-02-17 23:10:28 +02:00
commit 7308afebc7
No known key found for this signature in database
GPG key ID: 44F9B42770617B9B
109 changed files with 4260 additions and 2254 deletions

View file

@ -26,3 +26,17 @@ codeflash/
├── result/ # Result types and handling
└── version.py # Version information
```
## Key Entry Points
| Task | Start here |
|------|------------|
| CLI arguments & commands | `cli_cmds/cli.py` |
| Optimization orchestration | `optimization/optimizer.py``run()` |
| Per-function optimization | `optimization/function_optimizer.py` |
| Function discovery | `discovery/functions_to_optimize.py` |
| Context extraction | `context/code_context_extractor.py` |
| Test execution | `verification/test_runner.py`, `verification/pytest_plugin.py` |
| Performance ranking | `benchmarking/function_ranker.py` |
| Domain types | `models/models.py`, `models/function_types.py` |
| Result handling | `either.py` (`Result`, `Success`, `Failure`, `is_successful`) |

View file

@ -2,6 +2,7 @@
- **Line length**: 120 characters
- **Python**: 3.9+ syntax
- **Package management**: Always use `uv`, never `pip`
- **Tooling**: Ruff for linting/formatting, mypy strict mode, prek for pre-commit checks
- **Comments**: Minimal - only explain "why", not "what"
- **Docstrings**: Do not add unless explicitly requested

View file

@ -1,5 +1,6 @@
# Git Commits & Pull Requests
- **Always create a new branch from `main` before starting any new work** — never commit directly to `main` or reuse an existing feature branch for unrelated changes
- Use conventional commit format: `fix:`, `feat:`, `refactor:`, `docs:`, `test:`, `chore:`
- Keep commits atomic - one logical change per commit
- Commit message body should be concise (1-2 sentences max)

View file

@ -0,0 +1,12 @@
---
paths:
- "codeflash/languages/**/*.py"
---
# Language Support Patterns
- Current language is a module-level singleton in `languages/current.py` — use `set_current_language()` / `current_language()`, never pass language as a parameter through call chains
- Use `get_language_support(identifier)` from `languages/registry.py` to get a `LanguageSupport` instance — never import language classes directly
- New language support classes must use the `@register_language` decorator to register with the extension and language registries
- `languages/__init__.py` uses `__getattr__` for lazy imports to avoid circular dependencies — follow this pattern when adding new exports
- `is_javascript()` returns `True` for both JavaScript and TypeScript

View file

@ -0,0 +1,17 @@
---
paths:
- "codeflash/optimization/**/*.py"
- "codeflash/verification/**/*.py"
- "codeflash/benchmarking/**/*.py"
- "codeflash/context/**/*.py"
---
# Optimization Pipeline Patterns
- All major operations return `Result[SuccessType, ErrorType]` — construct with `Success(value)` / `Failure(error)`, check with `is_successful()` before calling `unwrap()`
- Code context has token limits (`OPTIMIZATION_CONTEXT_TOKEN_LIMIT`, `TESTGEN_CONTEXT_TOKEN_LIMIT` in `config_consts.py`) — exceeding them rejects the function
- `read_writable_code` can span multiple files; `read_only_context_code` is reference-only
- Code is serialized as markdown code blocks: ` ```language:filepath\ncode\n``` ` (see `CodeStringsMarkdown`)
- Candidates form a forest (DAG): refinements/repairs reference `parent_id` on previous candidates
- Test generation and optimization run concurrently — coordinate through `CandidateEvaluationContext`
- Generated tests are instrumented with `codeflash_capture.py` to record return values and traces

View file

@ -6,6 +6,3 @@ paths:
# Source Code Rules
- Use `libcst` for code modification/transformation to preserve formatting. `ast` is acceptable for read-only analysis and parsing.
- NEVER use leading underscores for function names (e.g., `_helper`). Python has no true private functions. Always use public names.
- Any new feature or bug fix that can be tested automatically must have test cases.
- If changes affect existing test expectations, update the tests accordingly. Tests must always pass after changes.

View file

@ -13,3 +13,5 @@ paths:
- Use `.as_posix()` when converting resolved paths to strings (normalizes to forward slashes).
- Any new feature or bug fix that can be tested automatically must have test cases.
- If changes affect existing test expectations, update the tests accordingly. Tests must always pass after changes.
- The pytest plugin patches `time`, `random`, `uuid`, and `datetime` for deterministic test execution — never assume real randomness or real time in verification tests.
- `conftest.py` uses an autouse fixture that calls `reset_current_language()` — tests always start with Python as the default language.

View file

@ -0,0 +1,12 @@
# Fix mypy errors
When modifying code, fix any mypy type errors in the files you changed:
```bash
uv run mypy --non-interactive --config-file pyproject.toml <changed_files>
```
- Fix type annotation issues: missing return types, incorrect types, Optional/None unions, import errors for type hints
- Do NOT add `# type: ignore` comments — always fix the root cause
- Do NOT fix type errors that require logic changes, complex generic type rework, or anything that could change runtime behavior
- Files in `mypy_allowlist.txt` are checked in CI — ensure they remain error-free

View file

@ -0,0 +1,9 @@
# Fix prek failures
When prek (pre-commit) checks fail:
1. Run `uv run prek run` to see failures (local, checks staged files)
2. In CI, the equivalent is `uv run prek run --from-ref origin/main`
3. prek runs ruff format, ruff check, and mypy on changed files
4. Fix issues in order: formatting → lint → type errors
5. Re-run `uv run prek run` to verify all checks pass

2
.codex/skills/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
# Managed by Tessl
tessl:*

2
.gemini/skills/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
# Managed by Tessl
tessl:*

View file

@ -42,11 +42,17 @@ jobs:
uv venv --seed
uv sync
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
use_foundry: "true"
use_bedrock: "true"
use_sticky_comment: true
allowed_bots: "claude[bot],codeflash-ai[bot]"
prompt: |
@ -173,12 +179,9 @@ jobs:
2. For each optimization PR:
- Check if CI is passing: `gh pr checks <number>`
- If all checks pass, merge it: `gh pr merge <number> --squash --delete-branch`
claude_args: '--model claude-opus-4-6 --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr list:*),Bash(gh pr checks:*),Bash(gh pr merge:*),Bash(gh issue view:*),Bash(gh issue list:*),Bash(gh api:*),Bash(uv run prek *),Bash(uv run mypy *),Bash(uv run coverage *),Bash(uv run pytest *),Bash(git status*),Bash(git add *),Bash(git commit *),Bash(git push*),Bash(git diff *),Bash(git checkout *),Read,Glob,Grep,Edit"'
claude_args: '--model us.anthropic.claude-opus-4-6-v1 --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr list:*),Bash(gh pr checks:*),Bash(gh pr merge:*),Bash(gh issue view:*),Bash(gh issue list:*),Bash(gh api:*),Bash(uv run prek *),Bash(uv run mypy *),Bash(uv run coverage *),Bash(uv run pytest *),Bash(git status*),Bash(git add *),Bash(git commit *),Bash(git push*),Bash(git diff *),Bash(git checkout *),Read,Glob,Grep,Edit"'
additional_permissions: |
actions: read
env:
ANTHROPIC_FOUNDRY_API_KEY: ${{ secrets.AZURE_ANTHROPIC_API_KEY }}
ANTHROPIC_FOUNDRY_BASE_URL: ${{ secrets.AZURE_ANTHROPIC_ENDPOINT }}
# @claude mentions (can edit and push) - restricted to maintainers only
claude-mention:
@ -240,14 +243,17 @@ jobs:
uv venv --seed
uv sync
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
use_foundry: "true"
claude_args: '--model claude-opus-4-6 --allowedTools "Read,Edit,Write,Glob,Grep,Bash(git status*),Bash(git diff*),Bash(git add *),Bash(git commit *),Bash(git push*),Bash(git log*),Bash(git merge*),Bash(git fetch*),Bash(git checkout*),Bash(git branch*),Bash(uv run prek *),Bash(prek *),Bash(uv run ruff *),Bash(uv run pytest *),Bash(uv run mypy *),Bash(uv run coverage *),Bash(gh pr comment*),Bash(gh pr view*),Bash(gh pr diff*),Bash(gh pr merge*),Bash(gh pr close*)"'
use_bedrock: "true"
claude_args: '--model us.anthropic.claude-opus-4-6-v1 --allowedTools "Read,Edit,Write,Glob,Grep,Bash(git status*),Bash(git diff*),Bash(git add *),Bash(git commit *),Bash(git push*),Bash(git log*),Bash(git merge*),Bash(git fetch*),Bash(git checkout*),Bash(git branch*),Bash(uv run prek *),Bash(prek *),Bash(uv run ruff *),Bash(uv run pytest *),Bash(uv run mypy *),Bash(uv run coverage *),Bash(gh pr comment*),Bash(gh pr view*),Bash(gh pr diff*),Bash(gh pr merge*),Bash(gh pr close*)"'
additional_permissions: |
actions: read
env:
ANTHROPIC_FOUNDRY_API_KEY: ${{ secrets.AZURE_ANTHROPIC_API_KEY }}
ANTHROPIC_FOUNDRY_BASE_URL: ${{ secrets.AZURE_ANTHROPIC_ENDPOINT }}

View file

@ -0,0 +1,116 @@
name: Duplicate Code Detector
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize]
jobs:
detect-duplicates:
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
issues: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.ref || github.ref }}
- name: Start Serena MCP server
run: |
docker pull ghcr.io/github/serena-mcp-server:latest
docker run -d --name serena \
--network host \
-v "${{ github.workspace }}:${{ github.workspace }}:rw" \
ghcr.io/github/serena-mcp-server:latest \
serena start-mcp-server --context codex --project "${{ github.workspace }}"
mkdir -p /tmp/mcp-config
cat > /tmp/mcp-config/mcp-servers.json << 'EOF'
{
"mcpServers": {
"serena": {
"command": "docker",
"args": ["exec", "-i", "serena", "serena", "start-mcp-server", "--context", "codex", "--project", "${{ github.workspace }}"]
}
}
}
EOF
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Run Claude Code
uses: anthropics/claude-code-action@v1
with:
use_bedrock: "true"
use_sticky_comment: true
allowed_bots: "claude[bot],codeflash-ai[bot]"
claude_args: '--mcp-config /tmp/mcp-config/mcp-servers.json --allowedTools "Read,Glob,Grep,Bash(git diff:*),Bash(git log:*),Bash(git show:*),Bash(wc *),Bash(find *),mcp__serena__*"'
prompt: |
You are a duplicate code detector with access to Serena semantic code analysis.
## Setup
First activate the project in Serena:
- Use `mcp__serena__activate_project` with the workspace path `${{ github.workspace }}`
## Steps
1. Get the list of changed .py files (excluding tests):
`git diff --name-only origin/main...HEAD -- '*.py' | grep -v -E '(test_|_test\.py|/tests/|/test/)'`
2. Use Serena's semantic analysis on changed files:
- `mcp__serena__get_symbols_overview` to understand file structure
- `mcp__serena__find_symbol` to search for similarly named symbols across the codebase
- `mcp__serena__find_referencing_symbols` to understand usage patterns
- `mcp__serena__search_for_pattern` to find similar code patterns
3. For each changed file, look for:
- **Exact Duplication**: Identical code blocks (>10 lines) in multiple locations
- **Structural Duplication**: Same logic with minor variations (different variable names)
- **Functional Duplication**: Different implementations of the same functionality
- **Copy-Paste Programming**: Similar blocks that could be extracted into shared utilities
4. Cross-reference against the rest of the codebase using Serena:
- Search for similar function signatures and logic patterns
- Check if new code duplicates existing utilities or helpers
- Look for repeated patterns across modules
## What to Report
- Identical or nearly identical functions in different files
- Repeated code blocks that could be extracted to utilities
- Similar classes or modules with overlapping functionality
- Copy-pasted code with minor modifications
- Duplicated business logic across components
## What to Skip
- Standard boilerplate (imports, __init__, etc.)
- Test setup/teardown code
- Configuration with similar structure
- Language-specific patterns (constructors, getters/setters)
- Small snippets (<5 lines) unless highly repetitive
- Workflow files under .github/
## Output
Post a single PR comment with your findings. For each pattern found:
- Severity (High/Medium/Low)
- File locations with line numbers
- Code samples showing the duplication
- Concrete refactoring suggestion
If no significant duplication is found, say so briefly. Do not create issues — just comment on the PR.
- name: Stop Serena
if: always()
run: docker stop serena && docker rm serena || true

View file

@ -1,50 +0,0 @@
name: JavaScript/TypeScript Integration Tests
on:
push:
branches:
- main
pull_request:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref_name }}
cancel-in-progress: true
jobs:
js-integration-tests:
name: JS/TS Integration Tests
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install uv
uses: astral-sh/setup-uv@v6
- name: Install Python dependencies
run: |
uv venv --seed
uv sync
- name: Install npm dependencies for test projects
run: |
npm install --prefix code_to_optimize/js/code_to_optimize_js
npm install --prefix code_to_optimize/js/code_to_optimize_ts
npm install --prefix code_to_optimize/js/code_to_optimize_vitest
- name: Run JavaScript integration tests
run: |
uv run pytest tests/languages/javascript/ -v
uv run pytest tests/test_languages/test_vitest_e2e.py -v
uv run pytest tests/test_languages/test_javascript_e2e.py -v
uv run pytest tests/test_languages/test_javascript_support.py -v
uv run pytest tests/code_utils/test_config_js.py -v

2
.gitignore vendored
View file

@ -268,3 +268,5 @@ tessl.json
# Tessl auto-generates AGENTS.md on install; ignore to avoid cluttering git status
AGENTS.md
.serena/
.codeflash/

12
.mcp.json Normal file
View file

@ -0,0 +1,12 @@
{
"mcpServers": {
"tessl": {
"type": "stdio",
"command": "tessl",
"args": [
"mcp",
"start"
]
}
}
}

View file

@ -1,54 +1,32 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. It uses LLMs to generate optimization candidates, verifies correctness through test execution, and benchmarks performance improvements.
## Common Commands
## Optimization Pipeline
```bash
# Package management (NEVER use pip)
uv sync # Install dependencies
uv sync --group dev # Install dev dependencies
uv add <package> # Add a package
# Running tests
uv run pytest tests/ # Run all tests
uv run pytest tests/test_foo.py # Run specific test file
uv run pytest tests/test_foo.py::test_bar -v # Run single test
# Type checking and linting
uv run mypy codeflash/ # Type check
uv run ruff check codeflash/ # Lint
uv run ruff format codeflash/ # Format
# Linting (run before committing)
uv run prek run --from-ref origin/main
# Mypy type checking (run on changed files before committing)
uv run mypy --non-interactive --config-file pyproject.toml <changed_files>
# Running the CLI
uv run codeflash --help
uv run codeflash init # Initialize in a project
uv run codeflash --all # Optimize entire codebase
```
Discovery → Ranking → Context Extraction → Test Gen + Optimization → Baseline → Candidate Evaluation → PR
```
## Mypy Type Checking
1. **Discovery** (`discovery/`): Find optimizable functions across the codebase
2. **Ranking** (`benchmarking/function_ranker.py`): Rank functions by addressable time using trace data
3. **Context** (`context/`): Extract code dependencies (read-writable code + read-only imports)
4. **Optimization** (`optimization/`, `api/`): Generate candidates via AI service, run in parallel with test generation
5. **Verification** (`verification/`): Run candidates against tests, compare outputs via custom pytest plugin
6. **Benchmarking** (`benchmarking/`): Measure performance, select best candidate by speedup
7. **Result** (`result/`, `github/`): Create PR with winning optimization
When modifying code, fix any mypy type errors in the files you changed. Run mypy on changed files:
## Domain Glossary
```bash
uv run mypy --non-interactive --config-file pyproject.toml <changed_files>
```
Rules:
- Fix type annotation issues: missing return types, incorrect types, Optional/None unions, import errors for type hints
- Do NOT add `# type: ignore` comments — always fix the root cause
- Do NOT fix type errors that require logic changes, complex generic type rework, or anything that could change runtime behavior
- Files in `mypy_allowlist.txt` are checked in CI — ensure they remain error-free
- **Optimization candidate**: A generated code variant that might be faster (`OptimizedCandidate`)
- **Function context**: All code needed for optimization — split into read-writable (modifiable) and read-only (reference)
- **Addressable time**: Time a function spends that could be optimized (own time + callee time / call count)
- **Candidate forest**: DAG of candidates where refinements/repairs build on previous candidates
- **Replay test**: Test generated from recorded benchmark data to reproduce real workloads
- **Tracer**: Profiling system that records function call trees and timings (`tracing/`, `tracer.py`)
- **Worktree mode**: Git worktree-based parallel optimization (`--worktree` flag)
<!-- Section below is auto-generated by `tessl install` - do not edit manually -->

98
LICENSE Normal file
View file

@ -0,0 +1,98 @@
Business Source License 1.1
Parameters
Licensor: CodeFlash Inc.
Licensed Work: Codeflash Client version 0.20.x
The Licensed Work is (c) 2024 CodeFlash Inc.
Additional Use Grant: None. Production use of the Licensed Work is only permitted
if you have entered into a separate written agreement
with CodeFlash Inc. for production use in connection
with a subscription to CodeFlash's Code Optimization
Platform. Please visit codeflash.ai for further
information.
Change Date: 2030-01-26
Change License: MIT
Notice
The Business Source License (this document, or the “License”) is not an Open
Source license. However, the Licensed Work will eventually be made available
under an Open Source License, as stated in this License.
License text copyright (c) 2017 MariaDB Corporation Ab, All Rights Reserved.
“Business Source License” is a trademark of MariaDB Corporation Ab.
-----------------------------------------------------------------------------
Business Source License 1.1
Terms
The Licensor hereby grants you the right to copy, modify, create derivative
works, redistribute, and make non-production use of the Licensed Work. The
Licensor may make an Additional Use Grant, above, permitting limited
production use.
Effective on the Change Date, or the fourth anniversary of the first publicly
available distribution of a specific version of the Licensed Work under this
License, whichever comes first, the Licensor hereby grants you rights under
the terms of the Change License, and the rights granted in the paragraph
above terminate.
If your use of the Licensed Work does not comply with the requirements
currently in effect as described in this License, you must purchase a
commercial license from the Licensor, its affiliated entities, or authorized
resellers, or you must refrain from using the Licensed Work.
All copies of the original and modified Licensed Work, and derivative works
of the Licensed Work, are subject to this License. This License applies
separately for each version of the Licensed Work and the Change Date may vary
for each version of the Licensed Work released by Licensor.
You must conspicuously display this License on each original or modified copy
of the Licensed Work. If you receive the Licensed Work in original or
modified form from a third party, the terms and conditions set forth in this
License apply to your use of that work.
Any use of the Licensed Work in violation of this License will automatically
terminate your rights under this License for the current and all other
versions of the Licensed Work.
This License does not grant you any right in any trademark or logo of
Licensor or its affiliates (provided that you may use a trademark or logo of
Licensor as expressly required by this License).
TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE LICENSED WORK IS PROVIDED ON
AN “AS IS” BASIS. LICENSOR HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS,
EXPRESS OR IMPLIED, INCLUDING (WITHOUT LIMITATION) WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND
TITLE.
MariaDB hereby grants you permission to use this Licenses text to license
your works, and to refer to it using the trademark “Business Source License”,
as long as you comply with the Covenants of Licensor below.
Covenants of Licensor
In consideration of the right to use this Licenses text and the “Business
Source License” name and trademark, Licensor covenants to MariaDB, and to all
other recipients of the licensed work to be provided by Licensor:
1. To specify as the Change License the GPL Version 2.0 or any later version,
or a license that is compatible with GPL Version 2.0 or a later version,
where “compatible” means that software provided under the Change License can
be included in a program with software provided under GPL Version 2.0 or a
later version. Licensor may specify additional Change Licenses without
limitation.
2. To either: (a) specify an additional grant of rights to use that does not
impose any additional restriction on the right granted in this License, as
the Additional Use Grant; or (b) insert the text “None”.
3. To specify a Change Date.
4. Not to modify this License in any other way.

View file

@ -0,0 +1,98 @@
Business Source License 1.1
Parameters
Licensor: CodeFlash Inc.
Licensed Work: Codeflash Client version 0.20.x
The Licensed Work is (c) 2024 CodeFlash Inc.
Additional Use Grant: None. Production use of the Licensed Work is only permitted
if you have entered into a separate written agreement
with CodeFlash Inc. for production use in connection
with a subscription to CodeFlash's Code Optimization
Platform. Please visit codeflash.ai for further
information.
Change Date: 2030-01-26
Change License: MIT
Notice
The Business Source License (this document, or the “License”) is not an Open
Source license. However, the Licensed Work will eventually be made available
under an Open Source License, as stated in this License.
License text copyright (c) 2017 MariaDB Corporation Ab, All Rights Reserved.
“Business Source License” is a trademark of MariaDB Corporation Ab.
-----------------------------------------------------------------------------
Business Source License 1.1
Terms
The Licensor hereby grants you the right to copy, modify, create derivative
works, redistribute, and make non-production use of the Licensed Work. The
Licensor may make an Additional Use Grant, above, permitting limited
production use.
Effective on the Change Date, or the fourth anniversary of the first publicly
available distribution of a specific version of the Licensed Work under this
License, whichever comes first, the Licensor hereby grants you rights under
the terms of the Change License, and the rights granted in the paragraph
above terminate.
If your use of the Licensed Work does not comply with the requirements
currently in effect as described in this License, you must purchase a
commercial license from the Licensor, its affiliated entities, or authorized
resellers, or you must refrain from using the Licensed Work.
All copies of the original and modified Licensed Work, and derivative works
of the Licensed Work, are subject to this License. This License applies
separately for each version of the Licensed Work and the Change Date may vary
for each version of the Licensed Work released by Licensor.
You must conspicuously display this License on each original or modified copy
of the Licensed Work. If you receive the Licensed Work in original or
modified form from a third party, the terms and conditions set forth in this
License apply to your use of that work.
Any use of the Licensed Work in violation of this License will automatically
terminate your rights under this License for the current and all other
versions of the Licensed Work.
This License does not grant you any right in any trademark or logo of
Licensor or its affiliates (provided that you may use a trademark or logo of
Licensor as expressly required by this License).
TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE LICENSED WORK IS PROVIDED ON
AN “AS IS” BASIS. LICENSOR HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS,
EXPRESS OR IMPLIED, INCLUDING (WITHOUT LIMITATION) WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND
TITLE.
MariaDB hereby grants you permission to use this Licenses text to license
your works, and to refer to it using the trademark “Business Source License”,
as long as you comply with the Covenants of Licensor below.
Covenants of Licensor
In consideration of the right to use this Licenses text and the “Business
Source License” name and trademark, Licensor covenants to MariaDB, and to all
other recipients of the licensed work to be provided by Licensor:
1. To specify as the Change License the GPL Version 2.0 or any later version,
or a license that is compatible with GPL Version 2.0 or a later version,
where “compatible” means that software provided under the Change License can
be included in a program with software provided under GPL Version 2.0 or a
later version. Licensor may specify additional Change Licenses without
limitation.
2. To either: (a) specify an additional grant of rights to use that does not
impose any additional restriction on the right granted in this License, as
the Additional Use Grant; or (b) insert the text “None”.
3. To specify a Change Date.
4. Not to modify this License in any other way.

View file

@ -0,0 +1,15 @@
# CodeFlash Benchmark
A pytest benchmarking plugin for [CodeFlash](https://codeflash.ai) - automatic code performance optimization.
## Installation
```bash
pip install codeflash-benchmark
```
## Usage
This plugin provides benchmarking capabilities for pytest tests used by CodeFlash's optimization pipeline.
For more information, visit [codeflash.ai](https://codeflash.ai).

View file

@ -1,32 +1,32 @@
[project]
name = "codeflash-benchmark"
version = "0.2.0"
description = "Pytest benchmarking plugin for codeflash.ai - automatic code performance optimization"
authors = [{ name = "CodeFlash Inc.", email = "contact@codeflash.ai" }]
requires-python = ">=3.9"
readme = "README.md"
license = {text = "BSL-1.1"}
keywords = [
"codeflash",
"benchmark",
"pytest",
"performance",
"testing",
]
dependencies = [
"pytest>=7.0.0,!=8.3.4",
]
[project.urls]
Homepage = "https://codeflash.ai"
Repository = "https://github.com/codeflash-ai/codeflash-benchmark"
[project.entry-points.pytest11]
codeflash-benchmark = "codeflash_benchmark.plugin"
[build-system]
requires = ["setuptools>=45", "wheel"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
packages = ["codeflash_benchmark"]
[project]
name = "codeflash-benchmark"
version = "0.2.0"
description = "Pytest benchmarking plugin for codeflash.ai - automatic code performance optimization"
authors = [{ name = "CodeFlash Inc.", email = "contact@codeflash.ai" }]
requires-python = ">=3.9"
readme = "README.md"
license-files = ["LICENSE"]
keywords = [
"codeflash",
"benchmark",
"pytest",
"performance",
"testing",
]
dependencies = [
"pytest>=7.0.0,!=8.3.4",
]
[project.urls]
Homepage = "https://codeflash.ai"
Repository = "https://github.com/codeflash-ai/codeflash-benchmark"
[project.entry-points.pytest11]
codeflash-benchmark = "codeflash_benchmark.plugin"
[build-system]
requires = ["setuptools>=45", "wheel"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
packages = ["codeflash_benchmark"]

View file

@ -4,8 +4,8 @@ from enum import Enum
from typing import Any, Union
MAX_TEST_RUN_ITERATIONS = 5
OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 16000
TESTGEN_CONTEXT_TOKEN_LIMIT = 16000
OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 48000
TESTGEN_CONTEXT_TOKEN_LIMIT = 48000
INDIVIDUAL_TESTCASE_TIMEOUT = 15
MAX_FUNCTION_TEST_SECONDS = 60
MIN_IMPROVEMENT_THRESHOLD = 0.05

View file

@ -519,15 +519,6 @@ class LanguageSupport(Protocol):
"""
...
def get_comment_prefix(self) -> str:
"""Get the comment prefix for this language.
Returns:
Comment prefix (e.g., "//" for JS, "#" for Python).
"""
...
def find_test_root(self, project_root: Path) -> Path | None:
"""Find the test root directory for a project.

View file

@ -34,7 +34,7 @@ if TYPE_CHECKING:
from codeflash.languages.base import LanguageSupport
# Module-level singleton for the current language
_current_language: Language | None = None
_current_language: Language = Language.PYTHON
def current_language() -> Language:

View file

@ -1354,12 +1354,10 @@ def fix_jest_mock_paths(test_code: str, test_file_path: Path, source_file_path:
or source_relative_resolved.with_suffix(".jsx").exists()
):
# Calculate the correct relative path from test_dir to source_relative_resolved
new_rel_path = os.path.relpath(str(source_relative_resolved), str(test_dir))
new_rel_path = Path(os.path.relpath(source_relative_resolved, test_dir)).as_posix()
# Ensure it starts with ./ or ../
if not new_rel_path.startswith("../") and not new_rel_path.startswith("./"):
new_rel_path = f"./{new_rel_path}"
# Use forward slashes
new_rel_path = new_rel_path.replace("\\", "/")
logger.debug(f"Fixed jest.mock path: {rel_path} -> {new_rel_path}")
return f"{prefix}{new_rel_path}{suffix}"

View file

@ -527,10 +527,5 @@ def parse_jest_test_xml(
f"[LOOP-SUMMARY] Results loop_index: min={min_idx}, max={max_idx}, "
f"unique_count={len(unique_loop_indices)}, total_results={len(loop_indices)}"
)
if max_idx == 1 and len(loop_indices) > 1:
logger.warning(
f"[LOOP-WARNING] All {len(loop_indices)} results have loop_index=1. "
"Perf test markers may not have been parsed correctly."
)
return test_results

View file

@ -1805,15 +1805,6 @@ class JavaScriptSupport:
"""
return ".test.js"
def get_comment_prefix(self) -> str:
"""Get the comment prefix for JavaScript.
Returns:
JavaScript single-line comment prefix.
"""
return "//"
def find_test_root(self, project_root: Path) -> Path | None:
"""Find the test root directory for a JavaScript project.

View file

@ -803,8 +803,6 @@ def run_jest_behavioral_tests(
wall_clock_ns = time.perf_counter_ns() - start_time_ns
logger.debug(f"Jest behavioral tests completed in {wall_clock_ns / 1e9:.2f}s")
print(result.stdout)
return result_file_path, result, coverage_json_path, None
@ -1046,6 +1044,10 @@ def run_jest_benchmarking_tests(
# Create result with combined stdout
result = subprocess.CompletedProcess(args=result.args, returncode=result.returncode, stdout=stdout, stderr="")
if result.returncode != 0:
logger.info(f"Jest benchmarking failed with return code {result.returncode}")
logger.info(f"Jest benchmarking stdout: {result.stdout}")
logger.info(f"Jest benchmarking stderr: {result.stderr}")
except subprocess.TimeoutExpired:
logger.warning(f"Jest benchmarking timed out after {total_timeout}s")

View file

@ -15,6 +15,8 @@ from codeflash.languages import is_javascript
from codeflash.models.models import CodeString, CodeStringsMarkdown
if TYPE_CHECKING:
from collections.abc import Callable
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
from codeflash.models.models import CodeOptimizationContext, FunctionSource
@ -49,6 +51,69 @@ def extract_names_from_targets(target: cst.CSTNode) -> list[str]:
return names
def is_assignment_used(node: cst.CSTNode, definitions: dict[str, UsageInfo], name_prefix: str = "") -> bool:
if isinstance(node, cst.Assign):
for target in node.targets:
names = extract_names_from_targets(target.target)
for name in names:
lookup = f"{name_prefix}{name}" if name_prefix else name
if lookup in definitions and definitions[lookup].used_by_qualified_function:
return True
return False
if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
names = extract_names_from_targets(node.target)
for name in names:
lookup = f"{name_prefix}{name}" if name_prefix else name
if lookup in definitions and definitions[lookup].used_by_qualified_function:
return True
return False
return False
def recurse_sections(
node: cst.CSTNode,
section_names: list[str],
prune_fn: Callable[[cst.CSTNode], tuple[cst.CSTNode | None, bool]],
keep_non_target_children: bool = False,
) -> tuple[cst.CSTNode | None, bool]:
updates: dict[str, list[cst.CSTNode] | cst.CSTNode] = {}
found_any_target = False
for section in section_names:
original_content = getattr(node, section, None)
if isinstance(original_content, (list, tuple)):
new_children = []
section_found_target = False
for child in original_content:
filtered, found_target = prune_fn(child)
if filtered:
new_children.append(filtered)
section_found_target |= found_target
if keep_non_target_children:
if section_found_target or new_children:
found_any_target |= section_found_target
updates[section] = new_children
elif section_found_target:
found_any_target = True
updates[section] = new_children
elif original_content is not None:
filtered, found_target = prune_fn(original_content)
if keep_non_target_children:
found_any_target |= found_target
if filtered:
updates[section] = filtered
elif found_target:
found_any_target = True
if filtered:
updates[section] = filtered
if keep_non_target_children:
if updates:
return node.with_changes(**updates), found_any_target
return None, False
if not found_any_target:
return None, False
return (node.with_changes(**updates) if updates else node), True
def collect_top_level_definitions(
node: cst.CSTNode, definitions: Optional[dict[str, UsageInfo]] = None
) -> dict[str, UsageInfo]:
@ -423,27 +488,9 @@ def remove_unused_definitions_recursively(
elif isinstance(statement, (cst.Assign, cst.AnnAssign, cst.AugAssign)):
var_used = False
# Check if any variable in this assignment is used
if isinstance(statement, cst.Assign):
for target in statement.targets:
names = extract_names_from_targets(target.target)
for name in names:
class_var_name = f"{class_name}.{name}"
if (
class_var_name in definitions
and definitions[class_var_name].used_by_qualified_function
):
var_used = True
method_or_var_used = True
break
elif isinstance(statement, (cst.AnnAssign, cst.AugAssign)):
names = extract_names_from_targets(statement.target)
for name in names:
class_var_name = f"{class_name}.{name}"
if class_var_name in definitions and definitions[class_var_name].used_by_qualified_function:
var_used = True
method_or_var_used = True
break
if is_assignment_used(statement, definitions, name_prefix=f"{class_name}."):
var_used = True
method_or_var_used = True
if var_used or class_has_dependencies:
new_statements.append(statement)
@ -459,56 +506,19 @@ def remove_unused_definitions_recursively(
return node, method_or_var_used or class_has_dependencies
# Handle assignments (Assign and AnnAssign)
if isinstance(node, cst.Assign):
for target in node.targets:
names = extract_names_from_targets(target.target)
for name in names:
if name in definitions and definitions[name].used_by_qualified_function:
return node, True
return None, False
if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
names = extract_names_from_targets(node.target)
for name in names:
if name in definitions and definitions[name].used_by_qualified_function:
return node, True
# Handle assignments (Assign, AnnAssign, AugAssign)
if isinstance(node, (cst.Assign, cst.AnnAssign, cst.AugAssign)):
if is_assignment_used(node, definitions):
return node, True
return None, False
# For other nodes, recursively process children
section_names = get_section_names(node)
if not section_names:
return node, False
updates = {}
found_used = False
for section in section_names:
original_content = getattr(node, section, None)
if isinstance(original_content, (list, tuple)):
new_children = []
section_found_used = False
for child in original_content:
filtered, used = remove_unused_definitions_recursively(child, definitions)
if filtered:
new_children.append(filtered)
section_found_used |= used
if new_children or section_found_used:
found_used |= section_found_used
updates[section] = new_children
elif original_content is not None:
filtered, used = remove_unused_definitions_recursively(original_content, definitions)
found_used |= used
if filtered:
updates[section] = filtered
if not found_used:
return None, False
if updates:
return node.with_changes(**updates), found_used
return node, False
return recurse_sections(
node, section_names, lambda child: remove_unused_definitions_recursively(child, definitions)
)
def collect_top_level_defs_with_usages(

View file

@ -21,9 +21,25 @@ from codeflash.languages.registry import register_language
if TYPE_CHECKING:
from collections.abc import Sequence
from codeflash.models.models import FunctionSource
logger = logging.getLogger(__name__)
def function_sources_to_helpers(sources: list[FunctionSource]) -> list[HelperFunction]:
return [
HelperFunction(
name=fs.only_function_name,
qualified_name=fs.qualified_name,
file_path=fs.file_path,
source_code=fs.source_code,
start_line=fs.jedi_definition.line if fs.jedi_definition else 1,
end_line=fs.jedi_definition.line if fs.jedi_definition else 1,
)
for fs in sources
]
@register_language
class PythonSupport:
"""Python language support implementation.
@ -171,127 +187,39 @@ class PythonSupport:
# === Code Analysis ===
def extract_code_context(self, function: FunctionToOptimize, project_root: Path, module_root: Path) -> CodeContext:
"""Extract function code and its dependencies.
"""Extract function code and its dependencies via the canonical context pipeline."""
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
Uses jedi and libcst for Python code analysis.
Args:
function: The function to extract context for.
project_root: Root of the project.
module_root: Root of the module containing the function.
Returns:
CodeContext with target code and dependencies.
"""
try:
source = function.file_path.read_text()
result = get_code_optimization_context(function, project_root)
except Exception as e:
logger.exception("Failed to read %s: %s", function.file_path, e)
logger.warning("Failed to extract code context for %s: %s", function.function_name, e)
return CodeContext(target_code="", target_file=function.file_path, language=Language.PYTHON)
# Extract the function source
lines = source.splitlines(keepends=True)
if function.starting_line and function.ending_line:
target_lines = lines[function.starting_line - 1 : function.ending_line]
target_code = "".join(target_lines)
else:
target_code = ""
# Find helper functions
helpers = self.find_helper_functions(function, project_root)
# Extract imports
import_lines = []
for line in lines:
stripped = line.strip()
if stripped.startswith(("import ", "from ")):
import_lines.append(stripped)
elif stripped and not stripped.startswith("#"):
# Stop at first non-import, non-comment line
break
helpers = function_sources_to_helpers(result.helper_functions)
return CodeContext(
target_code=target_code,
target_code=result.read_writable_code.markdown,
target_file=function.file_path,
helper_functions=helpers,
read_only_context="",
imports=import_lines,
read_only_context=result.read_only_context_code,
imports=[],
language=Language.PYTHON,
)
def find_helper_functions(self, function: FunctionToOptimize, project_root: Path) -> list[HelperFunction]:
"""Find helper functions called by the target function.
Uses jedi for Python code analysis.
Args:
function: The target function to analyze.
project_root: Root of the project.
Returns:
List of HelperFunction objects.
"""
helpers: list[HelperFunction] = []
"""Find helper functions called by the target function via the canonical jedi pipeline."""
from codeflash.languages.python.context.code_context_extractor import get_function_sources_from_jedi
try:
import jedi
from codeflash.code_utils.code_utils import get_qualified_name, path_belongs_to_site_packages
from codeflash.optimization.function_context import belongs_to_function_qualified
script = jedi.Script(path=function.file_path, project=jedi.Project(path=project_root))
file_refs = script.get_names(all_scopes=True, definitions=False, references=True)
qualified_name = function.qualified_name
for ref in file_refs:
if not ref.full_name or not belongs_to_function_qualified(ref, qualified_name):
continue
try:
definitions = ref.goto(follow_imports=True, follow_builtin_imports=False)
except Exception:
continue
for definition in definitions:
definition_path = definition.module_path
if definition_path is None:
continue
# Check if it's a valid helper (in project, not in target function)
is_valid = (
str(definition_path).startswith(str(project_root))
and not path_belongs_to_site_packages(definition_path)
and definition.full_name
and not belongs_to_function_qualified(definition, qualified_name)
and definition.type == "function"
)
if is_valid:
helper_qualified_name = get_qualified_name(definition.module_name, definition.full_name)
# Get source code
try:
helper_source = definition.get_line_code()
except Exception:
helper_source = ""
helpers.append(
HelperFunction(
name=definition.name,
qualified_name=helper_qualified_name,
file_path=definition_path,
source_code=helper_source,
start_line=definition.line or 1,
end_line=definition.line or 1,
)
)
_dict, sources = get_function_sources_from_jedi(
{function.file_path: {function.qualified_name}}, project_root
)
except Exception as e:
logger.warning("Failed to find helpers for %s: %s", function.function_name, e)
return []
return helpers
return function_sources_to_helpers(sources)
def find_references(
self, function: FunctionToOptimize, project_root: Path, tests_root: Path | None = None, max_files: int = 500
@ -728,15 +656,6 @@ class PythonSupport:
"""
return ".py"
def get_comment_prefix(self) -> str:
"""Get the comment prefix for Python.
Returns:
Python single-line comment prefix.
"""
return "#"
def find_test_root(self, project_root: Path) -> Path | None:
"""Find the test root directory for a Python project.

View file

@ -72,8 +72,6 @@ from codeflash.code_utils.line_profile_utils import add_decorator_imports, conta
from codeflash.code_utils.shell_utils import make_env_with_project_root
from codeflash.code_utils.static_analysis import get_first_top_level_function_or_method_ast
from codeflash.code_utils.time_utils import humanize_runtime
from codeflash.context import code_context_extractor
from codeflash.context.unused_definition_remover import detect_unused_helper_functions, revert_unused_helper_functions
from codeflash.discovery.functions_to_optimize import was_function_previously_optimized
from codeflash.either import Failure, Success, is_successful
from codeflash.languages import is_python
@ -81,6 +79,11 @@ from codeflash.languages.base import Language
from codeflash.languages.current import current_language_support, is_typescript
from codeflash.languages.javascript.module_system import detect_module_system
from codeflash.languages.javascript.test_runner import clear_created_config_files, get_created_config_files
from codeflash.languages.python.context import code_context_extractor
from codeflash.languages.python.context.unused_definition_remover import (
detect_unused_helper_functions,
revert_unused_helper_functions,
)
from codeflash.lsp.helpers import is_LSP_enabled, report_to_markdown_table, tree_to_markdown
from codeflash.lsp.lsp_message import LspCodeMessage, LspMarkdownMessage, LSPMessageId
from codeflash.models.ExperimentMetadata import ExperimentMetadata

View file

@ -1,6 +1,5 @@
from __future__ import annotations
import contextlib
import os
import re
import sqlite3
@ -22,6 +21,9 @@ from codeflash.code_utils.code_utils import (
)
from codeflash.discovery.discover_unit_tests import discover_parameters_unittest
from codeflash.languages import is_javascript
# Import Jest-specific parsing from the JavaScript language module
from codeflash.languages.javascript.parse import parse_jest_test_xml as _parse_jest_test_xml
from codeflash.models.models import (
ConcurrencyMetrics,
FunctionTestInvocation,
@ -32,10 +34,6 @@ from codeflash.models.models import (
)
from codeflash.verification.coverage_utils import CoverageUtils, JestCoverageUtils
# Import Jest-specific parsing from the JavaScript language module
from codeflash.languages.javascript.parse import jest_end_pattern, jest_start_pattern
from codeflash.languages.javascript.parse import parse_jest_test_xml as _parse_jest_test_xml
if TYPE_CHECKING:
import subprocess

View file

@ -6,8 +6,8 @@ codeflash/result/explanation.py
codeflash/result/critic.py
codeflash/version.py
codeflash/optimization/__init__.py
codeflash/context/__init__.py
codeflash/context/code_context_extractor.py
codeflash/languages/python/context/__init__.py
codeflash/languages/python/context/code_context_extractor.py
codeflash/discovery/__init__.py
codeflash/__init__.py
codeflash/models/ExperimentMetadata.py

View file

@ -113,21 +113,26 @@ function checkSharedTimeLimit() {
/**
* Get the current loop index for a specific invocation.
* The loop index represents how many times ALL test files have been run through.
* This is the batch count from the loop-runner.
* When using external loop-runner (Jest), returns the batch number directly.
* When using internal looping (Vitest), tracks and returns the invocation count.
*
* @param {string} invocationKey - Unique key for this test invocation
* @returns {number} The current batch number (loop index)
* @returns {number} The loop index for timing markers (1-based)
*/
function getInvocationLoopIndex(invocationKey) {
// Track local loop count for stopping logic (increments on each call)
// When using external loop-runner, use the batch number directly
// This is reliable because Jest resets module state between batches
const currentBatch = process.env.CODEFLASH_PERF_CURRENT_BATCH;
if (currentBatch !== undefined) {
return parseInt(currentBatch, 10);
}
// For internal looping (Vitest), track the count locally
if (!sharedPerfState.invocationLoopCounts[invocationKey]) {
sharedPerfState.invocationLoopCounts[invocationKey] = 0;
}
++sharedPerfState.invocationLoopCounts[invocationKey];
// Return the batch number as the loop index for timing markers
// This represents how many times all test files have been run through
return parseInt(process.env.CODEFLASH_PERF_CURRENT_BATCH || '1', 10);
return sharedPerfState.invocationLoopCounts[invocationKey];
}
/**
@ -693,11 +698,9 @@ function capturePerf(funcName, lineId, fn, ...args) {
// If not set, we're in Vitest mode and need to do all loops internally
const hasExternalLoopRunner = process.env.CODEFLASH_PERF_CURRENT_BATCH !== undefined;
// Batched looping: run BATCH_SIZE loops per capturePerf call when using loop-runner
// When using external loop-runner (Jest), execute only once per call - the loop-runner handles batching
// For Vitest (no loop-runner), do all loops internally in a single call
const batchSize = shouldLoop
? (hasExternalLoopRunner ? getPerfBatchSize() : getPerfLoopCount())
: 1;
const batchSize = hasExternalLoopRunner ? 1 : (shouldLoop ? getPerfLoopCount() : 1);
// Initialize runtime tracking for this invocation if needed
if (!sharedPerfState.invocationRuntimes[invocationKey]) {
@ -719,7 +722,7 @@ function capturePerf(funcName, lineId, fn, ...args) {
break;
}
// Get the loop index (batch number) for timing markers
// Get the loop index for timing markers
const loopIndex = getInvocationLoopIndex(invocationKey);
// Check if we've exceeded max loops for this invocation

View file

@ -35,69 +35,113 @@ const path = require('path');
const fs = require('fs');
/**
* Validates that a jest-runner path is valid by checking for package.json.
* @param {string} jestRunnerPath - Path to check
* @returns {boolean} True if valid jest-runner package
* Recursively find jest-runner package in node_modules.
* Works with any package manager (npm, yarn, pnpm) by searching for
* jest-runner/package.json anywhere in the tree.
*
* @param {string} nodeModulesPath - Path to node_modules directory
* @param {number} maxDepth - Maximum recursion depth (default: 5)
* @returns {string|null} Path to jest-runner or null if not found
*/
function isValidJestRunnerPath(jestRunnerPath) {
if (!fs.existsSync(jestRunnerPath)) {
return false;
function findJestRunnerRecursive(nodeModulesPath, maxDepth = 5) {
function search(dir, depth) {
if (depth > maxDepth || !fs.existsSync(dir)) return null;
try {
let entries = fs.readdirSync(dir, { withFileTypes: true });
// Sort entries: prefer higher versions for jest-runner@X.Y.Z directories
entries = entries.slice().sort((a, b) => {
const aMatch = a.name.match(/^jest-runner@(\d+)/);
const bMatch = b.name.match(/^jest-runner@(\d+)/);
if (aMatch && bMatch) {
return parseInt(bMatch[1], 10) - parseInt(aMatch[1], 10);
}
return a.name.localeCompare(b.name);
});
for (const entry of entries) {
if (!entry.isDirectory()) continue;
const entryPath = path.join(dir, entry.name);
// Found jest-runner directory - check if it's a valid package
if (entry.name === 'jest-runner') {
const pkgJsonPath = path.join(entryPath, 'package.json');
if (fs.existsSync(pkgJsonPath)) {
try {
const pkgJson = JSON.parse(fs.readFileSync(pkgJsonPath, 'utf8'));
if (pkgJson.name === 'jest-runner') {
return entryPath;
}
} catch (e) {
// Ignore JSON parse errors
}
}
}
// Recurse into:
// - node_modules subdirectories
// - scoped packages (@org/pkg)
// - hidden directories (.pnpm, .yarn, etc.)
// - pnpm versioned directories (jest-runner@30.0.5)
const shouldRecurse = entry.name === 'node_modules' ||
entry.name.startsWith('@') ||
entry.name === '.pnpm' || entry.name === '.yarn' ||
entry.name.startsWith('jest-runner@');
if (shouldRecurse) {
const result = search(entryPath, depth + 1);
if (result) return result;
}
}
} catch (e) {
// Ignore permission errors
}
return null;
}
const packageJsonPath = path.join(jestRunnerPath, 'package.json');
return fs.existsSync(packageJsonPath);
return search(nodeModulesPath, 0);
}
/**
* Resolve jest-runner with monorepo support.
* Uses CODEFLASH_MONOREPO_ROOT environment variable if available,
* otherwise walks up the directory tree looking for node_modules/jest-runner.
* Resolve jest-runner from the PROJECT's node_modules (not codeflash's).
*
* Uses recursive search to find jest-runner anywhere in node_modules,
* working with any package manager (npm, yarn, pnpm).
*
* @returns {string} Path to jest-runner package
* @throws {Error} If jest-runner cannot be found
*/
function resolveJestRunner() {
// Try standard resolution first (works in simple projects)
try {
return require.resolve('jest-runner');
} catch (e) {
// Standard resolution failed - try monorepo-aware resolution
}
// If Python detected a monorepo root, check there first
const monorepoRoot = process.env.CODEFLASH_MONOREPO_ROOT;
if (monorepoRoot) {
const jestRunnerPath = path.join(monorepoRoot, 'node_modules', 'jest-runner');
if (isValidJestRunnerPath(jestRunnerPath)) {
return jestRunnerPath;
}
}
// Fallback: Walk up from cwd looking for node_modules/jest-runner
const monorepoMarkers = ['yarn.lock', 'pnpm-workspace.yaml', 'lerna.json', 'package-lock.json'];
// Walk up from cwd to find all potential node_modules locations
let currentDir = process.cwd();
const visitedDirs = new Set();
// If Python detected a monorepo root, check there first
const monorepoRoot = process.env.CODEFLASH_MONOREPO_ROOT;
if (monorepoRoot && !visitedDirs.has(monorepoRoot)) {
visitedDirs.add(monorepoRoot);
const result = findJestRunnerRecursive(path.join(monorepoRoot, 'node_modules'));
if (result) return result;
}
while (currentDir !== path.dirname(currentDir)) {
// Avoid infinite loops
if (visitedDirs.has(currentDir)) break;
visitedDirs.add(currentDir);
// Try node_modules/jest-runner at this level
const jestRunnerPath = path.join(currentDir, 'node_modules', 'jest-runner');
if (isValidJestRunnerPath(jestRunnerPath)) {
return jestRunnerPath;
}
const result = findJestRunnerRecursive(path.join(currentDir, 'node_modules'));
if (result) return result;
// Check if this is a workspace root (has monorepo markers)
// Check if this is a workspace root - stop after this
const isWorkspaceRoot = monorepoMarkers.some(marker =>
fs.existsSync(path.join(currentDir, marker))
);
if (isWorkspaceRoot) {
// Found workspace root but no jest-runner - stop searching
break;
}
if (isWorkspaceRoot) break;
currentDir = path.dirname(currentDir);
}
@ -120,10 +164,15 @@ let jestVersion = 0;
try {
const jestRunnerPath = resolveJestRunner();
const internalRequire = createRequire(jestRunnerPath);
// Try to get the TestRunner class (Jest 30+)
const jestRunner = internalRequire(jestRunnerPath);
// Read the package.json to find the actual entry point and version
const pkgJsonPath = path.join(jestRunnerPath, 'package.json');
const pkgJson = JSON.parse(fs.readFileSync(pkgJsonPath, 'utf8'));
// Require using the full path to the entry point
const entryPoint = path.join(jestRunnerPath, pkgJson.main || 'build/index.js');
const jestRunner = require(entryPoint);
TestRunner = jestRunner.default || jestRunner.TestRunner;
if (TestRunner && TestRunner.prototype && typeof TestRunner.prototype.runTests === 'function') {
@ -131,9 +180,11 @@ try {
jestVersion = 30;
jestRunnerAvailable = true;
} else {
// Try Jest 29 style import
// Try Jest 29 style import - runTest is in build/runTest.js
try {
runTest = internalRequire('./runTest').default;
const runTestPath = path.join(jestRunnerPath, 'build', 'runTest.js');
const runTestModule = require(runTestPath);
runTest = runTestModule.default;
if (typeof runTest === 'function') {
// Jest 29 - use direct runTest function
jestVersion = 29;
@ -141,17 +192,23 @@ try {
}
} catch (e29) {
// Neither Jest 29 nor 30 style import worked
const errorMsg = `Found jest-runner at ${jestRunnerPath} but could not load it. ` +
`This may indicate an unsupported Jest version. ` +
`Supported versions: Jest 29.x and Jest 30.x`;
console.error(errorMsg);
jestRunnerAvailable = false;
}
}
} catch (e) {
// jest-runner not installed - this is expected for Vitest projects
// The runner will throw a helpful error if someone tries to use it without jest-runner
jestRunnerAvailable = false;
// try to directly import jest-runner
try {
const jestRunner = require('jest-runner');
TestRunner = jestRunner.default || jestRunner.TestRunner;
if (TestRunner && TestRunner.prototype && typeof TestRunner.prototype.runTests === 'function') {
jestVersion = 30;
jestRunnerAvailable = true;
} else {
jestRunnerAvailable = false;
}
} catch (e2) {
jestRunnerAvailable = false;
}
}
// Configuration
@ -233,15 +290,12 @@ class CodeflashLoopRunner {
this._context = context || {};
this._eventEmitter = new SimpleEventEmitter();
// For Jest 30+, create an instance of the base TestRunner for delegation
if (jestVersion >= 30) {
if (!TestRunner) {
throw new Error(
`Jest ${jestVersion} detected but TestRunner class not available. ` +
`This indicates an internal error in loop-runner initialization.`
);
}
this._baseRunner = new TestRunner(globalConfig, context);
// For Jest 30+, verify TestRunner is available (we create fresh instances per batch)
if (jestVersion >= 30 && !TestRunner) {
throw new Error(
`Jest ${jestVersion} detected but TestRunner class not available. ` +
`This indicates an internal error in loop-runner initialization.`
);
}
}
@ -270,7 +324,7 @@ class CodeflashLoopRunner {
* @param {Object} options - Jest runner options
* @returns {Promise<void>}
*/
async runTests(tests, watcher, options) {
async runTests(tests, watcher, ...rest) {
const startTime = Date.now();
let batchCount = 0;
let hasFailure = false;
@ -289,13 +343,11 @@ class CodeflashLoopRunner {
// Check time limit BEFORE each batch
if (batchCount > MIN_BATCHES && checkTimeLimit()) {
console.log(`[codeflash] Time limit reached after ${batchCount - 1} batches (${Date.now() - startTime}ms elapsed)`);
break;
}
// Check if interrupted
if (watcher.isInterrupted()) {
console.log(`[codeflash] Watcher is interrupted`)
break;
}
@ -303,57 +355,54 @@ class CodeflashLoopRunner {
process.env.CODEFLASH_PERF_CURRENT_BATCH = String(batchCount);
// Run all test files in this batch
const batchResult = await this._runAllTestsOnce(tests, watcher, options);
const batchResult = await this._runAllTestsOnce(tests, watcher, ...rest);
allConsoleOutput += batchResult.consoleOutput;
// if (batchResult.hasFailure) {
// hasFailure = true;
// break;
// }
// Check time limit AFTER each batch
if (checkTimeLimit()) {
console.log(`[codeflash] Time limit reached after ${batchCount} batches (${Date.now() - startTime}ms elapsed)`);
break;
}
}
const totalTimeMs = Date.now() - startTime;
console.log(`[codeflash] now: ${Date.now()}`)
// Output all collected console logs - this is critical for timing marker extraction
// The console output contains the !######...######! timing markers from capturePerf
if (allConsoleOutput) {
process.stdout.write(allConsoleOutput);
}
console.log(`[codeflash] Batched runner completed: ${batchCount} batches, ${tests.length} test files, ${totalTimeMs}ms total`);
}
/**
* Run all test files once (one batch).
* Uses different approaches for Jest 29 vs Jest 30.
*/
async _runAllTestsOnce(tests, watcher, options) {
async _runAllTestsOnce(tests, watcher, ...args) {
if (jestVersion >= 30) {
return this._runAllTestsOnceJest30(tests, watcher, options);
return this._runAllTestsOnceJest30(tests, watcher, ...args);
} else {
return this._runAllTestsOnceJest29(tests, watcher);
}
}
/**
* Jest 30+ implementation - delegates to base TestRunner and collects results.
* Jest 30+ implementation - creates a fresh TestRunner for each batch to avoid
* state corruption issues that occur when reusing runners across batches.
*/
async _runAllTestsOnceJest30(tests, watcher, options) {
async _runAllTestsOnceJest30(tests, watcher, ...args) {
let hasFailure = false;
let allConsoleOutput = '';
// For Jest 30, we need to collect results through event listeners
const resultsCollector = [];
// Subscribe to events from the base runner
const unsubscribeSuccess = this._baseRunner.on('test-file-success', (testData) => {
// Create a FRESH TestRunner instance for each batch
// Jest 30's TestRunner corrupts its internal state after running tests,
// so we cannot reuse the same instance across multiple batches
const batchRunner = new TestRunner(this._globalConfig, this._context);
// Subscribe to events from the batch runner
const unsubscribeSuccess = batchRunner.on('test-file-success', (testData) => {
const [test, result] = testData;
resultsCollector.push({ test, result, success: true });
@ -369,7 +418,7 @@ class CodeflashLoopRunner {
this._eventEmitter.emit('test-file-success', testData);
});
const unsubscribeFailure = this._baseRunner.on('test-file-failure', (testData) => {
const unsubscribeFailure = batchRunner.on('test-file-failure', (testData) => {
const [test, error] = testData;
resultsCollector.push({ test, error, success: false });
hasFailure = true;
@ -378,14 +427,14 @@ class CodeflashLoopRunner {
this._eventEmitter.emit('test-file-failure', testData);
});
const unsubscribeStart = this._baseRunner.on('test-file-start', (testData) => {
const unsubscribeStart = batchRunner.on('test-file-start', (testData) => {
// Forward to our event emitter
this._eventEmitter.emit('test-file-start', testData);
});
try {
// Run tests using the base runner (always serial for benchmarking)
await this._baseRunner.runTests(tests, watcher, { ...options, serial: true });
// Run tests using the fresh batch runner (always serial for benchmarking)
await batchRunner.runTests(tests, watcher, ...args);
} finally {
// Cleanup subscriptions
if (typeof unsubscribeSuccess === 'function') unsubscribeSuccess();

View file

@ -1,357 +1,358 @@
[project]
name = "codeflash"
dynamic = ["version"]
description = "Client for codeflash.ai - automatic code performance optimization, powered by AI"
authors = [{ name = "CodeFlash Inc.", email = "contact@codeflash.ai" }]
requires-python = ">=3.9"
readme = "README.md"
license = {text = "BSL-1.1"}
keywords = [
"codeflash",
"performance",
"optimization",
"ai",
"code",
"machine learning",
"LLM",
]
dependencies = [
"unidiff>=0.7.4",
"pytest>=7.0.0",
"gitpython>=3.1.31",
"libcst>=1.0.1",
"jedi>=0.19.1",
# Tree-sitter for multi-language support
"tree-sitter>=0.23.0",
"tree-sitter-javascript>=0.23.0",
"tree-sitter-typescript>=0.23.0",
"pytest-timeout>=2.1.0",
"tomlkit>=0.11.7",
"junitparser>=3.1.0",
"pydantic>=1.10.1",
"humanize>=4.0.0",
"posthog>=3.0.0",
"click>=8.1.0",
"inquirer>=3.0.0",
"sentry-sdk>=1.40.6,<3.0.0",
"parameterized>=0.9.0",
"isort>=5.11.0",
"dill>=0.3.8",
"rich>=13.8.1",
"lxml>=5.3.0",
"crosshair-tool>=0.0.78",
"coverage>=7.6.4",
"line_profiler>=4.2.0",
"platformdirs>=4.3.7",
"pygls>=2.0.0,<3.0.0",
"codeflash-benchmark",
"filelock",
"pytest-asyncio>=0.18.0",
]
[project.urls]
Homepage = "https://codeflash.ai"
[project.scripts]
codeflash = "codeflash.main:main"
[project.optional-dependencies]
[dependency-groups]
dev = [
"ipython>=8.12.0",
"mypy>=1.13",
"ruff>=0.7.0",
"lxml-stubs>=0.5.1",
"pandas-stubs>=2.2.2.240807, <2.2.3.241009",
"types-Pygments>=2.18.0.20240506",
"types-colorama>=0.4.15.20240311",
"types-decorator>=5.1.8.20240310",
"types-jsonschema>=4.23.0.20240813",
"types-requests>=2.32.0.20241016",
"types-six>=1.16.21.20241009",
"types-cffi>=1.16.0.20240331",
"types-openpyxl>=3.1.5.20241020",
"types-regex>=2024.9.11.20240912",
"types-python-dateutil>=2.9.0.20241003",
"types-gevent>=24.11.0.20241230,<25",
"types-greenlet>=3.1.0.20241221,<4",
"types-pexpect>=4.9.0.20241208,<5",
"types-unidiff>=0.7.0.20240505,<0.8",
"prek>=0.2.25",
"ty>=0.0.14",
"uv>=0.9.29",
]
tests = [
"black>=25.9.0",
"jax>=0.4.30",
"numpy>=2.0.2",
"pandas>=2.3.3",
"pyarrow>=15.0.0",
"pyrsistent>=0.20.0",
"scipy>=1.13.1",
"torch>=2.8.0",
"xarray>=2024.7.0",
"eval_type_backport",
"numba>=0.60.0",
"tensorflow>=2.20.0",
]
[tool.hatch.build.targets.sdist]
include = ["codeflash"]
exclude = [
"docs/*",
"experiments/*",
"tests/*",
"*.pyc",
"__pycache__",
"*.pyo",
"*.pyd",
"*.so",
"*.dylib",
"*.dll",
"*.exe",
"*.log",
"*.tmp",
".env",
".env.*",
"**/.env",
"**/.env.*",
".env.example",
"*.pem",
"*.key",
"secrets.*",
"config.yaml",
"config.json",
".git",
".gitignore",
".gitattributes",
".github",
"Dockerfile",
"docker-compose.yml",
"*.md",
"*.txt",
"*.csv",
"*.db",
"*.sqlite3",
"*.pdf",
"*.docx",
"*.xlsx",
"*.pptx",
"*.iml",
".idea",
".vscode",
".DS_Store",
"Thumbs.db",
"venv",
"env",
]
[tool.hatch.build.targets.wheel]
exclude = [
"docs/*",
"experiments/*",
"tests/*",
"*.pyc",
"__pycache__",
"*.pyo",
"*.pyd",
"*.so",
"*.dylib",
"*.dll",
"*.exe",
"*.log",
"*.tmp",
".env",
".env.*",
"**/.env",
"**/.env.*",
".env.example",
"*.pem",
"*.key",
"secrets.*",
"config.yaml",
"config.json",
".git",
".gitignore",
".gitattributes",
".github",
"Dockerfile",
"docker-compose.yml",
"*.md",
"*.txt",
"*.csv",
"*.db",
"*.sqlite3",
"*.pdf",
"*.docx",
"*.xlsx",
"*.pptx",
"*.iml",
".idea",
".vscode",
".DS_Store",
"Thumbs.db",
"venv",
"env",
]
[tool.mypy]
show_error_code_links = true
pretty = true
show_absolute_path = true
show_error_context = true
show_error_end = true
strict = true
warn_unreachable = true
install_types = true
plugins = ["pydantic.mypy"]
[[tool.mypy.overrides]]
module = ["jedi", "jedi.api.classes", "inquirer", "inquirer.themes", "numba"]
ignore_missing_imports = true
[tool.pydantic-mypy]
init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true
[tool.ruff]
target-version = "py39"
line-length = 120
fix = true
show-fixes = true
extend-exclude = ["code_to_optimize/", "pie_test_set/", "tests/", "experiments/"]
[tool.ruff.lint]
select = ["ALL"]
ignore = [
"N802",
"C901",
"D100",
"D101",
"D102",
"D103",
"D105",
"D107",
"D203", # incorrect-blank-line-before-class (incompatible with D211)
"D213", # multi-line-summary-second-line (incompatible with D212)
"S101",
"S603",
"S607",
"COM812",
"FIX002",
"PLR0912",
"PLR0913",
"PLR0915",
"TD002",
"TD003",
"TD004",
"PLR2004",
"UP007", # remove once we drop 3.9 support.
"E501",
"BLE001",
"ERA001",
"TRY003",
"EM101",
"T201",
"PGH004",
"S301",
"D104",
"PERF203",
"LOG015",
"PLC0415",
"UP045",
"TD007",
"D417",
"D401",
"S110", # try-except-pass - we do this a lot
"ARG002", # Unused method argument
# Added for multi-language branch
"FBT001", # Boolean positional argument
"FBT002", # Boolean default positional argument
"ANN401", # typing.Any disallowed
"ARG001", # Unused function argument (common in abstract/interface methods)
"TRY300", # Consider moving to else block
"FURB110", # if-exp-instead-of-or-operator - we prefer explicit if-else over "or"
"TRY401", # Redundant exception in logging.exception
"PLR0911", # Too many return statements
"PLW0603", # Global statement
"PLW2901", # Loop variable overwritten
"SIM102", # Nested if statements
"SIM103", # Return negated condition
"ANN001", # Missing type annotation
"PLC0206", # Dictionary items
"S314", # XML parsing (acceptable for dev tool)
"S608", # SQL injection (internal use only)
"S112", # try-except-continue
"PERF401", # List comprehension suggestion
"SIM108", # Ternary operator suggestion
"F841", # Unused variable (often intentional)
"ANN202", # Missing return type for private functions
]
[tool.ruff.lint.flake8-type-checking]
strict = true
runtime-evaluated-base-classes = ["pydantic.BaseModel"]
runtime-evaluated-decorators = ["pydantic.validate_call", "pydantic.dataclasses.dataclass"]
[tool.ruff.lint.pep8-naming]
classmethod-decorators = [
# Allow Pydantic's `@validator` decorator to trigger class method treatment.
"pydantic.validator",
]
[tool.ruff.lint.isort]
split-on-trailing-comma = false
[tool.ruff.format]
docstring-code-format = true
skip-magic-trailing-comma = true
[tool.hatch.version]
source = "uv-dynamic-versioning"
[tool.uv]
workspace = { members = ["codeflash-benchmark"] }
[tool.uv.sources]
codeflash-benchmark = { workspace = true }
[tool.uv-dynamic-versioning]
enable = true
style = "pep440"
vcs = "git"
[tool.hatch.build.hooks.version]
path = "codeflash/version.py"
template = """# These version placeholders will be replaced by uv-dynamic-versioning during build.
__version__ = "{version}"
"""
#[tool.hatch.build.hooks.custom]
#path = "codeflash/update_license_version.py"
[tool.codeflash]
# All paths are relative to this pyproject.toml's directory.
module-root = "codeflash"
tests-root = "codeflash"
benchmarks-root = "tests/benchmarks"
ignore-paths = []
formatter-cmds = ["disabled"]
[tool.pytest.ini_options]
filterwarnings = [
"ignore::pytest.PytestCollectionWarning",
]
markers = [
"ci_skip: mark test to skip in CI environment",
]
[build-system]
requires = ["hatchling", "uv-dynamic-versioning"]
build-backend = "hatchling.build"
[project]
name = "codeflash"
dynamic = ["version"]
description = "Client for codeflash.ai - automatic code performance optimization, powered by AI"
authors = [{ name = "CodeFlash Inc.", email = "contact@codeflash.ai" }]
requires-python = ">=3.9"
readme = "README.md"
license-files = ["LICENSE"]
keywords = [
"codeflash",
"performance",
"optimization",
"ai",
"code",
"machine learning",
"LLM",
]
dependencies = [
"unidiff>=0.7.4",
"pytest>=7.0.0",
"gitpython>=3.1.31",
"libcst>=1.0.1",
"jedi>=0.19.1",
# Tree-sitter for multi-language support
"tree-sitter>=0.23.0",
"tree-sitter-javascript>=0.23.0",
"tree-sitter-typescript>=0.23.0",
"pytest-timeout>=2.1.0",
"tomlkit>=0.11.7",
"junitparser>=3.1.0",
"pydantic>=1.10.1",
"humanize>=4.0.0",
"posthog>=3.0.0",
"click>=8.1.0",
"inquirer>=3.0.0",
"sentry-sdk>=1.40.6,<3.0.0",
"parameterized>=0.9.0",
"isort>=5.11.0",
"dill>=0.3.8",
"rich>=13.8.1",
"lxml>=5.3.0",
"crosshair-tool>=0.0.78",
"coverage>=7.6.4",
"line_profiler>=4.2.0",
"platformdirs>=4.3.7",
"pygls>=2.0.0,<3.0.0",
"codeflash-benchmark",
"filelock",
"pytest-asyncio>=0.18.0",
]
[project.urls]
Homepage = "https://codeflash.ai"
[project.scripts]
codeflash = "codeflash.main:main"
[project.optional-dependencies]
[dependency-groups]
dev = [
"ipython>=8.12.0",
"mypy>=1.13",
"ruff>=0.7.0",
"lxml-stubs>=0.5.1",
"pandas-stubs>=2.2.2.240807, <2.2.3.241009",
"types-Pygments>=2.18.0.20240506",
"types-colorama>=0.4.15.20240311",
"types-decorator>=5.1.8.20240310",
"types-jsonschema>=4.23.0.20240813",
"types-requests>=2.32.0.20241016",
"types-six>=1.16.21.20241009",
"types-cffi>=1.16.0.20240331",
"types-openpyxl>=3.1.5.20241020",
"types-regex>=2024.9.11.20240912",
"types-python-dateutil>=2.9.0.20241003",
"types-gevent>=24.11.0.20241230,<25",
"types-greenlet>=3.1.0.20241221,<4",
"types-pexpect>=4.9.0.20241208,<5",
"types-unidiff>=0.7.0.20240505,<0.8",
"prek>=0.2.25",
"ty>=0.0.14",
"uv>=0.9.29",
]
tests = [
"black>=25.9.0",
"jax>=0.4.30",
"numpy>=2.0.2",
"pandas>=2.3.3",
"pyarrow>=15.0.0",
"pyrsistent>=0.20.0",
"scipy>=1.13.1",
"torch>=2.8.0",
"xarray>=2024.7.0",
"eval_type_backport",
"numba>=0.60.0",
"tensorflow>=2.20.0",
]
[tool.hatch.build.targets.sdist]
include = ["codeflash"]
exclude = [
"docs/*",
"experiments/*",
"tests/*",
"*.pyc",
"__pycache__",
"*.pyo",
"*.pyd",
"*.so",
"*.dylib",
"*.dll",
"*.exe",
"*.log",
"*.tmp",
".env",
".env.*",
"**/.env",
"**/.env.*",
".env.example",
"*.pem",
"*.key",
"secrets.*",
"config.yaml",
"config.json",
".git",
".gitignore",
".gitattributes",
".github",
"Dockerfile",
"docker-compose.yml",
"*.md",
"*.txt",
"*.csv",
"*.db",
"*.sqlite3",
"*.pdf",
"*.docx",
"*.xlsx",
"*.pptx",
"*.iml",
".idea",
".vscode",
".DS_Store",
"Thumbs.db",
"venv",
"env",
]
[tool.hatch.build.targets.wheel]
exclude = [
"docs/*",
"experiments/*",
"tests/*",
"*.pyc",
"__pycache__",
"*.pyo",
"*.pyd",
"*.so",
"*.dylib",
"*.dll",
"*.exe",
"*.log",
"*.tmp",
".env",
".env.*",
"**/.env",
"**/.env.*",
".env.example",
"*.pem",
"*.key",
"secrets.*",
"config.yaml",
"config.json",
".git",
".gitignore",
".gitattributes",
".github",
"Dockerfile",
"docker-compose.yml",
"*.md",
"*.txt",
"*.csv",
"*.db",
"*.sqlite3",
"*.pdf",
"*.docx",
"*.xlsx",
"*.pptx",
"*.iml",
".idea",
".vscode",
".DS_Store",
"Thumbs.db",
"venv",
"env",
]
[tool.mypy]
show_error_code_links = true
pretty = true
show_absolute_path = true
show_error_context = true
show_error_end = true
strict = true
warn_unreachable = true
install_types = true
plugins = ["pydantic.mypy"]
[[tool.mypy.overrides]]
module = ["jedi", "jedi.api.classes", "inquirer", "inquirer.themes", "numba"]
ignore_missing_imports = true
[tool.pydantic-mypy]
init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true
[tool.ruff]
target-version = "py39"
line-length = 120
fix = true
show-fixes = true
extend-exclude = ["code_to_optimize/", "pie_test_set/", "tests/", "experiments/"]
[tool.ruff.lint]
select = ["ALL"]
ignore = [
"N802",
"C901",
"D100",
"D101",
"D102",
"D103",
"D105",
"D107",
"D203", # incorrect-blank-line-before-class (incompatible with D211)
"D213", # multi-line-summary-second-line (incompatible with D212)
"S101",
"S603",
"S607",
"COM812",
"FIX002",
"PLR0912",
"PLR0913",
"PLR0915",
"TD002",
"TD003",
"TD004",
"PLR2004",
"UP007", # remove once we drop 3.9 support.
"E501",
"BLE001",
"ERA001",
"TRY003",
"EM101",
"T201",
"PGH004",
"S301",
"D104",
"PERF203",
"LOG015",
"PLC0415",
"UP045",
"TD007",
"D417",
"D401",
"S110", # try-except-pass - we do this a lot
"ARG002", # Unused method argument
# Added for multi-language branch
"FBT001", # Boolean positional argument
"FBT002", # Boolean default positional argument
"ANN401", # typing.Any disallowed
"ARG001", # Unused function argument (common in abstract/interface methods)
"TRY300", # Consider moving to else block
"FURB110", # if-exp-instead-of-or-operator - we prefer explicit if-else over "or"
"TRY401", # Redundant exception in logging.exception
"PLR0911", # Too many return statements
"PLW0603", # Global statement
"PLW2901", # Loop variable overwritten
"SIM102", # Nested if statements
"SIM103", # Return negated condition
"ANN001", # Missing type annotation
"PLC0206", # Dictionary items
"S314", # XML parsing (acceptable for dev tool)
"S608", # SQL injection (internal use only)
"S112", # try-except-continue
"PERF401", # List comprehension suggestion
"SIM108", # Ternary operator suggestion
"F841", # Unused variable (often intentional)
"ANN202", # Missing return type for private functions
"B009", # getattr-with-constant - needed to avoid mypy [misc] on dunder access
]
[tool.ruff.lint.flake8-type-checking]
strict = true
runtime-evaluated-base-classes = ["pydantic.BaseModel"]
runtime-evaluated-decorators = ["pydantic.validate_call", "pydantic.dataclasses.dataclass"]
[tool.ruff.lint.pep8-naming]
classmethod-decorators = [
# Allow Pydantic's `@validator` decorator to trigger class method treatment.
"pydantic.validator",
]
[tool.ruff.lint.isort]
split-on-trailing-comma = false
[tool.ruff.format]
docstring-code-format = true
skip-magic-trailing-comma = true
[tool.hatch.version]
source = "uv-dynamic-versioning"
[tool.uv]
workspace = { members = ["codeflash-benchmark"] }
[tool.uv.sources]
codeflash-benchmark = { workspace = true }
[tool.uv-dynamic-versioning]
enable = true
style = "pep440"
vcs = "git"
[tool.hatch.build.hooks.version]
path = "codeflash/version.py"
template = """# These version placeholders will be replaced by uv-dynamic-versioning during build.
__version__ = "{version}"
"""
#[tool.hatch.build.hooks.custom]
#path = "codeflash/update_license_version.py"
[tool.codeflash]
# All paths are relative to this pyproject.toml's directory.
module-root = "codeflash"
tests-root = "codeflash"
benchmarks-root = "tests/benchmarks"
ignore-paths = []
formatter-cmds = ["disabled"]
[tool.pytest.ini_options]
filterwarnings = [
"ignore::pytest.PytestCollectionWarning",
]
markers = [
"ci_skip: mark test to skip in CI environment",
]
[build-system]
requires = ["hatchling", "uv-dynamic-versioning"]
build-backend = "hatchling.build"

80
tessl.json Normal file
View file

@ -0,0 +1,80 @@
{
"name": "codeflash",
"dependencies": {
"tessl/pypi-pytest": {
"version": "8.4.0"
},
"tessl/pypi-gitpython": {
"version": "3.1.0"
},
"tessl/pypi-libcst": {
"version": "1.8.0"
},
"tessl/pypi-jedi": {
"version": "0.19.0"
},
"tessl/pypi-tree-sitter": {
"version": "0.25.0"
},
"tessl/pypi-tomlkit": {
"version": "0.13.0"
},
"tessl/pypi-pydantic": {
"version": "1.10.0"
},
"tessl/pypi-humanize": {
"version": "4.13.0"
},
"tessl/pypi-posthog": {
"version": "6.7.0"
},
"tessl/pypi-click": {
"version": "8.2.0"
},
"tessl/pypi-inquirer": {
"version": "3.4.0"
},
"tessl/pypi-sentry-sdk": {
"version": "1.45.0"
},
"tessl/pypi-parameterized": {
"version": "0.9.0"
},
"tessl/pypi-dill": {
"version": "0.4.0"
},
"tessl/pypi-rich": {
"version": "13.9.0"
},
"tessl/pypi-lxml": {
"version": "5.4.0"
},
"tessl/pypi-crosshair-tool": {
"version": "0.0.0"
},
"tessl/pypi-coverage": {
"version": "7.10.0"
},
"tessl/pypi-platformdirs": {
"version": "4.4.0"
},
"tessl/pypi-pygls": {
"version": "1.3.0"
},
"tessl/pypi-filelock": {
"version": "3.19.0"
},
"codeflash/codeflash-rules": {
"version": "0.1.0"
},
"codeflash/codeflash-docs": {
"version": "0.1.0"
},
"codeflash/codeflash-skills": {
"version": "0.2.0"
},
"tessl-labs/tessl-skill-eval-scenarios": {
"version": "0.0.5"
}
}
}

View file

@ -1,8 +1,8 @@
from argparse import Namespace
from pathlib import Path
from codeflash.context.code_context_extractor import get_code_optimization_context
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
from codeflash.models.models import FunctionParent
from codeflash.optimization.optimizer import Optimizer

View file

@ -12,7 +12,7 @@ from pathlib import Path
import pytest
from junitparser import JUnitXml
from codeflash.verification.parse_test_output import jest_end_pattern, jest_start_pattern
from codeflash.languages.javascript.parse import jest_end_pattern, jest_start_pattern
class TestVitestJunitXmlFormat:

File diff suppressed because it is too large Load diff

View file

@ -2,7 +2,7 @@ from textwrap import dedent
import pytest
from codeflash.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.languages.python.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.models.models import CodeContextType

View file

@ -2,7 +2,7 @@ from textwrap import dedent
import pytest
from codeflash.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.languages.python.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.models.models import CodeContextType

View file

@ -2,7 +2,7 @@ from textwrap import dedent
import pytest
from codeflash.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.languages.python.context.code_context_extractor import parse_code_and_prune_cst
from codeflash.models.models import CodeContextType

View file

@ -20,14 +20,12 @@ All assertions use strict string equality to verify exact extraction output.
from __future__ import annotations
from pathlib import Path
import pytest
from codeflash.context.code_context_extractor import get_code_optimization_context_for_language
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
from codeflash.languages.base import Language
from codeflash.languages.javascript.support import JavaScriptSupport, TypeScriptSupport
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context_for_language
@pytest.fixture

View file

@ -106,9 +106,9 @@ class TestJavaScriptCodeContext:
def test_extract_code_context_for_javascript(self, js_project_dir):
"""Test extracting code context for a JavaScript function."""
skip_if_js_not_supported()
from codeflash.context.code_context_extractor import get_code_optimization_context
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
from codeflash.languages import current as lang_current
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
lang_current._current_language = Language.JAVASCRIPT

View file

@ -9,7 +9,6 @@ These tests verify the full optimization pipeline including:
This is the JavaScript equivalent of test_instrument_tests.py for Python.
"""
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
@ -71,9 +70,9 @@ module.exports = { add };
def test_code_context_preserves_language(self, tmp_path):
"""Verify language is preserved in code context extraction."""
skip_if_js_not_supported()
from codeflash.context.code_context_extractor import get_code_optimization_context
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
from codeflash.languages import current as lang_current
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
lang_current._current_language = Language.TYPESCRIPT
@ -164,7 +163,7 @@ export function add(a: number, b: number): number {
# Mock the AI service request
ai_client = AiServiceClient()
with patch.object(ai_client, 'make_ai_service_request') as mock_request:
with patch.object(ai_client, "make_ai_service_request") as mock_request:
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
@ -191,8 +190,8 @@ export function add(a: number, b: number): number {
# Verify the request was made with correct language
assert mock_request.called, "API request should have been made"
call_args = mock_request.call_args
payload = call_args[1].get('payload', call_args[0][1] if len(call_args[0]) > 1 else {})
assert payload.get('language') == 'typescript', \
payload = call_args[1].get("payload", call_args[0][1] if len(call_args[0]) > 1 else {})
assert payload.get("language") == "typescript", \
f"Expected language='typescript', got language='{payload.get('language')}'"
@ -462,7 +461,7 @@ class TestHelperFunctionLanguageAttribute:
"""Verify helper functions have language='javascript' for .js files."""
skip_if_js_not_supported()
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
from codeflash.languages import current as lang_current, get_language_support
from codeflash.languages import current as lang_current
from codeflash.optimization.function_optimizer import FunctionOptimizer
lang_current._current_language = Language.JAVASCRIPT

View file

@ -69,7 +69,7 @@ class TestTypeScriptFunctionDiscovery:
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
with tempfile.NamedTemporaryFile(suffix=".ts", mode="w", delete=False) as f:
f.write("""
f.write(r"""
export function add(a: number, b: number): number {
return a + b;
}
@ -123,9 +123,9 @@ class TestTypeScriptCodeContext:
def test_extract_code_context_for_typescript(self, ts_project_dir):
"""Test extracting code context for a TypeScript function."""
skip_if_ts_not_supported()
from codeflash.context.code_context_extractor import get_code_optimization_context
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
from codeflash.languages import current as lang_current
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
lang_current._current_language = Language.TYPESCRIPT
@ -201,7 +201,7 @@ function multiply(a: number, b: number): number {
from codeflash.languages import get_language_support
from codeflash.languages.base import FunctionInfo
original_source = """
original_source = r"""
interface Config {
timeout: number;
retries: number;
@ -212,7 +212,7 @@ function processConfig(config: Config): string {
}
"""
new_function = """function processConfig(config: Config): string {
new_function = r"""function processConfig(config: Config): string {
// Optimized with template caching
const { timeout, retries } = config;
return `timeout=\${timeout}, retries=\${retries}`;

View file

@ -117,10 +117,10 @@ class TestVitestCodeContext:
def test_extract_code_context_for_typescript(self, vitest_project_dir):
"""Test extracting code context for a TypeScript function."""
skip_if_js_not_supported()
from codeflash.context.code_context_extractor import get_code_optimization_context
from codeflash.discovery.functions_to_optimize import find_all_functions_in_file
from codeflash.languages import current as lang_current
from codeflash.languages.base import Language
from codeflash.languages.python.context.code_context_extractor import get_code_optimization_context
lang_current._current_language = Language.TYPESCRIPT

View file

@ -1,6 +1,6 @@
from codeflash.context.unused_definition_remover import remove_unused_definitions_by_function_names
from codeflash.languages.python.context.unused_definition_remover import remove_unused_definitions_by_function_names
def test_variable_removal_only() -> None:

View file

@ -5,8 +5,11 @@ from pathlib import Path
import pytest
from codeflash.context.unused_definition_remover import detect_unused_helper_functions, revert_unused_helper_functions
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
from codeflash.languages.python.context.unused_definition_remover import (
detect_unused_helper_functions,
revert_unused_helper_functions,
)
from codeflash.models.models import CodeStringsMarkdown
from codeflash.optimization.function_optimizer import FunctionOptimizer
from codeflash.verification.verification_utils import TestConfig

View file

@ -0,0 +1,108 @@
# AI Service
How codeflash communicates with the AI optimization backend.
## `AiServiceClient` (`api/aiservice.py`)
The client connects to the AI service at `https://app.codeflash.ai` (or `http://localhost:8000` when `CODEFLASH_AIS_SERVER=local`).
Authentication uses Bearer token from `get_codeflash_api_key()`. All requests go through `make_ai_service_request()` which handles JSON serialization via Pydantic encoder.
Timeout: 90s for production, 300s for local.
## Endpoints
### `/ai/optimize` — Generate Candidates
Method: `optimize_code()`
Sends source code + dependency context to generate optimization candidates.
Payload:
- `source_code` — The read-writable code (markdown format)
- `dependency_code` — Read-only context code
- `trace_id` — Unique trace ID for the optimization run
- `language``"python"`, `"javascript"`, or `"typescript"`
- `n_candidates` — Number of candidates to generate (controlled by effort level)
- `is_async` — Whether the function is async
- `is_numerical_code` — Whether the code is numerical (affects optimization strategy)
Returns: `list[OptimizedCandidate]` with `source=OptimizedCandidateSource.OPTIMIZE`
### `/ai/optimize_line_profiler` — Line-Profiler-Guided Candidates
Method: `optimize_python_code_line_profiler()`
Like `/optimize` but includes `line_profiler_results` to guide the LLM toward hot lines.
Returns: candidates with `source=OptimizedCandidateSource.OPTIMIZE_LP`
### `/ai/refine` — Refine Existing Candidate
Method: `refine_code()`
Request type: `AIServiceRefinerRequest`
Sends an existing candidate with runtime data and line profiler results to generate an improved version.
Key fields:
- `original_source_code` / `optimized_source_code` — Before and after
- `original_code_runtime` / `optimized_code_runtime` — Timing data
- `speedup` — Current speedup ratio
- `original_line_profiler_results` / `optimized_line_profiler_results`
Returns: candidates with `source=OptimizedCandidateSource.REFINE` and `parent_id` set to the refined candidate's ID
### `/ai/repair` — Fix Failed Candidate
Method: `repair_code()`
Request type: `AIServiceCodeRepairRequest`
Sends a failed candidate with test diffs showing what went wrong.
Key fields:
- `original_source_code` / `modified_source_code`
- `test_diffs: list[TestDiff]` — Each with `scope` (return_value/stdout/did_pass), original vs candidate values, and test source code
Returns: candidates with `source=OptimizedCandidateSource.REPAIR` and `parent_id` set
### `/ai/adaptive_optimize` — Multi-Candidate Adaptive
Method: `adaptive_optimize()`
Request type: `AIServiceAdaptiveOptimizeRequest`
Sends multiple previous candidates with their speedups for the LLM to learn from and generate better candidates.
Key fields:
- `candidates: list[AdaptiveOptimizedCandidate]` — Previous candidates with source code, explanation, source type, and speedup
Returns: candidates with `source=OptimizedCandidateSource.ADAPTIVE`
### `/ai/rewrite_jit` — JIT Rewrite
Method: `get_jit_rewritten_code()`
Rewrites code to use JIT compilation (e.g., Numba).
Returns: candidates with `source=OptimizedCandidateSource.JIT_REWRITE`
## Candidate Parsing
All endpoints return JSON with an `optimizations` array. Each entry has:
- `source_code` — Markdown-formatted code blocks
- `explanation` — LLM explanation
- `optimization_id` — Unique ID
- `parent_id` — Optional parent reference
- `model` — Which LLM model was used
`_get_valid_candidates()` parses the markdown code via `CodeStringsMarkdown.parse_markdown_code()` and filters out entries with empty code blocks.
## `LocalAiServiceClient`
Used when `CODEFLASH_EXPERIMENT_ID` is set. Mirrors `AiServiceClient` but sends to a separate experimental endpoint for A/B testing optimization strategies.
## LLM Call Sequencing
`AiServiceClient` tracks call sequence via `llm_call_counter` (itertools.count). Each request includes a `call_sequence` number, used by the backend to maintain conversation context across multiple calls for the same function.

View file

@ -0,0 +1,79 @@
# Configuration
Key configuration constants, effort levels, and thresholds.
## Constants (`code_utils/config_consts.py`)
### Test Execution
| Constant | Value | Description |
|----------|-------|-------------|
| `MAX_TEST_RUN_ITERATIONS` | 5 | Maximum test loop iterations |
| `INDIVIDUAL_TESTCASE_TIMEOUT` | 15s | Timeout per individual test case |
| `MAX_FUNCTION_TEST_SECONDS` | 60s | Max total time for function testing |
| `MAX_TEST_FUNCTION_RUNS` | 50 | Max test function executions |
| `MAX_CUMULATIVE_TEST_RUNTIME_NANOSECONDS` | 100ms | Max cumulative test runtime |
| `TOTAL_LOOPING_TIME` | 10s | Candidate benchmarking budget |
| `MIN_TESTCASE_PASSED_THRESHOLD` | 6 | Minimum test cases that must pass |
### Performance Thresholds
| Constant | Value | Description |
|----------|-------|-------------|
| `MIN_IMPROVEMENT_THRESHOLD` | 0.05 (5%) | Minimum speedup to accept a candidate |
| `MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD` | 0.10 (10%) | Minimum async throughput improvement |
| `MIN_CONCURRENCY_IMPROVEMENT_THRESHOLD` | 0.20 (20%) | Minimum concurrency ratio improvement |
| `COVERAGE_THRESHOLD` | 60.0% | Minimum test coverage |
### Stability Thresholds
| Constant | Value | Description |
|----------|-------|-------------|
| `STABILITY_WINDOW_SIZE` | 0.35 | 35% of total iteration window |
| `STABILITY_CENTER_TOLERANCE` | 0.0025 | ±0.25% around median |
| `STABILITY_SPREAD_TOLERANCE` | 0.0025 | 0.25% window spread |
### Context Limits
| Constant | Value | Description |
|----------|-------|-------------|
| `OPTIMIZATION_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for optimization context |
| `TESTGEN_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for test generation context |
| `MAX_CONTEXT_LEN_REVIEW` | 1000 | Max context length for optimization review |
### Other
| Constant | Value | Description |
|----------|-------|-------------|
| `MIN_CORRECT_CANDIDATES` | 2 | Min correct candidates before skipping repair |
| `REPEAT_OPTIMIZATION_PROBABILITY` | 0.1 | Probability of re-optimizing a function |
| `DEFAULT_IMPORTANCE_THRESHOLD` | 0.001 | Minimum addressable time to consider a function |
| `CONCURRENCY_FACTOR` | 10 | Number of concurrent executions for concurrency benchmark |
| `REFINED_CANDIDATE_RANKING_WEIGHTS` | (2, 1) | (runtime, diff) weights — runtime 2x more important |
## Effort Levels
`EffortLevel` enum: `LOW`, `MEDIUM`, `HIGH`
Effort controls the number of candidates, repairs, and refinements:
| Key | LOW | MEDIUM | HIGH |
|-----|-----|--------|------|
| `N_OPTIMIZER_CANDIDATES` | 3 | 5 | 6 |
| `N_OPTIMIZER_LP_CANDIDATES` | 4 | 6 | 7 |
| `N_GENERATED_TESTS` | 2 | 2 | 2 |
| `MAX_CODE_REPAIRS_PER_TRACE` | 2 | 3 | 5 |
| `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` | 0.2 | 0.3 | 0.4 |
| `TOP_VALID_CANDIDATES_FOR_REFINEMENT` | 2 | 3 | 4 |
| `ADAPTIVE_OPTIMIZATION_THRESHOLD` | 0 | 0 | 2 |
| `MAX_ADAPTIVE_OPTIMIZATIONS_PER_TRACE` | 0 | 0 | 4 |
Use `get_effort_value(EffortKeys.KEY, effort_level)` to retrieve values.
## Project Configuration
Configuration is read from `pyproject.toml` under `[tool.codeflash]`. Key settings are auto-detected by `setup/detector.py`:
- `module-root` — Root of the module to optimize
- `tests-root` — Root of test files
- `test-framework` — pytest, unittest, jest, etc.
- `formatter-cmds` — Code formatting commands

View file

@ -0,0 +1,60 @@
# Context Extraction
How codeflash extracts and limits code context for optimization and test generation.
## Overview
Context extraction (`context/code_context_extractor.py`) builds a `CodeOptimizationContext` containing all code needed for the LLM to understand and optimize a function, split into:
- **Read-writable code** (`CodeContextType.READ_WRITABLE`): The function being optimized plus its helper functions — code the LLM is allowed to modify
- **Read-only context** (`CodeContextType.READ_ONLY`): Dependency code for reference — imports, type definitions, base classes
- **Testgen context** (`CodeContextType.TESTGEN`): Context for test generation, may include imported class definitions and external base class inits
- **Hashing context** (`CodeContextType.HASHING`): Used for deduplication of optimization runs
## Token Limits
Both optimization and test generation contexts are token-limited:
- `OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 16000` tokens
- `TESTGEN_CONTEXT_TOKEN_LIMIT = 16000` tokens
Token counting uses `encoded_tokens_len()` from `code_utils/code_utils.py`. Functions whose context exceeds these limits are skipped.
## Context Building Process
### 1. Helper Discovery
For the target function (`FunctionToOptimize`), the extractor finds:
- **Helpers of the function**: Functions/classes in the same file that the target function calls
- **Helpers of helpers**: Transitive dependencies of the helper functions
These are organized as `dict[Path, set[FunctionSource]]` — mapping file paths to the set of helper functions found in each file.
### 2. Code Extraction
`extract_code_markdown_context_from_files()` builds `CodeStringsMarkdown` from the helper dictionaries. Each file's relevant code is extracted as a `CodeString` with its file path.
### 3. Testgen Context Enrichment
`build_testgen_context()` extends the basic context with:
- Imported class definitions (resolved from imports)
- External base class `__init__` methods
- External class `__init__` methods referenced in the context
### 4. Unused Definition Removal
`detect_unused_helper_functions()` and `remove_unused_definitions_by_function_names()` from `context/unused_definition_remover.py` prune definitions that are not transitively reachable from the target function, reducing token usage.
### 5. Deduplication
The hashing context (`hashing_code_context`) generates a hash (`hashing_code_context_hash`) used to detect when the same function context has already been optimized in a previous run, avoiding redundant work.
## Key Functions
| Function | Location | Purpose |
|----------|----------|---------|
| `build_testgen_context()` | `context/code_context_extractor.py` | Build enriched testgen context |
| `extract_code_markdown_context_from_files()` | `context/code_context_extractor.py` | Convert helper dicts to `CodeStringsMarkdown` |
| `detect_unused_helper_functions()` | `context/unused_definition_remover.py` | Find unused definitions |
| `remove_unused_definitions_by_function_names()` | `context/unused_definition_remover.py` | Remove unused definitions |
| `collect_top_level_defs_with_usages()` | `context/unused_definition_remover.py` | Analyze definition usage |
| `encoded_tokens_len()` | `code_utils/code_utils.py` | Count tokens in code |

View file

@ -0,0 +1,153 @@
# Domain Types
Core data types used throughout the codeflash optimization pipeline.
## Function Representation
### `FunctionToOptimize` (`models/function_types.py`)
The canonical dataclass representing a function candidate for optimization. Works across Python, JavaScript, and TypeScript.
Key fields:
- `function_name: str` — The function name
- `file_path: Path` — Absolute file path where the function is located
- `parents: list[FunctionParent]` — Parent scopes (classes/functions), each with `name` and `type`
- `starting_line / ending_line: Optional[int]` — Line range (1-indexed)
- `is_async: bool` — Whether the function is async
- `is_method: bool` — Whether it belongs to a class
- `language: str` — Programming language (default: `"python"`)
Key properties:
- `qualified_name` — Full dotted name including parent classes (e.g., `MyClass.my_method`)
- `top_level_parent_name` — Name of outermost parent, or function name if no parents
- `class_name` — Immediate parent class name, or `None`
### `FunctionParent` (`models/function_types.py`)
Represents a parent scope: `name: str` (e.g., `"MyClass"`) and `type: str` (e.g., `"ClassDef"`).
### `FunctionSource` (`models/models.py`)
Represents a resolved function with source code. Used for helper functions in context extraction.
Fields: `file_path`, `qualified_name`, `fully_qualified_name`, `only_function_name`, `source_code`, `jedi_definition`.
## Code Representation
### `CodeString` (`models/models.py`)
A single code block with validated syntax:
- `code: str` — The source code
- `file_path: Optional[Path]` — Origin file path
- `language: str` — Language for validation (default: `"python"`)
Validates syntax on construction via `model_validator`.
### `CodeStringsMarkdown` (`models/models.py`)
A collection of `CodeString` blocks — the primary format for passing code through the pipeline.
Key properties:
- `.flat` — Combined source code with file-path comment prefixes (e.g., `# file: path/to/file.py`)
- `.markdown` — Markdown-formatted with fenced code blocks: `` ```python:filepath\ncode\n``` ``
- `.file_to_path()` — Dict mapping file path strings to code
Static method:
- `parse_markdown_code(markdown_code, expected_language)` — Parses markdown code blocks back into `CodeStringsMarkdown`
## Optimization Context
### `CodeOptimizationContext` (`models/models.py`)
Holds all code context needed for optimization:
- `read_writable_code: CodeStringsMarkdown` — Code the LLM can modify
- `read_only_context_code: str` — Reference-only dependency code
- `testgen_context: CodeStringsMarkdown` — Context for test generation
- `hashing_code_context: str` / `hashing_code_context_hash: str` — For deduplication
- `helper_functions: list[FunctionSource]` — Helper functions in the writable code
- `preexisting_objects: set[tuple[str, tuple[FunctionParent, ...]]]` — Objects that already exist in the code
### `CodeContextType` enum (`models/models.py`)
Defines context categories: `READ_WRITABLE`, `READ_ONLY`, `TESTGEN`, `HASHING`.
## Candidates
### `OptimizedCandidate` (`models/models.py`)
A generated code variant:
- `source_code: CodeStringsMarkdown` — The optimized code
- `explanation: str` — LLM explanation of the optimization
- `optimization_id: str` — Unique identifier
- `source: OptimizedCandidateSource` — How it was generated
- `parent_id: str | None` — ID of parent candidate (for refinements/repairs)
- `model: str | None` — Which LLM model generated it
### `OptimizedCandidateSource` enum (`models/models.py`)
How a candidate was generated: `OPTIMIZE`, `OPTIMIZE_LP` (line profiler), `REFINE`, `REPAIR`, `ADAPTIVE`, `JIT_REWRITE`.
### `CandidateEvaluationContext` (`models/models.py`)
Tracks state during candidate evaluation:
- `speedup_ratios` / `optimized_runtimes` / `is_correct` — Per-candidate results
- `ast_code_to_id` — Deduplication map (normalized AST → first seen candidate)
- `valid_optimizations` — Candidates that passed all checks
Key methods: `record_failed_candidate()`, `record_successful_candidate()`, `handle_duplicate_candidate()`, `register_new_candidate()`.
## Baseline & Results
### `OriginalCodeBaseline` (`models/models.py`)
Baseline measurements for the original code:
- `behavior_test_results: TestResults` / `benchmarking_test_results: TestResults`
- `line_profile_results: dict`
- `runtime: int` — Total runtime in nanoseconds
- `coverage_results: Optional[CoverageData]`
### `BestOptimization` (`models/models.py`)
The winning candidate after evaluation:
- `candidate: OptimizedCandidate`
- `helper_functions: list[FunctionSource]`
- `code_context: CodeOptimizationContext`
- `runtime: int`
- `winning_behavior_test_results` / `winning_benchmarking_test_results: TestResults`
## Test Types
### `TestType` enum (`models/test_type.py`)
- `EXISTING_UNIT_TEST` (1) — Pre-existing tests from the codebase
- `INSPIRED_REGRESSION` (2) — Tests inspired by existing tests
- `GENERATED_REGRESSION` (3) — AI-generated regression tests
- `REPLAY_TEST` (4) — Tests from recorded benchmark data
- `CONCOLIC_COVERAGE_TEST` (5) — Coverage-guided tests
- `INIT_STATE_TEST` (6) — Class init state verification
### `TestFile` / `TestFiles` (`models/models.py`)
`TestFile` represents a single test file with `instrumented_behavior_file_path`, optional `benchmarking_file_path`, `original_file_path`, `test_type`, and `tests_in_file`.
`TestFiles` is a collection with lookup methods: `get_by_type()`, `get_by_original_file_path()`, `get_test_type_by_instrumented_file_path()`.
### `TestResults` (`models/models.py`)
Collection of `FunctionTestInvocation` results with indexed lookup. Key methods:
- `add(invocation)` — Deduplicated insert
- `total_passed_runtime()` — Sum of minimum runtimes per test case (nanoseconds)
- `number_of_loops()` — Max loop index across all results
- `usable_runtime_data_by_test_case()` — Dict of invocation ID → list of runtimes
## Result Type
### `Result[L, R]` / `Success` / `Failure` (`either.py`)
Functional error handling type:
- `Success(value)` — Wraps a successful result
- `Failure(error)` — Wraps an error
- `result.is_successful()` / `result.is_failure()` — Check type
- `result.unwrap()` — Get success value (raises if Failure)
- `result.failure()` — Get failure value (raises if Success)
- `is_successful(result)` — Module-level helper function

View file

@ -0,0 +1,41 @@
# Codeflash Internal Documentation
CodeFlash is an AI-powered Python code optimizer that automatically improves code performance while maintaining correctness. It uses LLMs to generate optimization candidates, verifies correctness through test execution, and benchmarks performance improvements.
## Pipeline Overview
```
Discovery → Ranking → Context Extraction → Test Gen + Optimization → Baseline → Candidate Evaluation → PR
```
1. **Discovery** (`discovery/`): Find optimizable functions across the codebase using `FunctionVisitor`
2. **Ranking** (`benchmarking/function_ranker.py`): Rank functions by addressable time using trace data
3. **Context** (`context/`): Extract code dependencies — split into read-writable (modifiable) and read-only (reference)
4. **Optimization** (`optimization/`, `api/`): Generate candidates via AI service, runs concurrently with test generation
5. **Verification** (`verification/`): Run candidates against tests via custom pytest plugin, compare outputs
6. **Benchmarking** (`benchmarking/`): Measure performance, select best candidate by speedup
7. **Result** (`result/`, `github/`): Create PR with winning optimization
## Key Entry Points
| Task | File |
|------|------|
| CLI arguments & commands | `cli_cmds/cli.py` |
| Optimization orchestration | `optimization/optimizer.py``Optimizer.run()` |
| Per-function optimization | `optimization/function_optimizer.py``FunctionOptimizer` |
| Function discovery | `discovery/functions_to_optimize.py` |
| Context extraction | `context/code_context_extractor.py` |
| Test execution | `verification/test_runner.py`, `verification/pytest_plugin.py` |
| Performance ranking | `benchmarking/function_ranker.py` |
| Domain types | `models/models.py`, `models/function_types.py` |
| AI service | `api/aiservice.py``AiServiceClient` |
| Configuration | `code_utils/config_consts.py` |
## Documentation Pages
- [Domain Types](domain-types.md) — Core data types and their relationships
- [Optimization Pipeline](optimization-pipeline.md) — Step-by-step data flow through the pipeline
- [Context Extraction](context-extraction.md) — How code context is extracted and token-limited
- [Verification](verification.md) — Test execution, pytest plugin, deterministic patches
- [AI Service](ai-service.md) — AI service client endpoints and request types
- [Configuration](configuration.md) — Config schema, effort levels, thresholds

View file

@ -0,0 +1,84 @@
# Optimization Pipeline
Step-by-step data flow from function discovery to PR creation.
## 1. Entry Point: `Optimizer.run()` (`optimization/optimizer.py`)
The `Optimizer` class is initialized with CLI args and creates:
- `TestConfig` with test roots, project root, pytest command
- `AiServiceClient` for AI service communication
- Optional `LocalAiServiceClient` for experiments
`run()` orchestrates the full pipeline: discovers functions, optionally ranks them, then optimizes each in turn.
## 2. Function Discovery (`discovery/functions_to_optimize.py`)
`FunctionVisitor` traverses source files to find optimizable functions, producing `FunctionToOptimize` instances. Filters include:
- Skipping functions that are too small or trivial
- Skipping previously optimized functions (via `was_function_previously_optimized()`)
- Applying user-configured include/exclude patterns
## 3. Function Ranking (`benchmarking/function_ranker.py`)
When trace data is available, `FunctionRanker` ranks functions by **addressable time** — the time a function spends that could be optimized (own time + callee time / call count). Functions below `DEFAULT_IMPORTANCE_THRESHOLD=0.001` are skipped.
## 4. Per-Function Optimization: `FunctionOptimizer` (`optimization/function_optimizer.py`)
For each function, `FunctionOptimizer.optimize_function()` runs the full optimization loop:
### 4a. Context Extraction (`context/code_context_extractor.py`)
Extracts `CodeOptimizationContext` containing:
- `read_writable_code` — Code the LLM can modify (the function + helpers)
- `read_only_context_code` — Dependency code for reference only
- `testgen_context` — Context for test generation (may include imported class definitions)
Token limits are enforced: `OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000` and `TESTGEN_CONTEXT_TOKEN_LIMIT=16000`. Functions exceeding these are rejected.
### 4b. Concurrent Test Generation + LLM Optimization
These run in parallel using `concurrent.futures`:
- **Test generation**: Generates regression tests from the function context
- **LLM optimization**: Sends `read_writable_code.markdown` + `read_only_context_code` to the AI service
The number of candidates depends on effort level (see Configuration docs).
### 4c. Candidate Evaluation
For each `OptimizedCandidate`:
1. **Deduplication**: Normalize code AST and check against `CandidateEvaluationContext.ast_code_to_id`. If duplicate, copy results from previous evaluation.
2. **Code replacement**: Replace the original function with the candidate using `replace_function_definitions_in_module()`.
3. **Behavioral testing**: Run instrumented tests in subprocess. The custom pytest plugin applies deterministic patches. Compare return values, stdout, and pass/fail status against the original baseline.
4. **Benchmarking**: If behavior matches, run performance tests with looping (`TOTAL_LOOPING_TIME=10s`). Calculate speedup ratio.
5. **Validation**: Candidate must beat `MIN_IMPROVEMENT_THRESHOLD=0.05` (5% speedup) and pass stability checks.
### 4d. Refinement & Repair
- **Repair**: If fewer than `MIN_CORRECT_CANDIDATES=2` pass, failed candidates can be repaired via `AIServiceCodeRepairRequest` (sends test diffs to LLM).
- **Refinement**: Top valid candidates are refined via `AIServiceRefinerRequest` (sends runtime data, line profiler results).
- **Adaptive**: At HIGH effort, additional adaptive optimization rounds via `AIServiceAdaptiveOptimizeRequest`.
### 4e. Best Candidate Selection
The winning candidate is selected by:
1. Highest speedup ratio
2. For tied speedups, shortest diff length from original
3. Refinement candidates use weighted ranking: `(2 * runtime_rank + 1 * diff_rank)`
Result is a `BestOptimization` with the candidate, context, test results, and runtime.
## 5. PR Creation (`github/`)
If a winning candidate is found, a PR is created with:
- The optimized code diff
- Performance benchmark details
- Explanation from the LLM
## Worktree Mode
When `--worktree` is enabled, optimization runs in an isolated git worktree (`code_utils/git_worktree_utils.py`). This allows parallel optimization without affecting the working tree. Changes are captured as patch files.

View file

@ -0,0 +1,93 @@
# Verification
How codeflash verifies candidate correctness and measures performance.
## Test Execution Architecture
Tests are executed in a **subprocess** to isolate the test environment from the main codeflash process. The test runner (`verification/test_runner.py`) invokes pytest (or Jest for JS/TS) with specific plugin configurations.
### Plugin Blocklists
- **Behavioral tests**: Block `benchmark`, `codspeed`, `xdist`, `sugar`
- **Benchmarking tests**: Block `codspeed`, `cov`, `benchmark`, `profiling`, `xdist`, `sugar`
These are defined as `BEHAVIORAL_BLOCKLISTED_PLUGINS` and `BENCHMARKING_BLOCKLISTED_PLUGINS` in `verification/test_runner.py`.
## Custom Pytest Plugin (`verification/pytest_plugin.py`)
The plugin is loaded into the test subprocess and provides:
### Deterministic Patches
`_apply_deterministic_patches()` replaces non-deterministic functions with fixed values to ensure reproducible test output:
| Module | Function | Fixed Value |
|--------|----------|-------------|
| `time` | `time()` | `1761717605.108106` |
| `time` | `perf_counter()` | Incrementing by 1ms per call |
| `datetime` | `datetime.now()` | `2021-01-01 02:05:10 UTC` |
| `datetime` | `datetime.utcnow()` | `2021-01-01 02:05:10 UTC` |
| `uuid` | `uuid4()` / `uuid1()` | `12345678-1234-5678-9abc-123456789012` |
| `random` | `random()` | `0.123456789` (seeded with 42) |
| `os` | `urandom(n)` | `b"\x42" * n` |
| `numpy.random` | seed | `42` |
Patches call the original function first to maintain performance characteristics (same call overhead).
### Timing Markers
Test results include timing markers in stdout: `!######<id>:<duration_ns>######!`
The pattern `_TIMING_MARKER_PATTERN` extracts timing data for calculating function utilization fraction.
### Loop Stability
Performance benchmarking uses configurable stability thresholds:
- `STABILITY_WINDOW_SIZE = 0.35` (35% of total iterations)
- `STABILITY_CENTER_TOLERANCE = 0.0025` (±0.25% around median)
- `STABILITY_SPREAD_TOLERANCE = 0.0025` (0.25% window spread)
### Memory Limits (Linux)
On Linux, the plugin sets `RLIMIT_AS` to 85% of total system memory (RAM + swap) to prevent OOM kills.
## Test Result Processing
### `TestResults` (`models/models.py`)
Collects `FunctionTestInvocation` results with:
- Deduplicated insertion via `unique_invocation_loop_id`
- `total_passed_runtime()` — Sum of minimum runtimes per test case (nanoseconds)
- `number_of_loops()` — Max loop index
- `usable_runtime_data_by_test_case()` — Grouped timing data
### `FunctionTestInvocation`
Each invocation records:
- `loop_index` — Iteration number (starts at 1)
- `id: InvocationId` — Fully qualified test identifier
- `did_pass: bool` — Pass/fail status
- `runtime: Optional[int]` — Time in nanoseconds
- `return_value: Optional[object]` — Captured return value
- `test_type: TestType` — Which test category
### Behavioral vs Performance Testing
1. **Behavioral**: Runs with `TestingMode.BEHAVIOR`. Compares return values and stdout between original and candidate. Any difference = candidate rejected.
2. **Performance**: Runs with `TestingMode.PERFORMANCE`. Loops for `TOTAL_LOOPING_TIME=10s` to get stable timing. Calculates speedup ratio.
3. **Line Profile**: Runs with `TestingMode.LINE_PROFILE`. Collects per-line timing data for refinement.
## Test Types
| TestType | Value | Description |
|----------|-------|-------------|
| `EXISTING_UNIT_TEST` | 1 | Pre-existing tests from the codebase |
| `INSPIRED_REGRESSION` | 2 | Tests inspired by existing tests |
| `GENERATED_REGRESSION` | 3 | AI-generated regression tests |
| `REPLAY_TEST` | 4 | Tests from recorded benchmark data |
| `CONCOLIC_COVERAGE_TEST` | 5 | Coverage-guided tests |
| `INIT_STATE_TEST` | 6 | Class init state verification |
## Coverage
Coverage is measured via `CoverageData` with a threshold of `COVERAGE_THRESHOLD=60.0%`. Low coverage may affect confidence in the optimization's correctness.

View file

@ -0,0 +1,118 @@
{
"package_name": "codeflash-docs",
"total_capabilities": 16,
"capabilities": [
{
"id": 0,
"name": "pipeline-stage-ordering",
"description": "Know the correct ordering of codeflash pipeline stages: Discovery → Ranking → Context Extraction → Test Gen + Optimization (concurrent) → Baseline → Candidate Evaluation → PR",
"complexity": "basic",
"api_elements": ["Optimizer.run()", "FunctionOptimizer.optimize_function()"]
},
{
"id": 1,
"name": "function-to-optimize-fields",
"description": "Know FunctionToOptimize key fields (function_name, file_path, parents, starting_line/ending_line, is_async, is_method, language) and properties (qualified_name, top_level_parent_name, class_name)",
"complexity": "intermediate",
"api_elements": ["FunctionToOptimize", "FunctionParent", "models/function_types.py"]
},
{
"id": 2,
"name": "code-strings-markdown-format",
"description": "Know that code is serialized as markdown fenced blocks with language:filepath syntax (```python:filepath\\ncode\\n```) and parsed via CodeStringsMarkdown.parse_markdown_code()",
"complexity": "intermediate",
"api_elements": ["CodeStringsMarkdown", "CodeString", ".markdown", ".flat", "parse_markdown_code()"]
},
{
"id": 3,
"name": "read-writable-vs-read-only",
"description": "Distinguish read_writable_code (LLM can modify) from read_only_context_code (reference only) in CodeOptimizationContext",
"complexity": "basic",
"api_elements": ["CodeOptimizationContext", "read_writable_code", "read_only_context_code"]
},
{
"id": 4,
"name": "candidate-source-types",
"description": "Know OptimizedCandidateSource variants: OPTIMIZE, OPTIMIZE_LP, REFINE, REPAIR, ADAPTIVE, JIT_REWRITE and when each is used",
"complexity": "intermediate",
"api_elements": ["OptimizedCandidateSource", "OptimizedCandidate"]
},
{
"id": 5,
"name": "candidate-forest-dag",
"description": "Know that candidates form a forest/DAG via parent_id references where refinements and repairs build on previous candidates",
"complexity": "intermediate",
"api_elements": ["parent_id", "OptimizedCandidate", "CandidateForest"]
},
{
"id": 6,
"name": "concurrent-testgen-optimization",
"description": "Know that test generation and LLM optimization run concurrently using concurrent.futures, not sequentially",
"complexity": "intermediate",
"api_elements": ["concurrent.futures", "FunctionOptimizer.optimize_function()"]
},
{
"id": 7,
"name": "deterministic-patch-values",
"description": "Know the specific fixed values used by deterministic patches: time=1761717605.108106, datetime=2021-01-01 02:05:10 UTC, uuid=12345678-1234-5678-9abc-123456789012, random seeded with 42",
"complexity": "advanced",
"api_elements": ["_apply_deterministic_patches()", "pytest_plugin.py"]
},
{
"id": 8,
"name": "test-type-enum",
"description": "Know the 6 TestType variants: EXISTING_UNIT_TEST, INSPIRED_REGRESSION, GENERATED_REGRESSION, REPLAY_TEST, CONCOLIC_COVERAGE_TEST, INIT_STATE_TEST",
"complexity": "basic",
"api_elements": ["TestType", "models/test_type.py"]
},
{
"id": 9,
"name": "ai-service-endpoints",
"description": "Know the AI service endpoints: /ai/optimize, /ai/optimize_line_profiler, /ai/refine, /ai/repair, /ai/adaptive_optimize, /ai/rewrite_jit",
"complexity": "intermediate",
"api_elements": ["AiServiceClient", "api/aiservice.py"]
},
{
"id": 10,
"name": "repair-request-structure",
"description": "Know that AIServiceCodeRepairRequest includes TestDiff objects with scope (RETURN_VALUE/STDOUT/DID_PASS), original vs candidate values, and test source code",
"complexity": "advanced",
"api_elements": ["AIServiceCodeRepairRequest", "TestDiff", "TestDiffScope"]
},
{
"id": 11,
"name": "effort-level-values",
"description": "Know specific effort level values: LOW gets 3 candidates, MEDIUM gets 5, HIGH gets 6 (N_OPTIMIZER_CANDIDATES)",
"complexity": "intermediate",
"api_elements": ["EffortLevel", "N_OPTIMIZER_CANDIDATES", "EFFORT_VALUES"]
},
{
"id": 12,
"name": "context-token-limits",
"description": "Know OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000 and TESTGEN_CONTEXT_TOKEN_LIMIT=16000 and that encoded_tokens_len() is used for counting",
"complexity": "basic",
"api_elements": ["OPTIMIZATION_CONTEXT_TOKEN_LIMIT", "TESTGEN_CONTEXT_TOKEN_LIMIT", "encoded_tokens_len()"]
},
{
"id": 13,
"name": "best-candidate-selection",
"description": "Know the selection criteria: highest speedup, then shortest diff for ties, and refinement weighted ranking (2*runtime + 1*diff)",
"complexity": "advanced",
"api_elements": ["BestOptimization", "REFINED_CANDIDATE_RANKING_WEIGHTS"]
},
{
"id": 14,
"name": "plugin-blocklists",
"description": "Know behavioral test blocklisted plugins (benchmark, codspeed, xdist, sugar) and benchmarking blocklist (adds cov, profiling)",
"complexity": "intermediate",
"api_elements": ["BEHAVIORAL_BLOCKLISTED_PLUGINS", "BENCHMARKING_BLOCKLISTED_PLUGINS"]
},
{
"id": 15,
"name": "result-type-usage",
"description": "Know that Result[L,R] from either.py uses Success(value)/Failure(error) with is_successful() check before unwrap()",
"complexity": "basic",
"api_elements": ["Result", "Success", "Failure", "is_successful", "either.py"]
}
]
}

View file

@ -0,0 +1 @@
Code serialization format and context splitting

View file

@ -0,0 +1,21 @@
{
"context": "Tests whether the agent knows the CodeStringsMarkdown serialization format and the distinction between read-writable and read-only code context in the codeflash pipeline.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Markdown code block format",
"description": "Uses the correct fenced code block format with language:filepath syntax (```python:path/to/file.py) when constructing code for the AI service, NOT plain code blocks without file paths",
"max_score": 30
},
{
"name": "Read-writable vs read-only split",
"description": "Correctly separates code into read_writable_code (code the LLM can modify) and read_only_context_code (reference-only dependency code), NOT treating all code as modifiable",
"max_score": 35
},
{
"name": "parse_markdown_code usage",
"description": "Uses CodeStringsMarkdown.parse_markdown_code() to parse AI service responses back into structured code, NOT manual string splitting or regex",
"max_score": 35
}
]
}

View file

@ -0,0 +1,35 @@
# Format Code for AI Service Request
## Context
You are working on the codeflash optimization engine. The AI service accepts optimization requests with source code and dependency context. A function `calculate_total` in `analytics/metrics.py` needs to be optimized. It calls a helper `normalize_values` in the same file (both modifiable), and imports `BaseMetric` from `analytics/base.py` (not modifiable, just for reference).
```python
# analytics/metrics.py
from analytics.base import BaseMetric
def normalize_values(data: list[float]) -> list[float]:
max_val = max(data)
return [x / max_val for x in data]
def calculate_total(metrics: list[BaseMetric]) -> float:
values = [m.value for m in metrics]
normalized = normalize_values(values)
return sum(normalized)
```
```python
# analytics/base.py
class BaseMetric:
def __init__(self, name: str, value: float):
self.name = name
self.value = value
```
## Task
Write a Python function `prepare_optimization_payload` that constructs the code payload for an AI service optimization request for `calculate_total`. It should properly format the source code and dependency code, and include a function to parse the AI service response back into structured code objects.
## Expected Outputs
- A Python file `payload_builder.py` with the payload construction and response parsing logic

View file

@ -0,0 +1 @@
Candidate source types and DAG relationships

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent knows the different OptimizedCandidateSource types and how candidates form a DAG via parent_id references in the codeflash pipeline.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Lists source types",
"description": "Identifies at least 4 of the 6 OptimizedCandidateSource variants: OPTIMIZE, OPTIMIZE_LP, REFINE, REPAIR, ADAPTIVE, JIT_REWRITE",
"max_score": 25
},
{
"name": "Parent ID linkage",
"description": "Explains that REFINE and REPAIR candidates reference their parent via parent_id, creating a DAG/forest structure, NOT independent candidates",
"max_score": 25
},
{
"name": "Refinement uses runtime data",
"description": "States that refinement sends runtime data and line profiler results to the AI service (AIServiceRefinerRequest), NOT just the source code",
"max_score": 25
},
{
"name": "Repair uses test diffs",
"description": "States that repair sends test failure diffs (TestDiff with scope: RETURN_VALUE/STDOUT/DID_PASS) to the AI service, NOT just error messages",
"max_score": 25
}
]
}

View file

@ -0,0 +1,13 @@
# Document the Candidate Lifecycle
## Context
A new engineer is joining the codeflash team and needs to understand how optimization candidates are generated, improved, and related to each other throughout the pipeline. They've asked for a clear explanation of the different ways candidates are produced and how the system iterates on them.
## Task
Write a technical document explaining the full lifecycle of an optimization candidate in codeflash — from initial generation through improvement iterations. Cover all the different ways candidates can be created, what data is sent to the AI service for each type, and how candidates relate to each other structurally.
## Expected Outputs
- A markdown file `candidate-lifecycle.md`

View file

@ -0,0 +1 @@
Deterministic patch values and test execution architecture

View file

@ -0,0 +1,31 @@
{
"context": "Tests whether the agent knows the specific deterministic patch values used in codeflash's pytest plugin and the subprocess-based test execution architecture.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Subprocess isolation",
"description": "States that tests run in a subprocess to isolate the test environment from the main codeflash process, NOT in the same process",
"max_score": 20
},
{
"name": "Fixed time value",
"description": "References the specific fixed timestamp 1761717605.108106 for time.time() or the fixed datetime 2021-01-01 02:05:10 UTC for datetime.now()",
"max_score": 20
},
{
"name": "Fixed UUID value",
"description": "References the specific fixed UUID 12345678-1234-5678-9abc-123456789012 for uuid4/uuid1",
"max_score": 20
},
{
"name": "Random seed",
"description": "States that random is seeded with 42 (NOT a different seed value)",
"max_score": 20
},
{
"name": "Plugin blocklists",
"description": "Mentions that behavioral tests block specific pytest plugins (at least 2 of: benchmark, codspeed, xdist, sugar) to ensure deterministic execution",
"max_score": 20
}
]
}

View file

@ -0,0 +1,13 @@
# Explain Test Reproducibility Guarantees
## Context
A codeflash user notices that their optimization candidate passes behavioral tests on one run but fails on the next. They suspect non-determinism in the test execution. They want to understand what guarantees codeflash provides for test reproducibility and how the system ensures consistent results.
## Task
Write a technical explanation of how codeflash ensures deterministic test execution. Cover the execution environment setup, what sources of non-determinism are controlled, and any specific values or configurations used. Also explain the test execution architecture.
## Expected Outputs
- A markdown file `test-reproducibility.md`

View file

@ -0,0 +1 @@
Effort level configuration and candidate selection criteria

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent knows the specific effort level values for candidate generation and the criteria used to select the best optimization candidate.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Candidate counts by effort",
"description": "States correct N_OPTIMIZER_CANDIDATES values: LOW=3, MEDIUM=5, HIGH=6 (at least 2 of 3 correct)",
"max_score": 25
},
{
"name": "Speedup as primary selector",
"description": "States that the winning candidate is selected primarily by highest speedup ratio",
"max_score": 25
},
{
"name": "Diff length as tiebreaker",
"description": "States that for tied speedups, shortest diff length from original is used as tiebreaker",
"max_score": 25
},
{
"name": "Refinement ranking weights",
"description": "States that refinement candidates use weighted ranking with runtime weighted more heavily than diff (2:1 ratio or REFINED_CANDIDATE_RANKING_WEIGHTS=(2,1))",
"max_score": 25
}
]
}

View file

@ -0,0 +1,18 @@
# Design a Candidate Selection Dashboard
## Context
The codeflash team wants to build a dashboard that shows users how optimization candidates were evaluated and why a particular candidate won. The dashboard needs to display the selection process at each stage, from initial candidate pool through to the final winner.
## Task
Write a specification document for the dashboard that explains:
1. How many candidates are generated at each effort level
2. The exact criteria and order of operations used to pick the winning candidate
3. How refinement candidates are ranked differently from initial candidates
Include concrete examples showing how two hypothetical candidates would be compared.
## Expected Outputs
- A markdown file `selection-dashboard-spec.md`

View file

@ -0,0 +1 @@
Pipeline concurrency and FunctionToOptimize structure

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent knows the FunctionToOptimize data structure and the concurrent execution model for test generation and optimization.",
"type": "weighted_checklist",
"checklist": [
{
"name": "FunctionToOptimize fields",
"description": "Includes at least 4 of: function_name, file_path, parents (list of FunctionParent), starting_line, ending_line, is_async, is_method, language",
"max_score": 25
},
{
"name": "Qualified name property",
"description": "Mentions qualified_name as a property that produces the full dotted name including parent classes (e.g., MyClass.my_method)",
"max_score": 25
},
{
"name": "Concurrent execution",
"description": "States that test generation and LLM optimization run concurrently (in parallel), NOT sequentially one after the other",
"max_score": 25
},
{
"name": "Entry point identification",
"description": "Correctly identifies Optimizer.run() as the top-level entry point and FunctionOptimizer.optimize_function() as the per-function entry point",
"max_score": 25
}
]
}

View file

@ -0,0 +1,17 @@
# Implement a Function Optimization Status Tracker
## Context
The codeflash team needs a status tracker that logs what happens to each function during an optimization run. For each function, it should record the function identity, which pipeline stages it passed through, and how long each stage took.
## Task
Write a design document explaining:
1. What data structure represents a function being optimized, including its identity fields and how nested functions (methods inside classes) are represented
2. The full name resolution strategy for identifying functions uniquely
3. Which stages of the pipeline operate on a single function at a time vs. operating on multiple functions
4. Where in the codebase the per-function optimization is orchestrated and what the top-level entry point is
## Expected Outputs
- A markdown file `status-tracker-design.md`

View file

@ -0,0 +1,40 @@
{
"total_scenarios": 5,
"capabilities_coverage": {
"total_capabilities": 16,
"capabilities_tested": 12,
"coverage_percentage": 75.0
},
"complexity_distribution": {
"basic": 1,
"intermediate": 3,
"advanced": 1
},
"scenarios": [
{
"index": 1,
"capability": "code-strings-markdown-format, read-writable-vs-read-only",
"complexity": "intermediate"
},
{
"index": 2,
"capability": "candidate-source-types, candidate-forest-dag, repair-request-structure",
"complexity": "intermediate"
},
{
"index": 3,
"capability": "deterministic-patch-values, plugin-blocklists",
"complexity": "advanced"
},
{
"index": 4,
"capability": "effort-level-values, best-candidate-selection",
"complexity": "intermediate"
},
{
"index": 5,
"capability": "function-to-optimize-fields, concurrent-testgen-optimization, pipeline-stage-ordering",
"complexity": "basic"
}
]
}

View file

@ -0,0 +1,25 @@
{
"total_infeasible": 4,
"infeasible_capabilities": [
{
"capability": "ai-service-endpoints",
"complexity": "intermediate",
"reasoning": "Testing knowledge of specific API endpoints requires actual HTTP requests or mocking that bypasses the capability being tested"
},
{
"capability": "context-token-limits",
"complexity": "basic",
"reasoning": "Already covered by the skills tile eval (scenario-1). Testing token counting requires the actual tokenizer library"
},
{
"capability": "test-type-enum",
"complexity": "basic",
"reasoning": "Simple enum knowledge is better verified through skills that use test types rather than isolated recall"
},
{
"capability": "result-type-usage",
"complexity": "basic",
"reasoning": "Already covered by the skills tile eval (scenario-2). Testing Result type usage is better done through implementation tasks"
}
]
}

View file

@ -0,0 +1,7 @@
{
"name": "codeflash/codeflash-docs",
"version": "0.1.0",
"summary": "Internal documentation for the codeflash optimization engine",
"private": true,
"docs": "docs/index.md"
}

View file

@ -0,0 +1,45 @@
# Architecture
```
codeflash/
├── main.py # CLI entry point
├── cli_cmds/ # Command handling, console output (Rich)
├── discovery/ # Find optimizable functions
├── context/ # Extract code dependencies and imports
├── optimization/ # Generate optimized code via AI
│ ├── optimizer.py # Main optimization orchestration
│ └── function_optimizer.py # Per-function optimization logic
├── verification/ # Run deterministic tests (pytest plugin)
├── benchmarking/ # Performance measurement
├── github/ # PR creation
├── api/ # AI service communication
├── code_utils/ # Code parsing, git utilities
├── models/ # Pydantic models and types
├── languages/ # Multi-language support (Python, JavaScript/TypeScript)
├── setup/ # Config schema, auto-detection, first-run experience
├── picklepatch/ # Serialization/deserialization utilities
├── tracing/ # Function call tracing
├── tracer.py # Root-level tracer entry point for profiling
├── lsp/ # IDE integration (Language Server Protocol)
├── telemetry/ # Sentry, PostHog
├── either.py # Functional Result type for error handling
├── result/ # Result types and handling
└── version.py # Version information
```
## Key Entry Points
| Task | Start here |
|------|------------|
| CLI arguments & commands | `cli_cmds/cli.py` |
| Optimization orchestration | `optimization/optimizer.py``Optimizer.run()` |
| Per-function optimization | `optimization/function_optimizer.py``FunctionOptimizer` |
| Function discovery | `discovery/functions_to_optimize.py` |
| Context extraction | `context/code_context_extractor.py` |
| Test execution | `verification/test_runner.py`, `verification/pytest_plugin.py` |
| Performance ranking | `benchmarking/function_ranker.py` |
| Domain types | `models/models.py`, `models/function_types.py` |
| Result handling | `either.py` (`Result`, `Success`, `Failure`, `is_successful`) |
| AI service communication | `api/aiservice.py``AiServiceClient` |
| Configuration constants | `code_utils/config_consts.py` |
| Language support | `languages/registry.py``get_language_support()` |

View file

@ -0,0 +1,11 @@
# Code Style
- **Line length**: 120 characters
- **Python**: 3.9+ syntax (use `from __future__ import annotations` for type hints)
- **Package management**: Always use `uv`, never `pip` — run commands via `uv run`
- **Tooling**: Ruff for linting/formatting, mypy strict mode, prek for pre-commit checks (`uv run prek run`)
- **Comments**: Minimal — only explain "why", not "what"
- **Docstrings**: Do not add unless explicitly requested
- **Naming**: NEVER use leading underscores (`_function_name`) — Python has no true private functions, use public names
- **Paths**: Always use absolute `Path` objects, handle encoding explicitly (UTF-8)
- **Source transforms**: Use `libcst` for code modification/transformation to preserve formatting; `ast` is acceptable for read-only analysis and parsing

View file

@ -0,0 +1,9 @@
# Git Conventions
- **Always create a new branch from `main`** — never commit directly to `main` or reuse an existing feature branch for unrelated changes
- Use conventional commit format: `fix:`, `feat:`, `refactor:`, `docs:`, `test:`, `chore:`
- Keep commits atomic — one logical change per commit
- Commit message body should be concise (1-2 sentences max)
- PR titles should also use conventional format
- Branch naming: `cf-#-title` (lowercase, hyphenated) where `#` is the Linear issue number
- If related to a Linear issue, include `CF-#` in the PR body

View file

@ -0,0 +1,9 @@
# Language Support Rules
- Current language is a module-level singleton in `languages/current.py` — use `set_current_language()` / `current_language()`, never pass language as a parameter through call chains
- Use `get_language_support(identifier)` from `languages/registry.py` to get a `LanguageSupport` instance — accepts `Path`, `Language` enum, or string; never import language classes directly
- New language support classes must use the `@register_language` decorator to register with the extension and language registries
- `languages/__init__.py` uses `__getattr__` for lazy imports to avoid circular dependencies — follow this pattern when adding new exports
- `is_javascript()` returns `True` for both JavaScript and TypeScript
- Language modules are lazily imported on first `get_language_support()` call via `_ensure_languages_registered()` — the `@register_language` decorator fires on import and populates `_EXTENSION_REGISTRY` and `_LANGUAGE_REGISTRY`
- `LanguageSupport` instances are cached in `_SUPPORT_CACHE` — use `clear_cache()` only in tests

View file

@ -0,0 +1,11 @@
# Optimization Pipeline Patterns
- All major operations return `Result[SuccessType, ErrorType]` — construct with `Success(value)` / `Failure(error)`, check with `is_successful()` before calling `unwrap()`
- Code context has token limits (`OPTIMIZATION_CONTEXT_TOKEN_LIMIT=16000`, `TESTGEN_CONTEXT_TOKEN_LIMIT=16000` in `code_utils/config_consts.py`) — exceeding them rejects the function
- `read_writable_code` (modifiable code) can span multiple files; `read_only_context_code` is reference-only dependency code
- Code is serialized as markdown code blocks: `` ```language:filepath\ncode\n``` `` — see `CodeStringsMarkdown` in `models/models.py`
- Candidates form a forest (DAG): refinements/repairs reference `parent_id` on previous candidates via `OptimizedCandidateSource` (OPTIMIZE, REFINE, REPAIR, ADAPTIVE, JIT_REWRITE)
- Test generation and optimization run concurrently — coordinate through `CandidateEvaluationContext`
- Generated tests are instrumented with `codeflash_capture.py` to record return values and traces
- Minimum improvement threshold is 5% (`MIN_IMPROVEMENT_THRESHOLD=0.05`) — candidates below this are rejected
- Stability thresholds: `STABILITY_WINDOW_SIZE=0.35`, `STABILITY_CENTER_TOLERANCE=0.0025`, `STABILITY_SPREAD_TOLERANCE=0.0025`

View file

@ -0,0 +1,13 @@
# Testing Rules
- Code context extraction and replacement tests must assert full string equality — no substring matching
- Use pytest's `tmp_path` fixture for temp directories (it's a `Path` object)
- Write temp files inside `tmp_path`, never use `NamedTemporaryFile` (causes Windows file contention)
- Always call `.resolve()` on Path objects to ensure absolute paths and resolve symlinks
- Use `.as_posix()` when converting resolved paths to strings (normalizes to forward slashes)
- Any new feature or bug fix that can be tested automatically must have test cases
- If changes affect existing test expectations, update the tests accordingly — tests must always pass after changes
- The pytest plugin patches `time`, `random`, `uuid`, `datetime`, `os.urandom`, and `numpy.random` for deterministic test execution — never assume real randomness or real time in verification tests
- `conftest.py` uses an autouse fixture that calls `reset_current_language()` — tests always start with Python as the default language
- Test types are defined by the `TestType` enum: `EXISTING_UNIT_TEST`, `INSPIRED_REGRESSION`, `GENERATED_REGRESSION`, `REPLAY_TEST`, `CONCOLIC_COVERAGE_TEST`, `INIT_STATE_TEST`
- Verification runs tests in a subprocess using a custom pytest plugin (`verification/pytest_plugin.py`) — behavioral tests use blocklisted plugins (`benchmark`, `codspeed`, `xdist`, `sugar`), benchmarking tests additionally block `cov` and `profiling`

View file

@ -0,0 +1,26 @@
{
"name": "codeflash/codeflash-rules",
"version": "0.1.0",
"summary": "Coding standards and conventions for the codeflash codebase",
"private": true,
"rules": {
"code-style": {
"rules": "rules/code-style.md"
},
"architecture": {
"rules": "rules/architecture.md"
},
"optimization-patterns": {
"rules": "rules/optimization-patterns.md"
},
"git-conventions": {
"rules": "rules/git-conventions.md"
},
"testing-rules": {
"rules": "rules/testing-rules.md"
},
"language-rules": {
"rules": "rules/language-rules.md"
}
}
}

View file

@ -0,0 +1,104 @@
{
"package_name": "codeflash-skills",
"total_capabilities": 14,
"capabilities": [
{
"id": 0,
"name": "sequential-pipeline-debugging",
"description": "Debug optimization failures by walking through pipeline stages sequentially and stopping at the first failure found",
"complexity": "intermediate",
"api_elements": ["discovery", "ranking", "context", "AI service", "verification", "deduplication", "repair"]
},
{
"id": 1,
"name": "token-limit-awareness",
"description": "Know that OPTIMIZATION_CONTEXT_TOKEN_LIMIT and TESTGEN_CONTEXT_TOKEN_LIMIT are both 16000 tokens and that exceeding them causes function rejection",
"complexity": "basic",
"api_elements": ["OPTIMIZATION_CONTEXT_TOKEN_LIMIT", "TESTGEN_CONTEXT_TOKEN_LIMIT", "encoded_tokens_len()"]
},
{
"id": 2,
"name": "improvement-threshold",
"description": "Know that MIN_IMPROVEMENT_THRESHOLD is 0.05 (5%) and candidates below this speedup are rejected",
"complexity": "basic",
"api_elements": ["MIN_IMPROVEMENT_THRESHOLD", "STABILITY_WINDOW_SIZE"]
},
{
"id": 3,
"name": "ast-deduplication",
"description": "Know that candidates are deduplicated via AST normalization using normalize_code() and CandidateEvaluationContext.ast_code_to_id",
"complexity": "intermediate",
"api_elements": ["normalize_code()", "CandidateEvaluationContext.ast_code_to_id", "code_utils/deduplicate_code.py"]
},
{
"id": 4,
"name": "repair-trigger-conditions",
"description": "Know that repair only triggers when fewer than MIN_CORRECT_CANDIDATES=2 pass, and is skipped when REPAIR_UNMATCHED_PERCENTAGE_LIMIT is exceeded",
"complexity": "advanced",
"api_elements": ["MIN_CORRECT_CANDIDATES", "REPAIR_UNMATCHED_PERCENTAGE_LIMIT", "AIServiceCodeRepairRequest"]
},
{
"id": 5,
"name": "ai-service-error-patterns",
"description": "Know specific log patterns to search for when AI service fails: 'Error generating optimized candidates', 'cli-optimize-error-caught', 'cli-optimize-error-response'",
"complexity": "intermediate",
"api_elements": ["AiServiceClient", "api/aiservice.py"]
},
{
"id": 6,
"name": "behavioral-vs-benchmark-failures",
"description": "Distinguish between behavioral test failures (return value/stdout/pass-fail mismatches via TestDiffScope) and benchmark failures (speedup below threshold)",
"complexity": "intermediate",
"api_elements": ["TestDiffScope", "RETURN_VALUE", "STDOUT", "DID_PASS"]
},
{
"id": 7,
"name": "result-type-pattern",
"description": "Use Result[L, R] from either.py with Success/Failure constructors and is_successful() checks before unwrap()",
"complexity": "basic",
"api_elements": ["Result", "Success", "Failure", "is_successful", "unwrap()", "either.py"]
},
{
"id": 8,
"name": "effort-config-pattern",
"description": "Add effort-dependent config via EffortKeys enum, EFFORT_VALUES dict with LOW/MEDIUM/HIGH levels, and get_effort_value()",
"complexity": "intermediate",
"api_elements": ["EffortKeys", "EffortLevel", "EFFORT_VALUES", "get_effort_value()", "config_consts.py"]
},
{
"id": 9,
"name": "module-to-feature-mapping",
"description": "Know which codeflash module to modify for different feature types (optimization/ for strategies, api/ for endpoints, languages/ for language support, etc.)",
"complexity": "basic",
"api_elements": ["MODULE_REFERENCE.md"]
},
{
"id": 10,
"name": "domain-type-conventions",
"description": "Use @dataclass(frozen=True) for immutable data, BaseModel for serializable models, and keep function_types.py dependency-free",
"complexity": "intermediate",
"api_elements": ["@dataclass(frozen=True)", "BaseModel", "models/models.py", "models/function_types.py"]
},
{
"id": 11,
"name": "test-patterns",
"description": "Use tmp_path fixture, .resolve() on Paths, .as_posix() for string conversion, full string equality assertions, and awareness of deterministic patches",
"complexity": "basic",
"api_elements": ["tmp_path", ".resolve()", ".as_posix()", "pytest_plugin.py"]
},
{
"id": 12,
"name": "quality-check-commands",
"description": "Run uv run prek run for formatting/linting, uv run mypy for type checking, and uv run pytest for tests",
"complexity": "basic",
"api_elements": ["uv run prek run", "uv run mypy", "uv run pytest"]
},
{
"id": 13,
"name": "language-support-patterns",
"description": "Use @register_language decorator, get_language_support() for lookup, singleton pattern via set_current_language()/current_language(), and is_python()/is_javascript() guards",
"complexity": "advanced",
"api_elements": ["@register_language", "get_language_support()", "set_current_language()", "is_python()", "is_javascript()"]
}
]
}

View file

@ -0,0 +1 @@
Sequential pipeline debugging with specific thresholds

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent follows the sequential debugging workflow from the skill, checking pipeline stages in order and using correct threshold values when diagnosing an optimization that produced no results.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Sequential stage order",
"description": "Investigates pipeline stages in order: discovery before ranking before context before AI service before test failures. Does NOT jump to later stages without checking earlier ones first.",
"max_score": 25
},
{
"name": "Token limit value",
"description": "References the specific token limit of 16000 for OPTIMIZATION_CONTEXT_TOKEN_LIMIT or TESTGEN_CONTEXT_TOKEN_LIMIT when checking context extraction",
"max_score": 25
},
{
"name": "Importance threshold",
"description": "References DEFAULT_IMPORTANCE_THRESHOLD=0.001 when checking function ranking",
"max_score": 25
},
{
"name": "Stops at failure",
"description": "Identifies the failing stage and focuses investigation there rather than continuing through all remaining stages",
"max_score": 25
}
]
}

View file

@ -0,0 +1,13 @@
# Diagnose Silent Optimization Skip
## Context
A user reports that when running codeflash on their project, a specific function `calculate_metrics` in `analytics/processor.py` never appears in the optimization results. The function exists in the module root, is not in the exclude list, and has not been previously optimized. Trace data shows the function is called frequently but with very short execution times (averaging 0.0005 seconds total addressable time). The function has moderate dependencies.
## Task
Write a diagnostic report explaining why this function is being skipped and at which stage in the pipeline the function is filtered out. Include the specific threshold or condition that causes the skip.
## Expected Outputs
A markdown file `diagnostic-report.md` explaining the root cause.

View file

@ -0,0 +1 @@
Result type pattern and effort-dependent configuration

View file

@ -0,0 +1,31 @@
{
"context": "Tests whether the agent uses the codeflash Result type pattern from either.py and the effort-dependent configuration pattern when implementing a new pipeline feature.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Imports from either.py",
"description": "Imports Success, Failure, and is_successful from codeflash.either (NOT from a different error handling module)",
"max_score": 20
},
{
"name": "Result return type",
"description": "Function returns Result type using Success() for success and Failure() for errors, not exceptions or None",
"max_score": 20
},
{
"name": "is_successful check",
"description": "Calls is_successful() or .is_successful() before calling unwrap() on the result",
"max_score": 20
},
{
"name": "EffortKeys enum entry",
"description": "Adds a new entry to the EffortKeys enum in config_consts.py",
"max_score": 20
},
{
"name": "Three effort levels",
"description": "Adds values for all three EffortLevel variants (LOW, MEDIUM, HIGH) in EFFORT_VALUES dict",
"max_score": 20
}
]
}

View file

@ -0,0 +1,21 @@
# Add Candidate Timeout Feature
## Context
The codeflash optimization engine currently has no per-candidate timeout. Some candidates take too long during verification, wasting the optimization budget. A new feature is needed to skip candidates that exceed a configurable time limit during behavioral testing.
The timeout should vary based on the optimization effort setting — shorter timeouts for low effort runs (to save time) and longer for high effort runs (to allow more complex optimizations).
## Task
Implement a `check_candidate_timeout` function in `codeflash/optimization/function_optimizer.py` that:
1. Takes a candidate runtime and returns whether the candidate should be skipped
2. Uses a configurable timeout threshold that scales with optimization effort
3. Handles the error case where the runtime measurement is unavailable
Also add the necessary configuration constant to `codeflash/code_utils/config_consts.py`.
## Expected Outputs
- Modified `function_optimizer.py` with the new function
- Modified `config_consts.py` with the new configuration

View file

@ -0,0 +1 @@
Test patterns and deterministic patch awareness

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent follows codeflash test conventions when writing tests, including path handling, temp directory patterns, and awareness of the deterministic patching system.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Uses tmp_path fixture",
"description": "Test function uses pytest tmp_path fixture parameter, NOT tempfile.NamedTemporaryFile or tempfile.mkdtemp",
"max_score": 25
},
{
"name": "Calls resolve on paths",
"description": "Calls .resolve() on Path objects before using them in assertions or function calls",
"max_score": 25
},
{
"name": "Full string equality",
"description": "Uses exact equality assertions (== or assert_equal) for code string comparisons, NOT substring checks like 'in' or assertIn or contains",
"max_score": 25
},
{
"name": "No real time dependency",
"description": "Test does NOT depend on real time.time(), datetime.now(), random values, or uuid generation for correctness. Acknowledges or accounts for deterministic patches if time/random values are involved.",
"max_score": 25
}
]
}

View file

@ -0,0 +1,24 @@
# Write Tests for Context Hash Comparison
## Context
The codeflash context extraction module has a function `compare_context_hashes(context_a, context_b)` that takes two `CodeOptimizationContext` objects and returns whether their hashing contexts are identical. This is used to detect when the same function has already been optimized.
```python
# In codeflash/context/code_context_extractor.py
def compare_context_hashes(context_a: CodeOptimizationContext, context_b: CodeOptimizationContext) -> bool:
return context_a.hashing_code_context_hash == context_b.hashing_code_context_hash
```
## Task
Write a test file `tests/test_context/test_hash_comparison.py` with tests for this function. Include tests for:
1. Two contexts with identical code producing the same hash
2. Two contexts with different code producing different hashes
3. A context compared with itself
The tests should create temporary Python source files to build realistic context objects.
## Expected Outputs
- `tests/test_context/test_hash_comparison.py`

View file

@ -0,0 +1 @@
Domain type conventions and module identification

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent follows codeflash domain type conventions and correctly identifies the right module when adding a new data type for the optimization pipeline.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Placed in models/models.py",
"description": "New data type is added to codeflash/models/models.py (NOT models/function_types.py, since it has dependencies on other codeflash modules)",
"max_score": 25
},
{
"name": "Uses frozen dataclass",
"description": "Immutable data type uses @dataclass(frozen=True) decorator, NOT a regular class or unfrozen dataclass",
"max_score": 25
},
{
"name": "BaseModel for serializable",
"description": "If a serializable model is needed, uses Pydantic BaseModel (NOT dataclass or dict)",
"max_score": 25
},
{
"name": "Correct module for feature",
"description": "Places the main logic in the correct module for the feature type (e.g., verification/ for test-related, optimization/ for candidate-related, api/ for service-related)",
"max_score": 25
}
]
}

View file

@ -0,0 +1,21 @@
# Add Optimization Confidence Score
## Context
The codeflash team wants to add a confidence score to each optimization result. The score should capture how confident the system is that an optimization is both correct and beneficial. It combines test coverage percentage, number of passing test cases, and speedup stability into a single metric.
The score needs to be:
- Attached to each candidate during evaluation (immutable once computed)
- Included in the final PR report (needs JSON serialization)
- Computed during the candidate evaluation phase
## Task
1. Define the data types needed for the confidence score
2. Write a `compute_confidence_score` function that takes coverage percentage (float), passing test count (int), and stability ratio (float) and returns the confidence result
3. Place all code in the appropriate codeflash modules
## Expected Outputs
- New/modified type definitions in the appropriate models file
- New function in the appropriate module

View file

@ -0,0 +1 @@
Deduplication mechanics and repair trigger conditions

View file

@ -0,0 +1,26 @@
{
"context": "Tests whether the agent understands codeflash's candidate deduplication via AST normalization and the specific conditions under which code repair is triggered vs skipped.",
"type": "weighted_checklist",
"checklist": [
{
"name": "AST normalization",
"description": "Mentions that deduplication uses AST normalization (normalize_code from code_utils/deduplicate_code.py), NOT simple string comparison",
"max_score": 25
},
{
"name": "Duplicate result copying",
"description": "Explains that duplicate candidates copy results from the first-seen candidate rather than being re-tested",
"max_score": 25
},
{
"name": "Repair trigger threshold",
"description": "States that repair triggers when fewer than 2 candidates pass (MIN_CORRECT_CANDIDATES=2), NOT when zero candidates pass or when any candidate fails",
"max_score": 25
},
{
"name": "Unmatched percentage limit",
"description": "Mentions REPAIR_UNMATCHED_PERCENTAGE_LIMIT as a condition that can cause repair to be skipped entirely, with effort-dependent values (0.2/0.3/0.4)",
"max_score": 25
}
]
}

Some files were not shown because too many files have changed in this diff Show more