Commit graph

6099 commits

Author SHA1 Message Date
HeshamHM28
c3be740417 chore: merge omni-java into feat/java-gradle-support
Resolved conflicts in:
- codeflash/version.py (kept gradle branch version)
- codeflash/languages/java/test_runner.py (kept gradle multi-module logic)
- codeflash/languages/java/support.py (kept java_test_module parameter)
- codeflash/discovery/functions_to_optimize.py (kept enhanced test filtering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 19:45:23 +00:00
HeshamHM28
83ef64ba39
Update codeflash/languages/java/test_runner.py
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
2026-02-04 10:19:37 -08:00
Kevin Turcios
95cc60397d
Merge branch 'main' into omni-java 2026-02-04 03:22:37 -05:00
Kevin Turcios
1c0d8da090
Merge pull request #1339 from codeflash-ai/coverage-no-files
Skip when no gen tests and no existing tests
2026-02-04 00:29:06 -05:00
claude[bot]
daf570b969 style: auto-fix formatting issues 2026-02-04 05:21:45 +00:00
Kevin Turcios
0b13beb9b0
Merge branch 'main' into coverage-no-files 2026-02-04 00:20:39 -05:00
Kevin Turcios
bade48513a chore: fix ruff lint issues after merges 2026-02-04 00:18:15 -05:00
Kevin Turcios
bbbc7ebe63 Merge #1362: Speed up ReferenceFinder._find_reexports by 14% 2026-02-04 00:17:55 -05:00
Kevin Turcios
3dedc59cba Merge #1357: Speed up PrComment.to_json by 46% 2026-02-04 00:17:51 -05:00
Kevin Turcios
4a850d35fe Merge #1353: Speed up extract_imports_for_class by 429% 2026-02-04 00:17:47 -05:00
Kevin Turcios
5a704084db Merge #1352: Speed up get_external_base_class_inits by 344% 2026-02-04 00:17:43 -05:00
Kevin Turcios
60bd77675d Merge #1343: Speed up _collect_numerical_imports by 159% 2026-02-04 00:17:38 -05:00
Kevin Turcios
2b4af2fd06 revert: undo "a or b" pattern changes
Reverts the automatic conversion from "a if a else b" to "a or b"
pattern that was applied by ruff. The FURB110 rule remains disabled
to prevent future automatic conversions.
2026-02-04 00:08:48 -05:00
Kevin Turcios
dfe073a592
Merge pull request #1370 from codeflash-ai/claude-workflow-perms
feat: secure Claude workflow and add merge permissions
2026-02-03 23:57:44 -05:00
claude[bot]
5cb780a890 refactor: revert "or" pattern in files that didn't originally use it
Reverted the "x or y" pattern back to "x if x else y" in 4 files that
didn't originally use the "or" pattern, maintaining consistency with
their original coding style.

Files reverted:
- codeflash/code_utils/codeflash_wrap_decorator.py
- codeflash/github/PrComment.py
- codeflash/result/explanation.py
- codeflash/verification/codeflash_capture.py

The other 20 files already used the "or" pattern and were kept as-is.

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-02-04 04:55:21 +00:00
Kevin Turcios
cb9248e022 feat: add merge/close permissions and secure workflow
- Add git merge/fetch/checkout/branch to allowed tools
- Add gh pr merge/close to allowed tools
- Add allowed_bots for claude[bot] to trigger pr-review
- Restrict @claude mentions to maintainers only (OWNER/MEMBER/COLLABORATOR)
- Block fork PRs from triggering pr-review and claude-mention
2026-02-03 23:54:43 -05:00
Kevin Turcios
e0ec03b3c8
Merge branch 'main' into coverage-no-files 2026-02-03 23:42:21 -05:00
Kevin Turcios
dd0cca94d3
Merge pull request #1369 from codeflash-ai/test-claude-perms
fix: skip pr-review when triggered by claude bot
2026-02-03 23:42:12 -05:00
Kevin Turcios
02b9bcb872
Merge branch 'main' into test-claude-perms 2026-02-03 23:41:59 -05:00
Kevin Turcios
831d296052 fix: skip pr-review when triggered by claude bot 2026-02-03 23:40:45 -05:00
claude[bot]
c0e8a98ca5 chore: disable FURB110 lint rule that enforces 'or' pattern
The codebase prefers explicit 'a if a else b' over 'a or b' pattern.
Disabled FURB110 (if-exp-instead-of-or-operator) rule to prevent
automatic conversion by the linter.

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-02-04 04:39:20 +00:00
Kevin Turcios
07e577f751
Merge branch 'main' into coverage-no-files 2026-02-03 23:23:49 -05:00
Kevin Turcios
7f5b49fd49
Merge pull request #1367 from codeflash-ai/test-claude-perms
agentic Claude for us
2026-02-03 23:21:07 -05:00
Kevin Turcios
8c40b3b099
Merge branch 'main' into test-claude-perms 2026-02-03 23:07:43 -05:00
Kevin Turcios
e1069ea7be chore: update lockfile 2026-02-03 23:02:49 -05:00
Kevin Turcios
d5ec877a78 feat: add coverage analysis to PR review workflow
- Run tests with coverage on changed files
- Compare coverage between PR and main branch
- New files require ≥75% test coverage
- Modified files must have changed lines covered
- Flag coverage regressions in PR comment
2026-02-03 22:57:56 -05:00
Saurabh Misra
610e63c168
Merge pull request #1348 from codeflash-ai/feature/java-verbose-logging
[feat] Java verbose logging
2026-02-03 19:51:42 -08:00
Kevin Turcios
6289c5325a feat: improve Claude PR review workflow
- Consolidate claude-code-review.yml into claude.yml with two jobs
- Add auto-fix for safe linting issues (formatting, imports) before review
- Use --from-ref origin/main to only check changed files
- Add smart re-review logic that resolves fixed comments
- Add inline comment support via MCP tool with 5-7 comment limit
2026-02-03 22:51:32 -05:00
Saurabh Misra
99b8b8e5f0
Merge branch 'omni-java' into feature/java-verbose-logging 2026-02-03 19:51:24 -08:00
Saurabh Misra
dbb26dfee8
Merge pull request #1337 from codeflash-ai/fix/java-test-timeout-issue
fix: increase Java test timeout from 15s to 120s
2026-02-03 19:50:25 -08:00
codeflash-ai[bot]
0b055ccc53
Optimize ReferenceFinder._find_reexports
The optimization achieves a **14% runtime improvement** (702μs → 614μs) by adding an inexpensive pre-check before performing expensive tree-sitter parsing operations.

**Key optimization:**
The code now checks if the `export_name` exists anywhere in the `source_code` string before calling `analyzer.find_exports()`. This simple substring check (`if export_name not in source_code`) acts as a fast filter to skip files that definitely don't contain re-exports of the target function.

**Why this is faster:**
1. **Avoids expensive parsing**: The line profiler shows `analyzer.find_exports()` consumes 45.4% of original runtime (1.37ms of 3.01ms total). The optimized version reduces this to 39.8% of a smaller total (1.02ms of 2.56ms), with 6 out of 25 calls completely avoided.

2. **String containment is O(n)** with highly optimized C implementation in Python, while tree-sitter parsing involves building and traversing an AST, making it orders of magnitude more expensive.

3. **Cascading savings**: When the pre-check fails, we also skip `source_code.splitlines()` (3.2% of original runtime) and all subsequent loop iterations.

**Impact:**
The profiler shows that in the test dataset, 6 out of 25 files (24%) don't contain the export name and can short-circuit immediately. For codebases with many files that import/re-export from various sources, this ratio could be even higher, making the optimization particularly valuable when searching across large projects.

**Trade-offs:**
This is a purely beneficial optimization with no downsides - the string check has negligible overhead compared to tree parsing, and it only returns early when the result would have been an empty list anyway.
2026-02-04 01:44:45 +00:00
claude[bot]
9e81b2be46 style: apply linting and formatting fixes
- Fixed 89 linting issues (imports, type annotations, code style)
- Formatted 22 files with ruff
- Updated auto-generated version.py

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-02-04 01:33:31 +00:00
claude[bot]
2ad731d3d6 style: fix linting and formatting issues in function_optimizer.py
- Fix quote formatting (15 instances)
- Remove unused import
- Prefix unused concolic_tests variable with underscore
- Apply code formatting

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
2026-02-04 01:29:34 +00:00
codeflash-ai[bot]
0459bd340e
Optimize PrComment.to_json
This optimization achieves a **45% runtime improvement** (729μs → 500μs) through strategic performance enhancements focused on eliminating repeated dictionary creation and reducing loop overhead.

## Key Optimizations

### 1. **Module-level Dictionary Caching** (`TestType.to_name()`)
The original code reconstructed a 5-element dictionary on *every* call to `to_name()`. The optimized version uses a module-level `_TO_NAME_MAP` constant that's created once at import time. Line profiler data shows this change reduced `to_name()` execution time from **1.27ms to 184μs** (~85% faster), as the dictionary construction overhead (610μs, 48% of function time) is eliminated entirely.

### 2. **Dict Comprehension Initialization** (`get_test_pass_fail_report_by_type()`)
Replaced a loop that iteratively built the report dictionary with a single dict comprehension: `{tt: {"passed": 0, "failed": 0} for tt in TestType}`. This reduces the loop overhead from iterating over all `TestType` enum members and performing multiple dictionary assignments to a single comprehension operation, cutting function time from **403μs to 376μs**.

### 3. **Early Continue Pattern** (`get_test_pass_fail_report_by_type()`)
Changed nested if-else logic to use early `continue` when `loop_index != 1`, reducing indentation and eliminating redundant condition checks for test results that don't meet the filter criteria.

### 4. **Filtered Report Table Construction** (`PrComment.to_json()`)
Instead of using a dict comprehension with a filter inside, the code now builds `report_table` with an explicit loop that checks `if name:` before insertion. This avoids creating intermediate tuples for the comprehension and provides clearer filtering logic. The profiler shows `to_json()` improved from **5.23ms to 3.36ms** (~36% faster).

## Test Case Performance
The annotated tests demonstrate consistent improvements across all scenarios:
- Simple cases: **74-97% faster** (27.4μs → 15.7μs, 19.9μs → 10.4μs)
- Large data cases: **81-92% faster** (maintaining performance even with 100+ benchmark details)
- Edge cases: **9-15% faster** (even extreme runtime values benefit)

The optimizations are particularly effective for the common use case where `to_name()` is called multiple times during report generation, and `get_test_pass_fail_report_by_type()` initializes its data structures. Since these functions are used in PR comment generation, the speedup directly improves CI/CD feedback loop performance.
2026-02-04 01:28:15 +00:00
HeshamHM28
7a7bf329cf refactor: use DEBUG_MODE from console.py for verbose logging
- Remove duplicate is_verbose_mode() function
- Import and reuse existing DEBUG_MODE from console.py
- Update all verbose logging functions to use DEBUG_MODE consistently
- Make language parameter required in log_instrumented_test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 03:24:14 +02:00
Kevin Turcios
575c760cd9
Merge pull request #1333 from codeflash-ai/test-claude-perms
test Claude perms
2026-02-03 20:19:52 -05:00
codeflash-ai[bot]
c889a8f75d
Optimize extract_imports_for_class
The optimized code achieves a **428% runtime speedup** (2.33ms → 441μs) by replacing the expensive `ast.walk(class_node)` traversal with direct iteration over `class_node.body`.

## Key Optimization

**Original approach**: Used `ast.walk(class_node)` which recursively visits every node in the AST subtree, including all nested function definitions, their arguments, return types, and deeply nested expression nodes. For a typical class with methods, this traverses ~2500 nodes.

**Optimized approach**: Iterates only `class_node.body`, which contains just the direct children of the class (typically 200-400 nodes for the same class). This is sufficient because:
- Type annotations for fields are in `class_node.body` as `ast.AnnAssign` nodes
- Field assignments with `field()` calls are in `class_node.body` as `ast.Assign` nodes
- Base classes and decorators are already extracted separately before the loop

The line profiler confirms this: the original's `ast.walk()` loop consumed **66% of total runtime** (12.76ms out of 19.3ms), while the optimized version's direct iteration takes only **2.3%** (112μs out of 4.96ms).

## Additional Refinement

The optimized code also improves the `field()` detection by changing from checking `ast.Call` nodes anywhere in the tree to specifically checking `ast.Assign` nodes where the value is a `Call` with a `Name` func. This more accurately targets dataclass field assignments and uses `elif` to avoid redundant checks.

## Test Case Performance

The optimization excels across all test categories:
- **Simple classes** (2-3 fields): 186-436% faster
- **Complex annotations** (nested generics): 335-591% faster  
- **Large-scale tests** (50+ fields, 200 imports): 495-949% faster

The performance gain scales with class complexity because larger classes have more nested nodes that `ast.walk()` unnecessarily traverses, while the optimized version still only iterates the direct body elements.

## Impact on Workloads

Based on function_references, `extract_imports_for_class` is called from:
1. **Test suite replay tests** - indicating it's in a performance-critical testing path
2. **`get_code_optimization_context`** - suggesting it's used during code analysis/optimization workflows

Since the function extracts context for optimization decisions, the 428% speedup directly reduces latency in code analysis pipelines, making the optimization particularly valuable for CI/CD systems or developer tooling that analyzes many classes.
2026-02-04 01:09:50 +00:00
codeflash-ai[bot]
3ee339f075
Optimize get_external_base_class_inits
This optimization achieves a **343% speedup** (88.7ms → 20.0ms) by eliminating redundant expensive operations through strategic caching and deduplication.

## Key Optimizations

**1. Deduplication of External Base Classes**
- Changed from list to set (`external_bases_set`) to automatically deduplicate base class entries
- Prevents processing the same (base_name, module_name) pair multiple times
- Removed the need for the `extracted` tracking set and subsequent membership checks

**2. Module Project Check Caching**
- Added `is_project_cache` to memoize `_is_project_module()` results per module
- This is critical because the profiler shows `_is_project_module()` consumed **79%** of original runtime (265ms out of 336ms)
- Each call involves expensive `importlib.util.find_spec()` and `path_belongs_to_site_packages()` operations
- In the optimized version, this drops to just **16.7%** (11.9ms) since most modules are checked only once

**3. Module Import Caching**
- Added `imported_module_cache` to avoid repeated `importlib.import_module()` calls
- When multiple classes inherit from the same base, the module is imported only once
- Reduces import overhead from 4.84ms to 2.12ms in the line profiler

## Performance Impact by Test Case

The optimization particularly excels when:
- **Multiple classes inherit from the same base**: `test_multiple_classes_same_base_extracted_once` shows 565% speedup (19.0ms → 2.86ms)
- **Large codebases with many classes**: `test_large_single_code_string` (500 classes) shows 1113% speedup (45.7ms → 3.77ms)
- **Many different external bases**: `test_many_classes_single_external_base` (100 classes) shows 949% speedup (9.19ms → 875μs)

These improvements directly benefit production workloads since `function_references` shows this function is called from `get_code_optimization_context`, which is part of the code analysis pipeline. When analyzing projects with extensive class hierarchies that inherit from external libraries (like web frameworks, ORMs, or data processing libraries), the optimization prevents redundant module introspection and imports, making the code context extraction phase significantly faster.
2026-02-04 01:03:50 +00:00
Kevin Turcios
9f4776eb2e chore: migrate from pre-commit to prek
Replace pre-commit with prek (faster Rust-based alternative) for linting.
- Add prek to dev dependencies
- Replace pre-commit workflow with prek workflow using setup-uv@v6
- Update Claude workflow allowed tools to use prek
2026-02-03 19:56:58 -05:00
HeshamHM28
cde9709eea fix: preserve existing test filtering behavior while adding edge case support
Refined test file filtering to maintain backward compatibility:

When tests_root overlaps with source (monorepo structure):
- Apply filename patterns (.test., .spec., _test., _spec.)
- Apply directory patterns (test, tests, __tests__, testFixtures)

When tests_root doesn't overlap (separate test directory):
- Check if file is under tests_root
- Apply directory patterns (test, tests, __tests__, testFixtures)
- Do NOT apply filename patterns (maintains original behavior)

This preserves the existing behavior for non-overlapping cases while
adding support for Java/Gradle edge cases like src/main/test/ and
src/testFixtures/.
2026-02-04 00:45:24 +00:00
HeshamHM28
e956e31bcd fix: filter test files with test-related directory patterns
Enhanced test file filtering to check for test-related patterns
(test, tests, __tests__, testFixtures) even when tests_root
doesn't overlap with source directories.

This fixes edge cases in Java/Gradle projects where:
- Files in src/main/test/ should be filtered as tests
- Files in src/testFixtures/ should be filtered as test fixtures

The previous logic only checked these patterns when tests_root
overlapped with source, missing these edge cases in multi-module
Java projects where tests are in separate directories.

Fixes test failures in test_filter_java_multimodule.py
2026-02-04 00:38:21 +00:00
HeshamHM28
2c48e9c9a9 feat: Add verbose logging for existing instrumented tests 2026-02-04 02:36:28 +02:00
aseembits93
92c54b7a6f move earlier to avoid more work 2026-02-03 16:34:07 -08:00
HeshamHM28
4ced2fb21a feat: Add verbose logging for Java optimization debugging
Add pretty-printed verbose logging in debug mode for:
- Code after replacement (with syntax highlighting)
- Instrumented behavioral tests
- Instrumented performance tests
- Test run stdout/stderr output

This helps debug the optimization pipeline by showing exactly what code
is being generated and what tests are being run.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 02:28:09 +02:00
Ubuntu
d6a209cd8b Merge branch 'omni-java' into feat/java-gradle-support
Resolve conflict in function_optimizer.py by keeping module-aware test root
detection logic from feat/java-gradle-support branch.
2026-02-04 00:18:53 +00:00
mashraf-222
65318e2de3
Merge pull request #1338 from codeflash-ai/fix/java-instrumented-test-cleanup
fix: add Java patterns to instrumented test file cleanup
2026-02-04 02:13:33 +02:00
codeflash-ai[bot]
3a44a2ff35
Optimize _collect_numerical_imports
The optimized code achieves a **158% speedup** (from 2.13ms to 823μs) by replacing `ast.walk()` with an explicit stack-based traversal using `ast.iter_child_nodes()`.

**What Changed:**
- Replaced the `for node in ast.walk(tree):` generator-based approach with a manual stack (`stack = [tree]`) and `while stack:` loop
- Added `stack.extend(ast.iter_child_nodes(node))` to traverse child nodes only when the current node isn't an Import or ImportFrom statement

**Why It's Faster:**
The key performance gain comes from **early pruning of the AST traversal**. In Python's AST:
- `ast.walk()` is a breadth-first traversal that visits **every single node** in the tree, regardless of whether we need to inspect them
- Import and ImportFrom statements are leaf-like nodes with no relevant children for our purposes
- The optimized version **skips traversing children** of Import/ImportFrom nodes by only calling `stack.extend()` in the `else` branch

Looking at the line profiler data confirms this:
- **Original**: `ast.walk(tree)` took **11.19ms** (77.3% of total runtime) across 1,778 node visits
- **Optimized**: The stack operations are distributed but the critical `stack.extend()` line only executes **204 times** (vs checking 1,778 nodes), taking 2.17ms (39.4% of total runtime)

The optimization effectively reduces the number of nodes processed by **~89%** (from 1,778 to ~992 total iterations based on the while loop hits), because once we identify an Import/ImportFrom node, we don't waste time visiting its children.

**Test Case Performance:**
The speedup is most dramatic for large-scale scenarios:
- `test_large_scale_many_imports`: **311% faster** (411μs → 100μs) - Many import statements benefit massively from avoiding unnecessary traversal
- `test_large_many_names_from_single_import`: **343% faster** (54.5μs → 12.3μs) - Large single import with many names
- `test_large_complex_submodule_structure`: **261% faster** (231μs → 64.1μs)

Even simple cases show consistent 80-140% improvements, demonstrating the overhead of `ast.walk()` is significant even for small trees.

**Impact on Workloads:**
This function collects numerical library imports, likely used for optimization analysis or dependency tracking in the Codeflash system. Since it processes ASTs of user code, any hot path that analyzes multiple files or large codebases will benefit substantially from this optimization. The stack-based approach is particularly effective because Python codebases typically have import statements at module-level or shallow nesting, making the early pruning strategy highly effective.
2026-02-04 00:11:17 +00:00
Ubuntu
5fe2cdf038 fix: use logging.getLogger instead of importing logger 2026-02-04 00:01:26 +00:00
Ubuntu
f51b3f23fd fix: add missing logger import for debug logging 2026-02-04 00:00:27 +00:00
aseembits93
de7bacabd3 unrelated precommit changes 2026-02-03 15:59:36 -08:00