* chore: add gitignore entries for local eval repos, e2e fixtures, and env files
* fix: restore clean bubble_sort_method.py test fixture
The call-site ID commit re-contaminated this file with instrumentation
decorators, causing tests to fail with missing CODEFLASH_LOOP_INDEX.
* fix: resolve ruff and mypy errors in codeflash-python
- Add import-not-found ignores for optional torch/jax imports
- Extract magic column index to _STDOUT_COLUMN_INDEX constant
- Fix unused variable in _instrument_sync.py
- Cast cpu_time_ns to int for mypy arg-type
* fix: add skip markers for optional deps and apply ruff formatting to tests
Skip torch/jax/tensorflow tests when those packages are not installed.
Move has_module helper to conftest.py for reuse across test files.
Apply ruff format to all test files that drifted.
* fix: resolve remaining ruff format and mypy errors
- Add missing blank line in conftest.py (ruff format)
- Remove unused import-untyped ignore on jax import (mypy unused-ignore)
- Add type: ignore comments for object-typed SQLite row values
* chore: bump codeflash-python to 0.1.1.dev0
Restore the old InjectPerfOnly behavior where call-site identifiers
are the source line number of the instrumented statement. Also fix
the sync integration test to properly apply the decorator and write
the helper file, and remove dead imports from test_instrumentation.
Merge async_results and test_results tables into one 15-column
codeflash_results table with a dedicated cpu_time_ns column.
Consolidate file pattern to codeflash_results_{N}.sqlite and
delete the now-unused _data_parsers.py module.
Replace the old AST-injected codeflash_wrap/InjectPerfOnly sync path
with decorator-based instrumentation matching the async path:
- Add codeflash_performance_sync and codeflash_behavior_sync decorators
with GPU device sync (torch CUDA/MPS, JAX, TensorFlow) via find_spec
- Add sync_devices_before/sync_devices_after with lazy cached detection
- Clean _instrumentation.py to a thin sync/async dispatcher (~47 lines)
- Remove dead code from _instrument_core.py (create_wrapper_function,
create_device_sync_statements, get_call_arguments, etc.)
- Fix all production imports to point at source modules directly
- Drop underscore prefixes on internal helpers (connections, get_async_db,
close_all_connections, detect_device_sync, etc.)
- Rewrite all test files for the new sync path assertions
- Add real-framework GPU device sync tests (torch, jax, tensorflow)
New _instrument_sync.py mirrors the async instrumentation pattern:
- SyncCallInstrumenter injects _codeflash_call_site.set() before sync calls
- SyncDecoratorAdder applies @codeflash_behavior_sync via libcst
- add_sync_decorator_to_function() decorates source files
- inject_sync_profiling_into_existing_test() instruments test files
Reuses the same helper file (codeflash_async_wrapper.py) since both
sync and async decorators live in _codeflash_async_decorators.py.
Same pattern as the async behavior decorator: decorates the
function-under-test directly, captures return values, timing
(wall + CPU), and stdout into the shared async_results SQLite
table. This is the first step toward replacing the AST-injected
codeflash_wrap approach for sync functions.
The async behavior decorator now captures stdout per invocation via
io.StringIO into a new `stdout` column in the async_results SQLite
table. The result merger prefers data-sourced stdout over XML stdout,
fixing the root cause of empty stdout in merged async results.
Also fixes: duplicate async parse block in _parse_results.py,
CODEFLASH_RUN_TMPDIR propagation to subprocesses, and removes
dead async code from _stdout_parsers.py and _wrap_decorator.py.
Replace the fragile stdout tag protocol with a unified SQLite table
(async_results) for all 3 async test modes. The new runtime decorators
write behavior, performance, and concurrency results directly to the DB
with zero stdout output. Test-file instrumentation now injects
_codeflash_call_site.set() (contextvar) instead of os.environ assignments,
which is correct for async task isolation.
New modules:
- runtime/_codeflash_async_decorators.py: self-contained decorators
- testing/_async_data_parser.py: SQLite reader replacing stdout parsing
Both at 100% test coverage (42 new tests).
Remove dead code (unused fields, hasattr guard, duplicate decorator
set), rename _optimized_instrument_statement to _find_awaited_target_call,
simplify AsyncDecoratorAdder init and leave_FunctionDef. Add 21 new
unit tests covering all branches: non-test skipping, attribute calls,
class body recursion, counter independence, decorator deduplication
(name and call form), error handlers, and mode selection.
Replace 218-line ASYNC_HELPER_INLINE_CODE string with shutil.copy2 of the
runtime decorator file. Update remaining test files for 10-column SQLite
schema (cpu_runtime). Add cpu_runtime assertions to async E2E tests.
Measure both wall-clock time (perf_counter_ns) and CPU thread time
(thread_time_ns) in instrumented test code. cpu_runtime is now a required
int field on FunctionTestInvocation, stored in the SQLite test_results
table as a 10th column.
Also fixes the sleeptime.py bug (10e9 → 1e9 divisor) and removes the
binary pickle parser (parse_test_return_values_bin) since no writer
exists in the current codebase — SQLite is the sole data capture path.
The busy-wait sleep function can overshoot by 90%+ under CPU contention
(observed 190ms for a 100ms target). The test verifies that
instrumentation produces runtimes in the right order of magnitude,
not that sleep timing is precise. Widen rel_tol from 0.05 to 1.0.
- test_danom_result.py: 58 tests for Ok/Err Result monad
- test_danom_stream.py: 65 tests for Stream pipeline operations
- test_model.py: 57 tests for core data models and serialization
- test_pipeline.py: 59 tests for pipeline utilities and candidate evaluation
- test_normalizer.py: 23 tests for code normalization including SyntaxError handling
Error handling:
- Protect ast.parse() in _normalizer.py (returns original on SyntaxError)
- Protect cst.parse_module() in _replacement.py (raises ValueError)
- Narrow except Exception to OSError/SyntaxError in _discovery.py (2 sites)
- Narrow except Exception to sqlite3.Error/OSError in _data_parsers.py
- Narrow pickle except to specific unpickling errors in _data_parsers.py
Missing future annotations:
- Add from __future__ import annotations to 12 __init__.py files
- Extract _parse_candidates helper in _client.py (used by get_candidates and optimize_with_line_profiler)
- Parameterize URL resolution in _http.py (_resolve_url_from_env replaces two near-identical functions)
- Delegate get_repo_owner_and_name to parse_repo_owner_and_name in _git.py
- Simplify _par_apply_fns to delegate to _apply_fns in danom/stream.py
- Remove duplicate performance_gain from _verification.py (use codeflash_core's version)
- Extract _extract_pytest_error helper in _verification.py (replaces duplicated 6-line block)
- Consolidate collect_names_from_annotation into collect_type_names_from_annotation in _ast_helpers.py
- Add ast.Attribute handling and relax BinOp guard in collect_type_names_from_annotation
- Add unit tests for all extracted helpers
Dash app at .codeflash/standups/ for weekly eng meetings. Pulls live PR data across 4 org repos, renders markdown standup notes, integrates CI audit report with corrected billing numbers from real GitHub API data. Deployed to Plotly Cloud.
The AsyncMock for wait_for discarded the coroutine from
proc.communicate() without consuming it. Replace with a side_effect
that closes the coroutine before returning the mock result.
Pytest's default collection pattern matches any class starting with
"Test". These domain models (TestType, TestResults, TestFiles, etc.)
are not test classes — mark them explicitly so pytest skips them.
Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README,
status, and raw data (fork activity, PRs merged).
Interactive Dash report in reports/codeflash-ci-audit/ with two tabs:
Executive Summary (hero metrics, cost impact charts, before/after) and
Full Detail (fork breakdown, findings table, PR inventory, methodology).
Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage
savings, 200+ forks disabled, 11 PRs merged across 2 repos.
The 12 DB integration tests in codeflash-api need testcontainers to spin
up a real PostgreSQL instance via Docker. Was already declared in the
package's own dev deps but missing from the root workspace.
The package was a workspace member but not listed in the root dev
group, so its tests couldn't import codeflash_api when running
from the monorepo root.
Both packages had tests/__init__.py, creating competing `tests`
packages under --import-mode=importlib. Remove both __init__.py files
and change github-app imports from `from tests.helpers` to
`from helpers` via sys.path insertion in conftest.py.
normalize_code no longer raises SyntaxError (it returns raw code as
fallback), so validate callee source with ast.parse() explicitly
before normalizing. Fixes test_callee_syntax_error_returns_none.
- _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that
previously silently swallowed errors
- _state.py: Wrap ast.parse() in SyntaxError handler, return None for
malformed files
- _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw
code string for dedup
- _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to
_parse_candidate(), matching the pattern in _candidate_gen.py
- Update error-handling.md rules to reflect resolved issues
Covers PR number env var parsing, suggest-changes vs create-pr
branching, branch push failure, GitHub App not-installed warning,
and generic API error logging.
Covers sanitize_to_filename edge cases, get_traced_arguments with
class filtering and invalid event types, and get_trace_total_run_time_ns
with missing files/tables/empty tables.