Commit graph

186 commits

Author SHA1 Message Date
Kevin Turcios
ca951dd1f3 Rewrite sync instrumentation to decorator-based approach
Replace the old AST-injected codeflash_wrap/InjectPerfOnly sync path
with decorator-based instrumentation matching the async path:

- Add codeflash_performance_sync and codeflash_behavior_sync decorators
  with GPU device sync (torch CUDA/MPS, JAX, TensorFlow) via find_spec
- Add sync_devices_before/sync_devices_after with lazy cached detection
- Clean _instrumentation.py to a thin sync/async dispatcher (~47 lines)
- Remove dead code from _instrument_core.py (create_wrapper_function,
  create_device_sync_statements, get_call_arguments, etc.)
- Fix all production imports to point at source modules directly
- Drop underscore prefixes on internal helpers (connections, get_async_db,
  close_all_connections, detect_device_sync, etc.)
- Rewrite all test files for the new sync path assertions
- Add real-framework GPU device sync tests (torch, jax, tensorflow)
2026-04-24 05:54:32 -05:00
Kevin Turcios
918a2a10a4 feat: add sync instrumentation module with decorator-based approach
New _instrument_sync.py mirrors the async instrumentation pattern:
- SyncCallInstrumenter injects _codeflash_call_site.set() before sync calls
- SyncDecoratorAdder applies @codeflash_behavior_sync via libcst
- add_sync_decorator_to_function() decorates source files
- inject_sync_profiling_into_existing_test() instruments test files

Reuses the same helper file (codeflash_async_wrapper.py) since both
sync and async decorators live in _codeflash_async_decorators.py.
2026-04-24 04:54:45 -05:00
Kevin Turcios
8c218038e9 feat: add codeflash_behavior_sync decorator
Same pattern as the async behavior decorator: decorates the
function-under-test directly, captures return values, timing
(wall + CPU), and stdout into the shared async_results SQLite
table. This is the first step toward replacing the AST-injected
codeflash_wrap approach for sync functions.
2026-04-24 04:41:34 -05:00
Kevin Turcios
c9f65aba6b fix: capture stdout in async decorator and fix result merger
The async behavior decorator now captures stdout per invocation via
io.StringIO into a new `stdout` column in the async_results SQLite
table. The result merger prefers data-sourced stdout over XML stdout,
fixing the root cause of empty stdout in merged async results.

Also fixes: duplicate async parse block in _parse_results.py,
CODEFLASH_RUN_TMPDIR propagation to subprocesses, and removes
dead async code from _stdout_parsers.py and _wrap_decorator.py.
2026-04-24 04:35:02 -05:00
Kevin Turcios
629d7f9f08 feat: rewrite async instrumentation to use SQLite-only data path and contextvars
Replace the fragile stdout tag protocol with a unified SQLite table
(async_results) for all 3 async test modes. The new runtime decorators
write behavior, performance, and concurrency results directly to the DB
with zero stdout output. Test-file instrumentation now injects
_codeflash_call_site.set() (contextvar) instead of os.environ assignments,
which is correct for async task isolation.

New modules:
- runtime/_codeflash_async_decorators.py: self-contained decorators
- testing/_async_data_parser.py: SQLite reader replacing stdout parsing

Both at 100% test coverage (42 new tests).
2026-04-24 03:44:06 -05:00
Kevin Turcios
24199efc63 refactor: remove dead parameters from AsyncCallInstrumenter and inject_async_profiling
Drop unused module_path, mode, tests_project_root parameters and the
module_name_from_file_path import they required. Update all call sites.
2026-04-24 02:49:05 -05:00
Kevin Turcios
c670d637c0 refactor: clean up _instrument_async and add 100% test coverage
Remove dead code (unused fields, hasattr guard, duplicate decorator
set), rename _optimized_instrument_statement to _find_awaited_target_call,
simplify AsyncDecoratorAdder init and leave_FunctionDef. Add 21 new
unit tests covering all branches: non-test skipping, attribute calls,
class body recursion, counter independence, decorator deduplication
(name and call form), error handlers, and mode selection.
2026-04-24 02:45:07 -05:00
Kevin Turcios
2fd9d06e28 refactor: eliminate inline async decorator duplication and fix 10-column test gaps
Replace 218-line ASYNC_HELPER_INLINE_CODE string with shutil.copy2 of the
runtime decorator file. Update remaining test files for 10-column SQLite
schema (cpu_runtime). Add cpu_runtime assertions to async E2E tests.
2026-04-24 02:31:40 -05:00
Kevin Turcios
eb6a0be717 feat: add dual-clock instrumentation (wall + CPU time) and remove dead binary parser
Measure both wall-clock time (perf_counter_ns) and CPU thread time
(thread_time_ns) in instrumented test code. cpu_runtime is now a required
int field on FunctionTestInvocation, stored in the SQLite test_results
table as a 10th column.

Also fixes the sleeptime.py bug (10e9 → 1e9 divisor) and removes the
binary pickle parser (parse_test_return_values_bin) since no writer
exists in the current codebase — SQLite is the sole data capture path.
2026-04-24 02:21:22 -05:00
Kevin Turcios
0c622ac469 fix: loosen timing tolerance in time correction instrumentation tests
The busy-wait sleep function can overshoot by 90%+ under CPU contention
(observed 190ms for a 100ms target). The test verifies that
instrumentation produces runtimes in the right order of magnitude,
not that sleep timing is precise. Widen rel_tol from 0.05 to 1.0.
2026-04-24 01:38:06 -05:00
Kevin Turcios
fd88580ac8 test: add 262 tests for previously untested core modules
- test_danom_result.py: 58 tests for Ok/Err Result monad
- test_danom_stream.py: 65 tests for Stream pipeline operations
- test_model.py: 57 tests for core data models and serialization
- test_pipeline.py: 59 tests for pipeline utilities and candidate evaluation
- test_normalizer.py: 23 tests for code normalization including SyntaxError handling
2026-04-24 01:36:14 -05:00
Kevin Turcios
90a46d732c fix: harden error handling and add missing future annotations
Error handling:
- Protect ast.parse() in _normalizer.py (returns original on SyntaxError)
- Protect cst.parse_module() in _replacement.py (raises ValueError)
- Narrow except Exception to OSError/SyntaxError in _discovery.py (2 sites)
- Narrow except Exception to sqlite3.Error/OSError in _data_parsers.py
- Narrow pickle except to specific unpickling errors in _data_parsers.py

Missing future annotations:
- Add from __future__ import annotations to 12 __init__.py files
2026-04-24 01:36:04 -05:00
Kevin Turcios
6b73b07d15 fix: deduplicate code across codeflash-core and codeflash-python
- Extract _parse_candidates helper in _client.py (used by get_candidates and optimize_with_line_profiler)
- Parameterize URL resolution in _http.py (_resolve_url_from_env replaces two near-identical functions)
- Delegate get_repo_owner_and_name to parse_repo_owner_and_name in _git.py
- Simplify _par_apply_fns to delegate to _apply_fns in danom/stream.py
- Remove duplicate performance_gain from _verification.py (use codeflash_core's version)
- Extract _extract_pytest_error helper in _verification.py (replaces duplicated 6-line block)
- Consolidate collect_names_from_annotation into collect_type_names_from_annotation in _ast_helpers.py
- Add ast.Attribute handling and relax BinOp guard in collect_type_names_from_annotation
- Add unit tests for all extracted helpers
2026-04-23 22:39:50 -05:00
Kevin Turcios
ffadf16147
chore: add standup dashboard with CI audit integration (#36)
Dash app at .codeflash/standups/ for weekly eng meetings. Pulls live PR data across 4 org repos, renders markdown standup notes, integrates CI audit report with corrected billing numbers from real GitHub API data. Deployed to Plotly Cloud.
2026-04-23 18:52:33 -05:00
Kevin Turcios
3ee9c22c8e
fix: resolve all ruff lint errors across repo (#38)
* fix: resolve all ruff lint errors across repo

Auto-fixed 31 errors (unused imports, formatting, simplifications).
Manually fixed 14 remaining:
- EXE001: removed shebangs from non-executable bench scripts
- C417: replaced map(lambda) with generator expression
- C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests
- C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts
- RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign)
- FBT002: made boolean params keyword-only in report functions
- E402: moved `import re` to top of file in security reports

* fix: resolve pre-existing mypy errors across packages

- _testgen.py: annotate `generated` as `str` to avoid no-any-return
- _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes|str),
  remove unused type: ignore on proc.kill()
- _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return
  from lazy-loaded performance_gain
2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]
c249bcd0ce
chore: update tessl tiles 2026-04-23 (#35)
Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>
2026-04-23 08:15:44 -05:00
Kevin Turcios
9bc9ff2250 chore: add ready-to-merge gate for branch freshness 2026-04-23 08:12:25 -05:00
Kevin Turcios
5d02df9890
Merge pull request #34 from codeflash-ai/fix/tessl-app-token
fix: pass CI bot secrets to tessl update workflow
2026-04-23 08:04:58 -05:00
Kevin Turcios
3851fd4cf9 fix: pass CI bot secrets to tessl update workflow 2026-04-23 08:04:06 -05:00
Kevin Turcios
82f16dc9f0
Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions
fix: add permissions to tessl update caller workflow
2026-04-23 07:50:43 -05:00
Kevin Turcios
fa1b5ece4e fix: add permissions to tessl update caller workflow 2026-04-23 07:49:56 -05:00
Kevin Turcios
01dc370d09
Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup
chore: initialize tessl with vendored tiles
2026-04-23 07:44:58 -05:00
Kevin Turcios
f83ece9ad8 chore: initialize tessl with vendored tiles
Install 32 Python tiles in vendored mode, add MCP configs for all
agents, and set up weekly tile update workflow via reusable
github-workflows caller.
2026-04-23 07:43:04 -05:00
Kevin Turcios
616d078ecd Add CI optimization item to roadmap 2026-04-23 05:59:08 -05:00
Kevin Turcios
851bd61017 Add project roadmap 2026-04-23 05:27:58 -05:00
Kevin Turcios
72a8610fcf Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00
Kevin Turcios
57446aad31 Fix unawaited coroutine warning in test_default_timeout_is_600
The AsyncMock for wait_for discarded the coroutine from
proc.communicate() without consuming it. Replace with a side_effect
that closes the coroutine before returning the mock result.
2026-04-23 04:46:32 -05:00
Kevin Turcios
76a07c7f66 Add __test__ = False to Test*-prefixed domain model classes
Pytest's default collection pattern matches any class starting with
"Test". These domain models (TestType, TestResults, TestFiles, etc.)
are not test classes — mark them explicitly so pytest skips them.
2026-04-23 04:41:18 -05:00
Kevin Turcios
4f98b5421f Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope
These classes represent behavioral verification diffs, not tests. The
Test* prefix caused pytest to attempt collection and emit warnings.
2026-04-23 04:37:24 -05:00
Kevin Turcios
9e893675c9 Add Plotly Cloud deployment config for CI audit report 2026-04-23 03:59:35 -05:00
Kevin Turcios
c492164fbf Add codeflash org CI audit case study and interactive Dash report
Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README,
status, and raw data (fork activity, PRs merged).

Interactive Dash report in reports/codeflash-ci-audit/ with two tabs:
Executive Summary (hero metrics, cost impact charts, before/after) and
Full Detail (fork breakdown, findings table, PR inventory, methodology).

Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage
savings, 200+ forks disabled, 11 PRs merged across 2 repos.
2026-04-23 03:56:04 -05:00
Kevin Turcios
8221ce32a2 Add codeflash-ai/ci-audit to active case studies list 2026-04-23 03:52:10 -05:00
Kevin Turcios
0316d9822a Add testcontainers[postgres] to workspace dev dependencies
The 12 DB integration tests in codeflash-api need testcontainers to spin
up a real PostgreSQL instance via Docker. Was already declared in the
package's own dev deps but missing from the root workspace.
2026-04-23 03:52:07 -05:00
Kevin Turcios
bf8707695f Add codeflash-api to workspace dev dependencies
The package was a workspace member but not listed in the root dev
group, so its tests couldn't import codeflash_api when running
from the monorepo root.
2026-04-23 03:38:30 -05:00
Kevin Turcios
e3e74c3f2e Add missing pyproject.toml for codeflash-ci-audit workspace member 2026-04-23 03:34:03 -05:00
Kevin Turcios
e41a1bf56a Fix conftest collision between codeflash-api and github-app test suites
Both packages had tests/__init__.py, creating competing `tests`
packages under --import-mode=importlib. Remove both __init__.py files
and change github-app imports from `from tests.helpers` to
`from helpers` via sys.path insertion in conftest.py.
2026-04-23 03:33:58 -05:00
Kevin Turcios
43a4009294 Fix callee syntax validation in prepare_python_module
normalize_code no longer raises SyntaxError (it returns raw code as
fallback), so validate callee source with ast.parse() explicitly
before normalizing. Fixes test_callee_syntax_error_returns_none.
2026-04-23 03:28:58 -05:00
Kevin Turcios
bd5613d22f Update test-coverage.md: remove resolved callouts for covered modules 2026-04-23 03:13:28 -05:00
Kevin Turcios
e1990092e0 Add tests for error handling paths in ranking, refinement, and state
- test_ranking: Update normalize_code test to expect fallback on invalid
  syntax instead of SyntaxError (matches new behavior)
- test_refinement: Add 7 tests for _parse_candidate markdown parsing
  (fenced blocks, file paths, multiple blocks, plain fallback)
- test_state: Add 6 tests for PythonState.module_ast and invalidate_module
  (valid parse, caching, SyntaxError→None, re-parse after fix)
2026-04-23 03:13:23 -05:00
Kevin Turcios
9e679f1c06 Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement
- _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that
  previously silently swallowed errors
- _state.py: Wrap ast.parse() in SyntaxError handler, return None for
  malformed files
- _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw
  code string for dedup
- _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to
  _parse_candidate(), matching the pattern in _candidate_gen.py
- Update error-handling.md rules to reflect resolved issues
2026-04-23 03:06:03 -05:00
Kevin Turcios
dd7d2db451 Add unit tests for _benchmark_worker subprocess script
5 tests covering module-level argv parsing, project_root derivation,
benchmark plugin and trace decorator imports, and __main__ guard.
2026-04-23 02:31:38 -05:00
Kevin Turcios
e2135e39b2 Add unit tests for vendored _tabulate module
64 tests covering: tabulate() with pipe/simple formats, empty/None
data, dict input, numeric alignment, float formatting, whitespace
preservation, separating lines, firstrow headers. Internal helpers:
type detection, number parsing, ANSI stripping, padding, multiline
detection, pipe segment alignment. Integration test matching the
_create_pr use case.
2026-04-23 02:31:38 -05:00
Kevin Turcios
276c2f36da Add unit tests for _discovery_worker (collection parsing, plugin)
Covers parse_pytest_collection_results with top-level functions,
class methods, and empty input. Tests PytestCollectionPlugin
benchmark skipping, collection_finish pickle output, and item
accumulation. Uses sys.argv patching to handle module-level reads.
2026-04-23 02:28:23 -05:00
Kevin Turcios
957f299243 Add unit tests for _create_pr (PR creation, suggestion, error paths)
Covers PR number env var parsing, suggest-changes vs create-pr
branching, branch push failure, GitHub App not-installed warning,
and generic API error logging.
2026-04-23 02:24:01 -05:00
Kevin Turcios
c31fbc1e43 Add unit tests for _trace_db (sanitize, trace queries, run time)
Covers sanitize_to_filename edge cases, get_traced_arguments with
class filtering and invalid event types, and get_trace_total_run_time_ns
with missing files/tables/empty tables.
2026-04-23 02:23:56 -05:00
Kevin Turcios
cf7cf60936 Add unit tests for _candidate_gen (generate, repair, refinement)
Covers happy paths and error paths for generate_candidates,
repair_failed_candidates, and generate_refinement_candidates.
Tests AI service errors, unparseable markdown, missing runtime
data, and repair failures.
2026-04-23 02:23:52 -05:00
Kevin Turcios
815eba00c0 Fix unawaited coroutine warning in test_skips_ai_test_generation
optimize() is now async, so the test must use async def + await.
2026-04-23 01:47:26 -05:00
Kevin Turcios
92e39d6923 Convert remaining sync test runner callers to async
Replace all sync test runner calls (run_behavioral_tests,
run_benchmarking_tests, run_line_profile_tests) with their async
counterparts throughout the pipeline. This eliminates the
ThreadPoolExecutor in _baseline.py in favor of asyncio.gather(),
and makes _async_bench.py, _candidate_gen.py, and
_function_optimizer.py fully async. Adds async_run_line_profile_tests
and coverage support to async_run_behavioral_tests in _test_runner.py.
2026-04-23 01:46:01 -05:00
Kevin Turcios
a292698a1d Add pytest-cov to dev dependencies 2026-04-23 00:41:32 -05:00
Kevin Turcios
f204f8e740 Unify sync/async candidate eval into single async path
Delete the sync evaluate_candidate() and run_tests_and_benchmark()
functions — all callers now use the async versions. Rename
async_run_tests_and_benchmark → run_tests_and_benchmark and
async_evaluate_candidate_isolated → evaluate_candidate_isolated.

The entire optimization pipeline is now async with a single
asyncio.run() entry point in _cli.py:main(). PythonOptimizer.run()
and PythonFunctionOptimizer.optimize() are async coroutines.

Update test_candidate_eval.py and test_parallel_eval_integration.py
to match the unified API.
2026-04-23 00:41:28 -05:00