Commit graph

189 commits

Author SHA1 Message Date
Kevin Turcios
2c9f2ad8de Fix call-site IDs to use source line numbers instead of sequential counter
Restore the old InjectPerfOnly behavior where call-site identifiers
are the source line number of the instrumented statement. Also fix
the sync integration test to properly apply the decorator and write
the helper file, and remove dead imports from test_instrumentation.
2026-04-24 07:12:45 -05:00
Kevin Turcios
5b20981cd4 Unify SQLite schema into single codeflash_results table
Merge async_results and test_results tables into one 15-column
codeflash_results table with a dedicated cpu_time_ns column.
Consolidate file pattern to codeflash_results_{N}.sqlite and
delete the now-unused _data_parsers.py module.
2026-04-24 07:12:34 -05:00
Kevin Turcios
ba001950ee fix: restore clean bubble_sort_method.py test fixture 2026-04-24 05:55:32 -05:00
Kevin Turcios
ca951dd1f3 Rewrite sync instrumentation to decorator-based approach
Replace the old AST-injected codeflash_wrap/InjectPerfOnly sync path
with decorator-based instrumentation matching the async path:

- Add codeflash_performance_sync and codeflash_behavior_sync decorators
  with GPU device sync (torch CUDA/MPS, JAX, TensorFlow) via find_spec
- Add sync_devices_before/sync_devices_after with lazy cached detection
- Clean _instrumentation.py to a thin sync/async dispatcher (~47 lines)
- Remove dead code from _instrument_core.py (create_wrapper_function,
  create_device_sync_statements, get_call_arguments, etc.)
- Fix all production imports to point at source modules directly
- Drop underscore prefixes on internal helpers (connections, get_async_db,
  close_all_connections, detect_device_sync, etc.)
- Rewrite all test files for the new sync path assertions
- Add real-framework GPU device sync tests (torch, jax, tensorflow)
2026-04-24 05:54:32 -05:00
Kevin Turcios
918a2a10a4 feat: add sync instrumentation module with decorator-based approach
New _instrument_sync.py mirrors the async instrumentation pattern:
- SyncCallInstrumenter injects _codeflash_call_site.set() before sync calls
- SyncDecoratorAdder applies @codeflash_behavior_sync via libcst
- add_sync_decorator_to_function() decorates source files
- inject_sync_profiling_into_existing_test() instruments test files

Reuses the same helper file (codeflash_async_wrapper.py) since both
sync and async decorators live in _codeflash_async_decorators.py.
2026-04-24 04:54:45 -05:00
Kevin Turcios
8c218038e9 feat: add codeflash_behavior_sync decorator
Same pattern as the async behavior decorator: decorates the
function-under-test directly, captures return values, timing
(wall + CPU), and stdout into the shared async_results SQLite
table. This is the first step toward replacing the AST-injected
codeflash_wrap approach for sync functions.
2026-04-24 04:41:34 -05:00
Kevin Turcios
c9f65aba6b fix: capture stdout in async decorator and fix result merger
The async behavior decorator now captures stdout per invocation via
io.StringIO into a new `stdout` column in the async_results SQLite
table. The result merger prefers data-sourced stdout over XML stdout,
fixing the root cause of empty stdout in merged async results.

Also fixes: duplicate async parse block in _parse_results.py,
CODEFLASH_RUN_TMPDIR propagation to subprocesses, and removes
dead async code from _stdout_parsers.py and _wrap_decorator.py.
2026-04-24 04:35:02 -05:00
Kevin Turcios
629d7f9f08 feat: rewrite async instrumentation to use SQLite-only data path and contextvars
Replace the fragile stdout tag protocol with a unified SQLite table
(async_results) for all 3 async test modes. The new runtime decorators
write behavior, performance, and concurrency results directly to the DB
with zero stdout output. Test-file instrumentation now injects
_codeflash_call_site.set() (contextvar) instead of os.environ assignments,
which is correct for async task isolation.

New modules:
- runtime/_codeflash_async_decorators.py: self-contained decorators
- testing/_async_data_parser.py: SQLite reader replacing stdout parsing

Both at 100% test coverage (42 new tests).
2026-04-24 03:44:06 -05:00
Kevin Turcios
24199efc63 refactor: remove dead parameters from AsyncCallInstrumenter and inject_async_profiling
Drop unused module_path, mode, tests_project_root parameters and the
module_name_from_file_path import they required. Update all call sites.
2026-04-24 02:49:05 -05:00
Kevin Turcios
c670d637c0 refactor: clean up _instrument_async and add 100% test coverage
Remove dead code (unused fields, hasattr guard, duplicate decorator
set), rename _optimized_instrument_statement to _find_awaited_target_call,
simplify AsyncDecoratorAdder init and leave_FunctionDef. Add 21 new
unit tests covering all branches: non-test skipping, attribute calls,
class body recursion, counter independence, decorator deduplication
(name and call form), error handlers, and mode selection.
2026-04-24 02:45:07 -05:00
Kevin Turcios
2fd9d06e28 refactor: eliminate inline async decorator duplication and fix 10-column test gaps
Replace 218-line ASYNC_HELPER_INLINE_CODE string with shutil.copy2 of the
runtime decorator file. Update remaining test files for 10-column SQLite
schema (cpu_runtime). Add cpu_runtime assertions to async E2E tests.
2026-04-24 02:31:40 -05:00
Kevin Turcios
eb6a0be717 feat: add dual-clock instrumentation (wall + CPU time) and remove dead binary parser
Measure both wall-clock time (perf_counter_ns) and CPU thread time
(thread_time_ns) in instrumented test code. cpu_runtime is now a required
int field on FunctionTestInvocation, stored in the SQLite test_results
table as a 10th column.

Also fixes the sleeptime.py bug (10e9 → 1e9 divisor) and removes the
binary pickle parser (parse_test_return_values_bin) since no writer
exists in the current codebase — SQLite is the sole data capture path.
2026-04-24 02:21:22 -05:00
Kevin Turcios
0c622ac469 fix: loosen timing tolerance in time correction instrumentation tests
The busy-wait sleep function can overshoot by 90%+ under CPU contention
(observed 190ms for a 100ms target). The test verifies that
instrumentation produces runtimes in the right order of magnitude,
not that sleep timing is precise. Widen rel_tol from 0.05 to 1.0.
2026-04-24 01:38:06 -05:00
Kevin Turcios
fd88580ac8 test: add 262 tests for previously untested core modules
- test_danom_result.py: 58 tests for Ok/Err Result monad
- test_danom_stream.py: 65 tests for Stream pipeline operations
- test_model.py: 57 tests for core data models and serialization
- test_pipeline.py: 59 tests for pipeline utilities and candidate evaluation
- test_normalizer.py: 23 tests for code normalization including SyntaxError handling
2026-04-24 01:36:14 -05:00
Kevin Turcios
90a46d732c fix: harden error handling and add missing future annotations
Error handling:
- Protect ast.parse() in _normalizer.py (returns original on SyntaxError)
- Protect cst.parse_module() in _replacement.py (raises ValueError)
- Narrow except Exception to OSError/SyntaxError in _discovery.py (2 sites)
- Narrow except Exception to sqlite3.Error/OSError in _data_parsers.py
- Narrow pickle except to specific unpickling errors in _data_parsers.py

Missing future annotations:
- Add from __future__ import annotations to 12 __init__.py files
2026-04-24 01:36:04 -05:00
Kevin Turcios
6b73b07d15 fix: deduplicate code across codeflash-core and codeflash-python
- Extract _parse_candidates helper in _client.py (used by get_candidates and optimize_with_line_profiler)
- Parameterize URL resolution in _http.py (_resolve_url_from_env replaces two near-identical functions)
- Delegate get_repo_owner_and_name to parse_repo_owner_and_name in _git.py
- Simplify _par_apply_fns to delegate to _apply_fns in danom/stream.py
- Remove duplicate performance_gain from _verification.py (use codeflash_core's version)
- Extract _extract_pytest_error helper in _verification.py (replaces duplicated 6-line block)
- Consolidate collect_names_from_annotation into collect_type_names_from_annotation in _ast_helpers.py
- Add ast.Attribute handling and relax BinOp guard in collect_type_names_from_annotation
- Add unit tests for all extracted helpers
2026-04-23 22:39:50 -05:00
Kevin Turcios
ffadf16147
chore: add standup dashboard with CI audit integration (#36)
Dash app at .codeflash/standups/ for weekly eng meetings. Pulls live PR data across 4 org repos, renders markdown standup notes, integrates CI audit report with corrected billing numbers from real GitHub API data. Deployed to Plotly Cloud.
2026-04-23 18:52:33 -05:00
Kevin Turcios
3ee9c22c8e
fix: resolve all ruff lint errors across repo (#38)
* fix: resolve all ruff lint errors across repo

Auto-fixed 31 errors (unused imports, formatting, simplifications).
Manually fixed 14 remaining:
- EXE001: removed shebangs from non-executable bench scripts
- C417: replaced map(lambda) with generator expression
- C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests
- C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts
- RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign)
- FBT002: made boolean params keyword-only in report functions
- E402: moved `import re` to top of file in security reports

* fix: resolve pre-existing mypy errors across packages

- _testgen.py: annotate `generated` as `str` to avoid no-any-return
- _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes|str),
  remove unused type: ignore on proc.kill()
- _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return
  from lazy-loaded performance_gain
2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]
c249bcd0ce
chore: update tessl tiles 2026-04-23 (#35)
Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>
2026-04-23 08:15:44 -05:00
Kevin Turcios
9bc9ff2250 chore: add ready-to-merge gate for branch freshness 2026-04-23 08:12:25 -05:00
Kevin Turcios
5d02df9890
Merge pull request #34 from codeflash-ai/fix/tessl-app-token
fix: pass CI bot secrets to tessl update workflow
2026-04-23 08:04:58 -05:00
Kevin Turcios
3851fd4cf9 fix: pass CI bot secrets to tessl update workflow 2026-04-23 08:04:06 -05:00
Kevin Turcios
82f16dc9f0
Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions
fix: add permissions to tessl update caller workflow
2026-04-23 07:50:43 -05:00
Kevin Turcios
fa1b5ece4e fix: add permissions to tessl update caller workflow 2026-04-23 07:49:56 -05:00
Kevin Turcios
01dc370d09
Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup
chore: initialize tessl with vendored tiles
2026-04-23 07:44:58 -05:00
Kevin Turcios
f83ece9ad8 chore: initialize tessl with vendored tiles
Install 32 Python tiles in vendored mode, add MCP configs for all
agents, and set up weekly tile update workflow via reusable
github-workflows caller.
2026-04-23 07:43:04 -05:00
Kevin Turcios
616d078ecd Add CI optimization item to roadmap 2026-04-23 05:59:08 -05:00
Kevin Turcios
851bd61017 Add project roadmap 2026-04-23 05:27:58 -05:00
Kevin Turcios
72a8610fcf Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00
Kevin Turcios
57446aad31 Fix unawaited coroutine warning in test_default_timeout_is_600
The AsyncMock for wait_for discarded the coroutine from
proc.communicate() without consuming it. Replace with a side_effect
that closes the coroutine before returning the mock result.
2026-04-23 04:46:32 -05:00
Kevin Turcios
76a07c7f66 Add __test__ = False to Test*-prefixed domain model classes
Pytest's default collection pattern matches any class starting with
"Test". These domain models (TestType, TestResults, TestFiles, etc.)
are not test classes — mark them explicitly so pytest skips them.
2026-04-23 04:41:18 -05:00
Kevin Turcios
4f98b5421f Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope
These classes represent behavioral verification diffs, not tests. The
Test* prefix caused pytest to attempt collection and emit warnings.
2026-04-23 04:37:24 -05:00
Kevin Turcios
9e893675c9 Add Plotly Cloud deployment config for CI audit report 2026-04-23 03:59:35 -05:00
Kevin Turcios
c492164fbf Add codeflash org CI audit case study and interactive Dash report
Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README,
status, and raw data (fork activity, PRs merged).

Interactive Dash report in reports/codeflash-ci-audit/ with two tabs:
Executive Summary (hero metrics, cost impact charts, before/after) and
Full Detail (fork breakdown, findings table, PR inventory, methodology).

Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage
savings, 200+ forks disabled, 11 PRs merged across 2 repos.
2026-04-23 03:56:04 -05:00
Kevin Turcios
8221ce32a2 Add codeflash-ai/ci-audit to active case studies list 2026-04-23 03:52:10 -05:00
Kevin Turcios
0316d9822a Add testcontainers[postgres] to workspace dev dependencies
The 12 DB integration tests in codeflash-api need testcontainers to spin
up a real PostgreSQL instance via Docker. Was already declared in the
package's own dev deps but missing from the root workspace.
2026-04-23 03:52:07 -05:00
Kevin Turcios
bf8707695f Add codeflash-api to workspace dev dependencies
The package was a workspace member but not listed in the root dev
group, so its tests couldn't import codeflash_api when running
from the monorepo root.
2026-04-23 03:38:30 -05:00
Kevin Turcios
e3e74c3f2e Add missing pyproject.toml for codeflash-ci-audit workspace member 2026-04-23 03:34:03 -05:00
Kevin Turcios
e41a1bf56a Fix conftest collision between codeflash-api and github-app test suites
Both packages had tests/__init__.py, creating competing `tests`
packages under --import-mode=importlib. Remove both __init__.py files
and change github-app imports from `from tests.helpers` to
`from helpers` via sys.path insertion in conftest.py.
2026-04-23 03:33:58 -05:00
Kevin Turcios
43a4009294 Fix callee syntax validation in prepare_python_module
normalize_code no longer raises SyntaxError (it returns raw code as
fallback), so validate callee source with ast.parse() explicitly
before normalizing. Fixes test_callee_syntax_error_returns_none.
2026-04-23 03:28:58 -05:00
Kevin Turcios
bd5613d22f Update test-coverage.md: remove resolved callouts for covered modules 2026-04-23 03:13:28 -05:00
Kevin Turcios
e1990092e0 Add tests for error handling paths in ranking, refinement, and state
- test_ranking: Update normalize_code test to expect fallback on invalid
  syntax instead of SyntaxError (matches new behavior)
- test_refinement: Add 7 tests for _parse_candidate markdown parsing
  (fenced blocks, file paths, multiple blocks, plain fallback)
- test_state: Add 6 tests for PythonState.module_ast and invalidate_module
  (valid parse, caching, SyntaxError→None, re-parse after fix)
2026-04-23 03:13:23 -05:00
Kevin Turcios
9e679f1c06 Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement
- _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that
  previously silently swallowed errors
- _state.py: Wrap ast.parse() in SyntaxError handler, return None for
  malformed files
- _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw
  code string for dedup
- _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to
  _parse_candidate(), matching the pattern in _candidate_gen.py
- Update error-handling.md rules to reflect resolved issues
2026-04-23 03:06:03 -05:00
Kevin Turcios
dd7d2db451 Add unit tests for _benchmark_worker subprocess script
5 tests covering module-level argv parsing, project_root derivation,
benchmark plugin and trace decorator imports, and __main__ guard.
2026-04-23 02:31:38 -05:00
Kevin Turcios
e2135e39b2 Add unit tests for vendored _tabulate module
64 tests covering: tabulate() with pipe/simple formats, empty/None
data, dict input, numeric alignment, float formatting, whitespace
preservation, separating lines, firstrow headers. Internal helpers:
type detection, number parsing, ANSI stripping, padding, multiline
detection, pipe segment alignment. Integration test matching the
_create_pr use case.
2026-04-23 02:31:38 -05:00
Kevin Turcios
276c2f36da Add unit tests for _discovery_worker (collection parsing, plugin)
Covers parse_pytest_collection_results with top-level functions,
class methods, and empty input. Tests PytestCollectionPlugin
benchmark skipping, collection_finish pickle output, and item
accumulation. Uses sys.argv patching to handle module-level reads.
2026-04-23 02:28:23 -05:00
Kevin Turcios
957f299243 Add unit tests for _create_pr (PR creation, suggestion, error paths)
Covers PR number env var parsing, suggest-changes vs create-pr
branching, branch push failure, GitHub App not-installed warning,
and generic API error logging.
2026-04-23 02:24:01 -05:00
Kevin Turcios
c31fbc1e43 Add unit tests for _trace_db (sanitize, trace queries, run time)
Covers sanitize_to_filename edge cases, get_traced_arguments with
class filtering and invalid event types, and get_trace_total_run_time_ns
with missing files/tables/empty tables.
2026-04-23 02:23:56 -05:00
Kevin Turcios
cf7cf60936 Add unit tests for _candidate_gen (generate, repair, refinement)
Covers happy paths and error paths for generate_candidates,
repair_failed_candidates, and generate_refinement_candidates.
Tests AI service errors, unparseable markdown, missing runtime
data, and repair failures.
2026-04-23 02:23:52 -05:00
Kevin Turcios
815eba00c0 Fix unawaited coroutine warning in test_skips_ai_test_generation
optimize() is now async, so the test must use async def + await.
2026-04-23 01:47:26 -05:00