codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
Kevin Turcios	2c9f2ad8de	Fix call-site IDs to use source line numbers instead of sequential counter Restore the old InjectPerfOnly behavior where call-site identifiers are the source line number of the instrumented statement. Also fix the sync integration test to properly apply the decorator and write the helper file, and remove dead imports from test_instrumentation.	2026-04-24 07:12:45 -05:00
Kevin Turcios	5b20981cd4	Unify SQLite schema into single codeflash_results table Merge async_results and test_results tables into one 15-column codeflash_results table with a dedicated cpu_time_ns column. Consolidate file pattern to codeflash_results_{N}.sqlite and delete the now-unused _data_parsers.py module.	2026-04-24 07:12:34 -05:00
Kevin Turcios	ba001950ee	fix: restore clean bubble_sort_method.py test fixture	2026-04-24 05:55:32 -05:00
Kevin Turcios	ca951dd1f3	Rewrite sync instrumentation to decorator-based approach Replace the old AST-injected codeflash_wrap/InjectPerfOnly sync path with decorator-based instrumentation matching the async path: - Add codeflash_performance_sync and codeflash_behavior_sync decorators with GPU device sync (torch CUDA/MPS, JAX, TensorFlow) via find_spec - Add sync_devices_before/sync_devices_after with lazy cached detection - Clean _instrumentation.py to a thin sync/async dispatcher (~47 lines) - Remove dead code from _instrument_core.py (create_wrapper_function, create_device_sync_statements, get_call_arguments, etc.) - Fix all production imports to point at source modules directly - Drop underscore prefixes on internal helpers (connections, get_async_db, close_all_connections, detect_device_sync, etc.) - Rewrite all test files for the new sync path assertions - Add real-framework GPU device sync tests (torch, jax, tensorflow)	2026-04-24 05:54:32 -05:00
Kevin Turcios	918a2a10a4	feat: add sync instrumentation module with decorator-based approach New _instrument_sync.py mirrors the async instrumentation pattern: - SyncCallInstrumenter injects _codeflash_call_site.set() before sync calls - SyncDecoratorAdder applies @codeflash_behavior_sync via libcst - add_sync_decorator_to_function() decorates source files - inject_sync_profiling_into_existing_test() instruments test files Reuses the same helper file (codeflash_async_wrapper.py) since both sync and async decorators live in _codeflash_async_decorators.py.	2026-04-24 04:54:45 -05:00
Kevin Turcios	8c218038e9	feat: add codeflash_behavior_sync decorator Same pattern as the async behavior decorator: decorates the function-under-test directly, captures return values, timing (wall + CPU), and stdout into the shared async_results SQLite table. This is the first step toward replacing the AST-injected codeflash_wrap approach for sync functions.	2026-04-24 04:41:34 -05:00
Kevin Turcios	c9f65aba6b	fix: capture stdout in async decorator and fix result merger The async behavior decorator now captures stdout per invocation via io.StringIO into a new `stdout` column in the async_results SQLite table. The result merger prefers data-sourced stdout over XML stdout, fixing the root cause of empty stdout in merged async results. Also fixes: duplicate async parse block in _parse_results.py, CODEFLASH_RUN_TMPDIR propagation to subprocesses, and removes dead async code from _stdout_parsers.py and _wrap_decorator.py.	2026-04-24 04:35:02 -05:00
Kevin Turcios	629d7f9f08	feat: rewrite async instrumentation to use SQLite-only data path and contextvars Replace the fragile stdout tag protocol with a unified SQLite table (async_results) for all 3 async test modes. The new runtime decorators write behavior, performance, and concurrency results directly to the DB with zero stdout output. Test-file instrumentation now injects _codeflash_call_site.set() (contextvar) instead of os.environ assignments, which is correct for async task isolation. New modules: - runtime/_codeflash_async_decorators.py: self-contained decorators - testing/_async_data_parser.py: SQLite reader replacing stdout parsing Both at 100% test coverage (42 new tests).	2026-04-24 03:44:06 -05:00
Kevin Turcios	24199efc63	refactor: remove dead parameters from AsyncCallInstrumenter and inject_async_profiling Drop unused module_path, mode, tests_project_root parameters and the module_name_from_file_path import they required. Update all call sites.	2026-04-24 02:49:05 -05:00
Kevin Turcios	c670d637c0	refactor: clean up _instrument_async and add 100% test coverage Remove dead code (unused fields, hasattr guard, duplicate decorator set), rename _optimized_instrument_statement to _find_awaited_target_call, simplify AsyncDecoratorAdder init and leave_FunctionDef. Add 21 new unit tests covering all branches: non-test skipping, attribute calls, class body recursion, counter independence, decorator deduplication (name and call form), error handlers, and mode selection.	2026-04-24 02:45:07 -05:00
Kevin Turcios	2fd9d06e28	refactor: eliminate inline async decorator duplication and fix 10-column test gaps Replace 218-line ASYNC_HELPER_INLINE_CODE string with shutil.copy2 of the runtime decorator file. Update remaining test files for 10-column SQLite schema (cpu_runtime). Add cpu_runtime assertions to async E2E tests.	2026-04-24 02:31:40 -05:00
Kevin Turcios	eb6a0be717	feat: add dual-clock instrumentation (wall + CPU time) and remove dead binary parser Measure both wall-clock time (perf_counter_ns) and CPU thread time (thread_time_ns) in instrumented test code. cpu_runtime is now a required int field on FunctionTestInvocation, stored in the SQLite test_results table as a 10th column. Also fixes the sleeptime.py bug (10e9 → 1e9 divisor) and removes the binary pickle parser (parse_test_return_values_bin) since no writer exists in the current codebase — SQLite is the sole data capture path.	2026-04-24 02:21:22 -05:00
Kevin Turcios	0c622ac469	fix: loosen timing tolerance in time correction instrumentation tests The busy-wait sleep function can overshoot by 90%+ under CPU contention (observed 190ms for a 100ms target). The test verifies that instrumentation produces runtimes in the right order of magnitude, not that sleep timing is precise. Widen rel_tol from 0.05 to 1.0.	2026-04-24 01:38:06 -05:00
Kevin Turcios	fd88580ac8	test: add 262 tests for previously untested core modules - test_danom_result.py: 58 tests for Ok/Err Result monad - test_danom_stream.py: 65 tests for Stream pipeline operations - test_model.py: 57 tests for core data models and serialization - test_pipeline.py: 59 tests for pipeline utilities and candidate evaluation - test_normalizer.py: 23 tests for code normalization including SyntaxError handling	2026-04-24 01:36:14 -05:00
Kevin Turcios	90a46d732c	fix: harden error handling and add missing future annotations Error handling: - Protect ast.parse() in _normalizer.py (returns original on SyntaxError) - Protect cst.parse_module() in _replacement.py (raises ValueError) - Narrow except Exception to OSError/SyntaxError in _discovery.py (2 sites) - Narrow except Exception to sqlite3.Error/OSError in _data_parsers.py - Narrow pickle except to specific unpickling errors in _data_parsers.py Missing future annotations: - Add from __future__ import annotations to 12 __init__.py files	2026-04-24 01:36:04 -05:00
Kevin Turcios	6b73b07d15	fix: deduplicate code across codeflash-core and codeflash-python - Extract _parse_candidates helper in _client.py (used by get_candidates and optimize_with_line_profiler) - Parameterize URL resolution in _http.py (_resolve_url_from_env replaces two near-identical functions) - Delegate get_repo_owner_and_name to parse_repo_owner_and_name in _git.py - Simplify _par_apply_fns to delegate to _apply_fns in danom/stream.py - Remove duplicate performance_gain from _verification.py (use codeflash_core's version) - Extract _extract_pytest_error helper in _verification.py (replaces duplicated 6-line block) - Consolidate collect_names_from_annotation into collect_type_names_from_annotation in _ast_helpers.py - Add ast.Attribute handling and relax BinOp guard in collect_type_names_from_annotation - Add unit tests for all extracted helpers	2026-04-23 22:39:50 -05:00
Kevin Turcios	ffadf16147	chore: add standup dashboard with CI audit integration (#36 ) Dash app at .codeflash/standups/ for weekly eng meetings. Pulls live PR data across 4 org repos, renders markdown standup notes, integrates CI audit report with corrected billing numbers from real GitHub API data. Deployed to Plotly Cloud.	2026-04-23 18:52:33 -05:00
Kevin Turcios	3ee9c22c8e	fix: resolve all ruff lint errors across repo (#38 ) * fix: resolve all ruff lint errors across repo Auto-fixed 31 errors (unused imports, formatting, simplifications). Manually fixed 14 remaining: - EXE001: removed shebangs from non-executable bench scripts - C417: replaced map(lambda) with generator expression - C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests - C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts - RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign) - FBT002: made boolean params keyword-only in report functions - E402: moved `import re` to top of file in security reports * fix: resolve pre-existing mypy errors across packages - _testgen.py: annotate `generated` as `str` to avoid no-any-return - _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes\|str), remove unused type: ignore on proc.kill() - _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return from lazy-loaded performance_gain	2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]	c249bcd0ce	chore: update tessl tiles 2026-04-23 (#35 ) Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>	2026-04-23 08:15:44 -05:00
Kevin Turcios	9bc9ff2250	chore: add ready-to-merge gate for branch freshness	2026-04-23 08:12:25 -05:00
Kevin Turcios	5d02df9890	Merge pull request #34 from codeflash-ai/fix/tessl-app-token fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:58 -05:00
Kevin Turcios	3851fd4cf9	fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:06 -05:00
Kevin Turcios	82f16dc9f0	Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions fix: add permissions to tessl update caller workflow	2026-04-23 07:50:43 -05:00
Kevin Turcios	fa1b5ece4e	fix: add permissions to tessl update caller workflow	2026-04-23 07:49:56 -05:00
Kevin Turcios	01dc370d09	Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup chore: initialize tessl with vendored tiles	2026-04-23 07:44:58 -05:00
Kevin Turcios	f83ece9ad8	chore: initialize tessl with vendored tiles Install 32 Python tiles in vendored mode, add MCP configs for all agents, and set up weekly tile update workflow via reusable github-workflows caller.	2026-04-23 07:43:04 -05:00
Kevin Turcios	616d078ecd	Add CI optimization item to roadmap	2026-04-23 05:59:08 -05:00
Kevin Turcios	851bd61017	Add project roadmap	2026-04-23 05:27:58 -05:00
Kevin Turcios	72a8610fcf	Add vulture to dev dependencies for dead code detection	2026-04-23 04:57:32 -05:00
Kevin Turcios	57446aad31	Fix unawaited coroutine warning in test_default_timeout_is_600 The AsyncMock for wait_for discarded the coroutine from proc.communicate() without consuming it. Replace with a side_effect that closes the coroutine before returning the mock result.	2026-04-23 04:46:32 -05:00
Kevin Turcios	76a07c7f66	Add __test__ = False to Test*-prefixed domain model classes Pytest's default collection pattern matches any class starting with "Test". These domain models (TestType, TestResults, TestFiles, etc.) are not test classes — mark them explicitly so pytest skips them.	2026-04-23 04:41:18 -05:00
Kevin Turcios	4f98b5421f	Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope These classes represent behavioral verification diffs, not tests. The Test* prefix caused pytest to attempt collection and emit warnings.	2026-04-23 04:37:24 -05:00
Kevin Turcios	9e893675c9	Add Plotly Cloud deployment config for CI audit report	2026-04-23 03:59:35 -05:00
Kevin Turcios	c492164fbf	Add codeflash org CI audit case study and interactive Dash report Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README, status, and raw data (fork activity, PRs merged). Interactive Dash report in reports/codeflash-ci-audit/ with two tabs: Executive Summary (hero metrics, cost impact charts, before/after) and Full Detail (fork breakdown, findings table, PR inventory, methodology). Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage savings, 200+ forks disabled, 11 PRs merged across 2 repos.	2026-04-23 03:56:04 -05:00
Kevin Turcios	8221ce32a2	Add codeflash-ai/ci-audit to active case studies list	2026-04-23 03:52:10 -05:00
Kevin Turcios	0316d9822a	Add testcontainers[postgres] to workspace dev dependencies The 12 DB integration tests in codeflash-api need testcontainers to spin up a real PostgreSQL instance via Docker. Was already declared in the package's own dev deps but missing from the root workspace.	2026-04-23 03:52:07 -05:00
Kevin Turcios	bf8707695f	Add codeflash-api to workspace dev dependencies The package was a workspace member but not listed in the root dev group, so its tests couldn't import codeflash_api when running from the monorepo root.	2026-04-23 03:38:30 -05:00
Kevin Turcios	e3e74c3f2e	Add missing pyproject.toml for codeflash-ci-audit workspace member	2026-04-23 03:34:03 -05:00
Kevin Turcios	e41a1bf56a	Fix conftest collision between codeflash-api and github-app test suites Both packages had tests/__init__.py, creating competing `tests` packages under --import-mode=importlib. Remove both __init__.py files and change github-app imports from `from tests.helpers` to `from helpers` via sys.path insertion in conftest.py.	2026-04-23 03:33:58 -05:00
Kevin Turcios	43a4009294	Fix callee syntax validation in prepare_python_module normalize_code no longer raises SyntaxError (it returns raw code as fallback), so validate callee source with ast.parse() explicitly before normalizing. Fixes test_callee_syntax_error_returns_none.	2026-04-23 03:28:58 -05:00
Kevin Turcios	bd5613d22f	Update test-coverage.md: remove resolved callouts for covered modules	2026-04-23 03:13:28 -05:00
Kevin Turcios	e1990092e0	Add tests for error handling paths in ranking, refinement, and state - test_ranking: Update normalize_code test to expect fallback on invalid syntax instead of SyntaxError (matches new behavior) - test_refinement: Add 7 tests for _parse_candidate markdown parsing (fenced blocks, file paths, multiple blocks, plain fallback) - test_state: Add 6 tests for PythonState.module_ast and invalidate_module (valid parse, caching, SyntaxError→None, re-parse after fix)	2026-04-23 03:13:23 -05:00
Kevin Turcios	9e679f1c06	Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement - _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that previously silently swallowed errors - _state.py: Wrap ast.parse() in SyntaxError handler, return None for malformed files - _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw code string for dedup - _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to _parse_candidate(), matching the pattern in _candidate_gen.py - Update error-handling.md rules to reflect resolved issues	2026-04-23 03:06:03 -05:00
Kevin Turcios	dd7d2db451	Add unit tests for _benchmark_worker subprocess script 5 tests covering module-level argv parsing, project_root derivation, benchmark plugin and trace decorator imports, and __main__ guard.	2026-04-23 02:31:38 -05:00
Kevin Turcios	e2135e39b2	Add unit tests for vendored _tabulate module 64 tests covering: tabulate() with pipe/simple formats, empty/None data, dict input, numeric alignment, float formatting, whitespace preservation, separating lines, firstrow headers. Internal helpers: type detection, number parsing, ANSI stripping, padding, multiline detection, pipe segment alignment. Integration test matching the _create_pr use case.	2026-04-23 02:31:38 -05:00
Kevin Turcios	276c2f36da	Add unit tests for _discovery_worker (collection parsing, plugin) Covers parse_pytest_collection_results with top-level functions, class methods, and empty input. Tests PytestCollectionPlugin benchmark skipping, collection_finish pickle output, and item accumulation. Uses sys.argv patching to handle module-level reads.	2026-04-23 02:28:23 -05:00
Kevin Turcios	957f299243	Add unit tests for _create_pr (PR creation, suggestion, error paths) Covers PR number env var parsing, suggest-changes vs create-pr branching, branch push failure, GitHub App not-installed warning, and generic API error logging.	2026-04-23 02:24:01 -05:00
Kevin Turcios	c31fbc1e43	Add unit tests for _trace_db (sanitize, trace queries, run time) Covers sanitize_to_filename edge cases, get_traced_arguments with class filtering and invalid event types, and get_trace_total_run_time_ns with missing files/tables/empty tables.	2026-04-23 02:23:56 -05:00
Kevin Turcios	cf7cf60936	Add unit tests for _candidate_gen (generate, repair, refinement) Covers happy paths and error paths for generate_candidates, repair_failed_candidates, and generate_refinement_candidates. Tests AI service errors, unparseable markdown, missing runtime data, and repair failures.	2026-04-23 02:23:52 -05:00
Kevin Turcios	815eba00c0	Fix unawaited coroutine warning in test_skips_ai_test_generation optimize() is now async, so the test must use async def + await.	2026-04-23 01:47:26 -05:00

1 2 3 4

189 commits