codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
Kevin Turcios	3ee9c22c8e	fix: resolve all ruff lint errors across repo (#38 ) * fix: resolve all ruff lint errors across repo Auto-fixed 31 errors (unused imports, formatting, simplifications). Manually fixed 14 remaining: - EXE001: removed shebangs from non-executable bench scripts - C417: replaced map(lambda) with generator expression - C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests - C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts - RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign) - FBT002: made boolean params keyword-only in report functions - E402: moved `import re` to top of file in security reports * fix: resolve pre-existing mypy errors across packages - _testgen.py: annotate `generated` as `str` to avoid no-any-return - _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes\|str), remove unused type: ignore on proc.kill() - _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return from lazy-loaded performance_gain	2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]	c249bcd0ce	chore: update tessl tiles 2026-04-23 (#35 ) Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>	2026-04-23 08:15:44 -05:00
Kevin Turcios	9bc9ff2250	chore: add ready-to-merge gate for branch freshness	2026-04-23 08:12:25 -05:00
Kevin Turcios	5d02df9890	Merge pull request #34 from codeflash-ai/fix/tessl-app-token fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:58 -05:00
Kevin Turcios	3851fd4cf9	fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:06 -05:00
Kevin Turcios	82f16dc9f0	Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions fix: add permissions to tessl update caller workflow	2026-04-23 07:50:43 -05:00
Kevin Turcios	fa1b5ece4e	fix: add permissions to tessl update caller workflow	2026-04-23 07:49:56 -05:00
Kevin Turcios	01dc370d09	Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup chore: initialize tessl with vendored tiles	2026-04-23 07:44:58 -05:00
Kevin Turcios	f83ece9ad8	chore: initialize tessl with vendored tiles Install 32 Python tiles in vendored mode, add MCP configs for all agents, and set up weekly tile update workflow via reusable github-workflows caller.	2026-04-23 07:43:04 -05:00
Kevin Turcios	616d078ecd	Add CI optimization item to roadmap	2026-04-23 05:59:08 -05:00
Kevin Turcios	851bd61017	Add project roadmap	2026-04-23 05:27:58 -05:00
Kevin Turcios	72a8610fcf	Add vulture to dev dependencies for dead code detection	2026-04-23 04:57:32 -05:00
Kevin Turcios	57446aad31	Fix unawaited coroutine warning in test_default_timeout_is_600 The AsyncMock for wait_for discarded the coroutine from proc.communicate() without consuming it. Replace with a side_effect that closes the coroutine before returning the mock result.	2026-04-23 04:46:32 -05:00
Kevin Turcios	76a07c7f66	Add __test__ = False to Test*-prefixed domain model classes Pytest's default collection pattern matches any class starting with "Test". These domain models (TestType, TestResults, TestFiles, etc.) are not test classes — mark them explicitly so pytest skips them.	2026-04-23 04:41:18 -05:00
Kevin Turcios	4f98b5421f	Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope These classes represent behavioral verification diffs, not tests. The Test* prefix caused pytest to attempt collection and emit warnings.	2026-04-23 04:37:24 -05:00
Kevin Turcios	9e893675c9	Add Plotly Cloud deployment config for CI audit report	2026-04-23 03:59:35 -05:00
Kevin Turcios	c492164fbf	Add codeflash org CI audit case study and interactive Dash report Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README, status, and raw data (fork activity, PRs merged). Interactive Dash report in reports/codeflash-ci-audit/ with two tabs: Executive Summary (hero metrics, cost impact charts, before/after) and Full Detail (fork breakdown, findings table, PR inventory, methodology). Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage savings, 200+ forks disabled, 11 PRs merged across 2 repos.	2026-04-23 03:56:04 -05:00
Kevin Turcios	8221ce32a2	Add codeflash-ai/ci-audit to active case studies list	2026-04-23 03:52:10 -05:00
Kevin Turcios	0316d9822a	Add testcontainers[postgres] to workspace dev dependencies The 12 DB integration tests in codeflash-api need testcontainers to spin up a real PostgreSQL instance via Docker. Was already declared in the package's own dev deps but missing from the root workspace.	2026-04-23 03:52:07 -05:00
Kevin Turcios	bf8707695f	Add codeflash-api to workspace dev dependencies The package was a workspace member but not listed in the root dev group, so its tests couldn't import codeflash_api when running from the monorepo root.	2026-04-23 03:38:30 -05:00
Kevin Turcios	e3e74c3f2e	Add missing pyproject.toml for codeflash-ci-audit workspace member	2026-04-23 03:34:03 -05:00
Kevin Turcios	e41a1bf56a	Fix conftest collision between codeflash-api and github-app test suites Both packages had tests/__init__.py, creating competing `tests` packages under --import-mode=importlib. Remove both __init__.py files and change github-app imports from `from tests.helpers` to `from helpers` via sys.path insertion in conftest.py.	2026-04-23 03:33:58 -05:00
Kevin Turcios	43a4009294	Fix callee syntax validation in prepare_python_module normalize_code no longer raises SyntaxError (it returns raw code as fallback), so validate callee source with ast.parse() explicitly before normalizing. Fixes test_callee_syntax_error_returns_none.	2026-04-23 03:28:58 -05:00
Kevin Turcios	bd5613d22f	Update test-coverage.md: remove resolved callouts for covered modules	2026-04-23 03:13:28 -05:00
Kevin Turcios	e1990092e0	Add tests for error handling paths in ranking, refinement, and state - test_ranking: Update normalize_code test to expect fallback on invalid syntax instead of SyntaxError (matches new behavior) - test_refinement: Add 7 tests for _parse_candidate markdown parsing (fenced blocks, file paths, multiple blocks, plain fallback) - test_state: Add 6 tests for PythonState.module_ast and invalidate_module (valid parse, caching, SyntaxError→None, re-parse after fix)	2026-04-23 03:13:23 -05:00
Kevin Turcios	9e679f1c06	Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement - _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that previously silently swallowed errors - _state.py: Wrap ast.parse() in SyntaxError handler, return None for malformed files - _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw code string for dedup - _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to _parse_candidate(), matching the pattern in _candidate_gen.py - Update error-handling.md rules to reflect resolved issues	2026-04-23 03:06:03 -05:00
Kevin Turcios	dd7d2db451	Add unit tests for _benchmark_worker subprocess script 5 tests covering module-level argv parsing, project_root derivation, benchmark plugin and trace decorator imports, and __main__ guard.	2026-04-23 02:31:38 -05:00
Kevin Turcios	e2135e39b2	Add unit tests for vendored _tabulate module 64 tests covering: tabulate() with pipe/simple formats, empty/None data, dict input, numeric alignment, float formatting, whitespace preservation, separating lines, firstrow headers. Internal helpers: type detection, number parsing, ANSI stripping, padding, multiline detection, pipe segment alignment. Integration test matching the _create_pr use case.	2026-04-23 02:31:38 -05:00
Kevin Turcios	276c2f36da	Add unit tests for _discovery_worker (collection parsing, plugin) Covers parse_pytest_collection_results with top-level functions, class methods, and empty input. Tests PytestCollectionPlugin benchmark skipping, collection_finish pickle output, and item accumulation. Uses sys.argv patching to handle module-level reads.	2026-04-23 02:28:23 -05:00
Kevin Turcios	957f299243	Add unit tests for _create_pr (PR creation, suggestion, error paths) Covers PR number env var parsing, suggest-changes vs create-pr branching, branch push failure, GitHub App not-installed warning, and generic API error logging.	2026-04-23 02:24:01 -05:00
Kevin Turcios	c31fbc1e43	Add unit tests for _trace_db (sanitize, trace queries, run time) Covers sanitize_to_filename edge cases, get_traced_arguments with class filtering and invalid event types, and get_trace_total_run_time_ns with missing files/tables/empty tables.	2026-04-23 02:23:56 -05:00
Kevin Turcios	cf7cf60936	Add unit tests for _candidate_gen (generate, repair, refinement) Covers happy paths and error paths for generate_candidates, repair_failed_candidates, and generate_refinement_candidates. Tests AI service errors, unparseable markdown, missing runtime data, and repair failures.	2026-04-23 02:23:52 -05:00
Kevin Turcios	815eba00c0	Fix unawaited coroutine warning in test_skips_ai_test_generation optimize() is now async, so the test must use async def + await.	2026-04-23 01:47:26 -05:00
Kevin Turcios	92e39d6923	Convert remaining sync test runner callers to async Replace all sync test runner calls (run_behavioral_tests, run_benchmarking_tests, run_line_profile_tests) with their async counterparts throughout the pipeline. This eliminates the ThreadPoolExecutor in _baseline.py in favor of asyncio.gather(), and makes _async_bench.py, _candidate_gen.py, and _function_optimizer.py fully async. Adds async_run_line_profile_tests and coverage support to async_run_behavioral_tests in _test_runner.py.	2026-04-23 01:46:01 -05:00
Kevin Turcios	a292698a1d	Add pytest-cov to dev dependencies	2026-04-23 00:41:32 -05:00
Kevin Turcios	f204f8e740	Unify sync/async candidate eval into single async path Delete the sync evaluate_candidate() and run_tests_and_benchmark() functions — all callers now use the async versions. Rename async_run_tests_and_benchmark → run_tests_and_benchmark and async_evaluate_candidate_isolated → evaluate_candidate_isolated. The entire optimization pipeline is now async with a single asyncio.run() entry point in _cli.py:main(). PythonOptimizer.run() and PythonFunctionOptimizer.optimize() are async coroutines. Update test_candidate_eval.py and test_parallel_eval_integration.py to match the unified API.	2026-04-23 00:41:28 -05:00
Kevin Turcios	8defba8a72	Add unit tests for async test runners and candidate evaluation 29 new tests in test_test_runner.py covering async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests, _base_pytest_args, replay test path, and coverage path. 21 new tests in test_candidate_eval.py covering evaluate_candidate, rank_candidates, build_benchmark_details, log_evaluation_results, and async_run_tests_and_benchmark.	2026-04-23 00:24:59 -05:00
Kevin Turcios	8d308fe8e8	Replace ThreadPoolExecutor with asyncio for parallel candidate evaluation Thread-safety concern with shared EvaluationContext mutations is eliminated by switching to cooperative concurrency — between await points only one coroutine runs, so no locks are needed. Adds async variants of test runners (async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests) and async evaluation functions (async_run_tests_and_benchmark, async_evaluate_candidate_isolated). Rewrites _evaluate_batch_parallel to use asyncio.Semaphore + asyncio.gather instead of ThreadPoolExecutor.	2026-04-23 00:12:53 -05:00
Kevin Turcios	df3538167f	Add .coverage to gitignore	2026-04-22 23:40:22 -05:00
Kevin Turcios	fb76024cfb	Fix CLAUDE.md accuracy: remove nonexistent files, update patterns - Remove _line_profiler.py, observability/models.py, _optimizer.py, _rate_limit.py, _usage.py from tree (never created) - Add _background.py, _markdown.py, _xml.py that actually exist - Mark java/ and js_ts/ as stubs - Update endpoint count from 15 to 14, note log_features stub - Fix Depends() example to use Annotated[] pattern - Add deferred items: optimize-line-profiler, observability DB writes	2026-04-22 23:40:01 -05:00
Kevin Turcios	3a07579bb0	Raise codeflash-api test coverage from 81% to 92% Add 182 new tests across optimize, V4A diff, CST utils, and postprocess modules. Key coverage improvements: - optimize/_pipeline.py: 29% → 97% - optimize/_router.py: 40% → 93% - diff/_v4a.py: 40% → 97% - languages/python/_cst_utils.py: 67% → 96% - languages/python/_postprocess.py: 67% → 87% Also apply ruff format to 5 files that had formatting drift.	2026-04-22 23:39:54 -05:00
Kevin Turcios	2d9fca6b3e	Fix all ruff lint errors in codeflash-core - Replace commented-out code pattern with descriptive comment in __init__.py - Move ModuleType into TYPE_CHECKING block in _git.py - Add noqa: F821 for PEP 562 lazy-loaded git module references - Restore noqa: PLC0415 on reformatted sentry imports in _telemetry.py	2026-04-22 23:39:47 -05:00
Kevin Turcios	fdfade528f	Strengthen testgen test assertions and remove duplicate integration tests Replace weak assertions (len >= 1, bare MagicMock) with exact counts, _stub_llm_response, response body checks, and mock call verification. Remove 6 duplicate TestTestgenIntegration tests already covered in test_testgen.py::TestTestgenEndpoint.	2026-04-22 23:24:36 -05:00
Kevin Turcios	758da2592f	Achieve 100% test coverage for testgen module Add 15 new tests covering all previously uncovered paths: - _validate.py: regex class splitting, trailing blank stripping, repair preamble edge cases (empty during iteration, lineno=None, out-of-range index, max attempts exhausted), AST gap/decorator paths - _generate.py: multi-context ellipsis detection, extract_code_block returning None, no test functions after validation - _review_router.py: non-dict/non-list JSON in review verdicts Mark 2 provably unreachable defensive lines with pragma: no cover.	2026-04-22 23:10:27 -05:00
Kevin Turcios	92c5fd7c74	Remove instrumented_behavior_tests and instrumented_perf_tests from testgen API Instrumentation (behavior/perf AST transformations) moves to the client side. The API now returns raw validated code only via generated_tests.	2026-04-22 23:10:16 -05:00
Kevin Turcios	051317e2dc	Mark all 9 implementation steps complete in architecture docs	2026-04-22 22:11:08 -05:00
Kevin Turcios	4b219907fd	Implement POST /ai/testgen endpoint with full generation pipeline Port test generation from Django reference: prompt templates (Jinja2 with model-type-aware formatting), LLM call orchestration with even/odd model selection, AST-based code validation with regex fallback, preamble repair, and ellipsis detection. Instrumentation and postprocessing are deferred — all four response fields return the same validated code for now.	2026-04-22 22:11:04 -05:00
Kevin Turcios	b3840627bb	Use explicit .strip() assertions in testgen repair tests	2026-04-22 20:36:14 -05:00
Kevin Turcios	6abcc8daa3	Add testgen review and repair endpoints Port /ai/testgen_review and /ai/testgen_repair from Django reference. Review: parallel LLM calls per test source, auto-flags behavioral failures, parses JSON verdicts. Repair: Jinja2 prompt templates, syntax-error retry loop, Python code extraction and validation. Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response, CoverageDetails, FunctionVerdict, TestSourceWithFailures. 23 tests covering: coverage context building, verdict parsing, syntax error detection, endpoint success/error/retry/language paths, and the model validator for python_version resolution.	2026-04-22 20:35:39 -05:00
Kevin Turcios	1d70d65914	Wire observability recording into LLM client Add fire-and-forget background task manager (background.py) and LLM call recording (recording.py). Every LLMClient.call now records trace_id, model, latency, tokens, cost, and errors via fire-and-forget. drain() awaits pending tasks on shutdown. Currently logs only — database persistence deferred until llm_calls table is wired.	2026-04-22 20:30:10 -05:00

1 2 3 4

172 commits