codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
Kevin Turcios	4800f35c5c	fix: make standup dashboard deployable to Plotly Cloud - Add server = app.server for WSGI/Gunicorn discovery - Commit data.json so deploy doesn't need gh CLI or GitHub token - Make load_data() resilient when generate.py can't reach GitHub - Add plotly-cloud.toml config	2026-04-23 16:15:11 -05:00
Kevin Turcios	b9556b9ded	fix: replace dcc.Checklist with plain html.Div to fix Dash 4.x layout.props errors dcc.Checklist with html component labels triggers a renderer bug in Dash 4.x ("undefined is not an object evaluating layout.props"). Use plain Div with circle indicator instead.	2026-04-23 16:06:20 -05:00
Kevin Turcios	bb7f53d26b	fix: correct CI savings numbers from real billing export Pulled actual billing data via gh api billing/usage endpoint. Key corrections: - Rate is $0.006/min (was $0.008/min) - Before: 198K min/mo (was 214K) — Feb 2026 billing data - After: 93K min/mo (was 89K) — Apr 2026 billing data - Net cost is $0.00/mo — Enterprise plan fully discounts all usage - Gross savings ~$7.6K/yr (was $12K/yr) - Updated Q&A to acknowledge Saurabh's correct pushback	2026-04-23 15:58:07 -05:00
Kevin Turcios	a26dff7623	refactor: simplify standup dashboard code - Extract _flush_list_items() helper (replaced 5 identical blocks) - Fix code-block flush ignoring is_qa/is_checklist (bug) - Add FLEX_ROW, SECTION_H2, TAB_STYLE, TAB_SELECTED to theme - Replace hardcoded "#f87171" with RED constant - Cache gh_token() with lru_cache (was spawning subprocess per request) - Parallelize GitHub API calls with ThreadPoolExecutor - serve_layout reads data.json directly instead of regenerating - render_tab fallback reads cached file instead of regenerating - Increase auto-refresh interval from 3s to 30s	2026-04-23 15:55:18 -05:00
Kevin Turcios	1671fab2f7	feat: enhance standup dashboard with rich UI and content - Add collapsible sections (Details/Summary) for Strategy, Strategy in Action, and In Summary - Parse bold markdown to html.Strong via _parse_inline() - Render pipe tables as styled html.Table with alternating row backgrounds - Detect ### sub-headers and render as styled H5 with border separators - Add horizontal bar chart for dollar savings by optimization layer - Add Plotly go.Table for CodSpeed vs Codeflash comparison - Strip duplicate numbered list prefixes from ordered list items - Fix title case handling for apostrophes (What's not What'S) - Guard render_tab against non-dict data on initial load - Expand all 5 Open Questions answers with full detail - Add Dollar Impact section with 4 layers, PR references, savings estimates - Add Competitive Landscape section with CodSpeed analysis - Add CI audit findings support in generate.py	2026-04-23 15:25:23 -05:00
Kevin Turcios	4be1a22a21	fix: add future annotations and format standups code Add `from __future__ import annotations` to fix UP007 ruff errors for `X \| Y` union syntax. Run ruff format on all three files.	2026-04-23 11:06:29 -05:00
Kevin Turcios	3673dfebea	chore: clean up standup dashboard - Fix double-numbering in notes section (Li inside Ol with manual index) - Clean up notes header, remove verbose parenthetical - Add .gitignore to exclude generated data.json - Fix CI_AUDIT_FILE path to use repo root instead of dir-name assumption - Correct standup note: 60% coverage floor, remove non-existent PRs - Commit uv.lock for reproducible installs	2026-04-23 11:04:09 -05:00
Kevin Turcios	fdfddea297	chore: add standup dashboard with CI audit integration Dash app at .codeflash/standups/ that pulls GitHub PR data across codeflash, codeflash-internal, codeflash-agent, and github-workflows, renders standup notes from markdown, and includes CI audit report.	2026-04-23 10:48:45 -05:00
Kevin Turcios	3ee9c22c8e	fix: resolve all ruff lint errors across repo (#38 ) * fix: resolve all ruff lint errors across repo Auto-fixed 31 errors (unused imports, formatting, simplifications). Manually fixed 14 remaining: - EXE001: removed shebangs from non-executable bench scripts - C417: replaced map(lambda) with generator expression - C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests - C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts - RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign) - FBT002: made boolean params keyword-only in report functions - E402: moved `import re` to top of file in security reports * fix: resolve pre-existing mypy errors across packages - _testgen.py: annotate `generated` as `str` to avoid no-any-return - _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes\|str), remove unused type: ignore on proc.kill() - _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return from lazy-loaded performance_gain	2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]	c249bcd0ce	chore: update tessl tiles 2026-04-23 (#35 ) Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>	2026-04-23 08:15:44 -05:00
Kevin Turcios	9bc9ff2250	chore: add ready-to-merge gate for branch freshness	2026-04-23 08:12:25 -05:00
Kevin Turcios	5d02df9890	Merge pull request #34 from codeflash-ai/fix/tessl-app-token fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:58 -05:00
Kevin Turcios	3851fd4cf9	fix: pass CI bot secrets to tessl update workflow	2026-04-23 08:04:06 -05:00
Kevin Turcios	82f16dc9f0	Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions fix: add permissions to tessl update caller workflow	2026-04-23 07:50:43 -05:00
Kevin Turcios	fa1b5ece4e	fix: add permissions to tessl update caller workflow	2026-04-23 07:49:56 -05:00
Kevin Turcios	01dc370d09	Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup chore: initialize tessl with vendored tiles	2026-04-23 07:44:58 -05:00
Kevin Turcios	f83ece9ad8	chore: initialize tessl with vendored tiles Install 32 Python tiles in vendored mode, add MCP configs for all agents, and set up weekly tile update workflow via reusable github-workflows caller.	2026-04-23 07:43:04 -05:00
Kevin Turcios	616d078ecd	Add CI optimization item to roadmap	2026-04-23 05:59:08 -05:00
Kevin Turcios	851bd61017	Add project roadmap	2026-04-23 05:27:58 -05:00
Kevin Turcios	72a8610fcf	Add vulture to dev dependencies for dead code detection	2026-04-23 04:57:32 -05:00
Kevin Turcios	57446aad31	Fix unawaited coroutine warning in test_default_timeout_is_600 The AsyncMock for wait_for discarded the coroutine from proc.communicate() without consuming it. Replace with a side_effect that closes the coroutine before returning the mock result.	2026-04-23 04:46:32 -05:00
Kevin Turcios	76a07c7f66	Add __test__ = False to Test*-prefixed domain model classes Pytest's default collection pattern matches any class starting with "Test". These domain models (TestType, TestResults, TestFiles, etc.) are not test classes — mark them explicitly so pytest skips them.	2026-04-23 04:41:18 -05:00
Kevin Turcios	4f98b5421f	Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope These classes represent behavioral verification diffs, not tests. The Test* prefix caused pytest to attempt collection and emit warnings.	2026-04-23 04:37:24 -05:00
Kevin Turcios	9e893675c9	Add Plotly Cloud deployment config for CI audit report	2026-04-23 03:59:35 -05:00
Kevin Turcios	c492164fbf	Add codeflash org CI audit case study and interactive Dash report Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README, status, and raw data (fork activity, PRs merged). Interactive Dash report in reports/codeflash-ci-audit/ with two tabs: Executive Summary (hero metrics, cost impact charts, before/after) and Full Detail (fork breakdown, findings table, PR inventory, methodology). Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage savings, 200+ forks disabled, 11 PRs merged across 2 repos.	2026-04-23 03:56:04 -05:00
Kevin Turcios	8221ce32a2	Add codeflash-ai/ci-audit to active case studies list	2026-04-23 03:52:10 -05:00
Kevin Turcios	0316d9822a	Add testcontainers[postgres] to workspace dev dependencies The 12 DB integration tests in codeflash-api need testcontainers to spin up a real PostgreSQL instance via Docker. Was already declared in the package's own dev deps but missing from the root workspace.	2026-04-23 03:52:07 -05:00
Kevin Turcios	bf8707695f	Add codeflash-api to workspace dev dependencies The package was a workspace member but not listed in the root dev group, so its tests couldn't import codeflash_api when running from the monorepo root.	2026-04-23 03:38:30 -05:00
Kevin Turcios	e3e74c3f2e	Add missing pyproject.toml for codeflash-ci-audit workspace member	2026-04-23 03:34:03 -05:00
Kevin Turcios	e41a1bf56a	Fix conftest collision between codeflash-api and github-app test suites Both packages had tests/__init__.py, creating competing `tests` packages under --import-mode=importlib. Remove both __init__.py files and change github-app imports from `from tests.helpers` to `from helpers` via sys.path insertion in conftest.py.	2026-04-23 03:33:58 -05:00
Kevin Turcios	43a4009294	Fix callee syntax validation in prepare_python_module normalize_code no longer raises SyntaxError (it returns raw code as fallback), so validate callee source with ast.parse() explicitly before normalizing. Fixes test_callee_syntax_error_returns_none.	2026-04-23 03:28:58 -05:00
Kevin Turcios	bd5613d22f	Update test-coverage.md: remove resolved callouts for covered modules	2026-04-23 03:13:28 -05:00
Kevin Turcios	e1990092e0	Add tests for error handling paths in ranking, refinement, and state - test_ranking: Update normalize_code test to expect fallback on invalid syntax instead of SyntaxError (matches new behavior) - test_refinement: Add 7 tests for _parse_candidate markdown parsing (fenced blocks, file paths, multiple blocks, plain fallback) - test_state: Add 6 tests for PythonState.module_ast and invalidate_module (valid parse, caching, SyntaxError→None, re-parse after fix)	2026-04-23 03:13:23 -05:00
Kevin Turcios	9e679f1c06	Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement - _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that previously silently swallowed errors - _state.py: Wrap ast.parse() in SyntaxError handler, return None for malformed files - _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw code string for dedup - _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to _parse_candidate(), matching the pattern in _candidate_gen.py - Update error-handling.md rules to reflect resolved issues	2026-04-23 03:06:03 -05:00
Kevin Turcios	dd7d2db451	Add unit tests for _benchmark_worker subprocess script 5 tests covering module-level argv parsing, project_root derivation, benchmark plugin and trace decorator imports, and __main__ guard.	2026-04-23 02:31:38 -05:00
Kevin Turcios	e2135e39b2	Add unit tests for vendored _tabulate module 64 tests covering: tabulate() with pipe/simple formats, empty/None data, dict input, numeric alignment, float formatting, whitespace preservation, separating lines, firstrow headers. Internal helpers: type detection, number parsing, ANSI stripping, padding, multiline detection, pipe segment alignment. Integration test matching the _create_pr use case.	2026-04-23 02:31:38 -05:00
Kevin Turcios	276c2f36da	Add unit tests for _discovery_worker (collection parsing, plugin) Covers parse_pytest_collection_results with top-level functions, class methods, and empty input. Tests PytestCollectionPlugin benchmark skipping, collection_finish pickle output, and item accumulation. Uses sys.argv patching to handle module-level reads.	2026-04-23 02:28:23 -05:00
Kevin Turcios	957f299243	Add unit tests for _create_pr (PR creation, suggestion, error paths) Covers PR number env var parsing, suggest-changes vs create-pr branching, branch push failure, GitHub App not-installed warning, and generic API error logging.	2026-04-23 02:24:01 -05:00
Kevin Turcios	c31fbc1e43	Add unit tests for _trace_db (sanitize, trace queries, run time) Covers sanitize_to_filename edge cases, get_traced_arguments with class filtering and invalid event types, and get_trace_total_run_time_ns with missing files/tables/empty tables.	2026-04-23 02:23:56 -05:00
Kevin Turcios	cf7cf60936	Add unit tests for _candidate_gen (generate, repair, refinement) Covers happy paths and error paths for generate_candidates, repair_failed_candidates, and generate_refinement_candidates. Tests AI service errors, unparseable markdown, missing runtime data, and repair failures.	2026-04-23 02:23:52 -05:00
Kevin Turcios	815eba00c0	Fix unawaited coroutine warning in test_skips_ai_test_generation optimize() is now async, so the test must use async def + await.	2026-04-23 01:47:26 -05:00
Kevin Turcios	92e39d6923	Convert remaining sync test runner callers to async Replace all sync test runner calls (run_behavioral_tests, run_benchmarking_tests, run_line_profile_tests) with their async counterparts throughout the pipeline. This eliminates the ThreadPoolExecutor in _baseline.py in favor of asyncio.gather(), and makes _async_bench.py, _candidate_gen.py, and _function_optimizer.py fully async. Adds async_run_line_profile_tests and coverage support to async_run_behavioral_tests in _test_runner.py.	2026-04-23 01:46:01 -05:00
Kevin Turcios	a292698a1d	Add pytest-cov to dev dependencies	2026-04-23 00:41:32 -05:00
Kevin Turcios	f204f8e740	Unify sync/async candidate eval into single async path Delete the sync evaluate_candidate() and run_tests_and_benchmark() functions — all callers now use the async versions. Rename async_run_tests_and_benchmark → run_tests_and_benchmark and async_evaluate_candidate_isolated → evaluate_candidate_isolated. The entire optimization pipeline is now async with a single asyncio.run() entry point in _cli.py:main(). PythonOptimizer.run() and PythonFunctionOptimizer.optimize() are async coroutines. Update test_candidate_eval.py and test_parallel_eval_integration.py to match the unified API.	2026-04-23 00:41:28 -05:00
Kevin Turcios	8defba8a72	Add unit tests for async test runners and candidate evaluation 29 new tests in test_test_runner.py covering async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests, _base_pytest_args, replay test path, and coverage path. 21 new tests in test_candidate_eval.py covering evaluate_candidate, rank_candidates, build_benchmark_details, log_evaluation_results, and async_run_tests_and_benchmark.	2026-04-23 00:24:59 -05:00
Kevin Turcios	8d308fe8e8	Replace ThreadPoolExecutor with asyncio for parallel candidate evaluation Thread-safety concern with shared EvaluationContext mutations is eliminated by switching to cooperative concurrency — between await points only one coroutine runs, so no locks are needed. Adds async variants of test runners (async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests) and async evaluation functions (async_run_tests_and_benchmark, async_evaluate_candidate_isolated). Rewrites _evaluate_batch_parallel to use asyncio.Semaphore + asyncio.gather instead of ThreadPoolExecutor.	2026-04-23 00:12:53 -05:00
Kevin Turcios	df3538167f	Add .coverage to gitignore	2026-04-22 23:40:22 -05:00
Kevin Turcios	fb76024cfb	Fix CLAUDE.md accuracy: remove nonexistent files, update patterns - Remove _line_profiler.py, observability/models.py, _optimizer.py, _rate_limit.py, _usage.py from tree (never created) - Add _background.py, _markdown.py, _xml.py that actually exist - Mark java/ and js_ts/ as stubs - Update endpoint count from 15 to 14, note log_features stub - Fix Depends() example to use Annotated[] pattern - Add deferred items: optimize-line-profiler, observability DB writes	2026-04-22 23:40:01 -05:00
Kevin Turcios	3a07579bb0	Raise codeflash-api test coverage from 81% to 92% Add 182 new tests across optimize, V4A diff, CST utils, and postprocess modules. Key coverage improvements: - optimize/_pipeline.py: 29% → 97% - optimize/_router.py: 40% → 93% - diff/_v4a.py: 40% → 97% - languages/python/_cst_utils.py: 67% → 96% - languages/python/_postprocess.py: 67% → 87% Also apply ruff format to 5 files that had formatting drift.	2026-04-22 23:39:54 -05:00
Kevin Turcios	2d9fca6b3e	Fix all ruff lint errors in codeflash-core - Replace commented-out code pattern with descriptive comment in __init__.py - Move ModuleType into TYPE_CHECKING block in _git.py - Add noqa: F821 for PEP 562 lazy-loaded git module references - Restore noqa: PLC0415 on reformatted sentry imports in _telemetry.py	2026-04-22 23:39:47 -05:00

1 2 3 4

180 commits