codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
Kevin Turcios	dd7d2db451	Add unit tests for _benchmark_worker subprocess script 5 tests covering module-level argv parsing, project_root derivation, benchmark plugin and trace decorator imports, and __main__ guard.	2026-04-23 02:31:38 -05:00
Kevin Turcios	e2135e39b2	Add unit tests for vendored _tabulate module 64 tests covering: tabulate() with pipe/simple formats, empty/None data, dict input, numeric alignment, float formatting, whitespace preservation, separating lines, firstrow headers. Internal helpers: type detection, number parsing, ANSI stripping, padding, multiline detection, pipe segment alignment. Integration test matching the _create_pr use case.	2026-04-23 02:31:38 -05:00
Kevin Turcios	276c2f36da	Add unit tests for _discovery_worker (collection parsing, plugin) Covers parse_pytest_collection_results with top-level functions, class methods, and empty input. Tests PytestCollectionPlugin benchmark skipping, collection_finish pickle output, and item accumulation. Uses sys.argv patching to handle module-level reads.	2026-04-23 02:28:23 -05:00
Kevin Turcios	957f299243	Add unit tests for _create_pr (PR creation, suggestion, error paths) Covers PR number env var parsing, suggest-changes vs create-pr branching, branch push failure, GitHub App not-installed warning, and generic API error logging.	2026-04-23 02:24:01 -05:00
Kevin Turcios	c31fbc1e43	Add unit tests for _trace_db (sanitize, trace queries, run time) Covers sanitize_to_filename edge cases, get_traced_arguments with class filtering and invalid event types, and get_trace_total_run_time_ns with missing files/tables/empty tables.	2026-04-23 02:23:56 -05:00
Kevin Turcios	cf7cf60936	Add unit tests for _candidate_gen (generate, repair, refinement) Covers happy paths and error paths for generate_candidates, repair_failed_candidates, and generate_refinement_candidates. Tests AI service errors, unparseable markdown, missing runtime data, and repair failures.	2026-04-23 02:23:52 -05:00
Kevin Turcios	815eba00c0	Fix unawaited coroutine warning in test_skips_ai_test_generation optimize() is now async, so the test must use async def + await.	2026-04-23 01:47:26 -05:00
Kevin Turcios	92e39d6923	Convert remaining sync test runner callers to async Replace all sync test runner calls (run_behavioral_tests, run_benchmarking_tests, run_line_profile_tests) with their async counterparts throughout the pipeline. This eliminates the ThreadPoolExecutor in _baseline.py in favor of asyncio.gather(), and makes _async_bench.py, _candidate_gen.py, and _function_optimizer.py fully async. Adds async_run_line_profile_tests and coverage support to async_run_behavioral_tests in _test_runner.py.	2026-04-23 01:46:01 -05:00
Kevin Turcios	a292698a1d	Add pytest-cov to dev dependencies	2026-04-23 00:41:32 -05:00
Kevin Turcios	f204f8e740	Unify sync/async candidate eval into single async path Delete the sync evaluate_candidate() and run_tests_and_benchmark() functions — all callers now use the async versions. Rename async_run_tests_and_benchmark → run_tests_and_benchmark and async_evaluate_candidate_isolated → evaluate_candidate_isolated. The entire optimization pipeline is now async with a single asyncio.run() entry point in _cli.py:main(). PythonOptimizer.run() and PythonFunctionOptimizer.optimize() are async coroutines. Update test_candidate_eval.py and test_parallel_eval_integration.py to match the unified API.	2026-04-23 00:41:28 -05:00
Kevin Turcios	8defba8a72	Add unit tests for async test runners and candidate evaluation 29 new tests in test_test_runner.py covering async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests, _base_pytest_args, replay test path, and coverage path. 21 new tests in test_candidate_eval.py covering evaluate_candidate, rank_candidates, build_benchmark_details, log_evaluation_results, and async_run_tests_and_benchmark.	2026-04-23 00:24:59 -05:00
Kevin Turcios	8d308fe8e8	Replace ThreadPoolExecutor with asyncio for parallel candidate evaluation Thread-safety concern with shared EvaluationContext mutations is eliminated by switching to cooperative concurrency — between await points only one coroutine runs, so no locks are needed. Adds async variants of test runners (async_execute_test_subprocess, async_run_behavioral_tests, async_run_benchmarking_tests) and async evaluation functions (async_run_tests_and_benchmark, async_evaluate_candidate_isolated). Rewrites _evaluate_batch_parallel to use asyncio.Semaphore + asyncio.gather instead of ThreadPoolExecutor.	2026-04-23 00:12:53 -05:00
Kevin Turcios	df3538167f	Add .coverage to gitignore	2026-04-22 23:40:22 -05:00
Kevin Turcios	fb76024cfb	Fix CLAUDE.md accuracy: remove nonexistent files, update patterns - Remove _line_profiler.py, observability/models.py, _optimizer.py, _rate_limit.py, _usage.py from tree (never created) - Add _background.py, _markdown.py, _xml.py that actually exist - Mark java/ and js_ts/ as stubs - Update endpoint count from 15 to 14, note log_features stub - Fix Depends() example to use Annotated[] pattern - Add deferred items: optimize-line-profiler, observability DB writes	2026-04-22 23:40:01 -05:00
Kevin Turcios	3a07579bb0	Raise codeflash-api test coverage from 81% to 92% Add 182 new tests across optimize, V4A diff, CST utils, and postprocess modules. Key coverage improvements: - optimize/_pipeline.py: 29% → 97% - optimize/_router.py: 40% → 93% - diff/_v4a.py: 40% → 97% - languages/python/_cst_utils.py: 67% → 96% - languages/python/_postprocess.py: 67% → 87% Also apply ruff format to 5 files that had formatting drift.	2026-04-22 23:39:54 -05:00
Kevin Turcios	2d9fca6b3e	Fix all ruff lint errors in codeflash-core - Replace commented-out code pattern with descriptive comment in __init__.py - Move ModuleType into TYPE_CHECKING block in _git.py - Add noqa: F821 for PEP 562 lazy-loaded git module references - Restore noqa: PLC0415 on reformatted sentry imports in _telemetry.py	2026-04-22 23:39:47 -05:00
Kevin Turcios	fdfade528f	Strengthen testgen test assertions and remove duplicate integration tests Replace weak assertions (len >= 1, bare MagicMock) with exact counts, _stub_llm_response, response body checks, and mock call verification. Remove 6 duplicate TestTestgenIntegration tests already covered in test_testgen.py::TestTestgenEndpoint.	2026-04-22 23:24:36 -05:00
Kevin Turcios	758da2592f	Achieve 100% test coverage for testgen module Add 15 new tests covering all previously uncovered paths: - _validate.py: regex class splitting, trailing blank stripping, repair preamble edge cases (empty during iteration, lineno=None, out-of-range index, max attempts exhausted), AST gap/decorator paths - _generate.py: multi-context ellipsis detection, extract_code_block returning None, no test functions after validation - _review_router.py: non-dict/non-list JSON in review verdicts Mark 2 provably unreachable defensive lines with pragma: no cover.	2026-04-22 23:10:27 -05:00
Kevin Turcios	92c5fd7c74	Remove instrumented_behavior_tests and instrumented_perf_tests from testgen API Instrumentation (behavior/perf AST transformations) moves to the client side. The API now returns raw validated code only via generated_tests.	2026-04-22 23:10:16 -05:00
Kevin Turcios	051317e2dc	Mark all 9 implementation steps complete in architecture docs	2026-04-22 22:11:08 -05:00
Kevin Turcios	4b219907fd	Implement POST /ai/testgen endpoint with full generation pipeline Port test generation from Django reference: prompt templates (Jinja2 with model-type-aware formatting), LLM call orchestration with even/odd model selection, AST-based code validation with regex fallback, preamble repair, and ellipsis detection. Instrumentation and postprocessing are deferred — all four response fields return the same validated code for now.	2026-04-22 22:11:04 -05:00
Kevin Turcios	b3840627bb	Use explicit .strip() assertions in testgen repair tests	2026-04-22 20:36:14 -05:00
Kevin Turcios	6abcc8daa3	Add testgen review and repair endpoints Port /ai/testgen_review and /ai/testgen_repair from Django reference. Review: parallel LLM calls per test source, auto-flags behavioral failures, parses JSON verdicts. Repair: Jinja2 prompt templates, syntax-error retry loop, Python code extraction and validation. Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response, CoverageDetails, FunctionVerdict, TestSourceWithFailures. 23 tests covering: coverage context building, verdict parsing, syntax error detection, endpoint success/error/retry/language paths, and the model validator for python_version resolution.	2026-04-22 20:35:39 -05:00
Kevin Turcios	1d70d65914	Wire observability recording into LLM client Add fire-and-forget background task manager (background.py) and LLM call recording (recording.py). Every LLMClient.call now records trace_id, model, latency, tokens, cost, and errors via fire-and-forget. drain() awaits pending tasks on shutdown. Currently logs only — database persistence deferred until llm_calls table is wired.	2026-04-22 20:30:10 -05:00
Kevin Turcios	a62f1ecd03	Add real DB integration tests with testcontainers 12 tests covering all Queries methods against a real PostgreSQL instance via testcontainers. Automatically skipped when Docker is unavailable. Tests: api key lookup, last_used update, organization fetch, subscription CRUD, usage increment, cumulative increments.	2026-04-22 20:02:28 -05:00
Kevin Turcios	3e16d44912	Fix all mypy strict errors across codeflash-api - Narrow search_start guard in search/replace parser - Type optimizations_limit as int\|str instead of object - Wrap cost calculation return in float() - Cast binary op result to int in CST evaluator - Suppress import-untyped for asyncpg (no stubs available) - Suppress arg-type for OpenAI messages (dict→union mismatch) - Type isort kwargs as Any, add Coroutine import for refinement - Narrow feature_version to tuple[int, int] for ast.parse - Rename shadowed loop variable in annotation walker - Add mypy strict=true config to pyproject.toml	2026-04-22 19:59:42 -05:00
Kevin Turcios	03bb712c65	Add integration tests and fix AuthenticatedUser runtime import FastAPI with `from __future__ import annotations` cannot resolve Annotated[AuthenticatedUser, Depends()] when AuthenticatedUser is only imported under TYPE_CHECKING — it becomes a query parameter instead of a dependency. Move to runtime import in all 11 routers with noqa: TC001 suppression. 30 integration tests cover all endpoints (success, invalid trace_id, LLM failure, edge cases) using httpx ASGITransport with mocked LLM.	2026-04-21 22:48:30 -05:00
Kevin Turcios	935c6f229e	Add remaining endpoints: repair, refinement, adaptive, explain, review, ranking, jit, workflow, testgen, log_features Port all P1 endpoints from the Django aiservice to FastAPI: - repair: 2-attempt LLM retry, SearchAndReplaceDiff patch application - refinement: parallel LLM calls via asyncio.gather, single/multi-file context dispatch, XML explanation extraction, deduplication - adaptive: single LLM call with previous candidate history - explain: conditional throughput/concurrency/acceptance prompt sections, XML <explain> tag extraction - review: 4-dimension scoring, JSON code block extraction, 2-attempt retry - ranking: 4-dimension weighted scoring, JSON extraction with 3 fallbacks (direct parse, markdown block, brace matching), legacy XML fallback - jit: reuses optimize pipeline with JIT-specific prompts - workflow: 3-tier regex YAML extraction, LLM-generated CI steps - testgen: stub returning 501 (language-specific logic deferred) - log_features: trace_id validation, DB write stubbed Also adds: - Task-specific model assignments in llm/_models.py - XML tag extraction utility in languages/python/_xml.py - All 11 routers registered in _app.py 348 tests passing, all lint clean.	2026-04-21 22:36:31 -05:00
Kevin Turcios	6c04324e25	Add optimize endpoint: context, pipeline, router, prompt templates Faithful port of the Python optimization pipeline from Django aiservice: - schemas.py: Pydantic request/response models (OptimizeRequest, OptimizeResponse) - _markdown.py: markdown code block extraction, splitting, grouping - _context.py: BaseOptimizerContext with Single/Multi variants for prompt assembly, LLM response extraction, and postprocessing - _pipeline.py: parallel LLM orchestration with model distribution (GPT-5-mini + Claude Sonnet 4.5), diversity via line profiler toggling - _router.py: POST /ai/optimize with auth, rate limiting, usage tracking - 11 prompt templates copied verbatim from Django reference - LLM client wired into app lifespan	2026-04-21 22:16:22 -05:00
Kevin Turcios	3e62f502e7	Add language layer: CST utils, validator, postprocessing pipeline Faithful port of Python language utilities from Django aiservice: - _cst_utils.py: depth tracking, import extraction, definition removal, ellipsis detection, expression evaluation, module path helpers - _validator.py: dual ast+libcst syntax validation, parse-or-none - _postprocess.py: full optimization postprocessing pipeline including dedup, equality check, docstring restoration, comment cleaning, forward reference fixing, ellipsis filtering, isort	2026-04-21 22:04:39 -05:00
Kevin Turcios	5c6b82050a	Add diff layer: SEARCH/REPLACE and V4A patch application Faithfully ported from Django aiservice. V4A uses 3-tier fuzzy context matching (exact/rstrip/strip) with EOF penalties and scope markers. Per-file lint ignores for ported complexity.	2026-04-21 21:55:28 -05:00
Kevin Turcios	2acebdbf51	Add DB layer: asyncpg pool, engine, row schemas, lifespan wiring Pool creation with min=2/max=100, row schema attrs classes for all 7 tables, and lifespan integration in the app factory for pool startup/shutdown.	2026-04-21 21:55:19 -05:00
Kevin Turcios	fcaac3a9f2	Add LLM layer: client abstraction, cost calculation, retry policy Dual-provider client (Azure OpenAI + Anthropic Bedrock) behind a common async interface with cache-aware cost calculation and event-loop-safe client lifecycle.	2026-04-21 21:55:09 -05:00
Kevin Turcios	d20b82762a	Add auth layer: key hashing, rate limiting, usage tracking SHA-384 + base64url key hashing matching the JS client. FastAPI dependencies for require_auth, check_rate_limit, and track_usage with Annotated[Depends()] pattern. Per-user per-endpoint rate limiting with employee bypass. Atomic subscription usage tracking with enterprise org and employee exemptions. DB queries module with asyncpg raw SQL for auth tables. 27 new tests covering auth flow, rate limits, usage enforcement, and edge cases.	2026-04-21 21:33:02 -05:00
Kevin Turcios	69714f410f	Scaffold codeflash-api package with app factory, config, and healthcheck FastAPI app factory with lifespan, CORS, optional Sentry. Pydantic-settings config for all env vars. Full directory structure for all 15 endpoints per the architecture doc. Workspace integration: ruff src paths, isort, pytest testpaths, per-file ignores. aiohttp for production, httpx for test client.	2026-04-21 21:28:59 -05:00
Kevin Turcios	0901db9fee	Update coveragepy status after E2E validation session	2026-04-21 21:19:24 -05:00
Kevin Turcios	e34873fb82	Add codeflash-api architecture docs and project-scoped rules CLAUDE.md with full package structure, layer boundaries, endpoint map, implementation order, business logic audit, and design decisions. Rules: architecture (layer boundaries, model conventions), testing (coverage requirements, mocking strategy), porting (reference files, what to port vs skip).	2026-04-21 21:16:32 -05:00
Kevin Turcios	2221de0a71	Clarify attrs for internals, Pydantic for API boundary only	2026-04-21 21:13:04 -05:00
Kevin Turcios	c978a9d2a9	Add codeflash-api to project layout and rewrite context in CLAUDE.md	2026-04-21 21:12:30 -05:00
Kevin Turcios	122152b3ce	Upgrade all dependencies to latest versions Notable: pydantic 2.13.3, rich 15.0.0, fastapi 0.136.0, sentry-sdk 2.58.0, ruff 0.15.11, mypy 1.20.2, uvicorn 0.45.0. All 2693 tests pass.	2026-04-21 21:11:25 -05:00
Kevin Turcios	b912121cf4	Upgrade lxml to 6.1.0 to fix XXE vulnerability (CVE-2026-41066)	2026-04-21 21:07:09 -05:00
Kevin Turcios	c0a72e978d	Add rules from session audit: error handling, testing, debugging - sessions.md: hard compaction limits, no-polling, file read budget - debugging.md: root cause first, isolated testing, subprocess logging - github.md: strengthen MCP-first enforcement - error-handling.md (packages): no silent swallowing, protect ast.parse - test-coverage.md (packages): every module needs tests, known gaps	2026-04-21 21:06:15 -05:00
Kevin Turcios	cefd625d35	Fix pytest 9 compat, addopts conflicts, XML recovery, and diagnostics - Use getattr for rootdir/rootpath in discovery worker (pytest 9 compat) - Add -o addopts= to all pytest invocations to override project config - Extract _base_pytest_args helper to eliminate duplication across runners - Support [tool.pytest] config section (not just [tool.pytest.ini_options]) - Add --dist, --no-flaky-report, --failed-first to BLACKLIST_ADDOPTS - Add recover=True to XMLParser for malformed JUnit XML tolerance - Log subprocess stdout/stderr on baseline and candidate test failures - Friendly warning when GitHub App not installed instead of raw error - Upgrade repair failure logging from debug to warning	2026-04-21 20:41:43 -05:00
Kevin Turcios	17de71251f	Send existing unit test source to optimizer for behavioral context _extract_test_input_examples() now includes EXISTING_UNIT_TEST files (hand-written assertions) in addition to GENERATED_REGRESSION tests. Existing tests are prioritized since they represent the developer's explicit behavioral expectations. Extracted a _collect_test_sources() helper to keep complexity manageable.	2026-04-21 17:06:30 -05:00
Kevin Turcios	9abaa95437	Fix pytest rootdir mismatch during overlay candidate evaluation When evaluating candidates in an overlay directory, pytest's --rootdir was set to the overlay cwd, causing classnames in JUnit XML to be computed relative to the wrong directory. The XML parser then failed to resolve test files, producing "0 diffs" instead of detecting real behavioral failures. Pass tests_project_rootdir as --rootdir so classnames are always relative to the project root.	2026-04-21 15:45:32 -05:00
Kevin Turcios	92e7a23722	Wire baseline runtime, test examples, and LP diversity to AI service Send baseline_runtime_ns, loop_count, test_input_examples, and line_profiler_results from the client to the optimize endpoint so the AI service can generate better-informed candidates. Restructure the per-function optimizer to establish baseline before candidate generation, and alternate line profiler data across calls for diversity.	2026-04-21 15:45:17 -05:00
Kevin Turcios	f23822b919	Parallelize AI test generation and increase client timeout - Generate regression tests concurrently with ThreadPoolExecutor instead of sequentially, cutting testgen wall time roughly in half - Increase default AIClient timeout from 120s to 300s to match typical LLM response times for optimization endpoints	2026-04-21 13:57:17 -05:00
Kevin Turcios	dc16346e4c	Show friendly error when [tool.codeflash] config is missing Catch ValueError/TypeError from parse_config_file and log the message instead of printing a raw traceback.	2026-04-21 09:12:10 -05:00
Kevin Turcios	bde8a7b782	Fix benchmark baseline for zero-runtime tests and src-layout projects Three fixes to baseline establishment: - Include zero-nanosecond runtimes in usable_runtime_data_by_test_case (use `is not None` instead of truthiness check) - Check runtime_data dict instead of total_timing scalar for skip decision - Use module_root directly in PYTHONPATH (not module_root.parent) so src-layout projects resolve imports correctly in test subprocesses	2026-04-21 08:59:32 -05:00
Kevin Turcios	434e888571	Move AI-generated test instrumentation from server-side to client-side Server-side instrumentation wrote return values to .bin files, which corrupted under concurrent pytest processes (interleaved records → UnicodeDecodeError). Client-side instrumentation writes to SQLite, which handles concurrent access safely. The client now ignores instrumented_behavior_tests and instrumented_perf_tests from the AI service response and instruments the plain generated_tests locally using inject_profiling_into_existing_test, the same path used for discovered existing tests.	2026-04-21 07:38:48 -05:00

1 2 3 4

196 commits