codeflash-agent

Author	SHA1	Message	Date
Kevin Turcios	3ee9c22c8e	fix: resolve all ruff lint errors across repo (#38 ) * fix: resolve all ruff lint errors across repo Auto-fixed 31 errors (unused imports, formatting, simplifications). Manually fixed 14 remaining: - EXE001: removed shebangs from non-executable bench scripts - C417: replaced map(lambda) with generator expression - C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests - C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts - RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign) - FBT002: made boolean params keyword-only in report functions - E402: moved `import re` to top of file in security reports * fix: resolve pre-existing mypy errors across packages - _testgen.py: annotate `generated` as `str` to avoid no-any-return - _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes\|str), remove unused type: ignore on proc.kill() - _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return from lazy-loaded performance_gain	2026-04-23 10:22:42 -05:00
Kevin Turcios	76a07c7f66	Add __test__ = False to Test*-prefixed domain model classes Pytest's default collection pattern matches any class starting with "Test". These domain models (TestType, TestResults, TestFiles, etc.) are not test classes — mark them explicitly so pytest skips them.	2026-04-23 04:41:18 -05:00
Kevin Turcios	4f98b5421f	Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope These classes represent behavioral verification diffs, not tests. The Test* prefix caused pytest to attempt collection and emit warnings.	2026-04-23 04:37:24 -05:00
Kevin Turcios	e41a1bf56a	Fix conftest collision between codeflash-api and github-app test suites Both packages had tests/__init__.py, creating competing `tests` packages under --import-mode=importlib. Remove both __init__.py files and change github-app imports from `from tests.helpers` to `from helpers` via sys.path insertion in conftest.py.	2026-04-23 03:33:58 -05:00
Kevin Turcios	fb76024cfb	Fix CLAUDE.md accuracy: remove nonexistent files, update patterns - Remove _line_profiler.py, observability/models.py, _optimizer.py, _rate_limit.py, _usage.py from tree (never created) - Add _background.py, _markdown.py, _xml.py that actually exist - Mark java/ and js_ts/ as stubs - Update endpoint count from 15 to 14, note log_features stub - Fix Depends() example to use Annotated[] pattern - Add deferred items: optimize-line-profiler, observability DB writes	2026-04-22 23:40:01 -05:00
Kevin Turcios	3a07579bb0	Raise codeflash-api test coverage from 81% to 92% Add 182 new tests across optimize, V4A diff, CST utils, and postprocess modules. Key coverage improvements: - optimize/_pipeline.py: 29% → 97% - optimize/_router.py: 40% → 93% - diff/_v4a.py: 40% → 97% - languages/python/_cst_utils.py: 67% → 96% - languages/python/_postprocess.py: 67% → 87% Also apply ruff format to 5 files that had formatting drift.	2026-04-22 23:39:54 -05:00
Kevin Turcios	fdfade528f	Strengthen testgen test assertions and remove duplicate integration tests Replace weak assertions (len >= 1, bare MagicMock) with exact counts, _stub_llm_response, response body checks, and mock call verification. Remove 6 duplicate TestTestgenIntegration tests already covered in test_testgen.py::TestTestgenEndpoint.	2026-04-22 23:24:36 -05:00
Kevin Turcios	758da2592f	Achieve 100% test coverage for testgen module Add 15 new tests covering all previously uncovered paths: - _validate.py: regex class splitting, trailing blank stripping, repair preamble edge cases (empty during iteration, lineno=None, out-of-range index, max attempts exhausted), AST gap/decorator paths - _generate.py: multi-context ellipsis detection, extract_code_block returning None, no test functions after validation - _review_router.py: non-dict/non-list JSON in review verdicts Mark 2 provably unreachable defensive lines with pragma: no cover.	2026-04-22 23:10:27 -05:00
Kevin Turcios	92c5fd7c74	Remove instrumented_behavior_tests and instrumented_perf_tests from testgen API Instrumentation (behavior/perf AST transformations) moves to the client side. The API now returns raw validated code only via generated_tests.	2026-04-22 23:10:16 -05:00
Kevin Turcios	051317e2dc	Mark all 9 implementation steps complete in architecture docs	2026-04-22 22:11:08 -05:00
Kevin Turcios	4b219907fd	Implement POST /ai/testgen endpoint with full generation pipeline Port test generation from Django reference: prompt templates (Jinja2 with model-type-aware formatting), LLM call orchestration with even/odd model selection, AST-based code validation with regex fallback, preamble repair, and ellipsis detection. Instrumentation and postprocessing are deferred — all four response fields return the same validated code for now.	2026-04-22 22:11:04 -05:00
Kevin Turcios	b3840627bb	Use explicit .strip() assertions in testgen repair tests	2026-04-22 20:36:14 -05:00
Kevin Turcios	6abcc8daa3	Add testgen review and repair endpoints Port /ai/testgen_review and /ai/testgen_repair from Django reference. Review: parallel LLM calls per test source, auto-flags behavioral failures, parses JSON verdicts. Repair: Jinja2 prompt templates, syntax-error retry loop, Python code extraction and validation. Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response, CoverageDetails, FunctionVerdict, TestSourceWithFailures. 23 tests covering: coverage context building, verdict parsing, syntax error detection, endpoint success/error/retry/language paths, and the model validator for python_version resolution.	2026-04-22 20:35:39 -05:00
Kevin Turcios	1d70d65914	Wire observability recording into LLM client Add fire-and-forget background task manager (background.py) and LLM call recording (recording.py). Every LLMClient.call now records trace_id, model, latency, tokens, cost, and errors via fire-and-forget. drain() awaits pending tasks on shutdown. Currently logs only — database persistence deferred until llm_calls table is wired.	2026-04-22 20:30:10 -05:00
Kevin Turcios	a62f1ecd03	Add real DB integration tests with testcontainers 12 tests covering all Queries methods against a real PostgreSQL instance via testcontainers. Automatically skipped when Docker is unavailable. Tests: api key lookup, last_used update, organization fetch, subscription CRUD, usage increment, cumulative increments.	2026-04-22 20:02:28 -05:00
Kevin Turcios	3e16d44912	Fix all mypy strict errors across codeflash-api - Narrow search_start guard in search/replace parser - Type optimizations_limit as int\|str instead of object - Wrap cost calculation return in float() - Cast binary op result to int in CST evaluator - Suppress import-untyped for asyncpg (no stubs available) - Suppress arg-type for OpenAI messages (dict→union mismatch) - Type isort kwargs as Any, add Coroutine import for refinement - Narrow feature_version to tuple[int, int] for ast.parse - Rename shadowed loop variable in annotation walker - Add mypy strict=true config to pyproject.toml	2026-04-22 19:59:42 -05:00
Kevin Turcios	03bb712c65	Add integration tests and fix AuthenticatedUser runtime import FastAPI with `from __future__ import annotations` cannot resolve Annotated[AuthenticatedUser, Depends()] when AuthenticatedUser is only imported under TYPE_CHECKING — it becomes a query parameter instead of a dependency. Move to runtime import in all 11 routers with noqa: TC001 suppression. 30 integration tests cover all endpoints (success, invalid trace_id, LLM failure, edge cases) using httpx ASGITransport with mocked LLM.	2026-04-21 22:48:30 -05:00
Kevin Turcios	935c6f229e	Add remaining endpoints: repair, refinement, adaptive, explain, review, ranking, jit, workflow, testgen, log_features Port all P1 endpoints from the Django aiservice to FastAPI: - repair: 2-attempt LLM retry, SearchAndReplaceDiff patch application - refinement: parallel LLM calls via asyncio.gather, single/multi-file context dispatch, XML explanation extraction, deduplication - adaptive: single LLM call with previous candidate history - explain: conditional throughput/concurrency/acceptance prompt sections, XML <explain> tag extraction - review: 4-dimension scoring, JSON code block extraction, 2-attempt retry - ranking: 4-dimension weighted scoring, JSON extraction with 3 fallbacks (direct parse, markdown block, brace matching), legacy XML fallback - jit: reuses optimize pipeline with JIT-specific prompts - workflow: 3-tier regex YAML extraction, LLM-generated CI steps - testgen: stub returning 501 (language-specific logic deferred) - log_features: trace_id validation, DB write stubbed Also adds: - Task-specific model assignments in llm/_models.py - XML tag extraction utility in languages/python/_xml.py - All 11 routers registered in _app.py 348 tests passing, all lint clean.	2026-04-21 22:36:31 -05:00
Kevin Turcios	6c04324e25	Add optimize endpoint: context, pipeline, router, prompt templates Faithful port of the Python optimization pipeline from Django aiservice: - schemas.py: Pydantic request/response models (OptimizeRequest, OptimizeResponse) - _markdown.py: markdown code block extraction, splitting, grouping - _context.py: BaseOptimizerContext with Single/Multi variants for prompt assembly, LLM response extraction, and postprocessing - _pipeline.py: parallel LLM orchestration with model distribution (GPT-5-mini + Claude Sonnet 4.5), diversity via line profiler toggling - _router.py: POST /ai/optimize with auth, rate limiting, usage tracking - 11 prompt templates copied verbatim from Django reference - LLM client wired into app lifespan	2026-04-21 22:16:22 -05:00
Kevin Turcios	3e62f502e7	Add language layer: CST utils, validator, postprocessing pipeline Faithful port of Python language utilities from Django aiservice: - _cst_utils.py: depth tracking, import extraction, definition removal, ellipsis detection, expression evaluation, module path helpers - _validator.py: dual ast+libcst syntax validation, parse-or-none - _postprocess.py: full optimization postprocessing pipeline including dedup, equality check, docstring restoration, comment cleaning, forward reference fixing, ellipsis filtering, isort	2026-04-21 22:04:39 -05:00
Kevin Turcios	5c6b82050a	Add diff layer: SEARCH/REPLACE and V4A patch application Faithfully ported from Django aiservice. V4A uses 3-tier fuzzy context matching (exact/rstrip/strip) with EOF penalties and scope markers. Per-file lint ignores for ported complexity.	2026-04-21 21:55:28 -05:00
Kevin Turcios	2acebdbf51	Add DB layer: asyncpg pool, engine, row schemas, lifespan wiring Pool creation with min=2/max=100, row schema attrs classes for all 7 tables, and lifespan integration in the app factory for pool startup/shutdown.	2026-04-21 21:55:19 -05:00
Kevin Turcios	fcaac3a9f2	Add LLM layer: client abstraction, cost calculation, retry policy Dual-provider client (Azure OpenAI + Anthropic Bedrock) behind a common async interface with cache-aware cost calculation and event-loop-safe client lifecycle.	2026-04-21 21:55:09 -05:00
Kevin Turcios	d20b82762a	Add auth layer: key hashing, rate limiting, usage tracking SHA-384 + base64url key hashing matching the JS client. FastAPI dependencies for require_auth, check_rate_limit, and track_usage with Annotated[Depends()] pattern. Per-user per-endpoint rate limiting with employee bypass. Atomic subscription usage tracking with enterprise org and employee exemptions. DB queries module with asyncpg raw SQL for auth tables. 27 new tests covering auth flow, rate limits, usage enforcement, and edge cases.	2026-04-21 21:33:02 -05:00
Kevin Turcios	69714f410f	Scaffold codeflash-api package with app factory, config, and healthcheck FastAPI app factory with lifespan, CORS, optional Sentry. Pydantic-settings config for all env vars. Full directory structure for all 15 endpoints per the architecture doc. Workspace integration: ruff src paths, isort, pytest testpaths, per-file ignores. aiohttp for production, httpx for test client.	2026-04-21 21:28:59 -05:00
Kevin Turcios	e34873fb82	Add codeflash-api architecture docs and project-scoped rules CLAUDE.md with full package structure, layer boundaries, endpoint map, implementation order, business logic audit, and design decisions. Rules: architecture (layer boundaries, model conventions), testing (coverage requirements, mocking strategy), porting (reference files, what to port vs skip).	2026-04-21 21:16:32 -05:00

26 commits