Commit graph

26 commits

Author SHA1 Message Date
Kevin Turcios
3ee9c22c8e
fix: resolve all ruff lint errors across repo (#38)
* fix: resolve all ruff lint errors across repo

Auto-fixed 31 errors (unused imports, formatting, simplifications).
Manually fixed 14 remaining:
- EXE001: removed shebangs from non-executable bench scripts
- C417: replaced map(lambda) with generator expression
- C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests
- C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts
- RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign)
- FBT002: made boolean params keyword-only in report functions
- E402: moved `import re` to top of file in security reports

* fix: resolve pre-existing mypy errors across packages

- _testgen.py: annotate `generated` as `str` to avoid no-any-return
- _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes|str),
  remove unused type: ignore on proc.kill()
- _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return
  from lazy-loaded performance_gain
2026-04-23 10:22:42 -05:00
Kevin Turcios
76a07c7f66 Add __test__ = False to Test*-prefixed domain model classes
Pytest's default collection pattern matches any class starting with
"Test". These domain models (TestType, TestResults, TestFiles, etc.)
are not test classes — mark them explicitly so pytest skips them.
2026-04-23 04:41:18 -05:00
Kevin Turcios
4f98b5421f Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope
These classes represent behavioral verification diffs, not tests. The
Test* prefix caused pytest to attempt collection and emit warnings.
2026-04-23 04:37:24 -05:00
Kevin Turcios
e41a1bf56a Fix conftest collision between codeflash-api and github-app test suites
Both packages had tests/__init__.py, creating competing `tests`
packages under --import-mode=importlib. Remove both __init__.py files
and change github-app imports from `from tests.helpers` to
`from helpers` via sys.path insertion in conftest.py.
2026-04-23 03:33:58 -05:00
Kevin Turcios
fb76024cfb Fix CLAUDE.md accuracy: remove nonexistent files, update patterns
- Remove _line_profiler.py, observability/models.py, _optimizer.py,
  _rate_limit.py, _usage.py from tree (never created)
- Add _background.py, _markdown.py, _xml.py that actually exist
- Mark java/ and js_ts/ as stubs
- Update endpoint count from 15 to 14, note log_features stub
- Fix Depends() example to use Annotated[] pattern
- Add deferred items: optimize-line-profiler, observability DB writes
2026-04-22 23:40:01 -05:00
Kevin Turcios
3a07579bb0 Raise codeflash-api test coverage from 81% to 92%
Add 182 new tests across optimize, V4A diff, CST utils, and postprocess
modules. Key coverage improvements:
- optimize/_pipeline.py: 29% → 97%
- optimize/_router.py: 40% → 93%
- diff/_v4a.py: 40% → 97%
- languages/python/_cst_utils.py: 67% → 96%
- languages/python/_postprocess.py: 67% → 87%

Also apply ruff format to 5 files that had formatting drift.
2026-04-22 23:39:54 -05:00
Kevin Turcios
fdfade528f Strengthen testgen test assertions and remove duplicate integration tests
Replace weak assertions (len >= 1, bare MagicMock) with exact counts,
_stub_llm_response, response body checks, and mock call verification.
Remove 6 duplicate TestTestgenIntegration tests already covered in
test_testgen.py::TestTestgenEndpoint.
2026-04-22 23:24:36 -05:00
Kevin Turcios
758da2592f Achieve 100% test coverage for testgen module
Add 15 new tests covering all previously uncovered paths:
- _validate.py: regex class splitting, trailing blank stripping,
  repair preamble edge cases (empty during iteration, lineno=None,
  out-of-range index, max attempts exhausted), AST gap/decorator paths
- _generate.py: multi-context ellipsis detection, extract_code_block
  returning None, no test functions after validation
- _review_router.py: non-dict/non-list JSON in review verdicts

Mark 2 provably unreachable defensive lines with pragma: no cover.
2026-04-22 23:10:27 -05:00
Kevin Turcios
92c5fd7c74 Remove instrumented_behavior_tests and instrumented_perf_tests from testgen API
Instrumentation (behavior/perf AST transformations) moves to the client
side. The API now returns raw validated code only via generated_tests.
2026-04-22 23:10:16 -05:00
Kevin Turcios
051317e2dc Mark all 9 implementation steps complete in architecture docs 2026-04-22 22:11:08 -05:00
Kevin Turcios
4b219907fd Implement POST /ai/testgen endpoint with full generation pipeline
Port test generation from Django reference: prompt templates (Jinja2
with model-type-aware formatting), LLM call orchestration with
even/odd model selection, AST-based code validation with regex
fallback, preamble repair, and ellipsis detection. Instrumentation
and postprocessing are deferred — all four response fields return
the same validated code for now.
2026-04-22 22:11:04 -05:00
Kevin Turcios
b3840627bb Use explicit .strip() assertions in testgen repair tests 2026-04-22 20:36:14 -05:00
Kevin Turcios
6abcc8daa3 Add testgen review and repair endpoints
Port /ai/testgen_review and /ai/testgen_repair from Django reference.
Review: parallel LLM calls per test source, auto-flags behavioral
failures, parses JSON verdicts. Repair: Jinja2 prompt templates,
syntax-error retry loop, Python code extraction and validation.

Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response,
CoverageDetails, FunctionVerdict, TestSourceWithFailures.

23 tests covering: coverage context building, verdict parsing,
syntax error detection, endpoint success/error/retry/language paths,
and the model validator for python_version resolution.
2026-04-22 20:35:39 -05:00
Kevin Turcios
1d70d65914 Wire observability recording into LLM client
Add fire-and-forget background task manager (background.py) and
LLM call recording (recording.py). Every LLMClient.call now records
trace_id, model, latency, tokens, cost, and errors via fire-and-forget.
drain() awaits pending tasks on shutdown. Currently logs only —
database persistence deferred until llm_calls table is wired.
2026-04-22 20:30:10 -05:00
Kevin Turcios
a62f1ecd03 Add real DB integration tests with testcontainers
12 tests covering all Queries methods against a real PostgreSQL
instance via testcontainers. Automatically skipped when Docker is
unavailable. Tests: api key lookup, last_used update, organization
fetch, subscription CRUD, usage increment, cumulative increments.
2026-04-22 20:02:28 -05:00
Kevin Turcios
3e16d44912 Fix all mypy strict errors across codeflash-api
- Narrow search_start guard in search/replace parser
- Type optimizations_limit as int|str instead of object
- Wrap cost calculation return in float()
- Cast binary op result to int in CST evaluator
- Suppress import-untyped for asyncpg (no stubs available)
- Suppress arg-type for OpenAI messages (dict→union mismatch)
- Type isort kwargs as Any, add Coroutine import for refinement
- Narrow feature_version to tuple[int, int] for ast.parse
- Rename shadowed loop variable in annotation walker
- Add mypy strict=true config to pyproject.toml
2026-04-22 19:59:42 -05:00
Kevin Turcios
03bb712c65 Add integration tests and fix AuthenticatedUser runtime import
FastAPI with `from __future__ import annotations` cannot resolve
Annotated[AuthenticatedUser, Depends()] when AuthenticatedUser is
only imported under TYPE_CHECKING — it becomes a query parameter
instead of a dependency. Move to runtime import in all 11 routers
with noqa: TC001 suppression.

30 integration tests cover all endpoints (success, invalid trace_id,
LLM failure, edge cases) using httpx ASGITransport with mocked LLM.
2026-04-21 22:48:30 -05:00
Kevin Turcios
935c6f229e Add remaining endpoints: repair, refinement, adaptive, explain, review, ranking, jit, workflow, testgen, log_features
Port all P1 endpoints from the Django aiservice to FastAPI:

- repair: 2-attempt LLM retry, SearchAndReplaceDiff patch application
- refinement: parallel LLM calls via asyncio.gather, single/multi-file
  context dispatch, XML explanation extraction, deduplication
- adaptive: single LLM call with previous candidate history
- explain: conditional throughput/concurrency/acceptance prompt sections,
  XML <explain> tag extraction
- review: 4-dimension scoring, JSON code block extraction, 2-attempt retry
- ranking: 4-dimension weighted scoring, JSON extraction with 3 fallbacks
  (direct parse, markdown block, brace matching), legacy XML fallback
- jit: reuses optimize pipeline with JIT-specific prompts
- workflow: 3-tier regex YAML extraction, LLM-generated CI steps
- testgen: stub returning 501 (language-specific logic deferred)
- log_features: trace_id validation, DB write stubbed

Also adds:
- Task-specific model assignments in llm/_models.py
- XML tag extraction utility in languages/python/_xml.py
- All 11 routers registered in _app.py

348 tests passing, all lint clean.
2026-04-21 22:36:31 -05:00
Kevin Turcios
6c04324e25 Add optimize endpoint: context, pipeline, router, prompt templates
Faithful port of the Python optimization pipeline from Django aiservice:
- schemas.py: Pydantic request/response models (OptimizeRequest, OptimizeResponse)
- _markdown.py: markdown code block extraction, splitting, grouping
- _context.py: BaseOptimizerContext with Single/Multi variants for
  prompt assembly, LLM response extraction, and postprocessing
- _pipeline.py: parallel LLM orchestration with model distribution
  (GPT-5-mini + Claude Sonnet 4.5), diversity via line profiler toggling
- _router.py: POST /ai/optimize with auth, rate limiting, usage tracking
- 11 prompt templates copied verbatim from Django reference
- LLM client wired into app lifespan
2026-04-21 22:16:22 -05:00
Kevin Turcios
3e62f502e7 Add language layer: CST utils, validator, postprocessing pipeline
Faithful port of Python language utilities from Django aiservice:
- _cst_utils.py: depth tracking, import extraction, definition removal,
  ellipsis detection, expression evaluation, module path helpers
- _validator.py: dual ast+libcst syntax validation, parse-or-none
- _postprocess.py: full optimization postprocessing pipeline including
  dedup, equality check, docstring restoration, comment cleaning,
  forward reference fixing, ellipsis filtering, isort
2026-04-21 22:04:39 -05:00
Kevin Turcios
5c6b82050a Add diff layer: SEARCH/REPLACE and V4A patch application
Faithfully ported from Django aiservice. V4A uses 3-tier fuzzy
context matching (exact/rstrip/strip) with EOF penalties and scope
markers. Per-file lint ignores for ported complexity.
2026-04-21 21:55:28 -05:00
Kevin Turcios
2acebdbf51 Add DB layer: asyncpg pool, engine, row schemas, lifespan wiring
Pool creation with min=2/max=100, row schema attrs classes for all
7 tables, and lifespan integration in the app factory for pool
startup/shutdown.
2026-04-21 21:55:19 -05:00
Kevin Turcios
fcaac3a9f2 Add LLM layer: client abstraction, cost calculation, retry policy
Dual-provider client (Azure OpenAI + Anthropic Bedrock) behind
a common async interface with cache-aware cost calculation and
event-loop-safe client lifecycle.
2026-04-21 21:55:09 -05:00
Kevin Turcios
d20b82762a Add auth layer: key hashing, rate limiting, usage tracking
SHA-384 + base64url key hashing matching the JS client. FastAPI
dependencies for require_auth, check_rate_limit, and track_usage
with Annotated[Depends()] pattern. Per-user per-endpoint rate
limiting with employee bypass. Atomic subscription usage tracking
with enterprise org and employee exemptions. DB queries module
with asyncpg raw SQL for auth tables. 27 new tests covering auth
flow, rate limits, usage enforcement, and edge cases.
2026-04-21 21:33:02 -05:00
Kevin Turcios
69714f410f Scaffold codeflash-api package with app factory, config, and healthcheck
FastAPI app factory with lifespan, CORS, optional Sentry. Pydantic-settings
config for all env vars. Full directory structure for all 15 endpoints per
the architecture doc. Workspace integration: ruff src paths, isort, pytest
testpaths, per-file ignores. aiohttp for production, httpx for test client.
2026-04-21 21:28:59 -05:00
Kevin Turcios
e34873fb82 Add codeflash-api architecture docs and project-scoped rules
CLAUDE.md with full package structure, layer boundaries, endpoint map,
implementation order, business logic audit, and design decisions.

Rules: architecture (layer boundaries, model conventions), testing
(coverage requirements, mocking strategy), porting (reference files,
what to port vs skip).
2026-04-21 21:16:32 -05:00