Commit graph

172 commits

Author SHA1 Message Date
Kevin Turcios
3ee9c22c8e
fix: resolve all ruff lint errors across repo (#38)
* fix: resolve all ruff lint errors across repo

Auto-fixed 31 errors (unused imports, formatting, simplifications).
Manually fixed 14 remaining:
- EXE001: removed shebangs from non-executable bench scripts
- C417: replaced map(lambda) with generator expression
- C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests
- C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts
- RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign)
- FBT002: made boolean params keyword-only in report functions
- E402: moved `import re` to top of file in security reports

* fix: resolve pre-existing mypy errors across packages

- _testgen.py: annotate `generated` as `str` to avoid no-any-return
- _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes|str),
  remove unused type: ignore on proc.kill()
- _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return
  from lazy-loaded performance_gain
2026-04-23 10:22:42 -05:00
codeflash-ci-bot[bot]
c249bcd0ce
chore: update tessl tiles 2026-04-23 (#35)
Co-authored-by: codeflash-ci-bot[bot] <codeflash-ci-bot[bot]@users.noreply.github.com>
2026-04-23 08:15:44 -05:00
Kevin Turcios
9bc9ff2250 chore: add ready-to-merge gate for branch freshness 2026-04-23 08:12:25 -05:00
Kevin Turcios
5d02df9890
Merge pull request #34 from codeflash-ai/fix/tessl-app-token
fix: pass CI bot secrets to tessl update workflow
2026-04-23 08:04:58 -05:00
Kevin Turcios
3851fd4cf9 fix: pass CI bot secrets to tessl update workflow 2026-04-23 08:04:06 -05:00
Kevin Turcios
82f16dc9f0
Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions
fix: add permissions to tessl update caller workflow
2026-04-23 07:50:43 -05:00
Kevin Turcios
fa1b5ece4e fix: add permissions to tessl update caller workflow 2026-04-23 07:49:56 -05:00
Kevin Turcios
01dc370d09
Merge pull request #32 from codeflash-ai/chore/tessl-vendored-setup
chore: initialize tessl with vendored tiles
2026-04-23 07:44:58 -05:00
Kevin Turcios
f83ece9ad8 chore: initialize tessl with vendored tiles
Install 32 Python tiles in vendored mode, add MCP configs for all
agents, and set up weekly tile update workflow via reusable
github-workflows caller.
2026-04-23 07:43:04 -05:00
Kevin Turcios
616d078ecd Add CI optimization item to roadmap 2026-04-23 05:59:08 -05:00
Kevin Turcios
851bd61017 Add project roadmap 2026-04-23 05:27:58 -05:00
Kevin Turcios
72a8610fcf Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00
Kevin Turcios
57446aad31 Fix unawaited coroutine warning in test_default_timeout_is_600
The AsyncMock for wait_for discarded the coroutine from
proc.communicate() without consuming it. Replace with a side_effect
that closes the coroutine before returning the mock result.
2026-04-23 04:46:32 -05:00
Kevin Turcios
76a07c7f66 Add __test__ = False to Test*-prefixed domain model classes
Pytest's default collection pattern matches any class starting with
"Test". These domain models (TestType, TestResults, TestFiles, etc.)
are not test classes — mark them explicitly so pytest skips them.
2026-04-23 04:41:18 -05:00
Kevin Turcios
4f98b5421f Rename TestDiff/TestDiffScope to BehaviorDiff/BehaviorDiffScope
These classes represent behavioral verification diffs, not tests. The
Test* prefix caused pytest to attempt collection and emit warnings.
2026-04-23 04:37:24 -05:00
Kevin Turcios
9e893675c9 Add Plotly Cloud deployment config for CI audit report 2026-04-23 03:59:35 -05:00
Kevin Turcios
c492164fbf Add codeflash org CI audit case study and interactive Dash report
Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README,
status, and raw data (fork activity, PRs merged).

Interactive Dash report in reports/codeflash-ci-audit/ with two tabs:
Executive Summary (hero metrics, cost impact charts, before/after) and
Full Detail (fork breakdown, findings table, PR inventory, methodology).

Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage
savings, 200+ forks disabled, 11 PRs merged across 2 repos.
2026-04-23 03:56:04 -05:00
Kevin Turcios
8221ce32a2 Add codeflash-ai/ci-audit to active case studies list 2026-04-23 03:52:10 -05:00
Kevin Turcios
0316d9822a Add testcontainers[postgres] to workspace dev dependencies
The 12 DB integration tests in codeflash-api need testcontainers to spin
up a real PostgreSQL instance via Docker. Was already declared in the
package's own dev deps but missing from the root workspace.
2026-04-23 03:52:07 -05:00
Kevin Turcios
bf8707695f Add codeflash-api to workspace dev dependencies
The package was a workspace member but not listed in the root dev
group, so its tests couldn't import codeflash_api when running
from the monorepo root.
2026-04-23 03:38:30 -05:00
Kevin Turcios
e3e74c3f2e Add missing pyproject.toml for codeflash-ci-audit workspace member 2026-04-23 03:34:03 -05:00
Kevin Turcios
e41a1bf56a Fix conftest collision between codeflash-api and github-app test suites
Both packages had tests/__init__.py, creating competing `tests`
packages under --import-mode=importlib. Remove both __init__.py files
and change github-app imports from `from tests.helpers` to
`from helpers` via sys.path insertion in conftest.py.
2026-04-23 03:33:58 -05:00
Kevin Turcios
43a4009294 Fix callee syntax validation in prepare_python_module
normalize_code no longer raises SyntaxError (it returns raw code as
fallback), so validate callee source with ast.parse() explicitly
before normalizing. Fixes test_callee_syntax_error_returns_none.
2026-04-23 03:28:58 -05:00
Kevin Turcios
bd5613d22f Update test-coverage.md: remove resolved callouts for covered modules 2026-04-23 03:13:28 -05:00
Kevin Turcios
e1990092e0 Add tests for error handling paths in ranking, refinement, and state
- test_ranking: Update normalize_code test to expect fallback on invalid
  syntax instead of SyntaxError (matches new behavior)
- test_refinement: Add 7 tests for _parse_candidate markdown parsing
  (fenced blocks, file paths, multiple blocks, plain fallback)
- test_state: Add 6 tests for PythonState.module_ast and invalidate_module
  (valid parse, caching, SyntaxError→None, re-parse after fix)
2026-04-23 03:13:23 -05:00
Kevin Turcios
9e679f1c06 Fix error handling: add logging to bare excepts, protect ast.parse(), parse markdown in refinement
- _tracing.py: Add log.warning(exc_info=True) to 4 bare except blocks that
  previously silently swallowed errors
- _state.py: Wrap ast.parse() in SyntaxError handler, return None for
  malformed files
- _ranking.py: Wrap ast.parse() in SyntaxError handler, fall back to raw
  code string for dedup
- _refinement.py: Add CodeStringsMarkdown.parse_markdown_code() to
  _parse_candidate(), matching the pattern in _candidate_gen.py
- Update error-handling.md rules to reflect resolved issues
2026-04-23 03:06:03 -05:00
Kevin Turcios
dd7d2db451 Add unit tests for _benchmark_worker subprocess script
5 tests covering module-level argv parsing, project_root derivation,
benchmark plugin and trace decorator imports, and __main__ guard.
2026-04-23 02:31:38 -05:00
Kevin Turcios
e2135e39b2 Add unit tests for vendored _tabulate module
64 tests covering: tabulate() with pipe/simple formats, empty/None
data, dict input, numeric alignment, float formatting, whitespace
preservation, separating lines, firstrow headers. Internal helpers:
type detection, number parsing, ANSI stripping, padding, multiline
detection, pipe segment alignment. Integration test matching the
_create_pr use case.
2026-04-23 02:31:38 -05:00
Kevin Turcios
276c2f36da Add unit tests for _discovery_worker (collection parsing, plugin)
Covers parse_pytest_collection_results with top-level functions,
class methods, and empty input. Tests PytestCollectionPlugin
benchmark skipping, collection_finish pickle output, and item
accumulation. Uses sys.argv patching to handle module-level reads.
2026-04-23 02:28:23 -05:00
Kevin Turcios
957f299243 Add unit tests for _create_pr (PR creation, suggestion, error paths)
Covers PR number env var parsing, suggest-changes vs create-pr
branching, branch push failure, GitHub App not-installed warning,
and generic API error logging.
2026-04-23 02:24:01 -05:00
Kevin Turcios
c31fbc1e43 Add unit tests for _trace_db (sanitize, trace queries, run time)
Covers sanitize_to_filename edge cases, get_traced_arguments with
class filtering and invalid event types, and get_trace_total_run_time_ns
with missing files/tables/empty tables.
2026-04-23 02:23:56 -05:00
Kevin Turcios
cf7cf60936 Add unit tests for _candidate_gen (generate, repair, refinement)
Covers happy paths and error paths for generate_candidates,
repair_failed_candidates, and generate_refinement_candidates.
Tests AI service errors, unparseable markdown, missing runtime
data, and repair failures.
2026-04-23 02:23:52 -05:00
Kevin Turcios
815eba00c0 Fix unawaited coroutine warning in test_skips_ai_test_generation
optimize() is now async, so the test must use async def + await.
2026-04-23 01:47:26 -05:00
Kevin Turcios
92e39d6923 Convert remaining sync test runner callers to async
Replace all sync test runner calls (run_behavioral_tests,
run_benchmarking_tests, run_line_profile_tests) with their async
counterparts throughout the pipeline. This eliminates the
ThreadPoolExecutor in _baseline.py in favor of asyncio.gather(),
and makes _async_bench.py, _candidate_gen.py, and
_function_optimizer.py fully async. Adds async_run_line_profile_tests
and coverage support to async_run_behavioral_tests in _test_runner.py.
2026-04-23 01:46:01 -05:00
Kevin Turcios
a292698a1d Add pytest-cov to dev dependencies 2026-04-23 00:41:32 -05:00
Kevin Turcios
f204f8e740 Unify sync/async candidate eval into single async path
Delete the sync evaluate_candidate() and run_tests_and_benchmark()
functions — all callers now use the async versions. Rename
async_run_tests_and_benchmark → run_tests_and_benchmark and
async_evaluate_candidate_isolated → evaluate_candidate_isolated.

The entire optimization pipeline is now async with a single
asyncio.run() entry point in _cli.py:main(). PythonOptimizer.run()
and PythonFunctionOptimizer.optimize() are async coroutines.

Update test_candidate_eval.py and test_parallel_eval_integration.py
to match the unified API.
2026-04-23 00:41:28 -05:00
Kevin Turcios
8defba8a72 Add unit tests for async test runners and candidate evaluation
29 new tests in test_test_runner.py covering async_execute_test_subprocess,
async_run_behavioral_tests, async_run_benchmarking_tests, _base_pytest_args,
replay test path, and coverage path.

21 new tests in test_candidate_eval.py covering evaluate_candidate,
rank_candidates, build_benchmark_details, log_evaluation_results, and
async_run_tests_and_benchmark.
2026-04-23 00:24:59 -05:00
Kevin Turcios
8d308fe8e8 Replace ThreadPoolExecutor with asyncio for parallel candidate evaluation
Thread-safety concern with shared EvaluationContext mutations is
eliminated by switching to cooperative concurrency — between await
points only one coroutine runs, so no locks are needed.

Adds async variants of test runners (async_execute_test_subprocess,
async_run_behavioral_tests, async_run_benchmarking_tests) and async
evaluation functions (async_run_tests_and_benchmark,
async_evaluate_candidate_isolated). Rewrites _evaluate_batch_parallel
to use asyncio.Semaphore + asyncio.gather instead of ThreadPoolExecutor.
2026-04-23 00:12:53 -05:00
Kevin Turcios
df3538167f Add .coverage to gitignore 2026-04-22 23:40:22 -05:00
Kevin Turcios
fb76024cfb Fix CLAUDE.md accuracy: remove nonexistent files, update patterns
- Remove _line_profiler.py, observability/models.py, _optimizer.py,
  _rate_limit.py, _usage.py from tree (never created)
- Add _background.py, _markdown.py, _xml.py that actually exist
- Mark java/ and js_ts/ as stubs
- Update endpoint count from 15 to 14, note log_features stub
- Fix Depends() example to use Annotated[] pattern
- Add deferred items: optimize-line-profiler, observability DB writes
2026-04-22 23:40:01 -05:00
Kevin Turcios
3a07579bb0 Raise codeflash-api test coverage from 81% to 92%
Add 182 new tests across optimize, V4A diff, CST utils, and postprocess
modules. Key coverage improvements:
- optimize/_pipeline.py: 29% → 97%
- optimize/_router.py: 40% → 93%
- diff/_v4a.py: 40% → 97%
- languages/python/_cst_utils.py: 67% → 96%
- languages/python/_postprocess.py: 67% → 87%

Also apply ruff format to 5 files that had formatting drift.
2026-04-22 23:39:54 -05:00
Kevin Turcios
2d9fca6b3e Fix all ruff lint errors in codeflash-core
- Replace commented-out code pattern with descriptive comment in __init__.py
- Move ModuleType into TYPE_CHECKING block in _git.py
- Add noqa: F821 for PEP 562 lazy-loaded git module references
- Restore noqa: PLC0415 on reformatted sentry imports in _telemetry.py
2026-04-22 23:39:47 -05:00
Kevin Turcios
fdfade528f Strengthen testgen test assertions and remove duplicate integration tests
Replace weak assertions (len >= 1, bare MagicMock) with exact counts,
_stub_llm_response, response body checks, and mock call verification.
Remove 6 duplicate TestTestgenIntegration tests already covered in
test_testgen.py::TestTestgenEndpoint.
2026-04-22 23:24:36 -05:00
Kevin Turcios
758da2592f Achieve 100% test coverage for testgen module
Add 15 new tests covering all previously uncovered paths:
- _validate.py: regex class splitting, trailing blank stripping,
  repair preamble edge cases (empty during iteration, lineno=None,
  out-of-range index, max attempts exhausted), AST gap/decorator paths
- _generate.py: multi-context ellipsis detection, extract_code_block
  returning None, no test functions after validation
- _review_router.py: non-dict/non-list JSON in review verdicts

Mark 2 provably unreachable defensive lines with pragma: no cover.
2026-04-22 23:10:27 -05:00
Kevin Turcios
92c5fd7c74 Remove instrumented_behavior_tests and instrumented_perf_tests from testgen API
Instrumentation (behavior/perf AST transformations) moves to the client
side. The API now returns raw validated code only via generated_tests.
2026-04-22 23:10:16 -05:00
Kevin Turcios
051317e2dc Mark all 9 implementation steps complete in architecture docs 2026-04-22 22:11:08 -05:00
Kevin Turcios
4b219907fd Implement POST /ai/testgen endpoint with full generation pipeline
Port test generation from Django reference: prompt templates (Jinja2
with model-type-aware formatting), LLM call orchestration with
even/odd model selection, AST-based code validation with regex
fallback, preamble repair, and ellipsis detection. Instrumentation
and postprocessing are deferred — all four response fields return
the same validated code for now.
2026-04-22 22:11:04 -05:00
Kevin Turcios
b3840627bb Use explicit .strip() assertions in testgen repair tests 2026-04-22 20:36:14 -05:00
Kevin Turcios
6abcc8daa3 Add testgen review and repair endpoints
Port /ai/testgen_review and /ai/testgen_repair from Django reference.
Review: parallel LLM calls per test source, auto-flags behavioral
failures, parses JSON verdicts. Repair: Jinja2 prompt templates,
syntax-error retry loop, Python code extraction and validation.

Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response,
CoverageDetails, FunctionVerdict, TestSourceWithFailures.

23 tests covering: coverage context building, verdict parsing,
syntax error detection, endpoint success/error/retry/language paths,
and the model validator for python_version resolution.
2026-04-22 20:35:39 -05:00
Kevin Turcios
1d70d65914 Wire observability recording into LLM client
Add fire-and-forget background task manager (background.py) and
LLM call recording (recording.py). Every LLMClient.call now records
trace_id, model, latency, tokens, cost, and errors via fire-and-forget.
drain() awaits pending tasks on shutdown. Currently logs only —
database persistence deferred until llm_calls table is wired.
2026-04-22 20:30:10 -05:00