The optimization defers `.strip()` until after syntax validation and test-function checks (eliminating an early temporary allocation in the 99% of cases where code passes validation) and skips constructing the large `f"LLM testgen response:\n{output.content}"` string in production by checking `IS_PRODUCTION` before calling `debug_log_sensitive_data`. This reduced string-manipulation overhead lifted throughput from 20,808 to 23,256 ops/sec (11.8% gain). The concurrency ratio remained at 0.01x because the function is CPU-bound on local parsing/validation rather than I/O.
The optimization replaced the DOTALL regex `GO_CODE_PATTERN.search()` with a manual linear scan (`_extract_go_code_block`) that uses `str.find()` to locate fence delimiters and validate opening/closing structure. Profiler data shows this cut the pattern-matching cost from ~628 µs to ~367 µs (41% reduction), which is the critical path for inputs with large surrounding text or many fence candidates. The manual scanner avoids regex backtracking and match-object allocation overhead while preserving identical semantics (MULTILINE anchor, optional "go" token, mandatory surrounding newlines). Test cases confirm correctness across all edge cases, with large-input tests showing 175–1236% speedup where the old regex scanned kilobytes of text repeatedly.
The regex pattern for matching Go package declarations is now compiled once at module load (`_PACKAGE_RE = re.compile(...)`) instead of being recompiled on every function call via `re.search()`. Line profiler shows the hot line (regex search) dropped from ~7.9 µs per hit to ~1.3 µs per hit, reducing total function time by 51% across 1049 invocations. The function is called during every Go test generation request (`testgen_go` extracts the package name from user source code), so eliminating repeated compilation overhead directly improves request latency.
The optimization wraps `get_user_prompt` with `@lru_cache(maxsize=2)`, caching file reads for the two prompt variants (sync and async). Since the function is called ~2600 times per optimization run with identical arguments (profiler shows 2628 hits, 47% spent in `exists()` checks and 37% in `read_text()`), caching eliminates all redundant I/O after the first call. The 40× speedup (66ms → 1.65ms) comes from avoiding repeated disk access to static prompt files, confirmed by test cases like `test_get_user_prompt_default_vs_explicit_false_500_iterations` showing 12,248% speedup on repeated calls.
The optimization added `@lru_cache(maxsize=2)` to cache the two prompt file reads (sync and async variants), eliminating redundant disk I/O on subsequent calls with the same `is_async` parameter. Line profiler shows that in the original code, `prompt_file.exists()` (46.5% of runtime) and `read_text()` (37.1%) dominated execution, but with caching these operations occur only once per variant instead of on every call. The annotated tests confirm this: `test_get_system_prompt_performance_many_calls` improved 4786% across 100 iterations, and `test_get_system_prompt_repeated_calls_consistent` saw one call improve 11790% (63.9μs → 401ns) as the cache eliminates file system access entirely. The `.exists()` check was also removed since `read_text()` naturally raises `FileNotFoundError`, which is caught and re-raised as `ValueError` to preserve the original error message format.
Fix package declaration to match code under test (prevents build errors), warn
against string(int) Unicode trap, and pass package_name to system prompt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from managed to vendored mode so tiles are committed to git.
Install 55 tiles (Python + JS/TS), add MCP configs, and set up
weekly tile update workflow via reusable github-workflows caller.
## Summary
- Replaces the inline `aiservice-test` job (30 lines of boilerplate)
with a 10-line shared workflow call
- Uses the new `test-secret-env` input on `ci-python-uv.yml` to
dynamically export 7 secrets as masked env vars
- Pattern: caller passes `secrets: inherit` + a JSON map of `{ENV_VAR:
SECRET_NAME}`, shared workflow uses `toJSON(secrets)` + jq to export
them with `::add-mask::`
### Before (inline)
```yaml
aiservice-test:
runs-on: ubuntu-latest
env:
SECRET_KEY: ${{ secrets.SECRET_KEY }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
# ... 5 more hardcoded secret refs
steps:
- uses: actions/checkout@v6
- uses: astral-sh/setup-uv@v8.1.0
- run: uv sync
- run: uv run pytest
```
### After (shared workflow)
```yaml
aiservice-test:
uses: codeflash-ai/github-workflows/.github/workflows/ci-python-uv.yml@main
secrets: inherit
with:
working-directory: "django/aiservice"
sync-command: "uv sync"
test-command: "uv run pytest"
test-secret-env: '{"SECRET_KEY": "SECRET_KEY", "DATABASE_URL": "DATABASE_URL", ...}'
```
First consumer of the `test-secret-env` feature — validates the pattern
for future jobs.
## Test plan
- [ ] CI passes — aiservice-test job runs via shared workflow and
secrets are correctly exported
- [ ] Gate job (required-checks-passed) still works with the new job
structure
- [ ] No regression in other jobs (they're unchanged)
## Summary
- Snyk PR #2305 bumped `diff` from 8.0.2 to 8.0.3 in
`js/VSC-Extension/package.json` without regenerating the lockfile
- This causes `npm ci` to fail with a "package.json and
package-lock.json are in sync" error
- Ran `npm install` to regenerate `package-lock.json` (resolves `diff`
to 8.0.4, the latest matching `^8.0.3`)
## Test plan
- [x] Verified `npm ci` succeeds with the updated lockfile
- [x] Diff is minimal: only the `diff` package version change (4
insertions, 4 deletions)
Delete 7 separate workflow files now replaced by the unified ci.yaml:
aiservice-ci.yml, cf-api-tests.yaml, cf-webapp-quality-gates.yml,
end-to-end-tests.yaml, nextjs-build.yaml, prek.yaml,
vscode-extension-build.yml
Replace 7 separate CI workflow files with a unified ci.yaml that uses
shared workflows from codeflash-ai/github-workflows:
- determine-changes: reusable workflow for path-based change detection
- prek-lint: reusable workflow for pre-commit checks
- ci-python-uv: reusable workflow for Python typecheck
- required-checks-gate: composite action for gate job
All downstream jobs use fromJSON(needs.determine-changes.outputs.flags)
for conditional execution. A single required-checks-passed gate job
replaces per-workflow required checks.
Private repos need explicit permissions on reusable workflow calls
(contents:write for prek) since they don't inherit permissive defaults.
The Python client sends raw source code, not markdown-wrapped blocks.
split_markdown_code() returned {} for raw input, making SearchAndReplaceDiff
have nothing to patch, so repairs always returned empty string.
Now falls back to {"file.py": raw_code} when markdown parsing yields nothing,
and is_valid() handles raw code blocks instead of only markdown-wrapped ones.
System prompt now focuses on repair strategy (identify pattern, compare
code, minimal fix) instead of spending most tokens on SEARCH/REPLACE
format spec. User prompt explicitly frames the task and asks for root
cause analysis. build_test_details() reformatted for clarity: grouped
by test source with clear Expected/Got lines separated by --- dividers.
## Summary
- The Monaco diff editor on `/trace/[id]` pages was not loading because
`@monaco-editor/react` fetches JS, CSS, and font assets from
`cdn.jsdelivr.net` by default
- The Content Security Policy in `next.config.mjs` blocked those
requests (missing from `script-src`, `style-src`, `font-src`)
- Added `https://cdn.jsdelivr.net` to the three relevant CSP directives
## Test plan
- [ ] Open a trace page (e.g.
`/trace/c0668bd3-9321-4082-9c43-3e41bdd9b1c5`) and verify the code diff
renders
- [ ] Check browser console for no remaining CSP violations
- [ ] Verify no regressions on other pages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
Reworded to highlight that hand-written unit tests encode the developer's
explicit behavioral expectations and that optimizations must produce
identical results for all test cases.
Accept baseline_runtime_ns, loop_count, line_profiler_results, and
test_input_examples on the optimize endpoint. Pass runtime context
and test examples to the user prompt so the LLM can generate
better-informed candidates. Alternate line profiler data across
parallel calls for diversity (odd calls get LP, even calls don't).
When Azure OpenAI or Anthropic returns null/empty content (content
filter, truncation, transient failure), call_openai/call_anthropic now
raise LLMOutputUnparseable instead of returning an empty string that
silently flows through the pipeline and produces 422 "Could not
generate any optimizations." All optimizer callers catch
LLMOutputUnparseable to preserve cost tracking while returning None.
node-linker=hoisted triggers an Invalid Version bug in pnpm 10 bin
linking. The standalone output with zip -y (symlink preservation) is
sufficient — Azure SquashFS supports symlinks natively.