codeflash-agent/plugin/ARCHITECTURE.md
Kevin Turcios 3b59d97647 squash
2026-04-13 14:12:17 -05:00

7.7 KiB

Plugin Architecture & Execution Order

Lifecycle

  1. SessionStart hook — initializes Codex session state
  2. User triggers /codeflash-optimize start (skill)
  3. Language router (codeflash) — detects project language, delegates to language-specific router
  4. Language-specific router (e.g., codeflash-python) — detects domain, asks user questions, launches setup
  5. Setup agent (e.g., codeflash-setup) — detects env, installs deps/profilers, writes .codeflash/setup.md
  6. Router validates setup, runs test suite, researches deps via context7
  7. Router creates team and dispatches optimizer agent

Optimization Loop

  1. Optimizer (codeflash-deep or domain-specific: -cpu, -memory, -async, -structure) — profiles all dimensions, ranks targets
  2. Researcher (codeflash-researcher) — launched alongside to analyze targets in parallel, sends findings back to optimizer
  3. Experiment cycle: profile → reason → implement → test → benchmark → keep/discard → commit → re-profile → repeat
  4. Plateau detection (3+ consecutive discards) → optimizer sends [complete]

Review Gate

  1. Review agent (codeflash-review) — 6-pass deep review (comprehension → correctness → safety → benchmark verification → quality → disclosure)
  2. Writes .codeflash/review-report.md with verdict (APPROVE/REQUEST CHANGES/BLOCK)

Cleanup

  1. Router shuts down teammates, deletes team
  2. Preserves learnings.md, results.tsv, changelog.md; deletes temp files
  3. SessionEnd hook — finalizes Codex session

Hooks

Defined in plugin/hooks/hooks.json, fire at session boundaries:

Hook When What
SessionStart New Claude session begins Initializes Codex session state, records metadata
SessionEnd Session ends Cleans up Codex jobs, saves final state
Stop User clicks Stop (900s timeout) Optionally runs Codex adversarial review gate before allowing termination

Agents

Language-agnostic (plugin/agents/)

Agent Role Triggered by
codeflash Language router — detects language, delegates to language-specific router /codeflash-optimize skill, user request
codeflash-researcher Read-only research teammate Domain agents, after baseline profiling
codeflash-review Independent 6-pass deep review /codex-review, post-optimization gate

Python-specific (plugin/languages/python/agents/)

Agent Role Triggered by
codeflash-python Python domain router/team lead — orchestrates Python sessions Language router after detecting Python
codeflash-setup Environment detection & preparation Python router, before first optimization
codeflash-scan Quick cross-domain diagnosis /codeflash-optimize scan or router recon
codeflash-deep Primary optimizer (all dimensions) Python router (default unless single-domain requested)
codeflash-cpu CPU/runtime specialist Python router or deep agent dispatch
codeflash-memory Memory specialist Python router or deep agent dispatch
codeflash-async Async/concurrency specialist Python router or deep agent dispatch
codeflash-structure Import-time/module structure specialist Python router or deep agent dispatch
codeflash-ci CI mode agent for GitHub webhooks CI service
codeflash-pr-prep PR preparation agent Post-session

Commands (plugin/commands/)

User-invocable anytime:

Command Purpose
/codex-review Manual adversarial review via Codex companion
/codex-setup Check/install Codex CLI, configure review gate
/codex-status Check active and recent Codex jobs

Skills (plugin/languages/python/skills/)

Skill Purpose
codeflash-optimize Entry point: start|resume|status|scan|review
memray-profiling Advanced memory profiling utilities (used by codeflash-memory)

References

Language-agnostic (plugin/references/shared/)

Methodology, templates, and frameworks that apply to any language:

File Purpose
agent-base-protocol.md Shared operational rules (experiment discipline, commit rules, stuck recovery)
experiment-loop-base.md Shared experiment loop framework (keep/discard tree, guard, plateau)
pre-submit-review.md Shared pre-submit checklist (resource ownership, concurrency, correctness)
e2e-benchmarks.md Two-phase measurement concept (micro-benchmark → E2E)
micro-benchmark.md A/B pre-screen pattern
pr-body-templates.md Generic PR body structure and writing guidelines
pr-preparation.md PR workflow (inventory, folding, conventions)
adversarial-review.md Codex adversarial review methodology
changelog-template.md Changelog generation structure
handoff-template.md HANDOFF.md template
learnings-template.md Cross-session learnings template

Python-specific (plugin/languages/python/references/)

Python implementations of shared protocols, plus domain-specific deep-dive docs:

File/Dir Purpose
agent-base-protocol.md Python profilers (cProfile, tracemalloc, memray), test runners, package managers
e2e-benchmarks.md codeflash compare usage, pytest-benchmark, fallback tools
micro-benchmark.md Python A/B template (timeit, memray, asyncio), domain thresholds
pre-submit-review.md Python checks (asyncio, .pyc, os.environ, monkey-patching)
pr-body-templates.md Python PR variants (codeflash compare output, memray memory table)
unified-profiling-script.py CPU+memory+GC profiling script for deep agent
library-replacement.md Library boundary breaking guide
async/ Async domain: asyncio patterns, blocking detection, concurrency
data-structures/ CPU domain: containers, algorithms, bytecode, stdlib
memory/ Memory domain: tracemalloc, memray, leak detection, framework leaks
structure/ Structure domain: import time, module decomposition, circular deps

State Files

Created during execution in .codeflash/:

File Created by Purpose
setup.md codeflash-setup Environment summary
scan-report.md codeflash-scan Ranked targets + domain recommendations
results.tsv optimizer agents Experiment log (baseline, speedup, keep/discard)
HANDOFF.md optimizer agents Session state for resume
conventions.md router Binding constraints from maintainer feedback
learnings.md router Cross-session discoveries
review-report.md codeflash-review 6-pass review findings + verdict
changelog.md router PR-ready optimization summary

Ordering Guarantees

Sequential:

  1. SessionStart hook fires before any agent acts
  2. Language detection before domain routing
  3. Setup agent completes before domain agents start
  4. Baseline profiling before any optimization experiment
  5. Re-profiling after every KEEP to update rankings
  6. Review gate runs after optimizer [complete], before cleanup
  7. SessionEnd hook fires as session terminates

Parallel allowed:

  • Researcher analyzes targets #2-5 while optimizer works on target #1
  • Multiple domain agents can run in separate worktrees
  • Deep agent can dispatch domain agents while continuing its own profiling

Assembly

make build-plugin merges plugin/ (base, excluding languages/) + plugin/languages/python/ (overlay) into dist/. Set LANG=javascript to build for JS instead. Agent files use ${CLAUDE_PLUGIN_ROOT} for references — paths differ between source and assembled output.