33 KiB
Java Language Support Architecture for CodeFlash
Executive Summary
Adding Java support to CodeFlash requires implementing the LanguageSupport protocol with Java-specific components for parsing, test discovery, context extraction, and test execution. The existing architecture is well-designed for multi-language support, and Java can follow the established patterns from Python and JavaScript/TypeScript.
1. Architecture Overview
Current Language Support Stack
┌─────────────────────────────────────────────────────────────────┐
│ Core Optimization Pipeline │
│ (language-agnostic: optimizer.py, function_optimizer.py) │
└───────────────────────────────┬─────────────────────────────────┘
│
┌───────────▼───────────┐
│ LanguageSupport │
│ Protocol │
└───────────┬───────────┘
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PythonSupport │ │JavaScriptSupport│ │ JavaSupport │
│ (mature) │ │ (functional) │ │ (NEW) │
├───────────────┤ ├─────────────────┤ ├─────────────────┤
│ - libcst │ │ - tree-sitter │ │ - tree-sitter │
│ - pytest │ │ - jest │ │ - JUnit 5 │
│ - Jedi │ │ - npm/yarn │ │ - Maven/Gradle │
└───────────────┘ └─────────────────┘ └─────────────────┘
Proposed Java Module Structure
codeflash/languages/java/
├── __init__.py # Module exports, register language
├── support.py # JavaSupport class (main implementation)
├── parser.py # Tree-sitter Java parsing utilities
├── discovery.py # Function/method discovery
├── context_extractor.py # Code context extraction
├── import_resolver.py # Java import/dependency resolution
├── instrument.py # Test instrumentation
├── test_runner.py # JUnit test execution
├── comparator.py # Test result comparison
├── build_tools.py # Maven/Gradle integration
├── formatter.py # Code formatting (google-java-format)
└── line_profiler.py # JProfiler/async-profiler integration
2. Core Components
2.1 Language Registration
# codeflash/languages/java/support.py
from codeflash.languages.base import Language, LanguageSupport
from codeflash.languages.registry import register_language
@register_language
class JavaSupport:
@property
def language(self) -> Language:
return Language.JAVA # Add to Language enum
@property
def file_extensions(self) -> tuple[str, ...]:
return (".java",)
@property
def test_framework(self) -> str:
return "junit"
@property
def comment_prefix(self) -> str:
return "//"
2.2 Language Enum Extension
# codeflash/languages/base.py
class Language(Enum):
PYTHON = "python"
JAVASCRIPT = "javascript"
TYPESCRIPT = "typescript"
JAVA = "java" # NEW
3. Component Implementation Details
3.1 Parsing (tree-sitter-java)
File: codeflash/languages/java/parser.py
Tree-sitter has excellent Java support. Key node types to handle:
| Java Construct | Tree-sitter Node Type |
|---|---|
| Class | class_declaration |
| Interface | interface_declaration |
| Method | method_declaration |
| Constructor | constructor_declaration |
| Static block | static_initializer |
| Lambda | lambda_expression |
| Anonymous class | anonymous_class_body |
| Annotation | annotation |
| Generic type | type_parameters |
class JavaParser:
"""Tree-sitter based Java parser."""
def __init__(self):
self.parser = Parser()
self.parser.set_language(tree_sitter_java.language())
def find_methods(self, source: str) -> list[MethodNode]:
"""Find all method declarations."""
tree = self.parser.parse(source.encode())
return self._walk_for_methods(tree.root_node)
def find_classes(self, source: str) -> list[ClassNode]:
"""Find all class/interface declarations."""
...
def get_method_signature(self, node: Node) -> MethodSignature:
"""Extract method signature including generics."""
...
3.2 Function Discovery
File: codeflash/languages/java/discovery.py
Java-specific considerations:
- Methods are always inside classes (no top-level functions)
- Need to handle: instance methods, static methods, constructors
- Interface default methods
- Annotation processing (
@Override,@Test, etc.) - Inner classes and nested methods
def discover_functions(
file_path: Path,
criteria: FunctionFilterCriteria | None = None
) -> list[FunctionInfo]:
"""
Discover optimizable methods in a Java file.
Returns methods that are:
- Public or protected (can be tested)
- Not abstract
- Not native
- Not in test files
- Not trivial (getters/setters unless specifically requested)
"""
parser = JavaParser()
source = file_path.read_text(encoding="utf-8")
methods = []
for class_node in parser.find_classes(source):
for method in class_node.methods:
if _should_include_method(method, criteria):
methods.append(FunctionInfo(
name=method.name,
file_path=file_path,
start_line=method.start_line,
end_line=method.end_line,
parents=(ParentInfo(
name=class_node.name,
type="ClassDeclaration"
),),
is_async=method.has_annotation("Async"),
is_method=True,
language=Language.JAVA,
))
return methods
3.3 Code Context Extraction
File: codeflash/languages/java/context_extractor.py
Java context extraction must handle:
- Full class context (methods often depend on fields)
- Import statements (crucial for compilation)
- Package declarations
- Type hierarchy (extends/implements)
- Inner classes
- Static imports
def extract_code_context(
function: FunctionInfo,
project_root: Path,
module_root: Path | None = None
) -> CodeContext:
"""
Extract code context for a Java method.
Context includes:
1. Full containing class (target method needs class context)
2. All imports from the file
3. Helper classes from same package
4. Superclass/interface definitions (read-only)
"""
source = function.file_path.read_text(encoding="utf-8")
parser = JavaParser()
# Extract package and imports
package_name = parser.get_package(source)
imports = parser.get_imports(source)
# Get the containing class
class_source = parser.extract_class_containing_method(
source, function.name, function.start_line
)
# Find helper classes (same package, used by target class)
helper_classes = find_helper_classes(
function.file_path.parent,
class_source,
imports
)
return CodeContext(
target_code=class_source,
target_file=function.file_path,
helper_functions=helper_classes,
read_only_context=get_superclass_context(imports, project_root),
imports=imports,
language=Language.JAVA,
)
3.4 Import/Dependency Resolution
File: codeflash/languages/java/import_resolver.py
Java import resolution is more complex:
- Explicit imports (
import com.foo.Bar;) - Wildcard imports (
import com.foo.*;) - Static imports (
import static com.foo.Bar.method;) - Same-package classes (implicit)
- Standard library vs external dependencies
class JavaImportResolver:
"""Resolve Java imports to source files."""
def __init__(self, project_root: Path, build_tool: BuildTool):
self.project_root = project_root
self.build_tool = build_tool
self.source_roots = self._find_source_roots()
self.classpath = build_tool.get_classpath()
def resolve_import(self, import_stmt: str) -> ResolvedImport:
"""
Resolve an import to its source location.
Returns:
- Source file path (if in project)
- JAR location (if external dependency)
- None (if JDK class)
"""
...
def find_same_package_classes(self, package: str) -> list[Path]:
"""Find all classes in the same package."""
...
3.5 Test Discovery
File: codeflash/languages/java/support.py (part of JavaSupport)
Java test discovery for JUnit 5:
def discover_tests(
self,
test_root: Path,
source_functions: list[FunctionInfo]
) -> dict[str, list[TestInfo]]:
"""
Discover JUnit tests that cover target methods.
Strategy:
1. Find test files by naming convention (*Test.java, *Tests.java)
2. Parse test files for @Test annotated methods
3. Analyze test code for method calls to target methods
4. Match tests to source methods
"""
test_files = self._find_test_files(test_root)
test_map: dict[str, list[TestInfo]] = defaultdict(list)
for test_file in test_files:
parser = JavaParser()
source = test_file.read_text()
for test_method in parser.find_test_methods(source):
# Find which source methods this test calls
called_methods = parser.find_method_calls(test_method.body)
for source_func in source_functions:
if source_func.name in called_methods:
test_map[source_func.qualified_name].append(TestInfo(
test_name=test_method.name,
test_file=test_file,
test_class=test_method.class_name,
))
return test_map
3.6 Test Execution
File: codeflash/languages/java/test_runner.py
JUnit test execution with Maven/Gradle:
class JavaTestRunner:
"""Run JUnit tests via Maven or Gradle."""
def __init__(self, project_root: Path):
self.build_tool = detect_build_tool(project_root)
self.project_root = project_root
def run_tests(
self,
test_classes: list[str],
timeout: int = 60,
capture_output: bool = True
) -> TestExecutionResult:
"""
Run specified JUnit tests.
Uses:
- Maven: mvn test -Dtest=ClassName#methodName
- Gradle: ./gradlew test --tests "ClassName.methodName"
"""
if self.build_tool == BuildTool.MAVEN:
return self._run_maven_tests(test_classes, timeout)
else:
return self._run_gradle_tests(test_classes, timeout)
def _run_maven_tests(self, tests: list[str], timeout: int) -> TestExecutionResult:
cmd = [
"mvn", "test",
f"-Dtest={','.join(tests)}",
"-Dmaven.test.failure.ignore=true",
"-DfailIfNoTests=false",
]
result = subprocess.run(cmd, cwd=self.project_root, ...)
return self._parse_surefire_reports()
def _parse_surefire_reports(self) -> TestExecutionResult:
"""Parse target/surefire-reports/*.xml for test results."""
...
3.7 Code Instrumentation
File: codeflash/languages/java/instrument.py
Java instrumentation for behavior capture:
class JavaInstrumenter:
"""Instrument Java code for behavior/performance capture."""
def instrument_for_behavior(
self,
source: str,
target_methods: list[str]
) -> str:
"""
Add instrumentation to capture method inputs/outputs.
Adds:
- CodeFlash.captureInput(args) before method body
- CodeFlash.captureOutput(result) before returns
- Exception capture in catch blocks
"""
parser = JavaParser()
tree = parser.parse(source)
# Insert capture calls using tree-sitter edit operations
edits = []
for method in parser.find_methods_by_name(tree, target_methods):
edits.append(self._create_input_capture(method))
edits.append(self._create_output_capture(method))
return apply_edits(source, edits)
def instrument_for_benchmarking(
self,
test_source: str,
target_method: str,
iterations: int = 1000
) -> str:
"""
Add timing instrumentation to test code.
Wraps test execution in timing loop with warmup.
"""
...
3.8 Build Tool Integration
File: codeflash/languages/java/build_tools.py
Maven and Gradle support:
class BuildTool(Enum):
MAVEN = "maven"
GRADLE = "gradle"
def detect_build_tool(project_root: Path) -> BuildTool:
"""Detect whether project uses Maven or Gradle."""
if (project_root / "pom.xml").exists():
return BuildTool.MAVEN
elif (project_root / "build.gradle").exists() or \
(project_root / "build.gradle.kts").exists():
return BuildTool.GRADLE
raise ValueError("No Maven or Gradle build file found")
class MavenIntegration:
"""Maven build tool integration."""
def __init__(self, project_root: Path):
self.pom_path = project_root / "pom.xml"
self.project_root = project_root
def get_source_roots(self) -> list[Path]:
"""Get configured source directories."""
# Default: src/main/java, src/test/java
...
def get_classpath(self) -> list[Path]:
"""Get full classpath including dependencies."""
result = subprocess.run(
["mvn", "dependency:build-classpath", "-q", "-DincludeScope=test"],
cwd=self.project_root,
capture_output=True
)
return [Path(p) for p in result.stdout.decode().split(":")]
def compile(self, include_tests: bool = True) -> bool:
"""Compile the project."""
cmd = ["mvn", "compile"]
if include_tests:
cmd.append("test-compile")
return subprocess.run(cmd, cwd=self.project_root).returncode == 0
class GradleIntegration:
"""Gradle build tool integration."""
# Similar implementation for Gradle
...
3.9 Code Replacement
File: codeflash/languages/java/support.py
def replace_function(
self,
source: str,
function: FunctionInfo,
new_source: str
) -> str:
"""
Replace a method in Java source code.
Challenges:
- Method might have annotations
- Javadoc comments should be preserved/updated
- Overloaded methods need exact signature matching
"""
parser = JavaParser()
# Find the exact method by line number (handles overloads)
method_node = parser.find_method_at_line(source, function.start_line)
# Include Javadoc if present
start = method_node.javadoc_start or method_node.start
end = method_node.end
# Replace the method
return source[:start] + new_source + source[end:]
3.10 Code Formatting
File: codeflash/languages/java/formatter.py
def format_code(source: str, file_path: Path | None = None) -> str:
"""
Format Java code using google-java-format.
Falls back to built-in formatter if google-java-format not available.
"""
try:
result = subprocess.run(
["google-java-format", "-"],
input=source.encode(),
capture_output=True,
timeout=30
)
if result.returncode == 0:
return result.stdout.decode()
except FileNotFoundError:
pass
# Fallback: basic indentation normalization
return normalize_indentation(source)
4. Test Result Comparison
4.1 Behavior Verification
For Java, test results comparison needs to handle:
- Object equality (
.equals()vs reference equality) - Collection ordering (Lists vs Sets)
- Floating point comparison with epsilon
- Exception messages and types
- Side effects (mocked interactions)
# codeflash/languages/java/comparator.py
def compare_test_results(
original_results: Path,
candidate_results: Path,
project_root: Path
) -> tuple[bool, list[TestDiff]]:
"""
Compare behavior between original and optimized code.
Uses a Java comparison utility (run via the build tool)
that handles Java-specific equality semantics.
"""
# Run Java-based comparison tool
result = subprocess.run([
"java", "-cp", get_comparison_jar(),
"com.codeflash.Comparator",
str(original_results),
str(candidate_results)
], capture_output=True)
diffs = json.loads(result.stdout)
return len(diffs) == 0, [TestDiff(**d) for d in diffs]
5. AI Service Integration
The AI service already supports language parameter. For Java:
# Called from function_optimizer.py
response = ai_service.optimize_code(
source_code=code_context.target_code,
dependency_code=code_context.read_only_context,
trace_id=trace_id,
language="java",
language_version="17", # or "11", "21"
n_candidates=5,
)
Java-specific optimization prompts should consider:
- Stream API optimizations
- Collection choice (ArrayList vs LinkedList, HashMap vs TreeMap)
- Concurrency patterns (CompletableFuture, parallel streams)
- Memory optimization (primitive vs boxed types)
- JIT-friendly patterns
6. Configuration Detection
File: codeflash/languages/java/config.py
def detect_java_version(project_root: Path) -> str:
"""Detect Java version from build configuration."""
build_tool = detect_build_tool(project_root)
if build_tool == BuildTool.MAVEN:
# Check pom.xml for maven.compiler.source
pom = ET.parse(project_root / "pom.xml")
version = pom.find(".//maven.compiler.source")
if version is not None:
return version.text
elif build_tool == BuildTool.GRADLE:
# Check build.gradle for sourceCompatibility
build_file = project_root / "build.gradle"
if build_file.exists():
content = build_file.read_text()
match = re.search(r"sourceCompatibility\s*=\s*['\"]?(\d+)", content)
if match:
return match.group(1)
# Fallback: detect from JAVA_HOME
return detect_jdk_version()
def detect_source_roots(project_root: Path) -> list[Path]:
"""Find source code directories."""
standard_paths = [
project_root / "src" / "main" / "java",
project_root / "src",
]
return [p for p in standard_paths if p.exists()]
def detect_test_roots(project_root: Path) -> list[Path]:
"""Find test code directories."""
standard_paths = [
project_root / "src" / "test" / "java",
project_root / "test",
]
return [p for p in standard_paths if p.exists()]
7. Runtime Library
CodeFlash needs a Java runtime library for instrumentation:
codeflash-runtime-java/
├── pom.xml
├── src/main/java/com/codeflash/
│ ├── CodeFlash.java # Main capture API
│ ├── Capture.java # Input/output capture
│ ├── Comparator.java # Result comparison
│ ├── Timer.java # High-precision timing
│ └── Serializer.java # Object serialization for comparison
// CodeFlash.java
package com.codeflash;
public class CodeFlash {
public static void captureInput(String methodId, Object... args) {
// Serialize and store inputs
}
public static <T> T captureOutput(String methodId, T result) {
// Serialize and store output
return result;
}
public static void captureException(String methodId, Throwable e) {
// Store exception info
}
public static long startTimer() {
return System.nanoTime();
}
public static void recordTime(String methodId, long startTime) {
long elapsed = System.nanoTime() - startTime;
// Store timing
}
}
8. Implementation Phases
Phase 1: Foundation (MVP)
- Add
Language.JAVAto enum - Implement tree-sitter Java parsing
- Basic method discovery (public methods in classes)
- Build tool detection (Maven/Gradle)
- Simple context extraction (single file)
- Test discovery (JUnit 5
@Testmethods) - Test execution via Maven/Gradle
Phase 2: Full Pipeline
- Import resolution and dependency tracking
- Multi-file context extraction
- Test result capture and comparison
- Code instrumentation for behavior verification
- Benchmarking instrumentation
- Code formatting integr.ation
Phase 3: Advanced Features
- Line profiler integration (JProfiler/async-profiler)
- Generics handling in optimization
- Lambda and stream optimization support
- Concurrency-aware benchmarking
- IDE integration (Language Server)
9. Key Challenges & Considerations
9.1 Java-Specific Challenges
| Challenge | Solution |
|---|---|
| No top-level functions | Always include class context |
| Overloaded methods | Use full signature for identification |
| Compilation required | Compile before running tests |
| Build tool complexity | Abstract via BuildTool interface |
| Static typing | Ensure type compatibility in replacements |
| Generics | Preserve type parameters in optimization |
| Checked exceptions | Maintain throws declarations |
| Package visibility | Handle package-private methods |
9.2 Performance Considerations
- JVM Warmup: Java needs JIT warmup before benchmarking
- GC Noise: Account for garbage collection in timing
- Classloading: First run is always slower
def run_benchmark_with_warmup(
test_method: str,
warmup_iterations: int = 100,
benchmark_iterations: int = 1000
) -> BenchmarkResult:
"""Run benchmark with proper JVM warmup."""
# Warmup phase (results discarded)
run_tests(test_method, iterations=warmup_iterations)
# Force GC before measurement
subprocess.run(["jcmd", str(pid), "GC.run"])
# Actual benchmark
return run_tests(test_method, iterations=benchmark_iterations)
9.3 Test Framework Support
| Framework | Priority | Notes |
|---|---|---|
| JUnit 5 | High | Primary target, most modern |
| JUnit 4 | Medium | Still widely used |
| TestNG | Low | Different annotation model |
| Mockito | High | Mocking support needed |
| AssertJ | Medium | Fluent assertions |
10. File Changes Summary
New Files to Create
codeflash/languages/java/
├── __init__.py
├── support.py (~800 lines)
├── parser.py (~400 lines)
├── discovery.py (~300 lines)
├── context_extractor.py (~400 lines)
├── import_resolver.py (~350 lines)
├── instrument.py (~500 lines)
├── test_runner.py (~400 lines)
├── comparator.py (~200 lines)
├── build_tools.py (~350 lines)
├── formatter.py (~100 lines)
├── line_profiler.py (~300 lines)
└── config.py (~150 lines)
Total: ~4,250 lines
Existing Files to Modify
| File | Changes |
|---|---|
codeflash/languages/base.py |
Add JAVA to Language enum |
codeflash/languages/__init__.py |
Import java module |
codeflash/cli_cmds/init.py |
Add Java project detection |
codeflash/api/aiservice.py |
No changes (already supports language param) |
requirements.txt / pyproject.toml |
Add tree-sitter-java |
External Dependencies
# pyproject.toml additions
tree-sitter-java = "^0.21.0"
11. Testing Strategy
Unit Tests
# tests/languages/java/test_parser.py
def test_discover_methods_in_class():
source = '''
public class Calculator {
public int add(int a, int b) {
return a + b;
}
}
'''
methods = JavaParser().find_methods(source)
assert len(methods) == 1
assert methods[0].name == "add"
# tests/languages/java/test_discovery.py
def test_discover_functions_filters_tests():
# Test that test methods are excluded
...
Integration Tests
# tests/languages/java/test_integration.py
def test_full_optimization_pipeline(java_test_project):
"""End-to-end test with a real Java project."""
support = JavaSupport()
functions = support.discover_functions(
java_test_project / "src/main/java/Example.java"
)
context = support.extract_code_context(functions[0], java_test_project)
# Verify context is compilable
assert compile_java(context.target_code)
12. LanguageSupport Protocol Reference
All methods that JavaSupport must implement:
Properties
@property
def language(self) -> Language: ...
@property
def file_extensions(self) -> tuple[str, ...]: ...
@property
def test_framework(self) -> str: ...
@property
def comment_prefix(self) -> str: ...
Discovery Methods
def discover_functions(
self,
file_path: Path,
criteria: FunctionFilterCriteria | None = None
) -> list[FunctionInfo]: ...
def discover_tests(
self,
test_root: Path,
source_functions: list[FunctionInfo]
) -> dict[str, list[TestInfo]]: ...
Code Analysis
def extract_code_context(
self,
function: FunctionInfo,
project_root: Path,
module_root: Path | None = None
) -> CodeContext: ...
def find_helper_functions(
self,
function: FunctionInfo,
project_root: Path
) -> list[HelperFunction]: ...
Code Transformation
def replace_function(
self,
source: str,
function: FunctionInfo,
new_source: str
) -> str: ...
def format_code(
self,
source: str,
file_path: Path | None = None
) -> str: ...
def normalize_code(self, source: str) -> str: ...
Test Execution
def run_behavioral_tests(
self,
test_paths: list[Path],
test_env: dict[str, str],
cwd: Path,
timeout: int,
...
) -> tuple[Path, Any, Path | None, Path | None]: ...
def run_benchmarking_tests(
self,
test_paths: list[Path],
test_env: dict[str, str],
cwd: Path,
timeout: int,
...
) -> tuple[Path, Any]: ...
Instrumentation
def instrument_for_behavior(
self,
source: str,
functions: list[str]
) -> str: ...
def instrument_for_benchmarking(
self,
test_source: str,
target_function: str
) -> str: ...
def instrument_existing_test(
self,
test_path: Path,
call_positions: list[tuple[int, int]],
...
) -> tuple[bool, str | None]: ...
Validation
def validate_syntax(self, source: str) -> bool: ...
Result Comparison
def compare_test_results(
self,
original_path: Path,
candidate_path: Path,
project_root: Path
) -> tuple[bool, list[TestDiff]]: ...
13. Data Flow Diagram
┌──────────────────────────────────────────────────────────────────────────┐
│ Java Optimization Flow │
└──────────────────────────────────────────────────────────────────────────┘
User runs: codeflash optimize Example.java
│
▼
┌───────────────────────────────┐
│ Detect Build Tool │
│ (Maven pom.xml / Gradle) │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Discover Methods │
│ (tree-sitter-java parsing) │
│ Filter: public, non-test │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Extract Code Context │
│ - Full class with imports │
│ - Helper classes (same pkg) │
│ - Superclass definitions │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Discover Tests │
│ - Find *Test.java files │
│ - Parse @Test annotations │
│ - Match to source methods │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Run Baseline │
│ - Compile (mvn/gradle) │
│ - Execute JUnit tests │
│ - Capture behavior + timing │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ AI Optimization │
│ - Send to AI service │
│ - language="java" │
│ - Receive N candidates │
└───────────────┬───────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Candidate 1 │ ... │ Candidate N │
└───────┬───────┘ └───────┬───────┘
│ │
└───────────┬───────────┘
│
▼
┌───────────────────────────────┐
│ For Each Candidate: │
│ 1. Replace method in source │
│ 2. Compile project │
│ 3. Run behavior tests │
│ 4. Compare outputs │
│ 5. If correct: benchmark │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Select Best Candidate │
│ - Correctness verified │
│ - Best speedup │
│ - Account for JVM warmup │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Apply Optimization │
│ - Update source file │
│ - Create PR (optional) │
│ - Report results │
└───────────────────────────────┘
14. Conclusion
This architecture provides a comprehensive roadmap for adding Java support to CodeFlash. The modular design mirrors the existing JavaScript/TypeScript implementation pattern, making it straightforward to implement incrementally while maintaining consistency with the rest of the codebase.
Key success factors:
- Leverage tree-sitter for consistent parsing approach
- Abstract build tools to support both Maven and Gradle
- Handle JVM specifics (warmup, GC) in benchmarking
- Reuse existing infrastructure where possible (AI service, result types)
- Implement incrementally following the phased approach