The JavaScript test generation prompt contained `{fn}` as part of
example code showing import syntax. However, Python's `.format()`
method interprets this as a placeholder and tries to substitute it,
causing a KeyError.
Fixed by escaping the curly braces as `{{fn}}` so they render as
literal `{fn}` in the final prompt.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
LLMs often add .js extensions to TypeScript import paths (e.g.,
`import { func } from '../module.js'`), but TypeScript/Jest module
resolution doesn't require explicit extensions. This causes
"Cannot find module" errors.
This change adds `strip_js_extensions()` function that removes
.js/.ts/.tsx/.jsx/.mjs/.mts extensions from relative import paths
in generated tests. The function handles:
- ES module imports: import { x } from '../path.js'
- CommonJS requires: require('../path.js')
- Jest mocks: jest.mock('../path.js'), jest.doMock(), etc.
External package imports (lodash, react, etc.) are preserved.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add more explicit instructions to prevent LLMs from adding .js/.ts
extensions to import paths. The previous guidance was being ignored
by some models.
- Add dedicated "CRITICAL: IMPORT PATH RULES" section with examples
- Show both WRONG and CORRECT patterns explicitly
- Remind to copy the provided import statement exactly
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Python's str.isidentifier() validates Python identifiers, not JavaScript
identifiers. This caused valid JS identifiers like '$handler' to be
rejected (since $ is not valid in Python identifiers).
Changed to use a regex pattern that matches JavaScript identifier rules:
- Can start with letter, underscore, or $
- Can contain letters, digits, underscores, or $
Added tests for $ identifiers to ensure they are correctly handled.
- Add _is_valid_js_identifier() to check for reserved words (module, exports, prototype, etc.)
- Only use class import pattern for single-dot names where class name is valid identifier
- Fall back to module import for:
- Multiple dots (e.g., Constructor.prototype.method)
- Reserved words (e.g., module.exports)
- Add comprehensive tests for edge cases
When generating test imports for class methods like `Validator.validateRequest`,
the previous code produced invalid JavaScript:
const { Validator.validateRequest } = require('../middlewares/Validator');
This is invalid because dots are not allowed in destructuring patterns.
The fix:
- Add _generate_import_statement() function to detect class methods (names with dots)
- For class methods: generate `const ClassName = require('...')`
- For simple functions: keep destructuring `const { funcName } = require('...')`
- Update prompt templates to use {import_statement} placeholder
Includes unit tests for the new import generation logic.
Add better error diagnostics for TypeScript/JavaScript syntax validation:
- Add line numbers and code snippets to error messages
- Log warnings when markdown parsing finds no code blocks
- Show the actual problematic code in error logs
- Help debug "Invalid syntax" errors by showing exact location
This helps diagnose issues where the API rejects code that tree-sitter
parses correctly on the client side by providing more context in the
error messages and logs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Add `instrumented_perf_test` field to `OptimizationFeatures` model
- Update `log_features` function to accept and store performance
instrumented tests
---------
Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com>
## ⚡️ This pull request contains optimizations for PR #2247
If you approve this dependent PR, these changes will be merged into the
original PR branch `multi-language`.
>This PR will be automatically closed if the original PR is merged.
----
#### 📄 18% (0.18x) speedup for ***`_has_test_functions` in
`django/aiservice/testgen/testgen_javascript.py`***
⏱️ Runtime : **`740 microseconds`** **→** **`627 microseconds`** (best
of `76` runs)
#### 📝 Explanation and details
The optimized code achieves an **18% runtime improvement** by
eliminating repeated regex compilation overhead.
**Key optimization:**
- **Precompiled regex pattern**: The pattern
`r"(?:test|it)\s*\(\s*['\"]"` is compiled once at module load time into
`_TEST_FUNC_RE`, rather than being recompiled on every function call. In
Python, `re.search()` with a raw string pattern incurs compilation cost
each time it's invoked.
**Performance breakdown from line profiler:**
- Original: 2.70ms spent in `re.search(test_pattern, code)` (96.5% of
total time)
- Optimized: 862μs for the direct pattern search (100% of total time,
but 3.1x faster overall)
- The pattern string assignment overhead (97μs in original) is
eliminated entirely
**Why this matters for the workload:**
Based on `function_references`, this function is called from
`parse_and_validate_js_output()` during LLM response validation. This is
a **hot path** operation that executes on every test generation request.
The validation flow checks multiple conditions including syntax
validation before checking for test functions, meaning this function
runs repeatedly during normal operations.
**Test case performance:**
- **Small inputs** (single test functions): 50-80% faster (e.g., 2.80μs
→ 1.83μs)
- **Empty/minimal strings**: 130-140% faster (e.g., 1.80μs → 750ns)
- **Large inputs** (500-1000 lines): 1-8% faster depending on match
location
- **Early matches** benefit most since regex short-circuits on first
match
The optimization is most effective when processing typical-sized
JavaScript test code (dozens to hundreds of lines), which aligns with
the common use case of validating LLM-generated test functions.
✅ **Correctness verification report:**
| Test | Status |
| --------------------------- | ----------------- |
| ⚙️ Existing Unit Tests | 🔘 **None Found** |
| 🌀 Generated Regression Tests | ✅ **102 Passed** |
| ⏪ Replay Tests | 🔘 **None Found** |
| 🔎 Concolic Coverage Tests | 🔘 **None Found** |
|📊 Tests Coverage | 100.0% |
<details>
<summary>🌀 Click to see Generated Regression Tests</summary>
```python
from __future__ import annotations
import re
# imports
import pytest # used for our unit tests
from testgen.testgen_javascript import _has_test_functions
def test_basic_test_call_double_quotes():
# Basic: a standard Jest test call using double quotes should be detected.
code = 'test("my test name", () => { expect(true).toBe(true); });'
codeflash_output = _has_test_functions(code) # 2.83μs -> 1.78μs (59.3% faster)
def test_basic_it_call_single_quotes():
# Basic: a standard it() call using single quotes should be detected.
code = "it('does something', function() { /* ... */ });"
codeflash_output = _has_test_functions(code) # 2.80μs -> 1.83μs (53.0% faster)
def test_whitespace_and_newlines_between_name_and_paren():
# Edge: whitespace/newlines between the function name and '(' and between '(' and the quote
# The regex allows arbitrary whitespace, so this should still match.
code = "it \n (\n 'handles newlines'\n )"
codeflash_output = _has_test_functions(code) # 2.90μs -> 1.89μs (53.2% faster)
def test_empty_string_returns_false():
# Edge: empty input must return False (no tests found).
code = ""
codeflash_output = _has_test_functions(code) # 1.80μs -> 750ns (139% faster)
def test_uppercase_function_name_not_matched():
# Edge: the regex is case-sensitive; 'Test' should NOT match.
code = "Test('capitalized should not match', () => {});"
codeflash_output = _has_test_functions(code) # 3.20μs -> 2.23μs (43.7% faster)
def test_backtick_template_not_matched():
# Edge: template literals use backticks; pattern looks only for single/double quotes.
code = "test(`template literal name`, () => {});"
codeflash_output = _has_test_functions(code) # 3.34μs -> 2.27μs (47.0% faster)
def test_numeric_first_arg_not_matched():
# Edge: if the first argument is not a quoted string (e.g., a number), pattern should not match.
code = "test(123, () => {});"
codeflash_output = _has_test_functions(code) # 3.01μs -> 1.86μs (61.7% faster)
def test_test_call_inside_comment_still_matches():
# Important behavioral note: the function does not ignore comments.
# A 'test(' occurrence inside a JS comment still matches because the function only does regex search.
code = "// test('in a single-line comment')\n/* test(\"in block comment\") */"
# Both comment forms contain test('...') / test("...") which the regex will find.
codeflash_output = _has_test_functions(code) # 2.96μs -> 1.84μs (61.1% faster)
def test_substring_in_identifier_matches():
# The regex is permissive and will match occurrences where 'test' or 'it' appear as suffixes
# of other identifiers (e.g., 'latesttest(' or 'split('). This test documents that behavior.
code_latest = "function latesttest(){}\nlatesttest('x')"
code_split = "const arr = ['a']; arr.split('a');"
# Both contain the substring "test('..." or "it('...", so they should be considered matches by the implementation.
codeflash_output = _has_test_functions(
code_latest
) # 3.77μs -> 2.59μs (45.5% faster)
codeflash_output = _has_test_functions(code_split) # 1.16μs -> 793ns (46.7% faster)
def test_comment_between_paren_blocks_prevents_match():
# If there is a non-whitespace token (like a block comment) between '(' and the starting quote,
# the current regex will not match because it expects only whitespace between '(' and the quote.
code = "test(/* important note */ 'name in comment')"
codeflash_output = _has_test_functions(code) # 3.31μs -> 2.30μs (43.5% faster)
def test_multiple_test_and_it_occurrences():
# A file with multiple matches should still return True (boolean).
code = """
describe('suite', () => {
it('first case', () => {});
// some other code
test("second case", () => {});
});
"""
codeflash_output = _has_test_functions(code) # 3.34μs -> 2.16μs (54.9% faster)
def test_large_scale_no_match_performance():
# Large-scale: many lines without any test/it(...) occurrences should return False.
# Keep size under 1000 to respect constraints. We use 900 repeated lines.
repeated = "const filler = 0;\n" * 900 # 900 lines of filler
codeflash_output = _has_test_functions(repeated) # 76.6μs -> 75.5μs (1.47% faster)
def test_large_scale_match_near_end():
# Large-scale: many lines of filler followed by a single test at the end should return True.
# This ensures the search scans through large input and finds a late occurrence.
repeated = "const filler = 0;\n" * 900 # 900 lines of filler
code = repeated + " // real test follows\n test('final case', () => {});"
codeflash_output = _has_test_functions(code) # 77.5μs -> 76.3μs (1.59% faster)
def test_it_with_newline_between_name_and_paren():
# Verify that a newline immediately after 'it' and before '(' is allowed by the regex (\s* covers newline).
code = "it\n('newline-allowed')"
codeflash_output = _has_test_functions(code) # 3.22μs -> 1.94μs (66.0% faster)
def test_quoted_string_with_escaped_quotes_still_matches():
# Even if the string contains escaped quotes, the regex only checks the opening quote, so it should match.
code = r'test("contains an escaped quote: \" here", () => {});'
codeflash_output = _has_test_functions(code) # 2.90μs -> 1.83μs (58.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
```
```python
import re
import pytest
from testgen.testgen_javascript import _has_test_functions
def test_simple_test_function_with_single_quotes():
"""Test detection of test() function with single quotes."""
code = "test('should work', () => {})"
codeflash_output = _has_test_functions(code) # 3.78μs -> 2.29μs (65.1% faster)
def test_simple_it_function_with_single_quotes():
"""Test detection of it() function with single quotes."""
code = "it('should work', () => {})"
codeflash_output = _has_test_functions(code) # 3.10μs -> 1.99μs (56.0% faster)
def test_simple_test_function_with_double_quotes():
"""Test detection of test() function with double quotes."""
code = 'test("should work", () => {})'
codeflash_output = _has_test_functions(code) # 3.07μs -> 1.89μs (62.8% faster)
def test_simple_it_function_with_double_quotes():
"""Test detection of it() function with double quotes."""
code = 'it("should work", () => {})'
codeflash_output = _has_test_functions(code) # 3.13μs -> 1.86μs (68.2% faster)
def test_no_test_functions():
"""Test code without any test functions returns False."""
code = "function myFunction() { return 42; }"
codeflash_output = _has_test_functions(code) # 3.08μs -> 1.85μs (66.5% faster)
def test_test_function_with_single_whitespace():
"""Test detection with single space between function name and parenthesis."""
code = "test ('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.20μs -> 1.96μs (63.0% faster)
def test_it_function_with_single_whitespace():
"""Test detection with single space between function name and parenthesis."""
code = "it ('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.98μs -> 1.88μs (58.5% faster)
def test_test_function_with_multiple_whitespaces():
"""Test detection with multiple spaces between function name and parenthesis."""
code = "test ('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.04μs -> 1.95μs (55.5% faster)
def test_it_function_with_multiple_whitespaces():
"""Test detection with multiple spaces between function name and parenthesis."""
code = "it ('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.06μs -> 1.89μs (62.1% faster)
def test_test_function_with_tab_character():
"""Test detection with tab character between function name and parenthesis."""
code = "test\t('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.05μs -> 1.93μs (57.8% faster)
def test_it_function_with_tab_character():
"""Test detection with tab character between function name and parenthesis."""
code = "it\t('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.99μs -> 1.85μs (61.6% faster)
def test_test_function_with_newline():
"""Test detection with newline between function name and parenthesis."""
code = "test\n('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.02μs -> 1.93μs (56.5% faster)
def test_it_function_with_newline():
"""Test detection with newline between function name and parenthesis."""
code = "it\n('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.95μs -> 1.90μs (55.4% faster)
def test_multiple_test_functions():
"""Test detection with multiple test functions in the same code."""
code = """
test('first test', () => {});
it('second test', () => {});
"""
codeflash_output = _has_test_functions(code) # 3.12μs -> 1.92μs (62.3% faster)
def test_test_function_in_multiline_code():
"""Test detection of test function within multiline code."""
code = """
const helper = () => {};
test('actual test', () => {});
const another = () => {};
"""
codeflash_output = _has_test_functions(code) # 3.19μs -> 2.12μs (50.1% faster)
def test_it_function_in_multiline_code():
"""Test detection of it function within multiline code."""
code = """
const helper = () => {};
it('actual test', () => {});
const another = () => {};
"""
codeflash_output = _has_test_functions(code) # 3.46μs -> 2.23μs (55.1% faster)
def test_test_word_in_comment_not_matched():
"""Test that test() in comments is still detected by regex (no comment parsing)."""
code = "// test('in comment', () => {})"
# Note: The function uses regex without comment awareness, so it will match
codeflash_output = _has_test_functions(code) # 3.10μs -> 1.94μs (60.2% faster)
def test_test_word_in_string_variable():
"""Test that test word in string variable doesn't match pattern."""
code = 'const description = "this is a test of something";'
codeflash_output = _has_test_functions(code) # 3.56μs -> 2.29μs (55.3% faster)
def test_test_as_variable_name_not_matched():
"""Test that 'test' as variable name doesn't match without parenthesis."""
code = "const test = 5;"
codeflash_output = _has_test_functions(code) # 3.11μs -> 2.03μs (53.1% faster)
def test_testing_as_word_not_matched():
"""Test that 'testing' word doesn't match."""
code = "const testing = 'some value';"
codeflash_output = _has_test_functions(code) # 3.17μs -> 2.04μs (55.4% faster)
def test_it_as_pronoun_not_matched():
"""Test that 'it' as pronoun doesn't match without proper pattern."""
code = "// it is a good day"
codeflash_output = _has_test_functions(code) # 3.15μs -> 1.98μs (59.1% faster)
def test_it_as_variable_not_matched():
"""Test that 'it' as variable name doesn't match without parenthesis."""
code = "const it = 5;"
codeflash_output = _has_test_functions(code) # 3.13μs -> 1.92μs (63.7% faster)
def test_empty_string():
"""Test with empty string input."""
codeflash_output = _has_test_functions("") # 1.74μs -> 757ns (129% faster)
def test_only_whitespace():
"""Test with only whitespace."""
codeflash_output = _has_test_functions(" \n\t ") # 2.05μs -> 890ns (131% faster)
def test_test_function_with_special_test_name():
"""Test detection with special characters in test name."""
code = "test('test-name_123!@#', () => {})"
codeflash_output = _has_test_functions(code) # 2.95μs -> 1.90μs (55.1% faster)
def test_it_function_with_special_test_name():
"""Test detection with special characters in test name."""
code = "it('it-name_123!@#', () => {})"
codeflash_output = _has_test_functions(code) # 3.04μs -> 1.88μs (62.0% faster)
def test_test_function_with_empty_string_name():
"""Test detection with empty string as test name."""
code = "test('', () => {})"
codeflash_output = _has_test_functions(code) # 2.90μs -> 1.84μs (58.0% faster)
def test_it_function_with_empty_string_name():
"""Test detection with empty string as test name."""
code = "it('', () => {})"
codeflash_output = _has_test_functions(code) # 2.97μs -> 1.86μs (59.7% faster)
def test_test_with_carriage_return():
"""Test detection with carriage return character."""
code = "test\r('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.02μs -> 1.89μs (60.2% faster)
def test_it_with_carriage_return():
"""Test detection with carriage return character."""
code = "it\r('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.12μs -> 1.91μs (63.8% faster)
def test_test_with_form_feed():
"""Test detection with form feed character."""
code = "test\f('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.98μs -> 1.85μs (60.7% faster)
def test_it_with_form_feed():
"""Test detection with form feed character."""
code = "it\f('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.13μs -> 1.91μs (64.4% faster)
def test_test_with_vertical_tab():
"""Test detection with vertical tab character."""
code = "test\v('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.08μs -> 1.87μs (65.0% faster)
def test_it_with_vertical_tab():
"""Test detection with vertical tab character."""
code = "it\v('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.13μs -> 1.83μs (71.4% faster)
def test_test_with_non_breaking_space():
"""Test that non-breaking space might not work depending on whitespace regex."""
code = "test\u00a0('my test', () => {})"
# Non-breaking space might not be treated as \s in regex
codeflash_output = _has_test_functions(code)
result = codeflash_output # 3.25μs -> 2.07μs (56.6% faster)
def test_test_with_zero_width_space():
"""Test with zero-width space."""
code = "test\u200b('my test', () => {})"
codeflash_output = _has_test_functions(code)
result = codeflash_output # 3.82μs -> 2.74μs (39.4% faster)
def test_only_test_keyword():
"""Test with only the word 'test' without parenthesis."""
code = "test"
codeflash_output = _has_test_functions(code) # 2.85μs -> 1.83μs (55.3% faster)
def test_only_it_keyword():
"""Test with only the word 'it' without parenthesis."""
code = "it"
codeflash_output = _has_test_functions(code) # 1.89μs -> 793ns (138% faster)
def test_test_with_parenthesis_but_no_quote():
"""Test function call without string argument."""
code = "test(variable)"
codeflash_output = _has_test_functions(code) # 3.39μs -> 2.12μs (59.8% faster)
def test_it_with_parenthesis_but_no_quote():
"""Test it function call without string argument."""
code = "it(variable)"
codeflash_output = _has_test_functions(code) # 3.24μs -> 2.07μs (56.5% faster)
def test_test_followed_by_string_literal_without_parenthesis():
"""Test with string literal but missing parenthesis."""
code = "test 'string'"
codeflash_output = _has_test_functions(code) # 3.17μs -> 1.93μs (64.5% faster)
def test_it_followed_by_string_literal_without_parenthesis():
"""Test with string literal but missing parenthesis."""
code = "it 'string'"
codeflash_output = _has_test_functions(code) # 3.24μs -> 2.02μs (59.9% faster)
def test_test_with_backtick_quotes():
"""Test with backtick quotes (template literals)."""
code = "test(`my test`, () => {})"
codeflash_output = _has_test_functions(code) # 3.49μs -> 2.34μs (49.5% faster)
def test_it_with_backtick_quotes():
"""Test it with backtick quotes (template literals)."""
code = "it(`my test`, () => {})"
codeflash_output = _has_test_functions(code) # 3.38μs -> 2.23μs (51.4% faster)
def test_describe_function_not_matched():
"""Test that describe() function is not matched."""
code = "describe('suite', () => {})"
codeflash_output = _has_test_functions(code) # 3.13μs -> 1.94μs (61.2% faster)
def test_beforeEach_function_not_matched():
"""Test that beforeEach() function is not matched."""
code = "beforeEach(() => {})"
codeflash_output = _has_test_functions(code) # 1.98μs -> 901ns (119% faster)
def test_afterEach_function_not_matched():
"""Test that afterEach() function is not matched."""
code = "afterEach(() => {})"
codeflash_output = _has_test_functions(code) # 2.68μs -> 1.50μs (78.6% faster)
def test_test_method_on_object():
"""Test with test as method call on object."""
code = "obj.test('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.12μs -> 1.94μs (61.4% faster)
def test_it_method_on_object():
"""Test with it as method call on object."""
code = "obj.it('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.10μs -> 2.00μs (55.0% faster)
def test_test_substring_in_longer_identifier():
"""Test when test is part of longer identifier."""
code = "mytest('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.00μs -> 1.90μs (58.0% faster)
def test_it_substring_in_longer_identifier():
"""Test when it is part of longer identifier."""
code = "unit('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.07μs -> 1.94μs (58.2% faster)
def test_test_with_unicode_test_name():
"""Test detection with unicode characters in test name."""
code = "test('\u4e2d\u6587\u6d4b\u8bd5', () => {})"
codeflash_output = _has_test_functions(code) # 3.57μs -> 2.31μs (54.4% faster)
def test_it_with_unicode_test_name():
"""Test detection with unicode characters in test name."""
code = "it('\u4e2d\u6587\u6d4b\u8bd5', () => {})"
codeflash_output = _has_test_functions(code) # 3.09μs -> 2.08μs (48.8% faster)
def test_test_with_emoji():
"""Test detection with emoji in test name."""
code = "test('\u263a emoji test', () => {})"
codeflash_output = _has_test_functions(code) # 3.14μs -> 2.02μs (55.2% faster)
def test_it_with_emoji():
"""Test detection with emoji in test name."""
code = "it('\u263a emoji test', () => {})"
codeflash_output = _has_test_functions(code) # 3.09μs -> 2.00μs (54.0% faster)
def test_very_long_test_name():
"""Test detection with very long test name."""
long_name = "a" * 5000
code = f"test('{long_name}', () => {{}})"
codeflash_output = _has_test_functions(code) # 3.00μs -> 1.77μs (69.5% faster)
def test_very_long_code_without_tests():
"""Test with very long code but no test functions."""
code = "const x = 1;\n" * 500
codeflash_output = _has_test_functions(code) # 29.7μs -> 28.5μs (4.29% faster)
def test_test_with_escaped_quote():
"""Test with escaped quote in test name."""
code = "test('test\\'s name', () => {})"
codeflash_output = _has_test_functions(code) # 3.15μs -> 2.03μs (55.2% faster)
def test_it_with_escaped_quote():
"""Test with escaped quote in test name."""
code = "it('it\\'s name', () => {})"
codeflash_output = _has_test_functions(code) # 2.97μs -> 1.87μs (59.0% faster)
def test_test_with_double_quote_in_single_quote():
"""Test with double quote inside single quoted test name."""
code = "test('has \"double\" quotes', () => {})"
codeflash_output = _has_test_functions(code) # 2.93μs -> 1.82μs (60.7% faster)
def test_it_with_double_quote_in_single_quote():
"""Test with double quote inside single quoted test name."""
code = "it('has \"double\" quotes', () => {})"
codeflash_output = _has_test_functions(code) # 2.97μs -> 1.83μs (62.6% faster)
def test_test_with_single_quote_in_double_quote():
"""Test with single quote inside double quoted test name."""
code = "test(\"has 'single' quotes\", () => {})"
codeflash_output = _has_test_functions(code) # 3.00μs -> 1.90μs (57.8% faster)
def test_it_with_single_quote_in_double_quote():
"""Test with single quote inside double quoted test name."""
code = "it(\"has 'single' quotes\", () => {})"
codeflash_output = _has_test_functions(code) # 2.97μs -> 1.78μs (66.7% faster)
def test_test_case_sensitive():
"""Test that TEST (uppercase) is not matched."""
code = "TEST('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.03μs -> 1.81μs (67.1% faster)
def test_it_case_sensitive():
"""Test that IT (uppercase) is not matched."""
code = "IT('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.81μs -> 1.84μs (52.8% faster)
def test_test_with_mixed_case():
"""Test that TeSt (mixed case) is not matched."""
code = "TeSt('my test', () => {})"
codeflash_output = _has_test_functions(code) # 3.05μs -> 1.85μs (65.2% faster)
def test_it_with_mixed_case():
"""Test that It (mixed case) is not matched."""
code = "It('my test', () => {})"
codeflash_output = _has_test_functions(code) # 2.94μs -> 1.91μs (54.3% faster)
def test_code_with_many_non_test_functions():
"""Test performance with many non-test functions."""
# Build code with 500 non-test function definitions
code_lines = [f"function func{i}() {{ return {i}; }}" for i in range(500)]
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 77.0μs -> 75.9μs (1.52% faster)
def test_code_with_many_functions_and_one_test():
"""Test detection of single test among many non-test functions."""
# Build code with 500 non-test functions and 1 test function
code_lines = [f"function func{i}() {{ return {i}; }}" for i in range(250)]
code_lines.append("test('the actual test', () => {})")
code_lines.extend(
[f"function func{i}() {{ return {i}; }}" for i in range(250, 500)]
)
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 40.3μs -> 38.9μs (3.41% faster)
def test_code_with_many_test_functions():
"""Test detection with many test functions."""
# Build code with 100 test functions
code_lines = [f"test('test {i}', () => {{}})" for i in range(100)]
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 2.94μs -> 1.74μs (69.4% faster)
def test_code_with_many_it_functions():
"""Test detection with many it functions."""
# Build code with 100 it functions
code_lines = [f"it('test {i}', () => {{}})" for i in range(100)]
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 2.91μs -> 1.61μs (80.4% faster)
def test_code_with_alternating_test_and_it_functions():
"""Test detection with alternating test and it functions."""
# Build code with 100 alternating test and it functions
code_lines = []
for i in range(50):
code_lines.append(f"test('test {i}', () => {{}})")
code_lines.append(f"it('it {i}', () => {{}})")
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 2.93μs -> 1.64μs (78.7% faster)
def test_code_with_many_non_matching_similar_patterns():
"""Test performance with many similar but non-matching patterns."""
# Build code with 500 similar patterns that don't match
code_lines = [f"test{i}('name', () => {{}})" for i in range(500)]
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 64.3μs -> 62.9μs (2.27% faster)
def test_large_code_with_test_at_end():
"""Test detection when test function is at end of large code."""
# Build code with 500 lines and test at the end
code_lines = [f"const var{i} = {i};" for i in range(500)]
code_lines.append("test('test at end', () => {})")
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 41.4μs -> 40.3μs (2.65% faster)
def test_large_code_with_it_at_end():
"""Test detection when it function is at end of large code."""
# Build code with 500 lines and it at the end
code_lines = [f"const var{i} = {i};" for i in range(500)]
code_lines.append("it('it at end', () => {})")
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 41.5μs -> 40.3μs (3.00% faster)
def test_large_code_with_test_at_beginning():
"""Test detection when test function is at beginning of large code."""
# Build code with test at beginning and 500 lines after
code_lines = ["test('test at beginning', () => {})"]
code_lines.extend([f"const var{i} = {i};" for i in range(500)])
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 3.09μs -> 1.90μs (62.2% faster)
def test_large_code_with_it_at_beginning():
"""Test detection when it function is at beginning of large code."""
# Build code with it at beginning and 500 lines after
code_lines = ["it('it at beginning', () => {})"]
code_lines.extend([f"const var{i} = {i};" for i in range(500)])
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 3.02μs -> 1.81μs (67.1% faster)
def test_code_with_multiple_tests_scattered():
"""Test detection with multiple test functions scattered throughout large code."""
# Build code with 20 test functions scattered among 480 non-test lines
code_lines = []
for i in range(500):
if i % 25 == 0:
code_lines.append(f"test('scattered test {i}', () => {{}})")
else:
code_lines.append(f"const var{i} = {i};")
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 2.86μs -> 1.79μs (59.8% faster)
def test_code_with_very_large_test_name():
"""Test performance with very long test name."""
# Create a test with name of 10000 characters
long_name = "x" * 10000
code = f"test('{long_name}', () => {{}})"
codeflash_output = _has_test_functions(code) # 2.99μs -> 1.84μs (62.1% faster)
def test_code_with_deeply_nested_structures():
"""Test detection in deeply nested code structures."""
# Build nested structure with test at bottom
code = "const nested = { level1: { level2: { level3: { level4: { " * 50
code += "test('nested test', () => {})"
code += " } } } } };" * 50
codeflash_output = _has_test_functions(code) # 14.6μs -> 13.5μs (8.56% faster)
def test_code_with_many_whitespace_variations():
"""Test detection with many different whitespace patterns."""
code_lines = []
for i in range(100):
if i % 4 == 0:
code_lines.append(f"test('test {i}', () => {{}})")
elif i % 4 == 1:
code_lines.append(f"test ('test {i}', () => {{}})")
elif i % 4 == 2:
code_lines.append(f"test ('test {i}', () => {{}})")
else:
code_lines.append(f"test\t('test {i}', () => {{}})")
code = "\n".join(code_lines)
codeflash_output = _has_test_functions(code) # 2.92μs -> 1.71μs (70.7% faster)
def test_code_return_type_is_boolean():
"""Test that return value is always boolean regardless of input size."""
# Various test inputs
test_inputs = [
"",
"test",
"test('name', () => {})",
"const x = 1;" * 100,
"test('name', () => {})" + "const x = 1;" * 100,
]
for test_input in test_inputs:
codeflash_output = _has_test_functions(test_input)
result = codeflash_output # 11.9μs -> 9.34μs (27.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
```
</details>
To edit these changes `git checkout
codeflash/optimize-pr2247-2026-01-25T08.57.25` and push.
[](https://codeflash.ai)

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com>
## Summary
- Fix CST tree corruption issues that caused 'NoneType' object has no
attribute 'visit' errors
- Consolidate testgen postprocessing into a single pipeline with
tuple-based pattern
- Improve markdown code extraction to prefer filepath-annotated blocks
- Add diagnostic context to optimization failure logs
## Changes
- Handle empty `SimpleStatementLine` and `StatementHandler` body to
prevent malformed CST
- Add trace_id logging to optimization and import failure paths
- Refactor testgen postprocessing into consolidated pipeline
- Fix code extraction for LLM responses with multiple code blocks
## Test plan
- [x] Added integration tests for full testgen pipeline
- [x] Added tests for markdown extraction with filepath preference
- [x] Existing tests pass
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
- Add language parameter to split_markdown_code and group_code for JS/TS support
- Fix callable type annotation in instrument_javascript.py
- Update testgen_javascript.py to use ChatCompletionMessageParam types
- Add None checks before parse_python_version calls
- Add missing None assertions in test files
- Apply ruff auto-fixes for formatting and unused imports
- Accept consolidated markdown utilities from common module
- Use wrap_code_in_markdown with language parameter for language support
- Remove duplicate split_markdown_code implementation
- Add validation for python_version before parsing
## Summary
- Removes `profanity_regex` and `profanity_words` from
`postprocess_constants.py`
- Removes `remove_profanity_from_explanation` from the optimization
pipeline
- Removes associated test
## Summary
- Add forward reference detection and automatic fix with `from
__future__ import annotations`
- Handle aliased imports and chained calls in test instrumentation
- Fix import resolution from correct module in multi-context testgen
- Allow ellipsis in Protocol/abstract method bodies
- Add dataclass constructor notes for LLM about required/positional
arguments
- Add logging to silent exception handlers
## Test plan
- [x] Unit tests added for forward reference detection
- [x] Unit tests added for dataclass constructor notes
- [x] Unit tests added for ellipsis handling in AST
- [x] Unit tests added for chained call instrumentation
- [x] Unit tests extended for add_missing_imports
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Remove unicode quote sanitization from test code validation
- Rely on individual test validation to filter out tests with syntax
errors (including unicode characters)
## Test plan
- [x] Existing tests pass
- [x] Tests with unicode quote syntax errors are correctly filtered out
during individual validation
Make get_referenced_names_from_source scope-aware by reusing
UndefinedNameCollector, preventing invalid imports like `i` and `v`
from loop variables in AI-generated tests.
## Summary
- Add `normalize_code` helper in `tests/conftest.py` for comparing code
while ignoring quote style differences
- Update test assertions to use `normalize_code()` wrapper
- Add unit tests for comprehension instrumentation cases
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Skip instrumentation for target function calls inside list/set/dict
comprehensions and generator expressions
- Fixes NameError when AI-generated tests use comprehensions like
`[func(x) for x in items]`
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
## Summary
- Preserve nested functions in test instrumentation (skip instrumenting
calls inside nested function definitions)
- Prevent LLM from embedding markdown code fences in generated tests
(updated system prompts)
- Handle optional filename in python code blocks (e.g.,
`\`\`\`python:filename.py`)
## Test plan
- [x] Unit tests added for nested function preservation
- [x] Unit test for optional filename parsing
- [ ] Manual testing with generated tests
## Summary
- Include private symbols when resolving undefined names in generated
tests
- Resolve symbols imported by source module in generated tests
- Add fallback for symbols referenced but not defined in source snippet
- Ensure LocalDefinitionRemover only removes top-level definitions
(preserves nested classes/functions)
- Performance optimization for `get_symbols_from_source_code` function
## Test plan
- [x] Unit tests added for all new functionality
- [ ] Manual testing with generated tests
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
# Pull Request Checklist
## Description
- [ ] **Description of PR**: Clear and concise description of what this
PR accomplishes
- [ ] **Breaking Changes**: Document any breaking changes (if
applicable)
- [ ] **Related Issues**: Link to any related issues or tickets
## Testing
- [ ] **Test cases Attached**: All relevant test cases have been
added/updated
- [ ] **Manual Testing**: Manual testing completed for the changes
## Monitoring & Debugging
- [ ] **Logging in place**: Appropriate logging has been added for
debugging user issues
- [ ] **Sentry will be able to catch errors**: Error handling ensures
Sentry can capture and report errors
- [ ] **Avoid Dev based/Prisma logging**: No development-only or
Prisma-specific logging in production code
## Configuration
- [ ] **Env variables newly added**: Any new environment variables are
documented in .env.example file or mentioned in description
---
## Additional Notes
<!-- Add any additional context, screenshots, or notes for reviewers
here -->
gpt-5-mini is misinterpreting the template format in execute_user_prompt.md. When it sees:
# function to test
{function_code}
It's outputting a nested code block with markdown path syntax (python:unstructured/staging/label_box.py) instead of the actual function code inline. It never reaches the # unit tests section where it should generate def test_* functions.
Fixes CF-690
Pull Request Checklist
Description
- Description of PR:
Privacy Mode Feature for Pro Users
This PR introduces a Privacy Mode feature that allows paid users
(Pro/Enterprise) to control how their code is stored during the
optimization review process.
Key Changes:
1. Privacy Mode Toggle in Sidebar
- Added a new toggle in the dashboard sidebar for Privacy Mode
- Only available for Pro/Enterprise users (disabled with upgrade prompt
for free users)
- Persists user preference in database with localStorage fallback for
fast UI
2. Storage Strategy Based on Privacy Setting
- Privacy Mode ON: Code is stored exclusively in GitHub branches (never
cached in database)
- Privacy Mode OFF: Code is temporarily cached in database for faster
loading, cleaned up when PR is created
3. Database Changes
- Added privacy_mode boolean field to users table (default: false)
- Added staging_storage_type field to track storage method
4. API Updates
- cf-api now checks user's privacy mode when determining storage
strategy
- StagingStorageStrategyFactory considers privacy mode alongside tier
eligibility
- Added getUserPrivacyMode, setUserPrivacyMode, and isUserPaid functions
User Experience
- Free users see the toggle disabled with "Upgrade to Pro" messaging
- Tooltip explains the trade-off: privacy vs. loading speed
- Toggle state syncs between localStorage (for instant UI) and database
(for persistence)
Storage Flow
User Request → Check Privacy Mode
├─ Privacy ON + Paid + Valid Repo → Git Branch Storage (GitHub only)
└─ Privacy OFF or Free → Plain Text Storage (Database cache)
---------
Co-authored-by: ali <mohammed18200118@gmail.com>
Co-authored-by: Mohamed Ashraf <ashraf@codeflash.ai>
- Do not use Mock objects for domain classes (pickling issues)
- Use correct constructor signatures from context
- Use concrete subclasses when base class is abstract
The optimization eliminates a significant performance bottleneck by moving the computation of `set(dir(builtins))` from runtime to module load time.
**What changed:**
- Introduced a module-level constant `_BUILTIN_NAMES = frozenset(dir(builtins))`
- Replaced the repeated `builtin_names = set(dir(builtins))` call with direct reference to `_BUILTIN_NAMES`
- Used `frozenset` instead of `set` for the constant (immutable and slightly more efficient for lookups)
**Why this is faster:**
The original code calls `dir(builtins)` and creates a new set from it on every invocation of `get_undefined_names()`. The line profiler shows this single line consumed 73.3% of the function's execution time (1.68ms out of 2.29ms total).
Python's `dir(builtins)` returns a list of ~150 builtin names, and converting this to a set requires:
1. Calling `dir()` to introspect the builtins module
2. Iterating through the list
3. Allocating a new set and hashing each name
Since builtin names are constant for a Python process, this work is entirely redundant. By computing it once at module import time, each call to `get_undefined_names()` saves ~21.8μs per invocation (from the line profiler: 21799.7ns per hit).
**Performance impact:**
- **184% speedup** overall (1.09ms → 383μs)
- The optimized version shows 100% of time now spent on the actual set operations (name filtering), not builtin lookup
- Test results show 1350-2450% improvement for simple cases with few names, and 22-126% improvement even for large-scale tests with 1000+ names
- The optimization is particularly effective when `get_undefined_names()` is called repeatedly (as shown by the 77 hits in the profiler and multiple test invocations)
**Test case effectiveness:**
- Most effective for tests with repeated calls to `get_undefined_names()` on the same or different collector instances
- Benefits all test patterns since the builtin check happens on every call regardless of the number of user-defined names
- Even large-scale tests with 1000+ names show 22-35% improvements because the builtin lookup overhead is eliminated
Instead of just adding missing imports, now also:
1. Detects classes/functions defined locally that exist in source module
2. Removes those local redefinitions
3. Adds imports for them
This ensures tests use the real classes from the source module.
When the LLM generates test code, it sometimes redefines classes locally
(like Element, ChunkingOptions) but forgets to import or define others
(like PreChunk), causing NameError at runtime.
This adds a postprocessing step that:
1. Detects undefined names in the generated test code
2. Checks if those names are defined in the source module
3. Adds the missing imports automatically
This is more reliable than relying on prompt instructions which the LLM
sometimes ignores.
Replace logger.error() and debug_log_sensitive_data() with
logging.exception() in exception handlers to ensure full stack traces
are logged to console/logs, not just to Sentry.
Affected endpoints:
- /optimize: Added exception logging
- /rank: Added logging import and exception logging
- /explanations: Added logging import and exception logging
- /workflow-gen: Changed logger.error to logger.exception
Remove the print_messages function and its usage since observability
tooling is now in place. This eliminates verbose system prompt output
during test generation in non-production environments.
Generated tests were failing isinstance() checks because LLM created
mock class definitions instead of importing real ones.
Added prompt instruction to import classes from their actual modules
when file paths are shown in the context.