11 KiB
Codeflash RL Environment — Batch Validation Report
Summary
| Metric | Count | % |
|---|---|---|
| Total tasks | 13 | 100% |
| Solve passes | 0 | 0% |
| Eval correct (all behavioral tests pass) | 8 | 61% |
| Faster than original (speedup > 1.0) | 6 | 46% |
| All test cases pass | 11 | 84% |
Speedup Distribution (correct tasks only)
- 1-1.5x: 5 tasks
- 1.5-2x: 1 tasks
- 2-5x: 1 tasks
- >100x: 1 tasks
Successful Tasks (correct=1.0)
| Task | Function | Speedup | Tests | Coverage | Quality | DB Speedup |
|---|---|---|---|---|---|---|
| decorators-withfixedsizecache-memory_pressure_detected | memory_pressure_detected |
2023.8593x | 132/132 | 37.8% | 917.09x | |
| handlers-handle_describe_workflows_blocks_request | handle_describe_workflows_blocks_request |
2.4668x | 153/153 | N/A | low | 2.50x |
| enterprise_blocks-load_enterprise_blocks | load_enterprise_blocks |
1.8884x | 1936/1936 | 32.2% | medium | 1.45x |
| common-deserialize_image_kind | deserialize_image_kind |
1.4730x | 1506/1506 | 7.4% | medium | 1.42x |
| dataset_upload-execute_registration | execute_registration |
1.0228x | 1005/1005 | 38.1% | low | 1.17x |
| detection_event_log-detectioneventlogblockv1-run | run |
1.0194x | 4426/4426 | 97.3% | low | 3.40x |
| halo-halovisualizationblockv1-getannotator | getAnnotator |
1.0000x | 2/2 | 34.2% | low | 11.57x |
| managers-customcollector-_fetch_stream_metrics | _fetch_stream_metrics |
1.0000x | 41/41 | 7.2% | low | 1.19x |
Failed Tasks (5)
core_steps-_should_filter_block
-
Function:
_should_filter_block -
File:
inference/core/workflows/core_steps/loader.py -
Commit:
HEAD -
Method: db_code_only
-
DB Speedup: 4.93x
-
Solve OK: False
-
Duration: 36.6s
-
Reward: correct=0.0, speedup=0.0, tests=41/41
Key errors
_ ERROR collecting tests/codeflash_generated/test__should_filter_block__behaviorinstrumented_1.py _
ImportError while importing test module '/workspace/inference/tests/codeflash_generated/test__should_filter_block__behaviorinstrumented_1.py'.
E ImportError: cannot import name 'WORKFLOW_SELECTIVE_BLOCKS_DISABLE' from 'inference.core.env' (/workspace/inference/inference/core/env.py)
/usr/local/lib/python3.12/site-packages/pydantic/fields.py:1093: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'optional'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
ERROR tests/codeflash_generated/test__should_filter_block__behaviorinstrumented_0.py
ERROR tests/codeflash_generated/test__should_filter_block__behaviorinstrumented_1.py
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
1 warning, 2 errors in 0.32s
INFO: INCORRECT: 41/41 passed, 0 diffs
Reproduce: bash docker_e2e_test.sh core_steps-_should_filter_block --debug
execution_data_manager-prepare_parameters
-
Function:
prepare_parameters -
File:
inference/core/workflows/execution_engine/v1/executor/execution_data_manager/step_input_assembler.py -
Commit:
HEAD -
Method: db_code_only
-
DB Speedup: 1.12x
-
Solve OK: False
-
Duration: 31.8s
-
Reward: correct=0.0, speedup=0.0, tests=1/1
Key errors
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_empty_runtime_parameters[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_step_execution_dimensionality_zero[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_large_dimensionality[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_special_step_names[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_unicode_step_names[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_many_input_parameters[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_large_batch_size[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_deeply_nested_compound_inputs[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_many_masks[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_many_auto_batch_casting_configs[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_iteration_performance[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_complex_data_structures[ 1 ]
FAILED tests/codeflash_generated/test_prepare_parameters__behaviorinstrumented_1.py::test_prepare_parameters_with_mixed_parameter_types[ 1 ]
INFO: INCORRECT: 1/1 passed, 0 diffs
Reproduce: bash docker_e2e_test.sh execution_data_manager-prepare_parameters --debug
ocsort-ocsortblockv1-run
-
Function:
run -
File:
inference/core/workflows/core_steps/trackers/ocsort/v1.py -
Commit:
HEAD -
Method: db_code_only
-
DB Speedup: 1.60x
-
Solve OK: False
-
Duration: 28.2s
-
Reward: correct=0.0, speedup=0.0, tests=408/408
Key errors
@field_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
PydanticDeprecatedSince20: `allow_reuse` is deprecated and will be ignored; it should no longer be necessary. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
PydanticDeprecatedSince20: `min_items` is deprecated and will be removed, use `min_length` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
ERROR tests/codeflash_generated/test_run__behaviorinstrumented_0.py
ERROR tests/codeflash_generated/test_run__behaviorinstrumented_1.py
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
25 warnings, 2 errors in 0.97s
INFO: INCORRECT: 408/408 passed, 0 diffs
Reproduce: bash docker_e2e_test.sh ocsort-ocsortblockv1-run --debug
perception_encoder-inferencemodelsperceptionencoderadapter-preprocess
-
Function:
preprocess -
File:
inference/models/perception_encoder/perception_encoder_inference_models.py -
Commit:
7648e452a70ff1aad09f017a0eb2ea4022b7e177 -
Method: db_code_match
-
DB Speedup: 2.47x
-
Solve OK: False
-
Duration: 64.0s
-
Reward: correct=0.0, speedup=0.0, tests=2031/2235
Key errors
PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'optional'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_returns_tuple_with_correct_types[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_calls_preproc_image[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_metadata_is_empty_dict[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_preserves_image_dimensions[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_with_kwargs[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_multiple_calls_independence[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_with_1000_rapid_calls[ 1 ]
FAILED tests/codeflash_generated/test_preprocess__behaviorinstrumented_0.py::test_preprocess_with_varying_channel_counts[ 1 ]
INFO: INCORRECT: 2031/2235 passed, 204 diffs
Reproduce: bash docker_e2e_test.sh perception_encoder-inferencemodelsperceptionencoderadapter-preprocess --debug
s3-s3sinkblockv1-_upload_separate_file
-
Function:
_upload_separate_file -
File:
inference/core/workflows/core_steps/sinks/s3/v1.py -
Commit:
639c8e77ab90d6a43f32fe55a355373ae74e0924 -
Method: db_code_match
-
DB Speedup: 1.15x
-
Solve OK: False
-
Duration: 60.3s
-
Reward: correct=0.0, speedup=0.0, tests=1249/1252
Key errors
.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
/workspace/inference/inference/core/workflows/execution_engine/entities/types.py:1267: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
/workspace/inference/inference/core/workflows/execution_engine/entities/types.py:1280: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
/workspace/inference/inference/core/workflows/execution_engine/entities/types.py:1296: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
/workspace/inference/inference/core/workflows/execution_engine/entities/types.py:1311: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
INFO: INCORRECT: 1249/1252 passed, 3 diffs
INFO: [stdout] WARNING Non-retryable S3 error (NoSuchBucket): An error occurred (NoSuchBucket) vs WARNING Could not upload to S3: An error occurred (NoSuchBucket) when calling
INFO: [stdout] WARNING S3 connection error on attempt 1/4: An unspecified error occurred vs WARNING Could not upload to S3: An unspecified error occurred
Reproduce: bash docker_e2e_test.sh s3-s3sinkblockv1-_upload_separate_file --debug