codeflash-internal/experiments/rl_env/validation_report_ship.md
2026-04-16 16:31:25 -07:00

13 KiB

Codeflash RL Environment — Batch Validation Report

Summary

Metric Count %
Total tasks 106 100%
Solve passes 0 0%
Eval correct (all behavioral tests pass) 106 100%
Faster than original (speedup > 1.0) 105 99%
All test cases pass 106 100%

Speedup Distribution (correct tasks only)

  • Slower (< 1x): 1 tasks
  • 1-1.5x: 52 tasks
  • 1.5-2x: 16 tasks
  • 2-5x: 14 tasks
  • 5-100x: 17 tasks
  • >100x: 6 tasks

Successful Tasks (correct=1.0)

Task Function Speedup Tests Coverage Quality DB Speedup
models-prepare_multi_label_classification_response prepare_multi_label_classification_response 62109.3293x 32/32 7.7% low 68465.46x
introspection-prepare_operators_descriptions prepare_operators_descriptions 15264.3773x 1057/1057 35.0% 14150.75x
decorators-withfixedsizecache-memory_pressure_detected memory_pressure_detected 1789.6793x 132/132 37.8% 917.09x
depth_anything_v3-inferencemodelsdepthanythingv3adapter-predict predict 388.9574x 36/36 23.2% 391.52x
detection_event_log-detectioneventlogblockv1-_evict_oldest_video _evict_oldest_video 337.1908x 170/170 46.4% low 15.92x
camera-_generate_grid_colors _generate_grid_colors 283.7794x 1901/1901 9.0% 218.30x
workflow_caller-_check_workflow_for_circular_references _check_workflow_for_circular_references 34.9018x 41/41 31.1% low 11.84x
semantic_segmentation-blockmanifest-describe_outputs describe_outputs 22.3384x 2539/2539 38.9% high 21.92x
dynamic_blocks-build_traceback_string build_traceback_string 20.1609x 2047/2047 16.0% low 13.38x
bytetrack-bytetrackmanifest-describe_outputs describe_outputs 18.8987x 3033/3033 90.2% low 17.56x
workflow_caller-_describe_outputs_from_spec _describe_outputs_from_spec 16.9712x 25/25 23.1% low 12.94x
event_writer-_extract_detail _extract_detail 15.8018x 43/43 31.9% 13.46x
managers-try_releasing_cuda_memory try_releasing_cuda_memory 15.0190x 1006/1006 10.8% 1.22x
s3-deduct_csv_header deduct_csv_header 13.9646x 54/54 38.6% 8.90x
cache-_slugify_model_id _slugify_model_id 13.1907x 1050/1050 100.0% 11.21x
sort-sortmanifest-describe_outputs describe_outputs 12.9542x 8036/8036 89.7% low 17.16x
dynamic_blocks-create_dynamic_module create_dynamic_module 10.7336x 142/142 27.4% 12.31x
dataset_upload-roboflowdatasetuploadblockv2-run run 9.8378x 13/13 57.1% 9.14x
glm_ocr-blockmanifest-describe_outputs describe_outputs 8.2526x 1035/1035 51.9% medium 9.33x
qwen3_5vl-blockmanifest-describe_outputs describe_outputs 7.6162x 3228/3228 48.6% medium 7.30x
http-with_route_exceptions with_route_exceptions 6.3600x 1297/1297 8.1% low 6.89x
introspection-prepare_operations_descriptions prepare_operations_descriptions 6.3557x 147/147 82.5% high 6.26x
qwen3_5vl-qwen35vlblockv1-run_remotely run_remotely 5.0724x 23/23 69.4% 5.64x
core_steps-load_kinds load_kinds 4.7680x 1153/1153 42.0% 3.68x
depth_anything_v2-inferencemodelsdepthanythingv2adapter-predict predict 4.0435x 38/38 60.2% medium 5.03x
qwen3_5vl-inferencemodelsqwen35vladapter-predict predict 3.5654x 2275/2275 70.3% low 3.72x
core-_prepare_workflow_response_cache_key _prepare_workflow_response_cache_key 2.9861x 7539/7539 2.7% medium 2.39x
event_writer-_detections_to_v2_instance_segmentations _detections_to_v2_instance_segmentations 2.7234x 36/36 41.2% high 2.18x
managers-modelmanager-_dispose_model_lock _dispose_model_lock 2.7203x 2784/2784 14.7% 3.24x
compiler-establish_step_execution_dimensionality establish_step_execution_dimensionality 2.6967x 47/47 23.2% 2.37x
semantic_segmentation-roboflowsemanticsegmentationmodelblockv1-_convert_to_sv_de _convert_to_sv_detections 2.6344x 13/13 71.7% 2.22x
qwen3vl-inferencemodelsqwen3vladapter-map_inference_kwargs map_inference_kwargs 2.6323x 1125/1125 26.8% medium 2.39x
models-baseinference-infer infer 2.2840x 1037/1037 2.8% low 2.32x
clip_comparison-blockmanifest-get_required_cache_artifacts get_required_cache_artifacts 2.2470x 130/130 26.6% 2.04x
text_display-clamp_box clamp_box 2.1986x 1210/1210 15.0% high 2.80x
compiler-verify_compatibility_of_input_data_lineage_with_control_flow_lineage verify_compatibility_of_input_data_lineage_with_control_flow_lineage 2.0117x 39/39 26.4% low 2.11x
introspection-_get_property_name_options _get_property_name_options 2.0115x 1053/1053 57.5% 1.52x
execution_data_manager-executiondatamanager-_register_control_flow_output_for_no _register_control_flow_output_for_non_simd_step 1.9572x 32/32 20.2% high 2.65x
compiler-_collect_unique_control_flow_lineages_with_step_mapping _collect_unique_control_flow_lineages_with_step_mapping 1.9380x 33/33 24.3% 1.95x
mask_area_measurement-maskareameasurementblockv1-run run 1.9282x 39/39 93.0% 1.65x
entities-workflowimagedata-copy_and_replace copy_and_replace 1.9097x 2336/2336 72.1% medium 2.04x
compiler-separate_control_flow_predecessors_from_data_providers separate_control_flow_predecessors_from_data_providers 1.8826x 34/34 23.1% high 1.87x
core-_forcetracerootsampler-get_description get_description 1.8642x 3244/3244 1.6% medium 2.03x
enterprise_blocks-load_enterprise_blocks load_enterprise_blocks 1.8464x 1936/1936 32.2% low 1.45x
event_writer-_build_event_data _build_event_data 1.8219x 4732/4732 34.7% medium 1.74x
cache-get_cached_foundation_models get_cached_foundation_models 1.7699x 32/32 34.7% low 1.46x
compiler-step_definition_allows_control_flow_references step_definition_allows_control_flow_references 1.7359x 27/27 22.5% medium 1.86x
dataset_upload-maybe_register_datapoint_at_roboflow maybe_register_datapoint_at_roboflow 1.7163x 1039/1039 55.6% 1.47x
introspection-retrieve_selectors_from_union_definition retrieve_selectors_from_union_definition 1.6884x 36/36 22.2% medium 1.98x
introspection-_ref_to_def_name _ref_to_def_name 1.6153x 1344/1344 27.5% medium 1.51x
mask_area_measurement-compute_detection_areas compute_detection_areas 1.6137x 24/24 83.0% 1.46x
managers-list_files list_files 1.5955x 99/99 8.9% 1.66x
dynamic_blocks-assembly_custom_python_block assembly_custom_python_block 1.5618x 135/135 36.7% low 1.61x
compiler-is_control_flow_step is_control_flow_step 1.4868x 1830/1830 15.3% medium 1.34x
qwen3_5vl-inferencemodelsqwen35vladapter-map_inference_kwargs map_inference_kwargs 1.4798x 1549/1549 64.9% low 1.53x
core-_url_for_safe_logging _url_for_safe_logging 1.4670x 1055/1055 2.8% 1.47x
usage_tracking-usagecollector-_compute_execution_duration _compute_execution_duration 1.4621x 2017/2017 27.5% 1.55x
execution_data_manager-construct_mask_for_all_inputs_dimensionalities construct_mask_for_all_inputs_dimensionalities 1.4471x 31/31 19.0% 1.51x
execution_data_manager-construct_simd_step_input construct_simd_step_input 1.4210x 26/26 28.3% low 1.37x
qwen3_5vl-qwen35vlblockv1-run run 1.4114x 28/28 93.1% low 1.69x
common-add_inference_keypoints_to_sv_detections add_inference_keypoints_to_sv_detections 1.4070x 30/30 4.1% 1.56x
core-get_workflow_specification get_workflow_specification 1.4033x 1157/1157 3.6% low 1.56x
common-deserialize_image_kind deserialize_image_kind 1.3860x 1506/1506 7.4% 1.42x
cache-is_block_cached is_block_cached 1.3696x 53/53 27.9% high 1.36x
email_notification-format_email_message format_email_message 1.3680x 56/56 31.7% high 1.35x
managers-modelmanager-infer_from_request_sync infer_from_request_sync 1.3592x 3041/3041 13.7% low 1.46x
sequences-sequence_apply sequence_apply 1.3493x 58/58 30.2% high 1.48x
cache-get_task_type_to_block_mapping get_task_type_to_block_mapping 1.3274x 30/30 29.6% low 1.39x
execution_data_manager-filter_to_valid_prefix_chains filter_to_valid_prefix_chains 1.3229x 32/32 15.3% high 1.32x
entities-batch-remove_by_indices remove_by_indices 1.3209x 44/44 65.4% high 1.26x
dataset_upload-is_prediction_registration_forbidden is_prediction_registration_forbidden 1.3204x 2043/2043 31.7% 1.44x
webrtc_worker-videoframeprocessor-_check_termination _check_termination 1.3132x 2029/2029 16.1% 1.36x
cache-_is_model_cached _is_model_cached 1.3058x 45/45 27.0% 1.24x
core-load_cached_workflow_response load_cached_workflow_response 1.2964x 12126/12126 2.8% low 1.38x
workflow_caller-_extract_workflow_caller_ids_from_spec _extract_workflow_caller_ids_from_spec 1.2938x 44/44 25.8% medium 1.34x
workflow_caller-_fetch_workflow_spec_for_validation _fetch_workflow_spec_for_validation 1.2808x 1547/1547 23.1% low 1.33x
cache-is_model_cached is_model_cached 1.2649x 55/55 28.7% medium 1.22x
execution_data_manager-intersect_masks_per_dimension intersect_masks_per_dimension 1.2627x 40/40 13.5% 1.66x
dataset_upload-register_datapoint_at_roboflow register_datapoint_at_roboflow 1.2611x 2037/2037 38.6% medium 1.32x
webrtc_worker-videoframeprocessor-serialize_outputs_sync serialize_outputs_sync 1.2545x 48/48 17.9% 1.37x
anthropic_claude-blockmanifest-get_air_gapped_availability get_air_gapped_availability 1.2463x 2243/2243 16.4% low 1.45x
executor-_run_workflow _run_workflow 1.2057x 130/130 21.6% low 1.22x
http-_build_step_execution_error_response _build_step_execution_error_response 1.2001x 1029/1029 1.0% low 1.19x
detection_event_log-detectioneventlogblockv1-_get_relative_time _get_relative_time 1.1992x 41/41 43.0% 1.19x
dataset_upload-roboflowdatasetuploadblockv1-run run 1.1971x 41/41 38.6% medium 1.26x
managers-rank_for_deletion rank_for_deletion 1.1867x 106/106 7.3% 1.88x
models-inferencemodelsobjectdetectionadapter-postprocess postprocess 1.1644x 33/33 8.7% 1.23x
compiler-get_lineage_derived_from_control_flow get_lineage_derived_from_control_flow 1.1586x 33/33 23.8% low 1.25x
text_display-draw_background_with_alpha draw_background_with_alpha 1.1567x 176/176 29.5% 1.18x
core-record_inference record_inference 1.1509x 3033/3033 1.6% low 1.22x
execution_data_manager-get_masks_intersection_for_dimensions get_masks_intersection_for_dimensions 1.1479x 36/36 16.9% low 1.23x
email_notification-apply_operations_to_message_parameters apply_operations_to_message_parameters 1.1431x 44/44 29.5% low 1.15x
easy_ocr-blockmanifest-get_supported_model_variants get_supported_model_variants 1.1429x 2039/2039 57.5% medium 1.31x
mask_area_measurement-get_detection_area get_detection_area 1.1349x 129/129 83.7% medium 1.19x
dataset_upload-register_datapoint register_datapoint 1.1344x 1138/1138 42.5% low 1.16x
event_writer-_build_image_entry _build_image_entry 1.1270x 1337/1337 60.6% low 1.10x
moondream2-inferencemodelsmoondream2adapter-caption caption 1.1264x 185/185 45.1% medium 1.11x
yolo_world-blockmanifest-get_supported_model_variants get_supported_model_variants 1.1187x 2232/2232 50.0% medium 1.29x
trackers-instancecache-record_instance record_instance 1.1115x 14857/14857 17.3% 1.14x
workflow_caller-workflowcallerblockv1-run run 1.1059x 59/59 48.9% 1.13x
webrtc_worker-default_encoder default_encoder 1.0985x 4071/4071 17.0% 1.12x
cache-_get_block_type_identifier _get_block_type_identifier 1.0949x 34/34 26.5% low 1.11x
workflow_caller-_convert_output_descriptions_to_kinds _convert_output_descriptions_to_kinds 1.0865x 37/37 24.6% medium 1.19x
common-serialise_sv_detections serialise_sv_detections 1.0764x 149/149 5.1% 1.19x
openai-execute_gpt_4v_request execute_gpt_4v_request 1.0642x 37/37 25.8% high 2.00x
notification-blockmanifest-get_air_gapped_availability get_air_gapped_availability 0.7728x 1535/1535 43.0% low 1.14x