From ec3eed6b8ad4065b3f2eb8d83bb269611f420c73 Mon Sep 17 00:00:00 2001 From: aseembits93 Date: Mon, 26 Jan 2026 16:54:04 -0800 Subject: [PATCH] ready to review --- docs/codeflash-concepts/benchmarking-gpu-code.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/codeflash-concepts/benchmarking-gpu-code.mdx b/docs/codeflash-concepts/benchmarking-gpu-code.mdx index 5353af745..41d4f1d89 100644 --- a/docs/codeflash-concepts/benchmarking-gpu-code.mdx +++ b/docs/codeflash-concepts/benchmarking-gpu-code.mdx @@ -110,8 +110,8 @@ With synchronize: 152.277 ms # How Codeflash measures execution time involving GPUs -Codeflash automatically inserts synchronization barriers before measuring performance. It currently supports GPU code written in `Pytorch`, `Tensorflow` and `JAX` for NVIDIA GPUs (CUDA) and MacOS Metal Performance Shaders (MPS). +Codeflash automatically inserts synchronization barriers before measuring performance. It currently supports GPU code written in `Pytorch`, `Tensorflow` and `JAX` for NVIDIA GPUs (`CUDA`) and MacOS Metal Performance Shaders (`MPS`). -- **PyTorch**: Uses `torch.cuda.synchronize()` (CUDA) or `torch.mps.synchronize()` (MPS) depending on the device. -- **JAX**: Uses `jax.block_until_ready()` to wait for computation to complete. It works for both CUDA and MPS devices. -- **TensorFlow**: Uses `tf.test.experimental.sync_devices()` for device synchronization. It works for both CUDA and MPS devices. +- **PyTorch**: Uses `torch.cuda.synchronize()` (`CUDA`) or `torch.mps.synchronize()` (`MPS`) depending on the device. +- **JAX**: Uses `jax.block_until_ready()` to wait for computation to complete. It works for both `CUDA` and `MPS` devices. +- **TensorFlow**: Uses `tf.test.experimental.sync_devices()` for device synchronization. It works for both `CUDA` and `MPS` devices.