mirror of
https://github.com/codeflash-ai/codeflash.git
synced 2026-05-04 18:25:17 +00:00
ready to review
This commit is contained in:
parent
754eb6cc5e
commit
ec3eed6b8a
1 changed files with 4 additions and 4 deletions
|
|
@ -110,8 +110,8 @@ With synchronize: 152.277 ms
|
|||
|
||||
# How Codeflash measures execution time involving GPUs
|
||||
|
||||
Codeflash automatically inserts synchronization barriers before measuring performance. It currently supports GPU code written in `Pytorch`, `Tensorflow` and `JAX` for NVIDIA GPUs (CUDA) and MacOS Metal Performance Shaders (MPS).
|
||||
Codeflash automatically inserts synchronization barriers before measuring performance. It currently supports GPU code written in `Pytorch`, `Tensorflow` and `JAX` for NVIDIA GPUs (`CUDA`) and MacOS Metal Performance Shaders (`MPS`).
|
||||
|
||||
- **PyTorch**: Uses `torch.cuda.synchronize()` (CUDA) or `torch.mps.synchronize()` (MPS) depending on the device.
|
||||
- **JAX**: Uses `jax.block_until_ready()` to wait for computation to complete. It works for both CUDA and MPS devices.
|
||||
- **TensorFlow**: Uses `tf.test.experimental.sync_devices()` for device synchronization. It works for both CUDA and MPS devices.
|
||||
- **PyTorch**: Uses `torch.cuda.synchronize()` (`CUDA`) or `torch.mps.synchronize()` (`MPS`) depending on the device.
|
||||
- **JAX**: Uses `jax.block_until_ready()` to wait for computation to complete. It works for both `CUDA` and `MPS` devices.
|
||||
- **TensorFlow**: Uses `tf.test.experimental.sync_devices()` for device synchronization. It works for both `CUDA` and `MPS` devices.
|
||||
|
|
|
|||
Loading…
Reference in a new issue