Profiling Tools
Contents
In-code performance profiling
The onnxruntime_perf_test.exe tool (available from the build drop) can be used to test various knobs. Please find the usage instructions using onnxruntime_perf_test.exe -h
. The perf_view tool can also be used to render the statistics as a summarized view in the browser.
You can enable ONNX Runtime latency profiling in code:
import onnxruntime as rt
sess_options = rt.SessionOptions()
sess_options.enable_profiling = True
If you are using the onnxruntime_perf_test.exe tool, you can add -p [profile_file]
to enable performance profiling.
In both cases, you will get a JSON file which contains the detailed performance data (threading, latency of each operator, etc). This file is a standard performance tracing file, and to view it in a user-friendly way, you can open it by using multiple tools.
- (Windows) Use the WPA GUI to open the trace using the Perfetto OSS plugin - Microsoft-Performance-Tools-Linux-Android
- Perfetto UI - Successor to Chrome Tracing UI
- chrome://tracing:
- Open a Chromium based browser such as Edge or Chrome
- Type chrome://tracing in the address bar
- Load the generated JSON file
To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with --enable_cuda_profiling
. To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with --enable_rocm_profiling
.
Performance numbers from the device will then be attached to those from the host. For example:
{"cat":"Node", "name":"Add_1234", "dur":17, ...}
{"cat":"Kernel", "name":"ort_add_cuda_kernel", dur:33, ...}
Here, the “Add” operator from the host initiated a CUDA kernel on device named “ort_add_cuda_kernel” which lasted for 33 microseconds. If an operator called multiple kernels during execution, the performance numbers of those kernels will all be listed following the call sequence:
{"cat":"Node", "name":<name of the node>, ...}
{"cat":"Kernel", "name":<name of the kernel called first>, ...}
{"cat":"Kernel", "name":<name of the kernel called next>, ...}