The QNN Execution Provider for ONNX Runtime enables hardware accelerated execution on Qualcomm chipsets. It uses the Qualcomm AI Engine Direct SDK (QNN SDK) to construct a QNN graph from an ONNX model which can be executed by a supported accelerator backend library.
The QNN Execution Provider supports a number of configuration options. The provider_option_keys, provider_options_values enable different options for the application. Each provider_options_keys accepts values as shown below:
provider_options_values for provider_options_keys = "backend_path"
Description
‘libQnnCpu.so’ or ‘QnnCpu.dll’
Enable CPU backend. Useful for integration testing. CPU backend is a reference implementation of QNN operators
‘libQnnHtp.so’ or ‘QnnHtp.dll’
Enable Htp backend. Offloads compute to NPU.
provider_options_values for provider_options_keys = "profiling_level"
Description
‘off’
‘basic’
‘detailed’
provider_options_values for provider_options_keys = "rpc_control_latency"
Description
microseconds (string)
allows client to set up RPC control latency in microseconds
provider_options_values for provider_options_keys = "htp_performance_mode"
Description
‘burst’
‘balanced’
‘default’
‘high_performance’
‘high_power_saver’
‘low_balanced’
‘low_power_saver’
‘power_saver’
‘sustained_high_performance’
provider_options_values for provider_options_keys = "qnn_context_cache_enable"
Description
‘0’
disabled (default)
‘1’
enable qnn context cache. write out prepared Htp Context Binary to disk to save initialization costs.
provider_options_values for provider_options_keys = "qnn_context_cache_path"
Description
‘/path/to/context/cache’
string path to context cache binary
provider_options_values for provider_options_keys = "qnn_context_embed_mode"
Description
‘0’
generate the QNN context binary into separate file, set path in ONNX file specified by qnn_context_cache_path.
‘1’
generate the QNN context binary into the ONNX file specified by qnn_context_cache_path (default).
provider_options_values for provider_options_keys = "qnn_context_priority"
import onnxruntime as ort
# Create a session with QNN EP using HTP (NPU) backend.
sess = ort.InferenceSession(model_path, providers=['QNNExecutionProvider'], provider_options=[{'backend_path':'QnnHtp.dll'}])`