Thread Settings When Using ONNX Runtime Generate() API #981
-
When running an LLM using the ONNX Runtime Generate() API, what is the default setting for threads affinity? How can I check and configure the threads settings? Currently, I am running the phi3.cpp example from this repository and the Example from the onnxruntime-inference-examples repository on Android mobiles, but I am noticing a performance difference and would like to understand the reason. I suspect it might be due to different thread settings, but it would be great if you could let me know if there are other factors. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The current defaults are the same as onnxruntime, we just use the defaults when creating the onnxruntime session. I think it uses half the available cores maximum, but this is all outside of our library. |
Beta Was this translation helpful? Give feedback.
The current defaults are the same as onnxruntime, we just use the defaults when creating the onnxruntime session. I think it uses half the available cores maximum, but this is all outside of our library.