Thread Settings When Using ONNX Runtime Generate() API #981

mitsunami · 2024-10-14T15:41:48Z

mitsunami
Oct 14, 2024

When running an LLM using the ONNX Runtime Generate() API, what is the default setting for threads affinity? How can I check and configure the threads settings?

Currently, I am running the phi3.cpp example from this repository and the Example from the onnxruntime-inference-examples repository on Android mobiles, but I am noticing a performance difference and would like to understand the reason. I suspect it might be due to different thread settings, but it would be great if you could let me know if there are other factors.

Answered by RyanUnderhill

Oct 18, 2024

The current defaults are the same as onnxruntime, we just use the defaults when creating the onnxruntime session. I think it uses half the available cores maximum, but this is all outside of our library.

View full answer

RyanUnderhill · 2024-10-18T22:59:01Z

RyanUnderhill
Oct 18, 2024
Collaborator

The current defaults are the same as onnxruntime, we just use the defaults when creating the onnxruntime session. I think it uses half the available cores maximum, but this is all outside of our library.

1 reply

mitsunami Oct 21, 2024
Author

Understood, I'll have a look at the onnxruntime settings.
Thanks for your reply, RyanUnderhill!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread Settings When Using ONNX Runtime Generate() API #981

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Thread Settings When Using ONNX Runtime Generate() API #981

mitsunami Oct 14, 2024

Replies: 1 comment · 1 reply

RyanUnderhill Oct 18, 2024 Collaborator

mitsunami Oct 21, 2024 Author

mitsunami
Oct 14, 2024

Replies: 1 comment 1 reply

RyanUnderhill
Oct 18, 2024
Collaborator

mitsunami Oct 21, 2024
Author