You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SYCL kernel compilation allows kernel introspection to select a work-group size according to available resources (e.g. shared local memory), but it may negatively impact performance. It is not clear when to use it, whether it is generally avoidable or not.
Problem Statement:
It is not clear when to compile the kernels, whether it is generally avoidable or not.
Summary:
SYCL kernel compilation allows kernel introspection to select a work-group size according to available resources (e.g. shared local memory), but it may negatively impact performance. It is not clear when to use it, whether it is generally avoidable or not.
Problem Statement:
It is not clear when to compile the kernels, whether it is generally avoidable or not.
It appears that there is an empirical rule for GPU devices: use no more than a half of SLM. For example, scan, reduce, find, merge-sort and radix-sort rely on this finding. Below is an example for the reduce pattern:
https://github.com/oneapi-src/oneDPL/blob/4898de274ed46526a1dae3e32fbdc525dd2e0291/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h#L460-L464
The ultimate question: can/should it applied to other devices?
Preferred Solution:
Clarify the strategy of using the compiled kernels, or do not use them at all.
Additional Context:
There is an internal knob to control kernel compilation (
_ONEDPL_COMPILE_KERNEL
), but its uses are not well-defined due to missing reasoning.That question was also raised here: #1881 (comment)
The text was updated successfully, but these errors were encountered: