SYCL kernel compilation: establish a guideline or avoid #1938

dmitriy-sobolev · 2024-11-18T17:58:49Z

Summary:

SYCL kernel compilation allows kernel introspection to select a work-group size according to available resources (e.g. shared local memory), but it may negatively impact performance. It is not clear when to use it, whether it is generally avoidable or not.

Problem Statement:
It is not clear when to compile the kernels, whether it is generally avoidable or not.

It appears that there is an empirical rule for GPU devices: use no more than a half of SLM. For example, scan, reduce, find, merge-sort and radix-sort rely on this finding. Below is an example for the reduce pattern:
https://github.com/oneapi-src/oneDPL/blob/4898de274ed46526a1dae3e32fbdc525dd2e0291/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h#L460-L464

The ultimate question: can/should it applied to other devices?

Preferred Solution:

Clarify the strategy of using the compiled kernels, or do not use them at all.

Additional Context:

There is an internal knob to control kernel compilation (_ONEDPL_COMPILE_KERNEL ), but its uses are not well-defined due to missing reasoning.

That question was also raised here: #1881 (comment)

The text was updated successfully, but these errors were encountered:

dmitriy-sobolev added enhancement question labels Nov 18, 2024

dmitriy-sobolev mentioned this issue Nov 20, 2024

How to limit number of threads per group in algorithms? (2024.11.18.) #1936

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL kernel compilation: establish a guideline or avoid #1938

SYCL kernel compilation: establish a guideline or avoid #1938

dmitriy-sobolev commented Nov 18, 2024

SYCL kernel compilation: establish a guideline or avoid #1938

SYCL kernel compilation: establish a guideline or avoid #1938

Comments

dmitriy-sobolev commented Nov 18, 2024