Skip to content

Add support for building a cuda + dml package #1600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

baijumeswani
Copy link
Collaborator

Add support for cuda + dml package. The python package will still be called onnxruntime-genai-cuda, but if --use_dml was passed in as a build time flag, dml will be available.

@natke natke self-requested a review July 1, 2025 17:55
Copy link
Contributor

@natke natke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we could have packages called onnxruntime-genai-cuda that contain different binaries?

Can we create a different package name for the combined binary.

I think this could be a source of confusion

@baijumeswani
Copy link
Collaborator Author

baijumeswani commented Jul 1, 2025

I agree it can cause some confusion.

If we build with the command:

python build.py --use_cuda --use_dml

It would build a package called onnxruntime-genai-cuda with support for dml as well as cuda. I think adding yet another package name is hard to maintain considering that we may not publish this package. Onnxruntime built with similar flags will build a package called onnxruntime-gpu (not onnxruntime-directml or a different package name). If we keep creating a new package name for every combination of device type supported, then that will become difficult to maintain.

I think for python in the medium term, let's combine all our packages into onnxruntime-genai and then add the dependencies via pip install onnxruntime-genai[dml] or pip install onnxruntime-genai[cuda] and so on or if users want to install their own onnxruntime dependencies, they would just pip install onnxruntime-genai?

@natke
Copy link
Contributor

natke commented Jul 1, 2025

I agree it can cause some confusion.

If we build with the command:

python build.py --use_cuda --use_dml

It would build a package called onnxruntime-genai-cuda with support for dml as well as cuda. I think adding yet another package name is hard to maintain considering that we may not publish this package. Onnxruntime built with similar flags will build a package called onnxruntime-gpu (not onnxruntime-directml or a different package name). If we keep creating a new package name for every combination of device type supported, then that will become difficult to maintain.

I think for python in the medium term, let's combine all our packages into onnxruntime-genai and then add the dependencies via pip install onnxruntime-genai[dml] or pip install onnxruntime-genai[cuda] and so on or if users want to install their own onnxruntime dependencies, they would just pip install onnxruntime-genai?

I think that's a great option for Python but we would still have the confusion for NuGet packages. How does a user know that their package has DML in it?

@baijumeswani
Copy link
Collaborator Author

As long as we don't publish these kinds of packages (support for multiple compile time providers), it should be ok. If we publish them, then we should change the name to something more meaningful.

Or maybe we can bubble this up to even ort. We could curate a list of eps we want to support in the default package and then build both ort and ort-genai that includes support for all the default package eps. And we simply call the package onnxruntime and onnxruntime-genai (or Microsoft.ML.OnnxRuntime and Microsoft.ML.OnnxRuntimeGenAI). That would solve a lot of the problems.

@ajindal1
Copy link
Collaborator

ajindal1 commented Jul 1, 2025

I think for python in the medium term, let's combine all our packages into onnxruntime-genai and then add the dependencies via pip install onnxruntime-genai[dml] or pip install onnxruntime-genai[cuda] and so on or if users want to install their own onnxruntime dependencies, they would just pip install onnxruntime-genai?

I like this idea but one downside would be that it will create some iterations and have a little worse user experience. One option is we can keep the same package name (onnxruntime-genai), but store the wheels at different locations for different versions, like the standard version exists at pypi, and cuda/DML versions can be stored somewhere else. This is the approach used by PyTorch.

@baijumeswani
Copy link
Collaborator Author

I like this idea but one downside would be that it will create some iterations and have a little worse user experience. One option is we can keep the same package name (onnxruntime-genai), but store the wheels at different locations for different versions, like the standard version exists at pypi, and cuda/DML versions can be stored somewhere else. This is the approach used by PyTorch.

Ideally it would be nice if one package can support multiple device types (dml + cuda + webgpu + cpu + others). Then we don't need multiple packages. We can technically do this right now. The only problem being that we list specific ort packages as dependencies (onnxruntime-gpu for cuda, onnxruntime-dml for dml and onnxruntime-qnn for qnn). Since this is only a dependency issue, it can be resolved with pip install onnxruntime-genai[gpu] or pip install onnxruntime-genai[qnn]. Hoping that this process can be simplified in the future.

But since we are not publishing any new package through this PR, I think we should discuss this in our scrum and decouple from this pr for now.

@natke
Copy link
Contributor

natke commented Jul 2, 2025

I like this idea but one downside would be that it will create some iterations and have a little worse user experience. One option is we can keep the same package name (onnxruntime-genai), but store the wheels at different locations for different versions, like the standard version exists at pypi, and cuda/DML versions can be stored somewhere else. This is the approach used by PyTorch.

Ideally it would be nice if one package can support multiple device types (dml + cuda + webgpu + cpu + others). Then we don't need multiple packages. We can technically do this right now. The only problem being that we list specific ort packages as dependencies (onnxruntime-gpu for cuda, onnxruntime-dml for dml and onnxruntime-qnn for qnn). Since this is only a dependency issue, it can be resolved with pip install onnxruntime-genai[gpu] or pip install onnxruntime-genai[qnn]. Hoping that this process can be simplified in the future.

But since we are not publishing any new package through this PR, I think we should discuss this in our scrum and decouple from this pr for now.

I would like to see a different name for the package if it is bundling different binaries. We will be building and shipping integration drops to IHVs and I can see this getting very confusing and causing a lot of churn. For each deliverable I would like to have a manifest, which shows dependencies and versions and from that it would be clear which binaries shipped (ie it's not just about packaging)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants