Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builder output with quantization enabled yields incorrect negative scales. #1051

Open
aendk opened this issue Nov 8, 2024 · 1 comment · May be fixed by #1053
Open

Builder output with quantization enabled yields incorrect negative scales. #1051

aendk opened this issue Nov 8, 2024 · 1 comment · May be fixed by #1053

Comments

@aendk
Copy link

aendk commented Nov 8, 2024

#Describe the bug
Quantization scales are defined to always be positive in the onnx documentation.
Creating a qdq-enabled onnx-representation using the builder.py lead to plenty of negative scales, contrary to what is stated in the documentation.
It also causes exceptions in various execution providers.

Steps to reproduce the behavior:

  1. Download a model from hugginface, in our case Phi-3.5-mini-instruct.
  2. Create a quantized onnx-represenation from it, with a call similar to this:
python builder.py -m /path/to/downloaded/model -e cuda -p int4 -o /desired/path/to/result --extra_options use_qdq=1
  1. Open this new onnx-representation using netron
  2. Click on a random DequantizeLinear node.
  3. In the opened sidebar, find Inputs, and there x_scale. Click on the associated +-sign on the right.
  4. See that there are many negative values.

Expected behavior
In Step 6, only 0 or positive values should be visible.

Desktop (please complete the following information):
ubuntu

Screenshots
Image

@kunal-vaishnavi
Copy link
Contributor

When quantizing to INT4 precision, the model builder assumes that the scales are symmetric. When is_symmetric = True, the scales can be positive or negative. If is_symmetric = False, the scales will be non-negative.

def to_int4(self, model):
quant = MatMul4BitsQuantizer(
model=model,
block_size=self.quant_attrs["int4"]["block_size"],
is_symmetric=True,
accuracy_level=self.quant_attrs["int4"]["accuracy_level"],
nodes_to_exclude=[],
quant_format=QuantFormat.QDQ if self.quant_attrs["use_qdq"] else QuantFormat.QOperator,
op_types_to_quantize=self.quant_attrs["int4"]["op_types_to_quantize"],
)
quant.process()
return quant.model.model

This can be fixed by adding the ability to set is_symmetric using the --extra_options. Here's the PR to enable this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants