Builder output with quantization enabled yields incorrect negative scales. #1051

aendk · 2024-11-08T13:50:48Z

#Describe the bug
Quantization scales are defined to always be positive in the onnx documentation.
Creating a qdq-enabled onnx-representation using the builder.py lead to plenty of negative scales, contrary to what is stated in the documentation.
It also causes exceptions in various execution providers.

Steps to reproduce the behavior:

Download a model from hugginface, in our case Phi-3.5-mini-instruct.
Create a quantized onnx-represenation from it, with a call similar to this:

python builder.py -m /path/to/downloaded/model -e cuda -p int4 -o /desired/path/to/result --extra_options use_qdq=1

Open this new onnx-representation using netron
Click on a random DequantizeLinear node.
In the opened sidebar, find Inputs, and there x_scale. Click on the associated +-sign on the right.
See that there are many negative values.

Expected behavior
In Step 6, only 0 or positive values should be visible.

Desktop (please complete the following information):
ubuntu

Screenshots

The text was updated successfully, but these errors were encountered:

kunal-vaishnavi · 2024-11-08T21:43:43Z

When quantizing to INT4 precision, the model builder assumes that the scales are symmetric. When is_symmetric = True, the scales can be positive or negative. If is_symmetric = False, the scales will be non-negative.

onnxruntime-genai/src/python/py/models/builder.py

Lines 463 to 474 in 2f2686f

    
           def to_int4(self, model): 
        
               quant = MatMul4BitsQuantizer( 
        
                   model=model, 
        
                   block_size=self.quant_attrs["int4"]["block_size"], 
        
                   is_symmetric=True, 
        
                   accuracy_level=self.quant_attrs["int4"]["accuracy_level"], 
        
                   nodes_to_exclude=[], 
        
                   quant_format=QuantFormat.QDQ if self.quant_attrs["use_qdq"] else QuantFormat.QOperator, 
        
                   op_types_to_quantize=self.quant_attrs["int4"]["op_types_to_quantize"], 
        
               ) 
        
               quant.process() 
        
               return quant.model.model

This can be fixed by adding the ability to set is_symmetric using the --extra_options. Here's the PR to enable this.

microsoft-github-policy-service bot added the quantization label Nov 8, 2024

kunal-vaishnavi linked a pull request Nov 8, 2024 that will close this issue

Add option to disable symmetric INT4 quantization #1053

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Builder output with quantization enabled yields incorrect negative scales. #1051

Builder output with quantization enabled yields incorrect negative scales. #1051

aendk commented Nov 8, 2024 •

edited

Loading

kunal-vaishnavi commented Nov 8, 2024

Builder output with quantization enabled yields incorrect negative scales. #1051

Builder output with quantization enabled yields incorrect negative scales. #1051

Comments

aendk commented Nov 8, 2024 • edited Loading

kunal-vaishnavi commented Nov 8, 2024

aendk commented Nov 8, 2024 •

edited

Loading