Is it really good to convert linear to conv2d when using qualcomm backend? #9568

fa-ina-tic · 2025-03-25T01:47:39Z

fa-ina-tic
Mar 25, 2025

Hi, I have a question regarding the convert_linear_to_conv2d function in executorch.examples.qualcomm.oss_scripts.llama.py.

I was curious about the impact of this conversion, so I ran some tests comparing the performance of a Linear module and its converted Conv2d counterpart. Surprisingly, I found that using Conv2d resulted in worse performance. This was unexpected, as I initially assumed that using Conv2d with HMX would lead to better performance.

I'm wondering whether converting Linear to Conv2d is actually a beneficial optimization for the Qualcomm backend, or if there may have been an issue in how I conducted the experiment.

I'm currently testing on an SM8650 device. Here's the code I used for the comparison:

class TestQNNQuantizedModel(TestQNN):
...
    def test_qnn_backend_linear_and_conv2d(self):
        linear = torch.nn.Linear(
            in_features=4096,
            out_features=4096,
            bias=False,
        )
        from executorch.backends.qualcomm.utils.utils import convert_linear_to_conv2d
        conv2d = convert_linear_to_conv2d(linear)
        sample_input = (torch.randn([127, 4096]),)
        instances = [
            linear,
            conv2d,
        ]
        for instance in instances:
            module = self.get_qdq_module(instance, sample_input)
            self.lower_module_and_test_output(module, sample_input)

I ran the test with the following command:

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedModel.test_qnn_backend_linear_and_conv2d -s 96698334 -m SM8650 -P -b build-android/

Below are the relevant inspector outputs for both cases.

Linear

inspector output

╒════╤════════════════════╤════════════════════════════════════════════════════════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤═════════════════════════╤═══════════════════╤═════════════════════════╕
│    │ event_block_name   │ event_name                                                         │     p10 (cycles) │     p50 (cycles) │     p90 (cycles) │     avg (cycles) │     min (cycles) │     max (cycles) │ op_types                │ is_delegated_op   │ delegate_backend_name   │
╞════╪════════════════════╪════════════════════════════════════════════════════════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪═════════════════════════╪═══════════════════╪═════════════════════════╡
│  0 │ Default            │ Method::init                                                       │      1.8941e+08  │      1.8941e+08  │      1.8941e+08  │      1.8941e+08  │      1.8941e+08  │      1.8941e+08  │ []                      │ False             │                         │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  1 │ Default            │ Program::load_method                                               │      1.89462e+08 │      1.89462e+08 │      1.89462e+08 │      1.89462e+08 │      1.89462e+08 │      1.89462e+08 │ []                      │ False             │                         │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  2 │ Execute            │ Input OpId_2 (cycles)                                              │      0           │      0           │      0           │      0           │      0           │      0           │ []                      │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  3 │ Execute            │ quantized_decomposed.quantize_per_tensor.default:OpId_16 (cycles)  │ 661570           │ 661570           │ 661570           │ 661570           │ 661570           │ 661570           │ []                      │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  4 │ Execute            │ aten_linear_default:OpId_18 (cycles)                               │ 663176           │ 663176           │ 663176           │ 663176           │ 663176           │ 663176           │ []                      │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  5 │ Execute            │ quantized_decomposed.dequantize_per_tensor.tensor:OpId_21 (cycles) │      1.75887e+06 │      1.75887e+06 │      1.75887e+06 │      1.75887e+06 │      1.75887e+06 │      1.75887e+06 │ []                      │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  6 │ Execute            │ Output OpId_3 (cycles)                                             │  90040           │  90040           │  90040           │  90040           │  90040           │  90040           │ []                      │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  7 │ Execute            │ DELEGATE_CALL                                                      │      2.03604e+06 │      2.03604e+06 │      2.03604e+06 │      2.03604e+06 │      2.03604e+06 │      2.03604e+06 │ ['aten.linear.default'] │ False             │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────────────┼───────────────────┼─────────────────────────┤
│  8 │ Execute            │ Method::execute                                                    │      2.04151e+06 │      2.04151e+06 │      2.04151e+06 │      2.04151e+06 │      2.04151e+06 │      2.04151e+06 │ []                      │ False             │                         │
╘════╧════════════════════╧════════════════════════════════════════════════════════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧═════════════════════════╧═══════════════════╧═════════════════════════╛

Conv2d

inspector output

╒════╤════════════════════╤════════════════════════════════════════════════════════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════╤═══════════════════╤═════════════════════════╕
│    │ event_block_name   │ event_name                                                         │     p10 (cycles) │     p50 (cycles) │     p90 (cycles) │     avg (cycles) │     min (cycles) │     max (cycles) │ op_types                                                                                       │ is_delegated_op   │ delegate_backend_name   │
╞════╪════════════════════╪════════════════════════════════════════════════════════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪════════════════════════════════════════════════════════════════════════════════════════════════╪═══════════════════╪═════════════════════════╡
│  0 │ Default            │ Method::init                                                       │      1.75865e+08 │      1.75865e+08 │      1.75865e+08 │      1.75865e+08 │      1.75865e+08 │      1.75865e+08 │ []                                                                                             │ False             │                         │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  1 │ Default            │ Program::load_method                                               │      1.75895e+08 │      1.75895e+08 │      1.75895e+08 │      1.75895e+08 │      1.75895e+08 │      1.75895e+08 │ []                                                                                             │ False             │                         │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  2 │ Execute            │ Input OpId_2 (cycles)                                              │      0           │      0           │      0           │      0           │      0           │      0           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  3 │ Execute            │ quantized_decomposed.quantize_per_tensor.default:OpId_16 (cycles)  │      2.26209e+06 │      2.26209e+06 │      2.26209e+06 │      2.26209e+06 │      2.26209e+06 │      2.26209e+06 │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  4 │ Execute            │ aten_view_copy_default:OpId_17 (cycles)                            │      0           │      0           │      0           │      0           │      0           │      0           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  5 │ Execute            │ aten_permute_copy_default_4:OpId_19 (cycles)                       │ 928693           │ 928693           │ 928693           │ 928693           │ 928693           │ 928693           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  6 │ Execute            │ aten_convolution_default:OpId_25 (cycles)                          │      1.8603e+06  │      1.8603e+06  │      1.8603e+06  │      1.8603e+06  │      1.8603e+06  │      1.8603e+06  │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  7 │ Execute            │ aten_permute_copy_default_5:OpId_29 (cycles)                       │ 199233           │ 199233           │ 199233           │ 199233           │ 199233           │ 199233           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  8 │ Execute            │ aten_view_copy_default_1:OpId_30 (cycles)                          │      0           │      0           │      0           │      0           │      0           │      0           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│  9 │ Execute            │ quantized_decomposed.dequantize_per_tensor.tensor:OpId_31 (cycles) │ 425073           │ 425073           │ 425073           │ 425073           │ 425073           │ 425073           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│ 10 │ Execute            │ Output OpId_3 (cycles)                                             │ 102169           │ 102169           │ 102169           │ 102169           │ 102169           │ 102169           │ []                                                                                             │ True              │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│ 11 │ Execute            │ DELEGATE_CALL                                                      │      5.36333e+06 │      5.36333e+06 │      5.36333e+06 │      5.36333e+06 │      5.36333e+06 │      5.36333e+06 │ ['aten.view_copy.default', 'aten.permute_copy.default' ... 'aten.view_copy.default'] (5 total) │ False             │ QnnBackend              │
├────┼────────────────────┼────────────────────────────────────────────────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼─────────────────────────┤
│ 12 │ Execute            │ Method::execute                                                    │      5.37047e+06 │      5.37047e+06 │      5.37047e+06 │      5.37047e+06 │      5.37047e+06 │      5.37047e+06 │ []                                                                                             │ False             │                         │
╘════╧════════════════════╧════════════════════════════════════════════════════════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧════════════════════════════════════════════════════════════════════════════════════════════════╧═══════════════════╧═════════════════════════╛

As shown above, the overall delegate call for Conv2d takes significantly more cycles than for Linear.

Could you please share your thoughts on this? I'd appreciate any insights into whether this conversion is intended to provide performance benefits on Qualcomm backends, or if there might be something I’m missing in my test setup.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it really good to convert linear to conv2d when using qualcomm backend? #9568

{{title}}

inspector output

inspector output

Replies: 0 comments

Select a reply

Is it really good to convert linear to conv2d when using qualcomm backend? #9568

fa-ina-tic Mar 25, 2025

inspector output

inspector output

Replies: 0 comments

fa-ina-tic
Mar 25, 2025