Is it really good to convert linear to conv2d when using qualcomm backend? #9568
fa-ina-tic
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I have a question regarding the
convert_linear_to_conv2d
function inexecutorch.examples.qualcomm.oss_scripts.llama.py.
I was curious about the impact of this conversion, so I ran some tests comparing the performance of a Linear module and its converted Conv2d counterpart. Surprisingly, I found that using Conv2d resulted in worse performance. This was unexpected, as I initially assumed that using Conv2d with HMX would lead to better performance.
I'm wondering whether converting Linear to Conv2d is actually a beneficial optimization for the Qualcomm backend, or if there may have been an issue in how I conducted the experiment.
I'm currently testing on an SM8650 device. Here's the code I used for the comparison:
I ran the test with the following command:
Below are the relevant inspector outputs for both cases.
Linear
inspector output
Conv2d
inspector output
As shown above, the overall delegate call for Conv2d takes significantly more cycles than for Linear.
Could you please share your thoughts on this? I'd appreciate any insights into whether this conversion is intended to provide performance benefits on Qualcomm backends, or if there might be something I’m missing in my test setup.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions