Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Bad codegen for widen followed by vdupq_n_* #128349

Open
nikic opened this issue Feb 22, 2025 · 2 comments
Open

[AArch64] Bad codegen for widen followed by vdupq_n_* #128349

nikic opened this issue Feb 22, 2025 · 2 comments

Comments

@nikic
Copy link
Contributor

nikic commented Feb 22, 2025

From rust-lang/rust#137407:

VectorCombine(+InstCombine) perform this transform (https://llvm.godbolt.org/z/veW5oG6Gx):

define void @src(ptr %ptr, i16 %x) {
  %ext = zext i16 %x to i32
  %ins = insertelement <1 x i32> poison, i32 %ext, i64 0
  %shuf = shufflevector <1 x i32> %ins, <1 x i32> poison, <4 x i32> zeroinitializer
  %bc = bitcast <4 x i32> %shuf to <8 x i16>
  %add = add <8 x i16> %bc, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  store <8 x i16> %add, ptr %ptr, align 16
  ret void
}

define void @tgt(ptr %ptr, i16 %x) {
  %1 = insertelement <2 x i16> <i16 poison, i16 0>, i16 %x, i64 0
  %bc = shufflevector <2 x i16> %1, <2 x i16> poison, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
  %add = add <8 x i16> %bc, splat (i16 1)
  store <8 x i16> %add, ptr %ptr, align 16
  ret void
}

Resulting in this codegen (https://llvm.godbolt.org/z/Px83GGq7Y):

src:                                    // @src
        movi    v0.8h, #1
        and     w8, w1, #0xffff
        dup     v1.4s, w8
        add     v0.8h, v1.8h, v0.8h
        str     q0, [x0]
        ret
tgt:                                    // @tgt
        movi    v0.2d, #0000000000000000
        movi    v1.8h, #1
        mov     v0.h[0], w1
        mov     v0.h[2], w1
        mov     v0.h[4], w1
        mov     v0.h[6], w1
        add     v0.8h, v0.8h, v1.8h
        str     q0, [x0]
        ret

The dup has been replaced by element-wise movs.

@llvmbot
Copy link
Member

llvmbot commented Feb 22, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

From https://github.com/rust-lang/rust/issues/137407:

VectorCombine(+InstCombine) perform this transform (https://llvm.godbolt.org/z/veW5oG6Gx):

define void @<!-- -->src(ptr %ptr, i16 %x) {
  %ext = zext i16 %x to i32
  %ins = insertelement &lt;1 x i32&gt; poison, i32 %ext, i64 0
  %shuf = shufflevector &lt;1 x i32&gt; %ins, &lt;1 x i32&gt; poison, &lt;4 x i32&gt; zeroinitializer
  %bc = bitcast &lt;4 x i32&gt; %shuf to &lt;8 x i16&gt;
  %add = add &lt;8 x i16&gt; %bc, &lt;i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1&gt;
  store &lt;8 x i16&gt; %add, ptr %ptr, align 16
  ret void
}

define void @<!-- -->tgt(ptr %ptr, i16 %x) {
  %1 = insertelement &lt;2 x i16&gt; &lt;i16 poison, i16 0&gt;, i16 %x, i64 0
  %bc = shufflevector &lt;2 x i16&gt; %1, &lt;2 x i16&gt; poison, &lt;8 x i32&gt; &lt;i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1&gt;
  %add = add &lt;8 x i16&gt; %bc, splat (i16 1)
  store &lt;8 x i16&gt; %add, ptr %ptr, align 16
  ret void
}

Resulting in this codegen (https://llvm.godbolt.org/z/Px83GGq7Y):

src:                                    // @<!-- -->src
        movi    v0.8h, #<!-- -->1
        and     w8, w1, #<!-- -->0xffff
        dup     v1.4s, w8
        add     v0.8h, v1.8h, v0.8h
        str     q0, [x0]
        ret
tgt:                                    // @<!-- -->tgt
        movi    v0.2d, #<!-- -->0000000000000000
        movi    v1.8h, #<!-- -->1
        mov     v0.h[0], w1
        mov     v0.h[2], w1
        mov     v0.h[4], w1
        mov     v0.h[6], w1
        add     v0.8h, v0.8h, v1.8h
        str     q0, [x0]
        ret

The dup has been replaced by element-wise movs.

@davemgreen
Copy link
Collaborator

Do you know where the shufflevector <1 x i32> %ins, <1 x i32> poison, <4 x i32> zeroinitializer comes from? I don't think I would have expected the 1x vector types.

Shuffles that change-type have never been very well supported in the cost-model in the past, they are getting better over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants