AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations #128390

arsenm · 2025-02-23T03:08:54Z

We are missing known bits and demanded bits optimizations which look through readfirstlane, readlane, and DPP operators. We need to insert extensions to produce a legal type, but these imply inserting instructions to appropriately set the high bits of the input value. These bits are never needed, and starting from the use context of the trunc, we should be able to delete it.

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s

target triple = "amdgcn-amd-amdhsa"

; 	v_and_b32_e32 v0, 0xff, v0  ; Should be able to delete this
;	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @readfirstlane_demanded_i8_zext(i8 %src) {
  %zext = zext i8 %src to i32
  %readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

; 	v_bfe_i32 v0, v0, 0, 8 ; Should be able to delete this
; 	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @readfirstlane_demanded_i8_sext(i8 %src) {
  %zext = sext i8 %src to i32
  %readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

This should be accomplished by implementing SimplifyDemandedBitsForTargetNode in SITargetLowering. This should handle INTRINSIC_WO_CHAIN operations, and handle Intrinsic::amdgcn_readfirstlane as the base example

As an example these appear in the tests from #128388

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-02-23T03:09:10Z

@llvm/issue-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

We are missing known bits and demanded bits optimizations which look through readfirstlane, readlane, and DPP operators. We need to insert extensions to produce a legal type, but these imply inserting instructions to appropriately set the high bits of the input value. These bits are never needed, and starting from the use context of the trunc, we should be able to delete it.

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 &lt; %s

target triple = "amdgcn-amd-amdhsa"

; 	v_and_b32_e32 v0, 0xff, v0  ; Should be able to delete this
;	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @<!-- -->readfirstlane_demanded_i8_zext(i8 %src) {
  %zext = zext i8 %src to i32
  %readfirstlane = call i32 @<!-- -->llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

; 	v_bfe_i32 v0, v0, 0, 8 ; Should be able to delete this
; 	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @<!-- -->readfirstlane_demanded_i8_sext(i8 %src) {
  %zext = sext i8 %src to i32
  %readfirstlane = call i32 @<!-- -->llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

This should be accomplished by implementing SimplifyDemandedBitsForTargetNode in SITargetLowering. This should handle INTRINSIC_WO_CHAIN operations, and handle Intrinsic::amdgcn_readfirstlane as the base example

As an example these appear in the tests from #128388

llvmbot · 2025-02-23T03:09:13Z

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot · 2025-02-23T03:09:15Z

@llvm/issue-subscribers-good-first-issue

Author: Matt Arsenault (arsenm)

We are missing known bits and demanded bits optimizations which look through readfirstlane, readlane, and DPP operators. We need to insert extensions to produce a legal type, but these imply inserting instructions to appropriately set the high bits of the input value. These bits are never needed, and starting from the use context of the trunc, we should be able to delete it.

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 &lt; %s

target triple = "amdgcn-amd-amdhsa"

; 	v_and_b32_e32 v0, 0xff, v0  ; Should be able to delete this
;	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @<!-- -->readfirstlane_demanded_i8_zext(i8 %src) {
  %zext = zext i8 %src to i32
  %readfirstlane = call i32 @<!-- -->llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

; 	v_bfe_i32 v0, v0, 0, 8 ; Should be able to delete this
; 	v_readfirstlane_b32 s4, v0
;	v_mov_b32_e32 v0, s4
define i8 @<!-- -->readfirstlane_demanded_i8_sext(i8 %src) {
  %zext = sext i8 %src to i32
  %readfirstlane = call i32 @<!-- -->llvm.amdgcn.readfirstlane.i32(i32 %zext)
  %trunc = trunc i32 %readfirstlane to i8
  ret i8 %trunc
}

This should be accomplished by implementing SimplifyDemandedBitsForTargetNode in SITargetLowering. This should handle INTRINSIC_WO_CHAIN operations, and handle Intrinsic::amdgcn_readfirstlane as the base example

As an example these appear in the tests from #128388

ethan0150 · 2025-02-23T09:34:49Z

I'd like to work on this. Could I get assigned?

arsenm added backend:AMDGPU good first issue https://github.com/llvm/llvm-project/contribute missed-optimization labels Feb 23, 2025

dtcxzyw assigned ethan0150 Feb 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations #128390

AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations #128390

arsenm commented Feb 23, 2025

llvmbot commented Feb 23, 2025

llvmbot commented Feb 23, 2025

llvmbot commented Feb 23, 2025

ethan0150 commented Feb 23, 2025

AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations #128390

AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations #128390

Comments

arsenm commented Feb 23, 2025

llvmbot commented Feb 23, 2025

llvmbot commented Feb 23, 2025

llvmbot commented Feb 23, 2025

ethan0150 commented Feb 23, 2025