-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARCH/X86: Use UCS function to count leading zeros #10514
base: master
Are you sure you want to change the base?
Conversation
@tvegas1 __builtin_clz is sligtly different when compared to _lzcnt_u32(). One problem with __builtin_clz() is that it is undefined if the input is zero. LZCNT ( _lzcnt_u32()) is a better instruction for counting the number of leading zeros. How do you think we should proceed? Can we add _lzcnt_u32() support in NVC? |
We can file a bug for nvc, this will take time to fix. And we can handle nvc case in |
__m256i y0, y1, y2, y3; | ||
/* Handle lengths that fall usually within eager short range */ | ||
switch (_lzcnt_u32(len)) { | ||
switch (ucs_count_leading_zero_bits(len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the len is zero the code expects '32' as output, is ucs_count_leading_zero_bits(0) return 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you're right, should be fixed now
What?
Use common UCS count leading zeros function (implemented with
__builtin_clz()
) to address #10509.Why?
nvc 24.9 seems to have issue with
_lzcnt_u32()
. It defines it as__builtin_ia32_lzcnt_u32()
but fails to find it at link time. But the_lzcnt_u32()
function seems to be likely defined as__builtin_clz()
in gcc, and is also properly linked by nvc.