Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARCH/X86: Use UCS function to count leading zeros #10514

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Feb 24, 2025

What?

Use common UCS count leading zeros function (implemented with __builtin_clz()) to address #10509.

Why?

nvc 24.9 seems to have issue with _lzcnt_u32(). It defines it as __builtin_ia32_lzcnt_u32() but fails to find it at link time. But the _lzcnt_u32() function seems to be likely defined as __builtin_clz() in gcc, and is also properly linked by nvc.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Feb 25, 2025

@arun-chandran-edarath

yosefe
yosefe previously approved these changes Feb 25, 2025
@arun-chandran-edarath
Copy link
Contributor

@tvegas1 __builtin_clz is sligtly different when compared to _lzcnt_u32().

One problem with __builtin_clz() is that it is undefined if the input is zero.
Ref:https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

LZCNT ( _lzcnt_u32()) is a better instruction for counting the number of leading zeros.

How do you think we should proceed? Can we add _lzcnt_u32() support in NVC?

@tvegas1
Copy link
Contributor Author

tvegas1 commented Feb 25, 2025

How do you think we should proceed? Can we add _lzcnt_u32() support in NVC?

We can file a bug for nvc, this will take time to fix. And we can handle nvc case in ucs_count_leading_zero_bits()?

__m256i y0, y1, y2, y3;
/* Handle lengths that fall usually within eager short range */
switch (_lzcnt_u32(len)) {
switch (ucs_count_leading_zero_bits(len)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the len is zero the code expects '32' as output, is ucs_count_leading_zero_bits(0) return 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right, should be fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants