Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARCH/X86: Use UCS function to count leading zeros #10514

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions src/ucs/arch/x86_64/cpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -1063,10 +1063,9 @@ size_t ucs_x86_nt_src_buffer_transfer(void *dst, const void *src, size_t len)
static UCS_F_ALWAYS_INLINE
void ucs_x86_copy_bytes_le_128(void *dst, const void *src, size_t len)
{
#if defined (__LZCNT__)
__m256i y0, y1, y2, y3;
/* Handle lengths that fall usually within eager short range */
switch (_lzcnt_u32(len)) {
switch (ucs_count_leading_zero_bits(len)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the len is zero the code expects '32' as output, is ucs_count_leading_zero_bits(0) return 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right, should be fixed now

Copy link
Contributor

@arun-chandran-edarath arun-chandran-edarath Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one issue; _lzcnt_u32(len) is expected to count the leading zeros from a 32bit operand.

_lzcnt_u32(0) should produce 32.

But now ucs_count_leading_zero_bits(len) will use _lzcnt_u64() because 'size_t len' will be 8 byte and the outputs produced makes the switch_cases wrong. [_lzcnt_u64(0) gives 64]

It should be ucs_count_leading_zero_bits((uint32_t) len) right?

/* 0 */
case 32:
break;
Expand Down Expand Up @@ -1121,9 +1120,6 @@ void ucs_x86_copy_bytes_le_128(void *dst, const void *src, size_t len)
_mm256_storeu_si256(UCS_PTR_BYTE_OFFSET(dst, len - 32), y3);
break;
}
#else
memcpy(dst, src, len);
#endif
}

/* This is an adaptation of the memcpy code from https://github.com/amd/aocl-libmem
Expand Down
Loading