Skip to content

Commit

Permalink
prov/xpmem: Fix xpmem memory corruption
Browse files Browse the repository at this point in the history
The offset into the XPMEM memory to be copied is calculated on the
receiving end. This introduces a constraint on caching. XPMEM code
calculates the memory to map to the beginning of the nearest page to the
start address and the length to the bottom of the last page. It then
caches that memory region and attaches to it if necessary. The offset is
calculated based on the original address sent and the address which is
calculated for attaching. After getting the mapped address, the copy is
done from the mapped address plus the offset.

When searching the cache a cache hit is found if the address being
searched is within a memory regions which has already been cached.
However, this introduces an issue for XPMEM. Here is an example to
illustrate the issue:

address being looked up: 0x7fffc8463000
Length: 0x5FFF
Ending address: 0x7FFFC8468FFF
offset: 0x1000

Address cached: 0x7fffc8462000
Lengh: 0x6FFF
Ending Address: 0x7FFFC8468FFF

As shown in the example the address being looked up in the cache is
within the cached memory region.

However, if the cached memory region address is returned and subsequently the
calculated offset is used to copy the data, there will be a discrepancy
of a page, leading to memory corruption. In the above example the copy
will start from 0x7fffc8462000 + 0x1000 instead of from
0x7fffc8463000 + 0x1000

This issue is not unique to other memory operations, such as ROCR or ZE,
the difference is in these other cases the offset is pre-calculated on
the sending path, which avoids the problem.

It is conceivable to calculate the XPMEM offset on the sending path as
well, however, the current code structure doesn't allow for it. The
sending path is shared with CMA, which doesn't have the same
requirements. Changes which will be needed are going to be more
extensive than this patch, with no clear advantage.

The caching infrastructure already provides a memory monitor mechanism
to validate memory. This can be used to ensure that the memory
region being looked up starts at the same address as the cached
memory region and be within this region.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
  • Loading branch information
amirshehataornl committed Mar 11, 2024
1 parent 263ce57 commit 5645e80
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion prov/util/src/xpmem_monitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ static bool xpmem_monitor_valid(struct ofi_mem_monitor *monitor,
const struct ofi_mr_info *info,
struct ofi_mr_entry *entry)
{
return true;
return (info->iov.iov_base == entry->info.iov.iov_base);
}

#else
Expand Down

0 comments on commit 5645e80

Please sign in to comment.