Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
prov/xpmem: Fix xpmem memory corruption
The offset into the XPMEM memory to be copied is calculated on the receiving end. This introduces a constraint on caching. XPMEM code calculates the memory to map to the beginning of the nearest page to the start address and the length to the bottom of the last page. It then caches that memory region and attaches to it if necessary. The offset is calculated based on the original address sent and the address which is calculated for attaching. After getting the mapped address, the copy is done from the mapped address plus the offset. When searching the cache a cache hit is found if the address being searched is within a memory regions which has already been cached. However, this introduces an issue for XPMEM. Here is an example to illustrate the issue: address being looked up: 0x7fffc8463000 Length: 0x5FFF Ending address: 0x7FFFC8468FFF offset: 0x1000 Address cached: 0x7fffc8462000 Lengh: 0x6FFF Ending Address: 0x7FFFC8468FFF As shown in the example the address being looked up in the cache is within the cached memory region. However, if the cached memory region address is returned and subsequently the calculated offset is used to copy the data, there will be a discrepancy of a page, leading to memory corruption. In the above example the copy will start from 0x7fffc8462000 + 0x1000 instead of from 0x7fffc8463000 + 0x1000 This issue is not unique to other memory operations, such as ROCR or ZE, the difference is in these other cases the offset is pre-calculated on the sending path, which avoids the problem. It is conceivable to calculate the XPMEM offset on the sending path as well, however, the current code structure doesn't allow for it. The sending path is shared with CMA, which doesn't have the same requirements. Changes which will be needed are going to be more extensive than this patch, with no clear advantage. The caching infrastructure already provides a memory monitor mechanism to validate memory. This can be used to ensure that the memory region being looked up starts at the same address as the cached memory region and be within this region. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
- Loading branch information