prov/xpmem: Fix xpmem memory corruption #9874

amirshehataornl · 2024-03-11T15:30:53Z

The offset into the XPMEM memory to be copied is calculated on the receiving end. This introduces a constraint on caching. XPMEM code calculates the memory to map to the beginning of the nearest page to the start address and the length to the bottom of the last page. It then caches that memory region and attaches to it if necessary. The offset is calculated based on the original address sent and the address which is calculated for attaching. After getting the mapped address, the copy is done from the mapped address plus the offset.

When searching the cache a cache hit is found if the address being searched is within a memory regions which has already been cached. However, this introduces an issue for XPMEM. Here is an example to illustrate the issue:

address being looked up: 0x7fffc8463000
Length: 0x5FFF
Ending address: 0x7FFFC8468FFF
offset: 0x1000

Address cached: 0x7fffc8462000
Lengh: 0x6FFF
Ending Address: 0x7FFFC8468FFF

As shown in the example the address being looked up in the cache is within the cached memory region.

However, if the cached memory region address is returned and subsequently the calculated offset is used to copy the data, there will be a discrepancy of a page, leading to memory corruption. In the above example the copy will start from 0x7fffc8462000 + 0x1000 instead of from 0x7fffc8463000 + 0x1000

This issue is not unique to other memory operations, such as ROCR or ZE, the difference is in these other cases the offset is pre-calculated on the sending path, which avoids the problem.

It is conceivable to calculate the XPMEM offset on the sending path as well, however, the current code structure doesn't allow for it. The sending path is shared with CMA, which doesn't have the same requirements. Changes which will be needed are going to be more extensive than this patch, with no clear advantage.

The caching infrastructure already provides a memory monitor mechanism to validate memory. This can be used to ensure that the memory region being looked up starts at the same address as the cached memory region and be within this region.

j-xiong · 2024-03-14T17:40:28Z

The ofi_mr_entry returned from ofi_mr_cache_search or ofi_mr_cache_find should have information about the original range the mr was registered for. With that the correct offset can be calculated. For example, in current implementation of xpmem_copy shown below, offset used in line 190 should be calculated using the base from mr_entry.info.iov.iov_base:

174         for (i = 0; i < remote_cnt; i++) {
175                 iov.iov_base = (void *) ofi_get_page_start(remote[i].iov_base,
176                                                            page_size);
177                 iov.iov_len =
178                         (uintptr_t) ofi_get_page_end(remote[i].iov_base +
179                                         remote[i].iov_len, page_size) -
180                                         (uintptr_t)iov.iov_base;
181                 offset = (uintptr_t)((uintptr_t) remote[i].iov_base -
182                                      (uintptr_t) iov.iov_base);
183
184                 ret = ofi_xpmem_cache_search(xpmem_cache, &iov, pid, &mr_entry,
185                                              (struct xpmem_client *)user_data);
186                 if (ret)
187                         return ret;
188
189                 mapped_addr = (char*) (uintptr_t)mr_entry->info.mapped_addr +
190                                 offset;
191
192                 copy_len = (local[i].iov_len <= iov.iov_len - offset) ?
193                                 local[i].iov_len : iov.iov_len - offset;
194
195                 if (write)
196                         xpmem_memcpy(mapped_addr, local[i].iov_base,
197                                      copy_len);
198                 else
199                         xpmem_memcpy(local[i].iov_base, mapped_addr,
200                                      copy_len);
201
202                 ofi_mr_cache_delete(xpmem_cache, mr_entry);
203         }

amirshehataornl · 2024-03-14T17:57:17Z

This isn't going to work. You can't calculate the offset based on the original value. You have to calculate the offset based on the new information. Basically, what I'm doing here is adding an extra requirement that if the cached value doesn't match the new request, it means the cached value is stale and needs to be cleared and re-attached.

j-xiong · 2024-03-14T18:21:38Z

I understand what this patch tries to do, but that excludes valid cache reuse.

By "using the mr_entry info to calculate the offset" I meant to use the base from the mr_entry to adjust the offset. Basically, the offset should be adjusted by the difference between the address searched for and the base of the mr was registered with, with the sample data you gave, that would be 0x7fffc8463000 - 0x7fffc8462000 = 0x1000. So the offset added to the mapped address would be 0x1000 + 0x1000 = 0x2000 and the copy would use starting address at 0x7fffc8462000 + 0x2000.

zachdworkin · 2024-03-18T20:25:50Z

@amirshehataornl please rebase to get latest Intel CI changes. It will allow your test to pass the cuda stage. I have aborted it early to save resources and it will auto-run again when you rebase.

zachdworkin · 2024-03-18T21:22:37Z

@amirshehataornl im very sorry but can you rebase one more time? We just had another CI infrastructure change that caused our ze-shm stage to fail. Rebasing will pull in changes from #9898 and allow the stage to pass.

The offset into the XPMEM memory to be copied is calculated on the receiving end. XPMEM code calculates the memory to map to in the following manner: memory_region_start_address = ofi_get_page_start(original_address) memory_region_length = ofi_get_page_end(original_address+original_length) It then caches that memory region and attaches to it if necessary. The offset is calculated as follows: offset = original_address - memory_region_start_address The cache search is provided the following parameters: - memory_region_start_address - memory_region_length After getting the mapped address, the copy is done from the mapped address plus the offset. When searching the cache a cache hit is found if the region being searched is within a memory region which has already been cached. However, this exposes a bug in XPMEM. Here is an example to illustrate the issue: address being looked up: 0x7fffc8463000 Length: 0x5FFF Ending address: 0x7FFFC8468FFF offset: 0x1000 Address cached: 0x7fffc8462000 Lengh: 0x6FFF Ending Address: 0x7FFFC8468FFF As shown in the example the memory region being looked up in the cache is within the cached memory region. However, if the cached memory region address is returned and subsequently the calculated offset is used to copy the data, there will be a discrepancy of a page, leading to memory corruption. In the above example the copy will start from 0x7fffc8462000 + 0x1000 instead of from 0x7fffc8463000 + 0x1000 To resolve the issue, calculate the delta between the remote address which is returned in the cache result and the remote address used in the operation. Add that delta to the offset to calculate the starting address of the memory operation. Signed-off-by: Amir Shehata <shehataa@ornl.gov>

amirshehataornl force-pushed the xpmem-offset-fix branch from 5645e80 to d542e72 Compare March 11, 2024 15:51

amirshehataornl force-pushed the xpmem-offset-fix branch from d542e72 to f74cc79 Compare March 18, 2024 20:04

amirshehataornl force-pushed the xpmem-offset-fix branch from f74cc79 to d52d119 Compare March 18, 2024 20:29

amirshehataornl force-pushed the xpmem-offset-fix branch from d52d119 to 946cd86 Compare March 18, 2024 21:31

j-xiong approved these changes Mar 18, 2024

View reviewed changes

j-xiong added the for-1.21.x label Mar 18, 2024

j-xiong merged commit 16385db into ofiwg:main Mar 18, 2024
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prov/xpmem: Fix xpmem memory corruption #9874

prov/xpmem: Fix xpmem memory corruption #9874

amirshehataornl commented Mar 11, 2024

j-xiong commented Mar 14, 2024

amirshehataornl commented Mar 14, 2024

j-xiong commented Mar 14, 2024

zachdworkin commented Mar 18, 2024

zachdworkin commented Mar 18, 2024

prov/xpmem: Fix xpmem memory corruption #9874

prov/xpmem: Fix xpmem memory corruption #9874

Conversation

amirshehataornl commented Mar 11, 2024

j-xiong commented Mar 14, 2024

amirshehataornl commented Mar 14, 2024

j-xiong commented Mar 14, 2024

zachdworkin commented Mar 18, 2024

zachdworkin commented Mar 18, 2024