Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/xpmem: Fix xpmem memory corruption #9874

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

amirshehataornl
Copy link
Contributor

The offset into the XPMEM memory to be copied is calculated on the receiving end. This introduces a constraint on caching. XPMEM code calculates the memory to map to the beginning of the nearest page to the start address and the length to the bottom of the last page. It then caches that memory region and attaches to it if necessary. The offset is calculated based on the original address sent and the address which is calculated for attaching. After getting the mapped address, the copy is done from the mapped address plus the offset.

When searching the cache a cache hit is found if the address being searched is within a memory regions which has already been cached. However, this introduces an issue for XPMEM. Here is an example to illustrate the issue:

address being looked up: 0x7fffc8463000
Length: 0x5FFF
Ending address: 0x7FFFC8468FFF
offset: 0x1000

Address cached: 0x7fffc8462000
Lengh: 0x6FFF
Ending Address: 0x7FFFC8468FFF

As shown in the example the address being looked up in the cache is within the cached memory region.

However, if the cached memory region address is returned and subsequently the calculated offset is used to copy the data, there will be a discrepancy of a page, leading to memory corruption. In the above example the copy will start from 0x7fffc8462000 + 0x1000 instead of from 0x7fffc8463000 + 0x1000

This issue is not unique to other memory operations, such as ROCR or ZE, the difference is in these other cases the offset is pre-calculated on the sending path, which avoids the problem.

It is conceivable to calculate the XPMEM offset on the sending path as well, however, the current code structure doesn't allow for it. The sending path is shared with CMA, which doesn't have the same requirements. Changes which will be needed are going to be more extensive than this patch, with no clear advantage.

The caching infrastructure already provides a memory monitor mechanism to validate memory. This can be used to ensure that the memory region being looked up starts at the same address as the cached memory region and be within this region.

@j-xiong
Copy link
Contributor

j-xiong commented Mar 14, 2024

The ofi_mr_entry returned from ofi_mr_cache_search or ofi_mr_cache_find should have information about the original range the mr was registered for. With that the correct offset can be calculated. For example, in current implementation of xpmem_copy shown below, offset used in line 190 should be calculated using the base from mr_entry.info.iov.iov_base:

174         for (i = 0; i < remote_cnt; i++) {
175                 iov.iov_base = (void *) ofi_get_page_start(remote[i].iov_base,
176                                                            page_size);
177                 iov.iov_len =
178                         (uintptr_t) ofi_get_page_end(remote[i].iov_base +
179                                         remote[i].iov_len, page_size) -
180                                         (uintptr_t)iov.iov_base;
181                 offset = (uintptr_t)((uintptr_t) remote[i].iov_base -
182                                      (uintptr_t) iov.iov_base);
183
184                 ret = ofi_xpmem_cache_search(xpmem_cache, &iov, pid, &mr_entry,
185                                              (struct xpmem_client *)user_data);
186                 if (ret)
187                         return ret;
188
189                 mapped_addr = (char*) (uintptr_t)mr_entry->info.mapped_addr +
190                                 offset;
191
192                 copy_len = (local[i].iov_len <= iov.iov_len - offset) ?
193                                 local[i].iov_len : iov.iov_len - offset;
194
195                 if (write)
196                         xpmem_memcpy(mapped_addr, local[i].iov_base,
197                                      copy_len);
198                 else
199                         xpmem_memcpy(local[i].iov_base, mapped_addr,
200                                      copy_len);
201
202                 ofi_mr_cache_delete(xpmem_cache, mr_entry);
203         }

@amirshehataornl
Copy link
Contributor Author

This isn't going to work. You can't calculate the offset based on the original value. You have to calculate the offset based on the new information. Basically, what I'm doing here is adding an extra requirement that if the cached value doesn't match the new request, it means the cached value is stale and needs to be cleared and re-attached.

@j-xiong
Copy link
Contributor

j-xiong commented Mar 14, 2024

I understand what this patch tries to do, but that excludes valid cache reuse.

By "using the mr_entry info to calculate the offset" I meant to use the base from the mr_entry to adjust the offset. Basically, the offset should be adjusted by the difference between the address searched for and the base of the mr was registered with, with the sample data you gave, that would be 0x7fffc8463000 - 0x7fffc8462000 = 0x1000. So the offset added to the mapped address would be 0x1000 + 0x1000 = 0x2000 and the copy would use starting address at 0x7fffc8462000 + 0x2000.

@zachdworkin
Copy link
Contributor

@amirshehataornl please rebase to get latest Intel CI changes. It will allow your test to pass the cuda stage. I have aborted it early to save resources and it will auto-run again when you rebase.

@zachdworkin
Copy link
Contributor

@amirshehataornl im very sorry but can you rebase one more time? We just had another CI infrastructure change that caused our ze-shm stage to fail. Rebasing will pull in changes from #9898 and allow the stage to pass.

The offset into the XPMEM memory to be copied is calculated on the
receiving end.

XPMEM code calculates the memory to map to in the following manner:

memory_region_start_address = ofi_get_page_start(original_address)
memory_region_length = ofi_get_page_end(original_address+original_length)

It then caches that memory region and attaches to it if necessary. The
offset is calculated as follows:

offset = original_address - memory_region_start_address

The cache search is provided the following parameters:
- memory_region_start_address
- memory_region_length

After getting the mapped address, the copy is done from the mapped
address plus the offset.

When searching the cache a cache hit is found if the region being
searched is within a memory region which has already been cached.
However, this exposes a bug in XPMEM. Here is an example to
illustrate the issue:

address being looked up: 0x7fffc8463000
Length: 0x5FFF
Ending address: 0x7FFFC8468FFF
offset: 0x1000

Address cached: 0x7fffc8462000
Lengh: 0x6FFF
Ending Address: 0x7FFFC8468FFF

As shown in the example the memory region being looked up in the cache is
within the cached memory region.

However, if the cached memory region address is returned and subsequently the
calculated offset is used to copy the data, there will be a discrepancy
of a page, leading to memory corruption. In the above example the copy
will start from 0x7fffc8462000 + 0x1000 instead of from
0x7fffc8463000 + 0x1000

To resolve the issue, calculate the delta between the remote address
which is returned in the cache result and the remote address used in the
operation. Add that delta to the offset to calculate the starting
address of the memory operation.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
@j-xiong j-xiong merged commit 16385db into ofiwg:main Mar 18, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants