Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix when USE_C10D_XCCL define in pytorch will not correct update cache #1441

Merged
merged 2 commits into from
Mar 10, 2025

Conversation

Chao1Han
Copy link
Contributor

@Chao1Han Chao1Han commented Mar 7, 2025

This pr for fix use_xccl and use_c10_xccl define in pytorch raise build error
pytorch/pytorch#147593
After pytorch pr land, we will remove torch-xpu-ops defination.

@Chao1Han Chao1Han requested a review from guangyey March 7, 2025 07:08
CMakeLists.txt Outdated
@@ -68,6 +68,8 @@ endif()
if(USE_DISTRIBUTED AND USE_XCCL AND USE_C10D_XCCL)
caffe2_update_option(USE_C10D_XCCL ON)
else()
# WA for USE_C10D_XCCL duplicated definition, will remove it after USE_C10D_XCCL and USE_XCCL define in PyTorch
set(USE_C10D_XCCL OFF CACHE BOOL "Build with XCCL support for C10D" FORCE)
Copy link
Contributor

@guangyey guangyey Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to only change line 43 to cmake_dependent_option(USE_C10D_XCCL "USE C10D XCCL" ON "USE_DISTRIBUTED;USE_XCCL" OFF)
It seems that it is legal to use cmake_dependent_option multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a better approach.

Copy link
Contributor

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a try on local machine to make sure this is compatible with pytorch/pytorch#147593 in the following scenarios:

  • python setup.py install
  • USE_CCL=ON python setup.py install with souring oneccl env
  • USE_CCL=ON python setup.py install without sourcing oneccl env

@Chao1Han
Copy link
Contributor Author

Chao1Han commented Mar 7, 2025

Please have a try on local machine to make sure this is compatible with pytorch/pytorch#147593 in the following scenarios:

  • python setup.py install
  • USE_CCL=ON python setup.py install with souring oneccl env
  • USE_CCL=ON python setup.py install without sourcing oneccl env

It works well

  • python setup.py install will build without xccl
  • USE_CCL=ON python setup.py install with souring oneccl env will build with xccl
  • USE_CCL=ON python setup.py install without sourcing oneccl env will without xccl and raise warning OneCCL library not found!!

@chuanqi129 chuanqi129 merged commit 56e7dda into main Mar 10, 2025
8 of 9 checks passed
@chuanqi129 chuanqi129 deleted the xccl/fix_cmak branch March 10, 2025 01:44
guangyey pushed a commit that referenced this pull request Mar 10, 2025
#1441)

This pr for fix use_xccl and use_c10_xccl define in pytorch raise build
error
pytorch/pytorch#147593
After pytorch pr land, we will remove torch-xpu-ops defination.

(cherry picked from commit 56e7dda)
CuiYifeng added a commit that referenced this pull request Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants