Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPEX XPU deepspeed #507

Closed
wants to merge 11 commits into from
Closed

IPEX XPU deepspeed #507

wants to merge 11 commits into from

Conversation

sramakintel
Copy link
Contributor

@sramakintel sramakintel commented Nov 4, 2024

Description

This PR adds support for Deepspeed XPU container layers for distributed training. This PR adds support for single node/multi-tile.

Related Issue

Changes Made

  • The code follows the project's coding standards.
  • No Intel Internal IP is present within the changes.
  • The documentation has been updated to reflect any changes in functionality.

Validation

  • I have tested any changes in container groups locally with test_runner.py with all existing tests passing, and I have added new tests where applicable.

Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
@sramakintel sramakintel added the WIP Work in Progress label Nov 4, 2024
@sramakintel sramakintel self-assigned this Nov 4, 2024
Copy link

github-actions bot commented Nov 4, 2024

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
pip/deepspeed 0.15.3 🟢 6.5
Details
CheckScoreReason
Code-Review🟢 9Found 28/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices🟢 5badge detected: Passing
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Packaging🟢 10packaging workflow detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 73 existing vulnerabilities detected
pip/neural-compressor 3.1.1 🟢 7.3
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 7 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 2badge detected: InProgress
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 10security policy file detected
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
SAST🟢 10SAST tool is run on all commits
Binary-Artifacts🟢 10no binaries found in the repo
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities⚠️ 093 existing vulnerabilities detected
pip/py-cpuinfo 9.0.0 🟢 3.8
Details
CheckScoreReason
Maintained⚠️ 00 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Code-Review🟢 4Found 7/17 approved changesets -- score normalized to 4
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0

Scanned Files

  • pytorch/multinode/requirements.txt

Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Copy link

github-actions bot commented Nov 5, 2024

Integration Test Results

Groups Tested: pytorch/serving, pytorch/tests

Results
Test-Group Test Status
pytorch/serving ipex-serving-cpu-model-archive PASS
pytorch/serving ipex-serving-cpu-workflow-archive PASS
pytorch/serving ipex-serving-cpu-rest-workflow PASS
pytorch/serving ipex-serving-cpu-rest-inference PASS
pytorch/serving ipex-serving-cpu-grpc-inference PASS
pytorch/serving ipex-serving-xpu-model-archive PASS
pytorch/serving ipex-serving-xpu-rest-inference PASS
pytorch/tests import-xpu-deepspeed-idp PASS
pytorch/tests import-xpu-deepspeed-pip PASS
pytorch/tests deepspeed-xpu-idp FAIL
pytorch/tests deepspeed-xpu-pip PASS

Overall Result: FAIL ❌

Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
Signed-off-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
@sramakintel sramakintel closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant