Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip certain unit tests on NAVI #1950

Merged

Skipped unit tests in test_fsdp_sharded_grad_scaler.py

eb38990
Select commit
Loading
Failed to load commit list.
Merged

Skip certain unit tests on NAVI #1950

Skipped unit tests in test_fsdp_sharded_grad_scaler.py
eb38990
Select commit
Loading
Failed to load commit list.
ROCm Repo Management API / Jenkins failed Mar 10, 2025 in 4m 16s

Build PyTorch/Build PyTorch/Build PyTorch: error in 'archiveArtifacts' step

Build PyTorch / Build PyTorch / Build PyTorch / Build PyTorch / Shell Script

Error in sh step, with arguments #!/usr/bin/bash set -o pipefail ./build_pytorch.sh 2>&1 | tee build_pytorch.log .

script returned exit code 1
Build log
[2025-03-10T17:33:22.525Z] + source prepare_docker_env.sh -e 'PYTORCH_ROCM_ARCH=gfx90a;gfx908;gfx942' -e CUSTOM_TEST_ARTIFACT_BUILD_DIR=build/custom_test_artifacts -e CUSTOM_TEST_ARTIFACTS_FILE=test_artifacts.zip
[2025-03-10T17:33:22.525Z] ++ set -ex
[2025-03-10T17:33:22.525Z] ++ copy_whls=0
[2025-03-10T17:33:22.525Z] ++ envvars=()
[2025-03-10T17:33:22.525Z] ++ [[ 6 -gt 0 ]]
[2025-03-10T17:33:22.525Z] ++ key=-e
[2025-03-10T17:33:22.525Z] ++ case $key in
[2025-03-10T17:33:22.525Z] ++ envvars+=("-e $2")
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ [[ 4 -gt 0 ]]
[2025-03-10T17:33:22.525Z] ++ key=-e
[2025-03-10T17:33:22.525Z] ++ case $key in
[2025-03-10T17:33:22.525Z] ++ envvars+=("-e $2")
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ [[ 2 -gt 0 ]]
[2025-03-10T17:33:22.525Z] ++ key=-e
[2025-03-10T17:33:22.525Z] ++ case $key in
[2025-03-10T17:33:22.525Z] ++ envvars+=("-e $2")
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ shift
[2025-03-10T17:33:22.525Z] ++ [[ 0 -gt 0 ]]
[2025-03-10T17:33:22.525Z] ++ [[ '' == \t\r\u\e ]]
[2025-03-10T17:33:22.525Z] ++ [[ true == \t\r\u\e ]]
[2025-03-10T17:33:22.525Z] ++ envvars+=("-e TEST_CORE=${TEST_CORE}")
[2025-03-10T17:33:22.525Z] ++ docker ps -a
[2025-03-10T17:33:22.525Z] CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
[2025-03-10T17:33:22.525Z] +++ docker ps -q
[2025-03-10T17:33:22.525Z] ++ docker kill
[2025-03-10T17:33:22.525Z] docker: 'docker kill' requires at least 1 argument
[2025-03-10T17:33:22.525Z] 
[2025-03-10T17:33:22.525Z] Usage:  docker kill [OPTIONS] CONTAINER [CONTAINER...]
[2025-03-10T17:33:22.525Z] 
[2025-03-10T17:33:22.525Z] See 'docker kill --help' for more information
[2025-03-10T17:33:22.525Z] ++ true
[2025-03-10T17:33:22.525Z] +++ docker ps -a -q
[2025-03-10T17:33:22.525Z] ++ docker rm
[2025-03-10T17:33:22.525Z] docker: 'docker rm' requires at least 1 argument
[2025-03-10T17:33:22.525Z] 
[2025-03-10T17:33:22.525Z] Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]
[2025-03-10T17:33:22.525Z] 
[2025-03-10T17:33:22.525Z] See 'docker rm --help' for more information
[2025-03-10T17:33:22.525Z] ++ true
[2025-03-10T17:33:22.525Z] ++ docker pull rocm/pytorch-ci-private:pytorch-linux-jammy-rocm6.3.4-py3.10-184f1d217023ec8306d62d2ace629066b8426a8a-gfx908_gfx90a_gfx942
[2025-03-10T17:33:23.252Z] pytorch-linux-jammy-rocm6.3.4-py3.10-184f1d217023ec8306d62d2ace629066b8426a8a-gfx908_gfx90a_gfx942: Pulling from rocm/pytorch-ci-private
[2025-03-10T17:33:23.252Z] Digest: sha256:f97d86445daa1cbe1075fe01934c7613b3e04bfe7ed78049b82fc9c1bb62668a
[2025-03-10T17:33:23.252Z] Status: Image is up to date for rocm/pytorch-ci-private:pytorch-linux-jammy-rocm6.3.4-py3.10-184f1d217023ec8306d62d2ace629066b8426a8a-gfx908_gfx90a_gfx942
[2025-03-10T17:33:23.252Z] docker.io/rocm/pytorch-ci-private:pytorch-linux-jammy-rocm6.3.4-py3.10-184f1d217023ec8306d62d2ace629066b8426a8a-gfx908_gfx90a_gfx942
[2025-03-10T17:33:23.252Z] ++++ dirname prepare_docker_env.sh
[2025-03-10T17:33:23.252Z] +++ cd .
[2025-03-10T17:33:23.252Z] +++ pwd
[2025-03-10T17:33:23.252Z] ++ script_dir=/var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline
[2025-03-10T17:33:23.252Z] ++ docker run --name pytorch-ci-container -e 'PYTORCH_ROCM_ARCH=gfx90a;gfx908;gfx942' -e CUSTOM_TEST_ARTIFACT_BUILD_DIR=build/custom_test_artifacts -e CUSTOM_TEST_ARTIFACTS_FILE=test_artifacts.zip -e TEST_CORE=true -v /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline:/host_workspace -v /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline:/scripts -t -d --network=host --device=/dev/kfd --device=/dev/dri --ipc=host --pid=host --shm-size 8G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch-ci-private:pytorch-linux-jammy-rocm6.3.4-py3.10-184f1d217023ec8306d62d2ace629066b8426a8a-gfx908_gfx90a_gfx942 /bin/cat
[2025-03-10T17:33:23.252Z] 480fe4425f3ab7c7d0f95e111fb8bf5de1a3f1dc700ee942c6be788d32e989c0
[2025-03-10T17:33:23.252Z] ++ env -i python3 /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/tools/stats/export_test_times.py
[2025-03-10T17:33:23.968Z] Exporting test times from test-infra
[2025-03-10T17:33:23.968Z] Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-times.json to /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.additional_ci_files/test-times.json
[2025-03-10T17:33:23.968Z] Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-class-times.json to /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.additional_ci_files/test-class-times.json
[2025-03-10T17:33:23.968Z] ++ python3 /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/tools/stats/export_test_times.py
[2025-03-10T17:33:23.968Z] Exporting test times from test-infra
[2025-03-10T17:33:23.968Z] Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-times.json to /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.additional_ci_files/test-times.json
[2025-03-10T17:33:23.968Z] Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-class-times.json to /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.additional_ci_files/test-class-times.json
[2025-03-10T17:33:23.968Z] ++ [[ -d .additional_ci_files ]]
[2025-03-10T17:33:23.968Z] ++ sed -i -E 's/"\S+rocm\S+"/""/g' /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.pytorch-test-times.json
[2025-03-10T17:33:23.968Z] sed: can't read /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch/.pytorch-test-times.json: No such file or directory
[2025-03-10T17:33:23.968Z] ++ true
[2025-03-10T17:33:23.968Z] ++ [[ '' != \t\r\u\e ]]
[2025-03-10T17:33:23.968Z] ++ docker exec -u root pytorch-ci-container bash -c 'rm -rf /var/lib/jenkins/pytorch'
[2025-03-10T17:33:23.968Z] ++ docker cp /var/z1_miciadmin/internal/workspace/pytorch/pytorch-ci-pipeline/pytorch pytorch-ci-container:/var/lib/jenkins/pytorch
[2025-03-10T17:33:26.617Z] ++ docker exec -u root pytorch-ci-container bash -c 'chown root:root -R /var/lib/jenkins/pytorch'
[2025-03-10T17:33:26.617Z] ++ [[ 0 -eq 1 ]]
[2025-03-10T17:33:26.617Z] ++ commands=()
[2025-03-10T17:33:26.617Z] ++ PYTORCH_DIRECTORY=/var/lib/jenkins/pytorch
[2025-03-10T17:33:26.617Z] ++ [[ -n '' ]]
[2025-03-10T17:33:26.617Z] + source exec_inside_docker.sh --work-dir /var/lib/jenkins/pytorch
[2025-03-10T17:33:26.617Z] ++ set -ex
[2025-03-10T17:33:26.617Z] +++ caller
[2025-03-10T17:33:26.617Z] +++ awk '{print $2}'
[2025-03-10T17:33:26.617Z] ++ script_to_exec=./build_pytorch.sh
[2025-03-10T17:33:26.617Z] ++ work_dir=/var/lib/jenkins
[2025-03-10T17:33:26.617Z] ++ [[ 2 -gt 1 ]]
[2025-03-10T17:33:26.617Z] ++ key=--work-dir
[2025-03-10T17:33:26.617Z] ++ case $key in
[2025-03-10T17:33:26.617Z] ++ work_dir=/var/lib/jenkins/pytorch
[2025-03-10T17:33:26.617Z] ++ shift
[2025-03-10T17:33:26.617Z] ++ shift
[2025-03-10T17:33:26.617Z] ++ [[ 0 -gt 1 ]]
[2025-03-10T17:33:26.617Z] ++ echo 'cd /var/lib/jenkins/pytorch'
[2025-03-10T17:33:26.617Z] ++ sed -n '/^#### / { s///; :a; n; p; ba; }' ./build_pytorch.sh
[2025-03-10T17:33:26.617Z] ++ docker exec -u root -i pytorch-ci-container bash
[2025-03-10T17:33:26.617Z] + pip3 install pytest-xdist pytest-rerunfailures xdoctest==1.0.2
[2025-03-10T17:33:27.184Z] Requirement already satisfied: pytest-xdist in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (3.3.1)
[2025-03-10T17:33:27.184Z] Requirement already satisfied: pytest-rerunfailures in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (14.0)
[2025-03-10T17:33:33.917Z] WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f752e0d8790>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/xdoctest/
[2025-03-10T17:33:47.700Z] WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f752e0d8ac0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/xdoctest/
[2025-03-10T17:33:59.410Z] WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f752e0d8c70>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/xdoctest/
[2025-03-10T17:34:13.216Z] WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f752e0d8e20>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/xdoctest/
[2025-03-10T17:34:27.003Z] WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f752e0d8fd0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/xdoctest/
[2025-03-10T17:34:32.532Z] ERROR: Could not find a version that satisfies the requirement xdoctest==1.0.2 (from versions: none)
[2025-03-10T17:34:32.532Z] ERROR: No matching distribution found for xdoctest==1.0.2

Build PyTorch / Build PyTorch / Build PyTorch / Build PyTorch / Error signal

Error in error step, with arguments PyTorch build failed hudson.AbortException: script returned exit code 1.

PyTorch build failed hudson.AbortException: script returned exit code 1

Build PyTorch / Build PyTorch / Build PyTorch / Build PyTorch / Archive the artifacts

Error in archiveArtifacts step.

No artifacts found that match the file pattern "*.whl, test_artifacts.zip". Configuration error?
Build log
[2025-03-10T17:34:33.651Z] Archiving artifacts
[2025-03-10T17:34:34.262Z] ‘*.whl’ doesn’t match anything

Details

  • Initialize (47 sec)
    • Initialize (46 sec)
      • Initialize (44 sec)
        • Node info (25 sec)
        • Download CI scripts (9.5 sec)
  • Build PyTorch (3 min 8 sec)
    • Build PyTorch (3 min 7 sec)
      • Build PyTorch (3 min 5 sec)
        • Node info (27 sec)
        • Checkout Pytorch (33 sec)
        • Check base Docker image existence (8.8 sec)
        • Pull Docker Image (2.7 sec)
        • Build Docker image (1 sec)
        • Build PyTorch (1 min 14 sec)
          Error: script returned exit code 1
          Error: PyTorch build failed hudson.AbortException: script returned exit code 1
          Error: No artifacts found that match the file pattern ".whl, test_artifacts.zip". Configuration error?*
  • Tests (12 sec)
    • Test PyTorch (6 ms)
      • Test PyTorch (5.3 sec)
    • Test Distributed (7 ms)
      • Test Distributed (5.3 sec)
    • Test Inductor (6 ms)
      • Test Inductor (5.4 sec)
    • Test PyTorch Slow (7 ms)
      • Test PyTorch Slow (5.4 sec)
    • Microbenchmark (11 sec)
      • Microbenchmark (5.4 sec)
  • Post Build (1 sec)
  • Declarative: Post Actions (2.7 sec)