Skip to content

Commit ed557c2

Browse files
sharvil10pre-commit-ci[bot]tylertitsworthsramakintel
authored
Fix SSH port by default in multinode container (#214)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com> Signed-off-by: Tyler Titsworth <tyler.titsworth@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: tylertitsworth <tyler.titsworth@intel.com> Co-authored-by: Srikanth Ramakrishna <srikanth.ramakrishna@intel.com>
1 parent e46f52e commit ed557c2

8 files changed

+113
-63
lines changed

pytorch/Dockerfile

+8-29
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,10 @@ RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missin
8585
ENV SIGOPT_PROJECT=.
8686

8787
WORKDIR /
88-
COPY multinode-requirements.txt .
88+
COPY multinode/requirements.txt requirements.txt
8989

90-
RUN python -m pip install --no-cache-dir -r multinode-requirements.txt
90+
RUN python -m pip install --no-cache-dir -r requirements.txt && \
91+
rm -rf requirements.txt
9192

9293
ENV LD_LIBRARY_PATH="/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}:/usr/local/lib/python${PYTHON_VERSION}/dist-packages/oneccl_bindings_for_pytorch/opt/mpi/libfabric/lib:/usr/local/lib/python${PYTHON_VERSION}/dist-packages/oneccl_bindings_for_pytorch/lib"
9394

@@ -99,16 +100,11 @@ RUN apt-get install -y --no-install-recommends --fix-missing \
99100
apt-get clean && \
100101
rm -rf /var/lib/apt/lists/*
101102

102-
# Allow OpenSSH to talk to containers without asking for confirmation
103-
# hadolint global ignore=SC2002
104-
RUN mkdir -p /var/run/sshd && \
105-
cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \
106-
echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \
107-
mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config
103+
RUN mkdir -p /var/run/sshd
108104

109105
ARG PYTHON_VERSION
110106

111-
COPY generate_ssh_keys.sh .
107+
COPY multinode/generate_ssh_keys.sh /generate_ssh_keys.sh
112108

113109
# modify generate_ssh_keys to be a helper script
114110
# print how to use helper script on bash startup
@@ -117,26 +113,9 @@ RUN echo "source /usr/local/lib/python${PYTHON_VERSION}/dist-packages/oneccl_bin
117113
cat '/generate_ssh_keys.sh' >> ~/.startup && \
118114
rm -rf /generate_ssh_keys.sh
119115

120-
# hadolint global ignore=SC3037
121-
RUN echo -e "#!/bin/bash \n\
122-
set -e \n\
123-
set -a \n\
124-
source ~/.startup \n\
125-
set +a \n\
126-
eval \"\$@\"" >> /usr/local/bin/dockerd-entrypoint.sh && \
127-
chmod +x /usr/local/bin/dockerd-entrypoint.sh
128-
129-
RUN echo 'HostKey /etc/ssh/ssh_host_dsa_key' > /var/run/sshd_config && \
130-
echo 'HostKey /etc/ssh/ssh_host_rsa_key' > /var/run/sshd_config && \
131-
echo 'HostKey /etc/ssh/ssh_host_ecdsa_key' > /var/run/sshd_config && \
132-
echo 'HostKey /etc/ssh/ssh_host_ed25519_key' > /var/run/sshd_config && \
133-
echo 'AuthorizedKeysFile /etc/ssh/authorized_keys' > /var/run/sshd_config && \
134-
echo '## Enable DEBUG log. You can ignore this but this may help you debug any issue while enabling SSHD for the first time' > /var/run/sshd_config && \
135-
echo 'LogLevel DEBUG3' > /var/run/sshd_config && \
136-
echo 'UsePAM yes' > /var/run/sshd_config && \
137-
echo 'LoginGraceTime 0' >> /var/run/sshd_config && \
138-
echo 'LoginGraceTime 0' >> /etc/ssh/sshd_config && \
139-
echo 'Subsystem sftp /usr/lib/openssh/sftp-server' > /var/run/sshd_config
116+
COPY multinode/dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
117+
COPY multinode/sshd_config /etc/ssh/sshd_config
118+
COPY multinode/ssh_config /etc/ssh/ssh_config
140119

141120
RUN mkdir -p /licensing
142121

pytorch/README.md

+67-33
Original file line numberDiff line numberDiff line change
@@ -114,12 +114,8 @@ The images below additionally include [Intel® oneAPI Collective Communications
114114
| `2.0.0-pip-multinode` | [v2.0.0] | [v2.0.0+cpu] | [v2.0.0][ccl-v2.0.0] | [v2.1.1] | [v0.1.0] |
115115

116116
> **Note:** Passwordless SSH connection is also enabled in the image.
117-
> The container does not contain the SSH ID keys. The user needs to mount those keys at `/root/.ssh/id_rsa` and `/root/.ssh/id_rsa.pub`.
118-
> User also need to append content of id_rsa.pub in `/etc/ssh/authorized_keys` in the SSH server container.
119-
> Since the SSH key is not owned by default user account in docker, please also do "chmod 644 id_rsa.pub; chmod 644 id_rsa" to grant read access for default user account.
120-
> Users could also use "/usr/bin/ssh-keygen -t rsa -b 4096 -N '' -f ~/mnt/ssh_key/id_rsa" to generate a new SSH Key inside the container.
121-
> Users need to mount a config file to list all hostnames at location `/root/.ssh/config` on the SSH client container.
122-
> Once all files are added
117+
> The container does not contain the SSH ID keys. The user needs to mount those keys at `/root/.ssh/id_rsa` and `/etc/ssh/authorized_keys`.
118+
> Since the SSH key is not owned by default user account in docker, please also do "chmod 600 authorized_keys; chmod 600 id_rsa" to grant read access for default user account.
123119
124120
#### Setup and Run IPEX Multi-Node Container
125121

@@ -131,8 +127,7 @@ SSH Server (Worker)
131127

132128
SSH Client (Launcher)
133129

134-
1. *Config File with Host IPs* : `/root/.ssh/config`
135-
2. *Private User Key* : `/root/.ssh/id_rsa`
130+
1. *Private User Key* : `/root/.ssh/id_rsa`
136131

137132
To add these files correctly please follow the steps described below.
138133

@@ -146,47 +141,33 @@ To add these files correctly please follow the steps described below.
146141
cat id_rsa.pub >> authorized_keys
147142
```
148143

149-
2. Add hosts to config
150-
151-
The launcher container needs to have the a config file with all hostnames and ports specified. An example of a hostfile is provided below.
144+
2. Configure the permissions and ownership for all of the files you have created so far.
152145

153146
```bash
154-
touch config
147+
chmod 600 id_rsa config authorized_keys
148+
chown root:root id_rsa.pub id_rsa config authorized_keys
155149
```
156150

151+
3. Setup hostfile. The hostfile is needed for running torch distributed using `ipexrun` utility. If you're not using `ipexrun` you can skip this step.
152+
157153
```txt
158-
Host host1
159-
HostName <Hostname of host1>
160-
IdentitiesOnly yes
161-
Port <SSH Port>
162-
Host host2
163-
HostName <Hostname of host2>
164-
IdentitiesOnly yes
165-
Port <SSH Port>
154+
<Host 1 IP/Hostname>
155+
<Host 2 IP/Hostname>
166156
...
167157
```
168158
169-
3. Configure the permissions and ownership for all of the files you have created so far.
170-
171-
```bash
172-
chmod 600 id_rsa.pub id_rsa config authorized_keys
173-
chown root:root id_rsa.pub id_rsa config authorized_keys
174-
```
175-
176159
4. Now start the workers and execute DDP on the launcher.
177160
178161
1. Worker run command:
179162
180163
```bash
181-
export SSH_PORT=<SSH Port>
182164
docker run -it --rm \
183165
--net=host \
184-
-v $PWD/authorized_keys:/root/.ssh/authorized_keys \
166+
-v $PWD/authorized_keys:/etc/ssh/authorized_keys \
185167
-v $PWD/tests:/workspace/tests \
186168
-w /workspace \
187-
-e SSH_PORT=${SSH_PORT} \
188169
intel/intel-extension-for-pytorch:2.3.0-pip-multinode \
189-
bash -c '/usr/sbin/sshd -D -p ${SSH_PORT} -f /var/run/sshd_config'
170+
bash -c '/usr/sbin/sshd -D'
190171
```
191172
192173
2. Launcher run command:
@@ -195,12 +176,65 @@ To add these files correctly please follow the steps described below.
195176
docker run -it --rm \
196177
--net=host \
197178
-v $PWD/id_rsa:/root/.ssh/id_rsa \
198-
-v $PWD/config:/root/.ssh/config \
199179
-v $PWD/tests:/workspace/tests \
180+
-v $PWD/hostfile:/workspace/hostfile \
200181
-w /workspace \
182+
intel/intel-extension-for-pytorch:2.3.0-pip-multinode \
183+
bash -c 'ipexrun cpu --nnodes 2 --nprocs-per-node 1 --master-addr 127.0.0.1 --master-port 3022 /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl'
184+
```
185+
186+
5. Start SSH server with a custom port.
187+
If the user wants to define their own port to start the SSH server, it can be done so using the commands described below.
188+
189+
1. Worker command:
190+
191+
```bash
192+
export SSH_PORT=<User SSH Port>
193+
docker run -it --rm \
194+
--net=host \
195+
-v $PWD/authorized_keys:/etc/ssh/authorized_keys \
196+
-v $PWD/tests:/workspace/tests \
201197
-e SSH_PORT=${SSH_PORT} \
198+
-w /workspace \
199+
intel/intel-extension-for-pytorch:2.3.0-pip-multinode \
200+
bash -c '/usr/sbin/sshd -D -p ${SSH_PORT}'
201+
```
202+
203+
2. Add hosts to config. (**Note:** This is an optional step)
204+
205+
User can optionally mount their own custom client config file to define a list of hosts and ports where the SSH server is running inside the container. An example of a hostfile is provided below. This file is supposed to be mounted in the launcher container at `/etc/ssh/ssh_config`.
206+
207+
```bash
208+
touch config
209+
```
210+
211+
```txt
212+
Host host1
213+
HostName <Hostname of host1>
214+
IdentitiesOnly yes
215+
IdentityFile ~/.root/id_rsa
216+
Port <SSH Port>
217+
Host host2
218+
HostName <Hostname of host2>
219+
IdentitiesOnly yes
220+
IdentityFile ~/.root/id_rsa
221+
Port <SSH Port>
222+
...
223+
```
224+
225+
3. Launcher run command:
226+
227+
```bash
228+
docker run -it --rm \
229+
--net=host \
230+
-v $PWD/id_rsa:/root/.ssh/id_rsa \
231+
-v $PWD/config:/etc/ssh/ssh_config \
232+
-v $PWD/hostfile:/workspace/hostfile \
233+
-v $PWD/tests:/workspace/tests \
234+
-e SSH_PORT=${SSH_PORT} \
235+
-w /workspace \
202236
intel/intel-extension-for-pytorch:2.3.0-pip-multinode \
203-
bash -c 'ipexrun cpu /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl'
237+
bash -c 'ipexrun cpu --nnodes 2 --nprocs-per-node 1 --master-addr 127.0.0.1 --master-port ${SSH_PORT} /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl'
204238
```
205239
206240
> [!NOTE]

pytorch/docker-compose.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ services:
7777
dependency.apt.libglib2: true
7878
dependency.apt.python3-dev: true
7979
dependency.pip.apt.virtualenv: true
80-
dependency.python.pip: multinode-requirements.txt
80+
dependency.python.pip: multinode/requirements.txt
8181
org.opencontainers.base.name: "intel/intel-optimized-pytorch:${IPEX_VERSION:-2.2.0}-${PACKAGE_OPTION:-pip}-base"
8282
org.opencontainers.image.title: "Intel® Extension for PyTorch MultiNode Image"
8383
org.opencontainers.image.version: ${IPEX_VERSION:-2.2.0}-${PACKAGE_OPTION:-pip}-multinode
+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash
2+
# Copyright (c) 2024 Intel Corporation
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
set -e
17+
set -a
18+
# shellcheck disable=SC1091
19+
source "$HOME/.startup"
20+
set +a
21+
"$@"
File renamed without changes.
File renamed without changes.

pytorch/multinode/ssh_config

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Host *
2+
Port 3022
3+
IdentityFile ~/.ssh/id_rsa
4+
StrictHostKeyChecking no

pytorch/multinode/sshd_config

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
HostKey /etc/ssh/ssh_host_dsa_key
2+
HostKey /etc/ssh/ssh_host_rsa_key
3+
HostKey /etc/ssh/ssh_host_ecdsa_key
4+
HostKey /etc/ssh/ssh_host_ed25519_key
5+
AuthorizedKeysFile /etc/ssh/authorized_keys
6+
## Enable DEBUG log. You can ignore this but this may help you debug any issue while enabling SSHD for the first time
7+
LogLevel DEBUG3
8+
Port 3022
9+
UsePAM yes
10+
Subsystem sftp /usr/lib/openssh/sftp-server
11+
# https://ubuntu.com/security/CVE-2024-6387
12+
LoginGraceTime 0

0 commit comments

Comments
 (0)