Skip to content

Commit 9650481

Browse files
Kayobe Automationassumptionsandg
Kayobe Automation
authored andcommitted
Add DOCA install playbook
1 parent e26298b commit 9650481

File tree

4 files changed

+69
-26
lines changed

4 files changed

+69
-26
lines changed

doc/source/contributor/ofed.rst

+38-23
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,17 @@ OFED
44

55
Warning: Experimental workflow subject to change
66

7-
This section documents the workflow for building OFED packages for Release train integration.
8-
9-
The workflow builds the OFED kernel modules against the latest available kernel in Release train
10-
(as configured in SKC) and compiles them into RPM packages to be uploaded to Ark. Addtionally,
11-
this workflow downloads the userspace OFED packages from the Nvidia repository and uploads these
12-
to Ark.
7+
The Nvidia DOCA framework is distributed as part of StackHPC Release Train for OFED driver support,
8+
this repository is synced into Ark as part of the Release Train worfkflows, however to ensure
9+
compatibility with Release Train packages, we are required to build OFED modules with support for
10+
the latest Release Train kernel.
1311

1412
Workflow
1513
========
1614

1715
The workflow uses workflow_dispatch to manually request an OFED build, which will deploy a builder
1816
VM, apply kayobe config to the builder, upgrade the kernel, reboot, then run two Ansible playbooks
19-
for building and uploading OFED to Ark.
17+
for building and uploading OFED modules to Ark.
2018

2119
Pre-requisites
2220
--------------
@@ -25,31 +23,48 @@ Before building OFED packages, the workflow will ensure that:
2523

2624
* A full distro-sync has taken place, ensuring the kernel is upgraded.
2725

28-
* The bootloader has been configured to use the latest kernel
26+
* The bootloader has been configured to use the latest kernel (reset-bls-entries.yml)
2927

3028
* noexec is disabled in the temporary logical volume.
3129

3230
build-ofed
3331
----------
3432

35-
Currently we only support building Rocky Linux 9 OFED packages.
36-
37-
In order to setup OFED, we're required to build kernel modules for the OFED drivers as
38-
the kernels we provide in release train are unsupported by OFED. To accomplish this we
39-
will need to use the doca-kernel-support from the doca-extra repository.
33+
Currently we only support building Rocky Linux 9 OFED kerenl module packages.
4034

41-
We will need to instll dependencies in order to build the OFED kernel modules, and these
42-
are installed at the beginning of the build playbook. We also install base and appstream
43-
dependencies of userspace OFED packages here, this is intended to stop these dependencies
44-
being pulled in later when we download the OFED packages from the doca-host repository.
35+
The Build OFED module workflow will check that the filesystem is configured (noexec disabled)
36+
to allow the DOCA build script to run. The workflow will also install any necessary dependencies
37+
for the module build.
4538

46-
At the end of the playbook following the kernel module build, the OFED userspace packages
47-
are downloaded from the upstream repository in order to upload these to Ark.
39+
The build script will output a ``doca-kernel-repo`` RPM which contains all kernel modules built
40+
as part of the workflow. When this RPM is installed, the repofile is created pointing to the
41+
modules in `/usr/share/doca-host-<doca-version>/Modules/<kernel-version>/` on the host.
4842

4943
push-ofed
5044
---------
5145

52-
As we're not syncing OFED from any upstream source, and are instead creating our own
53-
repository of custom packages, we will be required to setup the Pulp distribution/publication
54-
and upload the content directly to Ark. This playbook uses the Pulp CLI to upload the RPMs
55-
to Ark.
46+
As mentioned above, the DOCA repository is synced into the `doca` repository in Ark. This workflow
47+
will upload the ``doca-kernel-repo`` RPM to a seperate repository named `doca-modules`. The version
48+
for this repository is set in `pulp-repo-versions.yml` and is disabled for local pulp syncs by
49+
default.
50+
51+
Install process
52+
===============
53+
54+
Pre-requisites
55+
--------------
56+
57+
* Ensure the OFED hosts are upgraded with the latest packages in the point release.
58+
59+
* The bootloader has been configured to use the latest kernel (reset-bls-entries.yml)
60+
61+
install-doca
62+
------------
63+
64+
A playbook is provided to install DOCA on hosts in the `mlnx` group. Ensure this group
65+
is configured to include the hosts you wish to install DOCA on. To run the install
66+
playbook:
67+
68+
.. code-block:: console
69+
70+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/install-doca.yml

etc/kayobe/ansible/install-doca.yml

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
- name: Install DOCA
3+
become: true
4+
hosts: mlnx
5+
gather_facts: true
6+
tasks:
7+
- name: Get running kernel
8+
ansible.builtin.command:
9+
cmd: "uname -r"
10+
register: kernel
11+
12+
- name: Install kernel repo
13+
ansible.builtin.dnf:
14+
name: doca-kernel-repo
15+
state: latest
16+
update_cache: true
17+
18+
- name: Ensure correct priority for DOCA modules
19+
ansible.builtin.lineinfile:
20+
line: "priority=-2"
21+
insertafter: EOF
22+
path: "/etc/yum.repos.d/doca-kernel-{{ kernel.stdout }}.repo"
23+
24+
- name: Install DOCA OFED
25+
ansible.builtin.dnf:
26+
name: doca-ofed
27+
state: latest
28+
update_cache: true

etc/kayobe/dnf.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,9 @@ dnf_custom_repos_doca:
6262
password: "{{ stackhpc_repo_mirror_password | default(omit, true) }}"
6363
doca-modules:
6464
baseurl: "{{ stackhpc_repo_rhel9_doca_modules_url }}"
65-
description: "OFED Kernel modules for DOCA {{ stackhpc_pulp_doca_version }} - RHEL $releasever"
65+
description: "OFED Kernel module repository for DOCA {{ stackhpc_pulp_doca_version }} - RHEL $releasever"
6666
enabled: "{{ dnf_enable_doca_modules | bool | default(false) }}"
67-
priority: -2
67+
priority: -1
6868
file: doca
6969
gpgcheck: no
7070
username: "{{ stackhpc_repo_mirror_username | default(omit, true) }}"

etc/kayobe/pulp-repo-versions.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,4 +52,4 @@ stackhpc_pulp_repo_ubuntu_jammy_version: 20240924T064114
5252
stackhpc_pulp_repo_rhel_9_4_doca_version: 20241211T153620
5353
stackhpc_pulp_repo_rhel_9_4_doca_modules_version: 20241213T112245
5454
stackhpc_pulp_repo_rhel_9_5_doca_version: 20241211T171301
55-
stackhpc_pulp_repo_rhel_9_5_doca_modules_version: 20241213T112245
55+
stackhpc_pulp_repo_rhel_9_5_doca_modules_version: 20250115T150314

0 commit comments

Comments
 (0)