Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ramalama image built on Fedora using Fedora's rocm packages #596

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

maxamillion
Copy link
Collaborator

Add ramalama image built on Fedora using Fedora's rocm packages, this enables more models of embedded GPUs in the Ryzen APUs for more series of gfx than is available in the official ROCm packages from AMD that are used in the UBI based default images.

@rhatdan
Copy link
Member

rhatdan commented Jan 16, 2025

Do we actually need a separate Containerfile or could we just do this with a podman build --from fedora ... or I guess to make it more container engine agnostic use BUILDARGS.

@@ -0,0 +1,8 @@
FROM registry.fedoraproject.org/fedora:latest

COPY ../scripts /scripts
Copy link
Collaborator

@ericcurtin ericcurtin Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not obvious apologies, but "ramalama" means use a generic driver, aka vulkan/kompute. I think what we could do here is:

/scripts/build_llama_and_whisper.sh "rocm" "fedora"

with the first parameter being the GPU/AI driver (rocm in this case) and the second parameter could be fedora, ubi, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@maxamillion
Copy link
Collaborator Author

Do we actually need a separate Containerfile or could we just do this with a podman build --from fedora ... or I guess to make it more container engine agnostic use BUILDARGS.

That's probably a better option, I'll go that route.

@@ -36,7 +36,11 @@ dnf_install() {
dnf install -y asahi-repos
dnf install -y mesa-vulkan-drivers "${vulkan_rpms[@]}" "${rpm_list[@]}"
elif [ "$containerfile" = "rocm" ]; then
dnf install -y rocm-dev hipblas-devel rocblas-devel
if [ "${ID}" = "fedora" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer :)

@@ -1,8 +1,10 @@
FROM registry.fedoraproject.org/fedora:latest

RUN dnf update -y --refresh && dnf clean all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step doesn't seem unique to this Containerfile or anything, no big deal though

Copy link
Collaborator

@ericcurtin ericcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging on green build

@maxamillion
Copy link
Collaborator Author

@ericcurtin would you prefer I converge the two rocm Containerfiles and use --build-args as @rhatdan suggested or would that be a separate PR?

@ericcurtin
Copy link
Collaborator

@maxamillion I don't mind, if it works, you'll probably have to edit container_build.sh for that technique

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 16, 2025

The build failed because the container image is too large, we probably want to remove some older GPUs from it, for example on the UBI container images we rm'd *gfx9* files to trim the image, these are GPUs prior to ~2018

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 16, 2025

@debarshiray and @owtaylor I remember you guys wanted some Fedora based RamaLama images. Well... It's coming it seems 😄 Thanks to @maxamillion

@@ -115,7 +124,8 @@ main() {
dnf clean all
rm -rf /var/cache/*dnf* /opt/rocm-*/lib/llvm \
/opt/rocm-*/lib/rocblas/library/*gfx9* \
/opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp
/opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp \
/usr/lib64/rocm/gfx9*
Copy link
Collaborator

@ericcurtin ericcurtin Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make it small enough :)

It's actually a restriction on the storage space available on our github runner, but it's a welcome restriction, we don't want 20GB container images

One could always contribute a rocm-legacy-fedora type image for gfx9 if they had the motivation to do so.

Copy link
Collaborator

@ericcurtin ericcurtin Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually simplify the above line to:

/opt/rocm-*/lib/*/library/*gfx9*

not necessary in this PR or anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll find out if it's small enough soon! 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericcurtin it appears that the build is still angry about it 🤔

Copy link
Collaborator

@ericcurtin ericcurtin Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess you just gotta "podman run --rm -it bash" into it and look around and see what's big, gfx9 was enough to remove for the UBI versions. Trim, test to make sure you didn't break the ROCm support, then add the "rm -fr" here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericcurtin question - what's the size goal I'm trying to hit?

Copy link
Collaborator

@ericcurtin ericcurtin Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember exactly, these ones pushed here are built in this CI and are within the limit (https://quay.io/repository/ramalama/rocm?tab=tags). The ROCm images are the only images that suffer from this, I've never had to trim a CUDA one for example.

Copy link
Collaborator

@ericcurtin ericcurtin Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's arguably worth logging an issue in ROCm upstream, it's unlikely all those *gfx* files we trim are completely unique, I'm sure there's loads of duplicate data in those files that could be shared using some common files or some other form of de-duplication.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leaves a bad first impression for ramalama if users have to pull huge container images just to get going, some people won't wait around for hours for a large image to be pulled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do container images like rocm-gfx9-fedora, rocm-gfx10-fedora, etc. We may be able to detect what generation we are pre-pull

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 17, 2025

UBI has nothing lower than gfx9 @maxamillion I see you found gfx8 . If there's gfx7 or less, likely worth removing also.

@maxamillion
Copy link
Collaborator Author

@ericcurtin nothing older than gfx8, but it's still a lot of space (granted, it was 14G before I pulled both gfx9* and gfx8* out)

[root@a854647ecf79 /]# ls /usr/lib64/rocm
gfx10  gfx11  gfx1100  gfx1103
[root@a854647ecf79 /]# du -sh /usr/lib64/rocm/
5.3G    /usr/lib64/rocm/

@ericcurtin
Copy link
Collaborator

Looks like you are almost there, if only we could rank the GPU models by units sold or something and trim based on least units sold.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 17, 2025

Could also be interesting to see if there is further compression we can do other than whatever podman build does by default. @rhatdan any compression techniques, etc. we could do off the top of your head?

@ericcurtin
Copy link
Collaborator

gfx10 seems like the first models were released in 2019 FWIW

@ericcurtin
Copy link
Collaborator

Maybe we will be lucky and this will be small enough to go green

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jan 17, 2025

Seperate images for gfx8 gfx9 gfx10 gfx11 would be a welcome addition to be honest, it would improve UX, faster pulls. We may be able to detect which image we need to pull on a given machine by querying /proc /sys /dev etc. Pre-pull. Also any image with < 1GB VRAM has incredibly low value in enabling, often CPU inferencing is better when VRAM is this low (maybe even < 2GB, 4GB, 8GB VRAM). An 8 GB VRAM GPU is considered small these days.

@maxamillion
Copy link
Collaborator Author

well bummer ... even splitting them all out the builds are too large 🤔

@ericcurtin
Copy link
Collaborator

We could just use the exact same list of GPUs as the UBI9 images and enable additional ones on demand as people request.

@maxamillion maxamillion changed the title Add ramalama image built on Fedora using Fedora's rocm packages WIP: Add ramalama image built on Fedora using Fedora's rocm packages Jan 21, 2025
@maxamillion
Copy link
Collaborator Author

@ericcurtin I don't see a way to trim this down further, I'm open to suggestion though.

This is some sample output from the rocm-fedora-gfx11 image.

[root@167cf24d485c /]# du -sh /usr/* | grep G
8.6G    /usr/lib64

[root@167cf24d485c /]# du -sh /usr/lib64/* | grep G
1.1G    /usr/lib64/librocblas.so.4.2
1.9G    /usr/lib64/librocsparse.so.1.0
3.0G    /usr/lib64/rocm

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/* | grep G
2.3G    /usr/lib64/rocm/gfx11

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/gfx11/* | grep G
2.3G    /usr/lib64/rocm/gfx11/lib

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/gfx11/lib/*
32K     /usr/lib64/rocm/gfx11/lib/cmake
4.0K    /usr/lib64/rocm/gfx11/lib/libhipblas.so
4.0K    /usr/lib64/rocm/gfx11/lib/libhipblas.so.2
996K    /usr/lib64/rocm/gfx11/lib/libhipblas.so.2.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocblas.so
4.0K    /usr/lib64/rocm/gfx11/lib/librocblas.so.4
486M    /usr/lib64/rocm/gfx11/lib/librocblas.so.4.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocsolver.so.0
359M    /usr/lib64/rocm/gfx11/lib/librocsolver.so.0.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocsparse.so.1
830M    /usr/lib64/rocm/gfx11/lib/librocsparse.so.1.0
589M    /usr/lib64/rocm/gfx11/lib/rocblas

Based on the output, there's not really anything in there we can rip out based on specific GPU models. It's all just bundled into giant shared object files 😕

@maxamillion
Copy link
Collaborator Author

@ericcurtin in a shocking turn of events, if I set it to rawhide as a base image the rocm-fedora-gf11 image is about 4gb smaller than if I build with Fedora 41 🤷 .... any reservations with switching everything to rawhide for now?

@maxamillion
Copy link
Collaborator Author

@ericcurtin in a shocking turn of events, if I set it to rawhide as a base image the rocm-fedora-gf11 image is about 4gb smaller than if I build with Fedora 41 🤷 .... any reservations with switching everything to rawhide for now?

lol nvm .. it's smaller because cmake failed 🤦

@rhatdan
Copy link
Member

rhatdan commented Jan 22, 2025

I am fine with that, do you know why it is smaller?

@maxamillion
Copy link
Collaborator Author

I am fine with that, do you know why it is smaller?

because cmake failed 🤦

@ericcurtin
Copy link
Collaborator

No issue with rawhide, eventually we can call it Fedora 42

@ericcurtin
Copy link
Collaborator

This is an isolated image, so that helps

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 4, 2025

$ grep gfx /sys/class/kfd/kfd/topology/nodes/*/properties
/sys/class/kfd/kfd/topology/nodes/0/properties:gfx_target_version 0
/sys/class/kfd/kfd/topology/nodes/1/properties:gfx_target_version 110002
/sys/class/kfd/kfd/topology/nodes/2/properties:gfx_target_version 90012

there's a little decimal to hex conversion to get "02" and "0c"

Yes, it's:

major_ver = int((device_id / 10000) % 100)
minor_ver = int((device_id / 100) % 100)
stepping_ver = int(device_id % 100)

... and then:

"gfx" + format(major_ver, 'd') + format(minor_ver, 'x') + format(stepping_ver, 'x')

Did you figure out how to filter out your smaller GPU?

Yes it's code elsewhere in RamaLama:

    # ROCm/AMD CASE
    i = 0
    gpu_num = 0
    gpu_bytes = 0
    for fp in sorted(glob.glob('/sys/bus/pci/devices/*/mem_info_vram_total')):
        with open(fp, 'r') as file:
            content = int(file.read())
            if content > 1073741824 and content > gpu_bytes:
                gpu_bytes = content
                gpu_num = i

        i += 1

basically ignore every GPU with less than 1GB VRAM and choose the one with the most VRAM.

dnf install -y "$url"
crb enable # this is in epel-release, can only install epel-release via url
dnf --enablerepo=ubi-9-appstream-rpms install -y "${rpm_list[@]}"
if [[ "${ID}" == "rhel" || "${ID}" == "redhat" || "${ID}" == "centos" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd catch all these with:

PLATFORM_ID

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even ID is not fedora

@@ -1,4 +1,5 @@
#!/bin/bash
source /etc/os-release
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally like to keep everything in main() function

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 14, 2025

I'd merge this only it still has "WIP:" in it. I didn't check the CI logs yet to see if all these were actually built

@ericcurtin
Copy link
Collaborator

But LGTM in general

@ericcurtin
Copy link
Collaborator

My suspicions were right, the builds for the container images were turned off, we should turn them on again.

@@ -0,0 +1,11 @@
FROM registry.fedoraproject.org/fedora:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather then have 4 Containerfiles, that we need to manage can we just consolidate this into one ROCM containerfile and then manage this with container_build.sh and build_llama_whisper.sh

These containerfiles have very little difference that I can see then which files are being removed for size.

If users say make build IMAGE=rocm-fedora-gfx10 Then this should be enough to change the rm -fr ... line and potentially the FROM line.

Copy link
Collaborator

@ericcurtin ericcurtin Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean @rhatdan I'd just add another argument to the build_llama_and_whisper.sh script if it was me, would remove these lines:

RUN dnf update -y --refresh && dnf clean all
...
ENV WHISPER_CPP_SHA=${WHISPER_CPP_SHA}
ENV LLAMA_CPP_SHA=${LLAMA_CPP_SHA}

@ericcurtin
Copy link
Collaborator

PR to build all container image builds in CI:

#831

@@ -0,0 +1,11 @@
FROM registry.fedoraproject.org/fedora:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to tag :latest to a fixed fedora version and let renovate update it.

Signed-off-by: Adam Miller <admiller@redhat.com>
Signed-off-by: Adam Miller <admiller@redhat.com>
Signed-off-by: Adam Miller <admiller@redhat.com>
Signed-off-by: Adam Miller <admiller@redhat.com>
Signed-off-by: Adam Miller <admiller@redhat.com>
@maxamillion maxamillion changed the title WIP: Add ramalama image built on Fedora using Fedora's rocm packages Add ramalama image built on Fedora using Fedora's rocm packages Feb 26, 2025
@maxamillion
Copy link
Collaborator Author

@ericcurtin @rhatdan sorry for the delay, I finally had time to refactor this based on feedback and it should be ready for review now

@rhatdan
Copy link
Member

rhatdan commented Feb 26, 2025

@maxamillion Are you thinking we will have 4 or more ROCM images?

;;
gfx11)
rm -fr /usr/lib64/rocm/gfx8* /usr/lib64/rocm/gfx9* /usr/lib64/rocm/gfx10* && \
ln -s /usr/lib64/rocm/gfx1103/lib/rocblas/library/TensileLibrary_lazy_gfx1103.dat \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting you figured out we need to do this, worth alerting Tom Rix to this, it might be better to fix this in the rpm package. Worth checking if this is needed for UBI9 also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericcurtin that isn't needed in the Rawhide and Fedora 42 nightlies of ROCm but support for gfx1103 is considered experimental in Fedora 41 so I'm not sure where the fix is appropriate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. gfx1103 was experimental in F41.
gfx1035,gfx1103,gfx1150,gfx1151,gfx1152,gfx1200,gfx1201 were added in F42.

@maxamillion
Copy link
Collaborator Author

@maxamillion Are you thinking we will have 4 or more ROCM images?

Unfortunately I'm not sure. I wanted it to be all-in-one but the made the container image massive and it was suggested that we break it out so I did. I don't know how many Fedora ROCm images we'll need over time, but I suspect at least one per gfx major release series unless we can find an alternative method of making the ROCm stack smaller in the Fedora builds.

@rhatdan
Copy link
Member

rhatdan commented Feb 27, 2025

Ok we can start by building multiple images, but we need to get a naming standard and make the GPU check pull the right image. Also if we can make sure we have as big of a shared image as possible,

@ericcurtin
Copy link
Collaborator

I dunno by how much or which one, but it appears one of the images is still too big.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants