Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(docker): remove artifacts from base CUDA image #5830

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

amadeuszsz
Copy link
Contributor

Description

Relax about 1.7 GB space.

How was this PR tested?

Notes for reviewers

Due to type of CI runner, there is no unit tests which can use ML artifacts.

Effects on system behavior

None.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Copy link

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@amadeuszsz amadeuszsz self-assigned this Feb 28, 2025
@amadeuszsz amadeuszsz added the type:containers Docker containers, containerization of components, or container orchestration. label Feb 28, 2025
@amadeuszsz
Copy link
Contributor Author

@YoshiRi @kminoda
Please confirm if we can remove artifacts from our Docker images.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 5, 2025

Wow I didn't even know they were a part of the images :o

@amadeuszsz amadeuszsz marked this pull request as ready for review March 5, 2025 08:36
@amadeuszsz amadeuszsz added the tag:run-health-check Run health-check label Mar 5, 2025
Copy link
Member

@youtalk youtalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only does the learning models disappear from the development containers, but also from the runtime containers. Are we sure that’s okay?

@xmfcx https://github.com/orgs/autowarefoundation/discussions/5007#discussioncomment-10086717

@xmfcx
Copy link
Contributor

xmfcx commented Mar 5, 2025

@youtalk -san, I remember that discussion, but I thought they were already removed since then 😕

Not only does the learning models disappear from the development containers, but also from the runtime containers. Are we sure that’s okay?

I'm OK with adding them back to the runtime containers, either in this PR or a new PR.

I'm not sure if this would fix it but currently the docker-build workflow is not working so we need a solution soon:

@amadeuszsz
Copy link
Contributor Author

We discussed it in [TIER IV INTERNAL LINK] and seems there is no need to deploy OSS images with artifacts. I would like to keep artifacts for unit tests purposes, but AFAIK, we don't plan to use CI runners with CUDA runtime support.

Copy link
Member

@youtalk youtalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then LGTM

@youtalk youtalk enabled auto-merge (squash) March 6, 2025 00:31
@youtalk
Copy link
Member

youtalk commented Mar 6, 2025

We need to update autoware-base first, so I’m working on it now.
https://github.com/autowarefoundation/autoware/actions/runs/13689030397
After that, once the health-check passes, we’ll merge this PR.

@youtalk
Copy link
Member

youtalk commented Mar 6, 2025

@youtalk youtalk merged commit a95f45f into autowarefoundation:main Mar 6, 2025
34 of 38 checks passed
@mitsudome-r
Copy link
Member

@amadeuszsz Did you check where the artifacts were installed in the old image? I couldn't find any.

@amadeuszsz
Copy link
Contributor Author

@amadeuszsz Did you check where the artifacts were installed in the old image? I couldn't find any.

@mitsudome-r
Please check /root/autoware_data in universe-cuda image.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 6, 2025

To check what occupies most size easily, you can run the following in the docker container:

sudo apt update && sudo apt install ncdu -y
cd /
ncdu

And you can navigate with arrow keys and enter and backspace keys. It will sort all the folders from large to small. This should make it easier to find big blobs.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 7, 2025

So there has been a lot of misunderstandings from my part and some miscommunications.

Until yesterday, I thought:
We had artifacts in both devel and runtime images.
This PR removes them from the devel image.

But in reality:
We had artifacts in only runtime images.
This PR removes them from the runtime image.

So this PR doesn't change anything from the autoware core/universe CI perspective. It only effects the docker image building workflow size requirements.

So this could be reverted. But I don't mind either way.

@amadeuszsz
Copy link
Contributor Author

amadeuszsz commented Mar 7, 2025

@xmfcx
For me it could be reverted if we have unit tests possible. Other than that, I don't see any reason to store artifacts in docker images, but correct me if I'm wrong. We can only expect the size of artifacts will grow.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 7, 2025

Thing is, the image that you've modified universe-base-cuda, which influences the universe-cuda was not meant for CI in the first place, it was a runtime image, just for the users to pull and run the image.

Universe CI uses universe-devel-cuda which doesn't have the artifacts anyways.

@amadeuszsz
Copy link
Contributor Author

I see. I also considered that each artifacts update forces autoware-base:cuda-latest update, which consumes CI time and gives us nothing. If:

  • we have still relatively huge space margin for artifacts,
  • frequent artifacts update are welcome,

we can revert this PR.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 7, 2025

we have still relatively huge space margin for artifacts

We have about 6-7GB to spare for cache.

I've marked some summary here on the dockerfile.svg:

images

There are multiple solutions we could consider:

  • Separate out the building of the final image with artifacts to another job/machine?
    • autoware-base:cuda-latest, universe-sensing-perception-cuda, universe-cuda are the only images that contain the artifacts.
    • Maybe these could be built on a separate action.
    • This would add a lot of work and complexity just to overcome a couple gigabytes of storage shortage.
  • Just revert the PR and add a mechanism to keep the cache size low (still, we don't have much margin, just 6-7GB).
  • Rent proper machines for this job and not worry about storage concerns. (@mitsudome-r should we consider OVH again? The machine we got is working well so far.)
  • Keep this PR for now until someone complains? I don't know who are actually using Autoware's runtime PRs.

This could be also discussed in the OpenADKit Working Group meeting next week.
Related: https://discord.com/channels/953808765935816715/953877118415151205/1347463585474805801 (from autoware discord)

@amadeuszsz
Copy link
Contributor Author

Keep this PR for now until someone complains? I don't know who are actually using Autoware's runtime PRs.

We already mount autoware_map. From now on, users will do same way with autoware_data.

@xmfcx
Copy link
Contributor

xmfcx commented Mar 7, 2025

Difference is that autoware_map folder is a simple manual download process.
But for autoware_data, you need to clone autoware, install ansible, run the download artifacts playboook to get it.

And you can only download the latest artifacts. If an artifact is removed in a new version, old images cannot be used anymore.

@amadeuszsz
Copy link
Contributor Author

Good point, that kind of link between software and artifacts disappeared now from our image.
By the way, could you clarify how we deploy image (universe-cuda before this PR) to GitHub registry with respect to repositories versioning? Do we use:

  1. latest autoware and nigthly repos,
  2. latest autoware and base repos?

If this is the option 1, it will be beneficial to keep artifacts in Docker images.
If this is the option 2, there might be discrepancy in case of updating artifacts, which requires updated software as well. That being said, we can't assure compatibility between stored artifacts and built libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:run-health-check Run health-check type:containers Docker containers, containerization of components, or container orchestration.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants