Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copr error retries #1579

Merged

Conversation

lbarcziova
Copy link
Member

@lbarcziova lbarcziova commented Jul 13, 2022

TODO:

  • agree on the interval (and create a constant for it) and types of exceptions

Fixes #1549


RELEASE NOTES BEGIN
During creating Copr builds, on Copr errors, Packit will retry the task multiple times
in case there is a Copr outage.
RELEASE NOTES END

@lbarcziova lbarcziova changed the title Copr error retries WIP: Copr error retries Jul 13, 2022
@lbarcziova
Copy link
Member Author

@lachmanfrantisek the issue says that if there is a permission error, the task should be put back in the queue as well, but since users are easily able to retrigger the build (as opposed to bodhi updates) after resolving the permissions issue, I implemented it in the way that for these exceptions the task will not be retried, WDYT? Also, I was thinking what should be the interval for retrying if there is a Copr error, now it is set to 1 minute.

@softwarefactory-project-zuul

This comment was marked as outdated.

Copy link

@FrNecas FrNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, a few suggestions for improvements

@@ -209,6 +211,49 @@ def get_packit_commands_from_comment(
return []


class CeleryTask:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was to put the relevant methods into JobHandler but I like your solution (new class wrapping celery's Task) more 👍

@lbarcziova lbarcziova force-pushed the copr-error-retries branch from 9d5d4b8 to e916465 Compare July 13, 2022 14:44
@softwarefactory-project-zuul

This comment was marked as outdated.

@lbarcziova lbarcziova force-pushed the copr-error-retries branch from e916465 to 608d656 Compare July 14, 2022 11:03
@softwarefactory-project-zuul

This comment was marked as outdated.

@lachmanfrantisek
Copy link
Member

@lachmanfrantisek the issue says that if there is a permission error, the task should be put back in the queue as well, but since users are easily able to retrigger the build (as opposed to bodhi updates) after resolving the permissions issue, I implemented it in the way that for these exceptions the task will not be retried, WDYT? Also, I was thinking what should be the interval for retrying if there is a Copr error, now it is set to 1 minute.

Is the retrigger easy also for the commit or release trigger? (Is the re-run button reliable and easy to use?)

I would say if we can be clever, why not do it. Do you see any problem in the implementation or performance?

Copy link
Member

@lachmanfrantisek lachmanfrantisek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! I would maybe go with the retries for permission issues as well
(just with bigger intervals like with Bodhi).

Comment on lines +214 to +217
class CeleryTask:
"""
Class wrapping the Celery task object with methods related to retrying.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for that!

@lbarcziova
Copy link
Member Author

I would say if we can be clever, why not do it. Do you see any problem in the implementation or performance?

@lachmanfrantisek My point was more that via check or comment retries users can trigger the builds when they are done with resolving the permission issue, which e.g. may be sooner than our interval is set. From the implementation POV I don't see a problem.

In that case, can the interval for the permission issues where a user action is required 10 minutes (the same as for Bodhi) and for others the one minute with exponential backoff?

@lbarcziova lbarcziova force-pushed the copr-error-retries branch from 608d656 to 05ab86a Compare July 14, 2022 16:29
@lbarcziova lbarcziova changed the title WIP: Copr error retries Copr error retries Jul 14, 2022
@lbarcziova
Copy link
Member Author

I was looking into the Copr exceptions and also our Sentry issues and implemented retrying for CoprRequestException with the message "Unable to connect" and CoprTimeoutException (and the mentioned permission-related issues). Let me know if you think the exceptions we retry on should be extended, but my point was not to retry for errors where it may not make sense, such as CoprNoResultException, CoprNoConfigException,..

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

✔️ pre-commit SUCCESS in 1m 50s
✔️ packit-service-tests SUCCESS in 1m 56s
✔️ packit-service-tests-openshift SUCCESS in 13m 46s

The functionality for retrying the tasks is now used in more places,
therefore create a new class that will wrap the celery task object
and add functionality on top of it.
@lbarcziova lbarcziova force-pushed the copr-error-retries branch from 05ab86a to 71fdbec Compare July 19, 2022 10:04
Copy link
Member

@lachmanfrantisek lachmanfrantisek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This can help in case there is an Copr outage. If retrying, set the status so
that user knows the task will be retried.
@lbarcziova lbarcziova force-pushed the copr-error-retries branch from 71fdbec to 3240ce5 Compare July 19, 2022 10:13
@lbarcziova lbarcziova added the mergeit When set, zuul wil gate and merge the PR. label Jul 19, 2022
@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

✔️ pre-commit SUCCESS in 2m 05s
✔️ packit-service-tests SUCCESS in 2m 24s
packit-service-tests-openshift FAILURE in 17m 25s

@lbarcziova
Copy link
Member Author

recheck

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

✔️ pre-commit SUCCESS in 2m 01s
✔️ packit-service-tests SUCCESS in 2m 08s
✔️ packit-service-tests-openshift SUCCESS in 13m 37s

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded (gate pipeline).

✔️ pre-commit SUCCESS in 1m 58s

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit da4a797 into packit:main Jul 19, 2022
@lbarcziova lbarcziova deleted the copr-error-retries branch August 24, 2022 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergeit When set, zuul wil gate and merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry job in case of Copr error/outage
3 participants