Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener Pod Session Creation Fails with RunnerScaleSetNotFoundException During GitHub API Outage #3942

Open
4 tasks done
ali-kafel opened this issue Feb 21, 2025 · 0 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@ali-kafel
Copy link

ali-kafel commented Feb 21, 2025

Checks

Controller Version

0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

- This error was observed during an unplanned GitHub API outage and has not been reproducible via deliberate steps. 
- We suspect that GitHub API availability plays a critical role in triggering the bug; therefore, simulating an outage (e.g., by blocking network access to GitHub API endpoints) might help in reproducing it. 
- As it stands, the bug appears to be a timing/availability issue rather than a deterministic code defect.

Describe the bug

We encountered an issue in the listener pod (pod name: xlarge-dind-spot-6566dbd7-listener) of the GitHub Actions controller during a GitHub API/Actions outage. The listener pod’s logs indicate that while the pod initializes correctly, it fails when attempting to create a session with GitHub Actions.

Error Log Snippet from the Listener Pod:

│ stream logs failed container "listener" in pod "xlarge-dind-spot-6566dbd7-listener" is waiting to start: ContainerCreating for gha-runner-controller/xlarge-dind-spot-6566dbd7-listener (listener)  │
│ 2025-02-17T08:37:14Z    INFO    listener-app    app initialized                                                                                                                                     │
│ 2025-02-17T08:37:14Z    INFO    listener-app    Starting metrics server                                                                                                                             │
│ 2025-02-17T08:37:14Z    INFO    listener-app    Starting listener                                                                                                                                   │
│ 2025-02-17T08:37:14Z    INFO    listener-app    refreshing token    {"githubConfigUrl": "https://github.com/****"}                                                                                │
│ 2025-02-17T08:37:14Z    INFO    listener-app    getting access token for GitHub App auth    {"accessTokenURL": "https://api.github.com/app/installations/****/access_tokens"}                   │
│ 2025-02-17T08:37:14Z    INFO    listener-app    getting runner registration token    {"registrationTokenURL": "https://api.github.com/orgs/****/actions/runners/registration-token"}              │
│ 2025-02-17T08:37:15Z    INFO    listener-app    getting Actions tenant URL and JWT    {"registrationURL": "https://api.github.com/actions/runner-registration"}                                     │
│ 2025/02/17 08:37:15 Application returned an error: createSession failed: failed to create session: actions error: StatusCode 404, AcivityId "71b5512b-3bcb-4def-943c-23c4362fe8f3": GitHub.Actions. │
│ Runtime.WebApi.RunnerScaleSetNotFoundException, GitHub.Actions.Runtime.WebApi: No runner scale set found with identifier 108.

Analysis:

  • The listener pod successfully initializes and begins the token refresh and registration process.
  • Shortly after, the pod attempts to create a session with GitHub Actions via several API calls.
  • The session creation fails with a 404 status code alongside an exception (RunnerScaleSetNotFoundException) indicating that no runner scale set was found with the provided identifier (108).
  • Investigation points to this failure occurring as a result of an outage affecting the GitHub API and Actions service, causing downstream errors in the listener’s session creation process.

Describe the expected behavior

The listener pod should ideally handle temporary GitHub API/Actions outages more gracefully by either:

  • Retrying the session creation process a set number of times with backoff.
  • Logging a clear error and marking the pod for restart in a controlled manner.

Additional Context

N/A

Controller Logs

N/A

Runner Pod Logs

N/A
@ali-kafel ali-kafel added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant