-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onboarding the federated learning into multi-cluster environment(ACM) #1
Comments
TODO:
|
Jan 5, 2025
|
Feb 11, 2025
![]()
![]() Note: Currently, we lack a secure connection between the client and server. If we allow external nodes or devices to run the client or collaborator, we’ll need to enable a TLS connection to secure the communication.
|
Containerize the Collaborator and Aggregator Client
Training a neural network often involves third-party packages like torch, which significantly increase the image size. In many cases, the image size can exceed 1GB.
For example:
To simplify the initial setup, I used the
sklearn
package to build a lightweight Logistic Regression model for the startup.Other issues when containerize the application:
Federated Learning
Namespaced
orCluster
Layer configuration is disabled
Clients -> Client: Use a single "client" configuration instead of defining individual details for each client. This ensures scalability and flexibility by avoiding repetitive definitions.
Remove client replicas:
Use the of placement to schedule the client to specific clusters(
key
of theLabel
orClusterClaim
from the managed cluster)Use the
value
of theLabel
orClusterClaim
from the managed cluster to locate the data metadata for the clientThe text was updated successfully, but these errors were encountered: