Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Fusion Snapshots docs with recommended EC2 instance types and sizing guidance #488

Merged
merged 14 commits into from
Feb 25, 2025
37 changes: 30 additions & 7 deletions fusion_docs/guide/snapshots.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ More specifically, the first use case for this feature is for Seqera Platform us
Fusion Snapshots v1.0.0 requires the following [Seqera compute environment](https://docs.seqera.io/platform/latest/compute-envs/aws-batch) configuration:

- **Provider**: AWS Batch
- **Pipeline work directory**: An S3 bucket in the same region as the compute environment
- **Pipeline work directory**: An S3 bucket located in the same region as your AWS Batch compute resources
- **Enable Wave containers**
- **Enable Fusion v2**
- **Enable fast instance storage**
Expand All @@ -43,18 +43,41 @@ fusion.containerConfigUrl = '<CUSTOM_CONTAINER_URL>'

`maxSpotAttempts` must be a value higher than `0`.

### Recommended instance sizes
### EC2 instance selection guidelines

Fusion Snapshots require EC2 Spot instances with enough memory and network bandwidth to dump the cache of task intermediate files to S3 storage before AWS terminates an instance. When AWS issues a Spot instance reclamation notice, Fusion has two minutes to complete this transfer.
- Choose EC2 Spot instances with sufficient memory and network bandwidth to dump the cache of task intermediate files to S3 storage before AWS terminates an instance.
- Select instances with guaranteed network bandwidth (not instances with bandwidth "up to" a maximum value).
- Maintain a 5:1 ratio between memory (GiB) and network bandwidth (Gbps).
- Recommended instance families: `c6id`, `r6id`, or `m6id` series instances work optimally with Fusion fast instance storage.

It is recommended to select instances with guaranteed network bandwidth (as opposed to bandwidth _up to_ a maximum value) and maintain a ratio of 5:1 between memory and network bandwidth.
:::info "Example"
A c6id.8xlarge instance provides 64 GiB memory and 12.5 Gbps guaranteed network bandwidth. This configuration can transfer the entire memory contents to S3 in approximately 70 seconds, well within the 2-minute reclamation window.
:::

:::note
Instances with lower network-to-memory ratios may not complete transfers before termination, potentially resulting in task failures.
:::

For example, taking into account the bandwidth and compute necessary to create a snapshot, a `c6i.8xlarge` instance with 64 GIB memory and a guaranteed network bandwidth of 12.5 Gbps can take approximately 70 seconds to dump the entire instance to S3 storage before instance reclamation occurs.
#### Recommended instance types

### Amazon Linux 2023 ECS-optimized AMI
| Instance type | Memory (GiB) | Network bandwidth (Gbps) | Memory:Bandwidth ratio | Est. Snapshot time|
|----------------|--------------|--------------------------|------------------------|-------------------|
| c6id.4xlarge | 32 | 12.5 | 2.56:1 | ~45 seconds |
| c6id.8xlarge | 64 | 12.5 | 5.12:1 | ~70 seconds |
| r6id.2xlarge | 64 | 12.5 | 5.12:1 | ~70 seconds |
| m6id.4xlarge | 64 | 12.5 | 5.12:1 | ~70 seconds |
| c6id.12xlarge | 96 | 18.75 | 5.12:1 | ~70 seconds |
| r6id.4xlarge | 128 | 12.5 | 10.24:1 | ~105 seconds |
| m6id.8xlarge | 128 | 25 | 5.12:1 | ~70 seconds |

### (Seqera Enterprise only) Select an Amazon Linux 2023 ECS-optimized AMI

To obtain sufficient performance, Fusion Snapshots require instances with Amazon Linux 2023 (which ships with Linux Kernel 6.1), with an ECS Container-optimized AMI.

:::note
Selecting an Amazon Linux 2023 ECS-optimized AMI is only required for compute environments in Seqera Enterprise deployments. Seqera Cloud AWS Batch compute environments use this AMI by default.
:::

To find the recommended AL2023 ECS-optimized AMI for your region, run the following (replace `eu-central-1` with your AWS region):

```bash
Expand All @@ -77,4 +100,4 @@ The result for the `eu-central-1` region is similar to the following:
}
```

Note the `image_id` in your result (in this example, `ami-0281c9a5cd9de63bd`). Specify this ID in the **AMI ID** field under **Advanced options** when you create your Seqera compute environment.
Note the `image_id` in your result (in this example, `ami-0281c9a5cd9de63bd`). Specify this ID in the **AMI ID** field under **Advanced options** when you create your Seqera compute environment.