Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: test command v1.2 update #451

Merged
merged 22 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions docs/adv/troubleshooting/test_command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
sidebar_position: 3
description: Troubleshoot issues spotted by the test command
---

# Test Commands

This page aims to give guidance on the causes, and potential for troubleshooting or improvement, of failed tests or low test scores from the [Charon Test commands](../../run/prepare/test-command.mdx).

## Peers

### Charon Peers

#### Ping

- Peers might have not started their nodes or are not reachable.

#### PingMeasure

- Peer might be too far away (geographically) from you.
- If the connection to the peer is indirect, the route is from your node, to the relay, to the peer. Meaning you are measuring the travel time from you to the relay, and from the relay to the peer: (your node -> relay -> peer). This means, even if your peer's node is right next to yours, if the connection is being transmitted through a relay far away, the latency between your nodes might be too high to be effective.
- Your general network latency to the public internet might be high. Verify with the [`charon test infra`](../../run/prepare/test-command.mdx#test-machine-and-network-performance) tests.
- If the connection to the peer is indirect, there is a potential that the relay might be overloaded or under-resourced, consider adding [alternative relays](../../adv/security/risks.md#risk-obol-hosting-the-relay-infrastructure), or preferably [opening charon's p2p port](../../learn/charon/networking.mdx#libp2p-relays-and-peer-discovery) to the internet to establish direct peer to peer connections.

#### PingLoad

Same causes as PingMeasure test apply here.

#### DirectConn

- Your or your peer's port might not be publicly exposed.
- Your or your peer's port might be behind a firewall.
- Your or your peer's port might be behind a strict NAT gateway.

### Charon Relays

#### PingRelay

- Relay might be down or un-conctactable for other reasons.

#### PingMeasureRelay

- Relay might be under heavy load.
- Your network latency might be high. Verify with the `charon test infra` tests.

### Self

#### Libp2pTCPPortOpenTest

- There might be another process running on the designated port (tcp/3610 by default).
- The process might have died.

## Beacon

#### Ping

- Beacon node might not be started or is not reachable.

#### PingMeasure

- Beacon node might be too far away (geographically) from you.
- Your network latency might be high. Verify with the `charon test infra` tests.

#### Version

- The beacon node version is not compatible with charon.

#### IsSynced

- Beacon node is not synced to the network.

#### PeerCount

- Beacon node does not have enough peers. This may result in slower fetching and broadcasting of slots and duties.

#### PingLoad

This is a load test, to enable it add the `--load-test` flag.

Same causes as PingMeasure test apply here.

#### Simulation

This is a load test, to enable it add the `--load-test` flag.

Same causes as PingMeasure test apply here and additionally:

- The infrastructure on which the beacon node runs (amount of RAM, disk IOPS) might not be enough to handle the number of simulated validators supplied in this test.

## Validator

#### Ping

- Validator client might not be started or is not reachable.

#### PingMeasure

- Validator client might be too far away (geographically) from the charon client. Generally a low latency between a validator client and its charon client is important for timely signing.

#### PingLoad

Same causes as PingMeasure test apply here.

## MEV

#### Ping

- MEV relay might not be started or is not reachable.

#### PingMeasure

- MEV relay might be too far away (geographically) from you.
- Your network latency might be high. Verify with the `charon test infra` tests.

#### CreateBlock

Same causes as PingMeasure test apply here and additionally:

- MEV relay might be too slow in block production.

#### CreateMultipleBlocks

Same causes as CreateBlock test apply here.

## Infra

#### DiskWriteSpeed

- Read more in our [Deployment Best Practices](../../run/prepare/deployment-best-practices#hardware-specifications).

#### DiskWriteIOPS

- Read more in our [Deployment Best Practices](../../run/prepare/deployment-best-practices#hardware-specifications).

#### DiskReadSpeed

- Read more in our [Deployment Best Practices](../../run/prepare/deployment-best-practices#hardware-specifications).

#### DiskReadIOPS

- Read more in our [Deployment Best Practices](../../run/prepare/deployment-best-practices#hardware-specifications).

#### AvailableMemory

- Your available memory (RAM) is not enough to run Charon. The minimum available memory should be 2GB, the recommended available memory is 4GB. Note that this test is a best estimate, as memory availability can be hard to predict, particularly if the command is run in a virtualised environment (i.e.: a Docker container).

#### TotalMemory

- Your total memory (RAM) may not be enough to run a full validating node. The recommended minimum total memory is 16GB. Specialised, or optimised deployments can use less RAM than the recommended minimum, but may require some monitoring to assert sufficient stability and performance. Read more in our [Deployment Best Practices](../../run/prepare/deployment-best-practices#hardware-specifications)

#### InternetLatency

- Your internet latency to the nearest server is too high. Latency is expected to be at least less than 50ms and at best less than 20ms.

#### InternetDownloadSpeed

- Your internet download speed from the nearest test server is too low. Download speed is expected to be at least above 10Mb/s and at best above 50Mb/s.

#### InternetUploadSpeed

- Your internet upload speed to the nearest test server is too low. Upload speed is expected to be at least above 10Mb/s and at best above 50Mb/s.
6 changes: 5 additions & 1 deletion docs/run/prepare/deployment-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,17 @@ The following specifications are recommended for bare metal machines for cluster
- A CPU with 4+ cores, favouring high clock speed over more cores. ( >3.0GHz and higher or a cpubenchmark [single thread](https://www.cpubenchmark.net/singleThread.html) score of >2,500)
- 16GB of RAM
- 2TB+ free SSD disk space (for mainnet)
- 1000 read/write SSD IOPS
- 500MB/s read/write SSD speed
- 10mb/s internet bandwidth

### Recommended Specs for extremely large clusters

- A CPU with 8+ physical cores, with clock speeds >3.5Ghz
- 32GB+ RAM (depending on the EL+CL clients)
- 4TB+ NVMe storage
- 2000 read/write SSD IOPS
- 1000MB/s read/write SSD speed
- 25mb/s internet bandwidth

An NVMe storage device is **highly recommended for optimal performance**, offering nearly 10x more random read/writes per second than a standard SSD.
Expand Down Expand Up @@ -68,7 +72,7 @@ Cluster sizes that allow for Byzantine Fault Tolerance are recommended as they a

MEV relays are configured at the Consensus Layer or MEV-boost client level. Refer to our [guide](../../run/start/quickstart-builder-api.mdx) to ensure all necessary configuration has been applied to your clients. As with all validators, low latency during proposal opportunities is extremely important. By default, MEV-Boost waits for all configured relays to return a bid, or will timeout if any have not returned a bid within 950ms. This default timeout is generally too slow for a distributed cluster (think of this time as additive to the time it takes the cluster to come to consensus, both of which need to happen within a 2 second window for optimal proposal broadcasting). It is likely better to only list relays that are located geographically near your node, so that once all relays respond (e.g. in < 50ms) your cluster will move forward with the proposal.

Use Charon's [`test mev` command](../../run/prepare/test-command.md#test-mev-relay) to test a number of your preferred relays, and select the two or three relays with the lowest latency to your node(s), you do not need to have the same relays on each node in a cluster.
Use Charon's [`test mev` command](../../run/prepare/test-command.mdx#test-mev-relay) to test a number of your preferred relays, and select the two or three relays with the lowest latency to your node(s), you do not need to have the same relays on each node in a cluster.

## Client Diversity

Expand Down
145 changes: 0 additions & 145 deletions docs/run/prepare/test-command.md

This file was deleted.

Loading
Loading