Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state to the validator-network and fix some network issues #2216

Merged
merged 4 commits into from
Feb 27, 2024

Conversation

jsdanielh
Copy link
Member

  • lib: Go back to use 3 as the default network quorum value
    • Change the client code to use 3 as the default network quorum value.
      Also make the devnet scripts use this value for configuring the
      nodes.
  • validator-network: Do less validator PeerId cache clears
    • Change the function that clears the validator PeerId to receive the request error such that it doesn't clear the cache if InboundRequestError::NoReceiver error was received. This error is an indicative that the requested peer has no receiver for such request which could happen when the peer is no running the specified aggregation and this is not an indicative of the PeerId cache entry being wrong. Thus in case of receiving this error, the network won't invalidate the cache entry for this validator.
  • network-libp2p: Fix mishandling of FinishedWithNoAdditionalRecord
    • Fix the mishandling of FinishedWithNoAdditionalRecord since we could get such event even if the query.finish isn't called (which the network is calling once Quorum records have been received).
      This mishandling was causing some of the queries to hang indefinitely since no error nor result was being sent to the application layer.
      Now the network always pushes the result to the application layer once the FinishedWithNoAdditionalRecord is received.
  • validator-network: Add some state to the PeerId cache
    • Add some state to the validator PeerId cache to make less queries as possible to the DHT.
      This also makes the validator-network able to quickly return if the DHT query is still in progress allowing upper layers to continue their process without blocking them until the query finishes or errors out.

Pull request checklist

  • All tests pass. The project builds and runs.
  • I have resolved any merge conflicts.
  • I have resolved all clippy and rustfmt warnings.

@jsdanielh jsdanielh requested review from hrxi and nibhar February 15, 2024 04:48
@jsdanielh jsdanielh marked this pull request as ready for review February 15, 2024 15:05
@jsdanielh jsdanielh force-pushed the jsdanielh/validator-network branch 2 times, most recently from 54835e8 to 6eaa7b3 Compare February 16, 2024 21:41
Copy link
Contributor

@hrxi hrxi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one potential issue, but the whole thing looks very good. :)

@jsdanielh jsdanielh force-pushed the jsdanielh/validator-network branch from 6eaa7b3 to 5006dd9 Compare February 27, 2024 00:59
@jsdanielh jsdanielh force-pushed the jsdanielh/validator-network branch from 5006dd9 to 6a64ff1 Compare February 27, 2024 17:00
Add some state to the validator `PeerId` cache to make less queries
as possible to the DHT.
This also makes the `validator-network` able to quickly return if the
DHT query is still in progress allowing upper layers to continue
their process without blocking them until the query finishes or
errors out.
Fix the mishandling of `FinishedWithNoAdditionalRecord` since we
could get such event even if the `query.finish` isn't called (which
the network is calling once `Quorum` records have been received).
This mishandling was causing some of the queries to hang indefinitely
since no error nor result was being sent to the application layer.
Now the network always pushes the result to the application layer
once the `FinishedWithNoAdditionalRecord` is received.
Change the function that clears the validator `PeerId` to receive the
request error such that it doesn't clear the cache if
`InboundRequestError::NoReceiver` error was received. This error is
an indicative that the requested peer has no receiver for such
request which could happen when the peer is no running the specified
aggregation and this is not an indicative of the `PeerId` cache entry
being wrong. Thus in case of receiving this error, the network won't
invalidate the cache entry for this validator.
Change the client code to use `3` as the default network quorum
value.
Also make the `devnet` scripts use this value for configuring the
nodes.
@hrxi hrxi force-pushed the jsdanielh/validator-network branch from 6a64ff1 to 2522b42 Compare February 27, 2024 17:29
@hrxi hrxi merged commit 2522b42 into albatross Feb 27, 2024
12 of 13 checks passed
@hrxi hrxi deleted the jsdanielh/validator-network branch February 27, 2024 20:41
@jsdanielh jsdanielh added this to the Nimiq PoS Mainnet milestone Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants