Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul stats: Refactor events #1341

Closed
josecelano opened this issue Mar 3, 2025 · 3 comments
Closed

Overhaul stats: Refactor events #1341

josecelano opened this issue Mar 3, 2025 · 3 comments
Assignees
Labels
- Developer - Torrust Improvement Experience Code Cleanup / Refactoring Tidying and Making Neat EPIC Contains several subissues

Comments

@josecelano
Copy link
Member

There are three types of statistics events (one for the HTTP tracker and another for the UDP tracker):

bittorrent_http_tracker_core::statistics::event::Event:

pub enum Event {
    Tcp4Announce,
    Tcp4Scrape,
    Tcp6Announce,
    Tcp6Scrape,
}

bittorrent_udp_tracker_core::statistics::event::Event

pub enum Event {
    Udp4Connect,
    Udp4Announce,
    Udp4Scrape,
    Udp6Connect,
    Udp6Announce,
    Udp6Scrape,
}

torrust_udp_tracker_server::statistics::event::Event:

pub enum Event {
    UdpRequestAborted,
    UdpRequestBanned,

    // UDP4
    Udp4IncomingRequest,
    Udp4Request {
        kind: UdpResponseKind,
    },
    Udp4Response {
        kind: UdpResponseKind,
        req_processing_time: Duration,
    },
    Udp4Error,

    // UDP6
    Udp6IncomingRequest,
    Udp6Request {
        kind: UdpResponseKind,
    },
    Udp6Response {
        kind: UdpResponseKind,
        req_processing_time: Duration,
    },
    Udp6Error,
}

I think the code would be simpler and the events would be more useful if we enrich them. Something like this:

bittorrent_http_tracker_core::statistics::event::Event:

pub enum Event {
    Announce { ip: IpAddr }
    Scrape { ip: IpAddr }
}

bittorrent_udp_tracker_core::statistics::event::Event:

pub enum Event {
    Connect { ip: IpAddr }
    Announce { ip: IpAddr }
    Scrape { ip: IpAddr }
}

torrust-udp-tracker-server::statistics::event::Event:

pub enum Event {
    UdpRequestAborted { ip: IpAddr }
    UdpRequestBanned { ip: IpAddr }
    UdpRequest { ip: IpAddr }
    UdpResponse {
        ip: IpAddr,
        req_processing_time: Duration,
    },
    UdpError { ip: IpAddr }
}

That would also simplify the way we send them, from this:

if let Some(udp_stats_event_sender) = self.udp_tracker_container.udp_stats_event_sender.as_deref() {
    match target.ip() {
        IpAddr::V4(_) => {
            udp_stats_event_sender
                .send_event(statistics::event::Event::Udp4Response {
                    kind: udp_response_kind,
                    req_processing_time,
                })
                .await;
        }
        IpAddr::V6(_) => {
            udp_stats_event_sender
                .send_event(statistics::event::Event::Udp6Response {
                    kind: udp_response_kind,
                    req_processing_time,
                })
                .await;
        }
    }
}

To this:

if let Some(udp_stats_event_sender) = self.udp_tracker_container.udp_stats_event_sender.as_deref() {
    udp_stats_event_sender
        .send_event(statistics::event::Event::UdpResponse {
            ip: target.ip(),
            req_processing_time,
        })
        .await;
}

These events would be more useful for external consumers. For example, adding the IP would allow external event consumers to extract more information and build their own aggregate data. This relates to #1307.

We might need to add more info like the server socket address of the UDP server that handled the request.

pub enum Event {
    Connect { client_ip: IpAddr, server_socket_addr: SocketAddr }
    Announce { client_ip: IpAddr, server_socket_addr: SocketAddr }
    Scrape { client_ip: IpAddr, server_socket_addr: SocketAddr }
}

See #1263 (comment)

@josecelano josecelano added - Developer - Torrust Improvement Experience Code Cleanup / Refactoring Tidying and Making Neat labels Mar 3, 2025
@josecelano josecelano self-assigned this Mar 10, 2025
@josecelano
Copy link
Member Author

Hi @da2ce7 I'm planing to start refactoring HTTP tracker events. I would like to have something like:

pub enum Event {
    Tcp4Announce { connection: ConnectionContext },
    Tcp4Scrape { connection: ConnectionContext },
    Tcp6Announce { connection: ConnectionContext },
    Tcp6Scrape { connection: ConnectionContext },
}

pub struct ConnectionContext {
    pub client_ip: IpAddr,
    pub server_socket_addr: SocketAddr,
}   

I don't have the server_socket_addr until the service is started because the port can be 0 in the configuration. On the other hand, in order to split metrics for each running service (bind metrics to socket address) we also need to have a http_stats_repository for each HTTP tracker server.

The easy solution would be to new fields (services or configuration) to the HttpTrackerCoreContainer.

This is the current HttpTrackerCoreContainer

pub struct HttpTrackerCoreContainer {
    // todo: replace with TrackerCoreContainer
    pub core_config: Arc<Core>,
    pub announce_handler: Arc<AnnounceHandler>,
    pub scrape_handler: Arc<ScrapeHandler>,
    pub whitelist_authorization: Arc<whitelist::authorization::WhitelistAuthorization>,
    pub authentication_service: Arc<AuthenticationService>,

    pub http_tracker_config: Arc<HttpTracker>,
    pub http_stats_event_sender: Arc<Option<Box<dyn statistics::event::sender::Sender>>>,
    pub http_stats_repository: Arc<statistics::repository::Repository>,
    pub announce_service: Arc<AnnounceService>,
    pub scrape_service: Arc<ScrapeService>,
}

And this is how it would look after the change:

pub struct HttpTrackerCoreContainer {
    // todo: replace with TrackerCoreContainer
    pub core_config: Arc<Core>,
    pub announce_handler: Arc<AnnounceHandler>,
    pub scrape_handler: Arc<ScrapeHandler>,
    pub whitelist_authorization: Arc<whitelist::authorization::WhitelistAuthorization>,
    pub authentication_service: Arc<AuthenticationService>,

    pub http_tracker_config: Arc<HttpTracker>,
    pub http_stats_event_sender: Arc<Option<Box<dyn statistics::event::sender::Sender>>>,

    pub announce_service: Arc<AnnounceService>,
    pub scrape_service: Arc<ScrapeService>,

    pub running_http_trackers: Arc<HashMap<SocketAddr>, RunningHttpTrackerCoreContainer>,
}

pub struct RunningHttpTrackerCoreContainer {
    pub server_socket_addr: SocketAddr,
    pub http_stats_repository: Arc<statistics::repository::Repository>,
}

However, I think we should not put that inside the HttpTrackerCoreContainer. I think this should be and independent service in the AppContainer:

pub struct AppContainer {
    // ... other services

    pub running_hattp_trackers: RunningHttpTrackers,
}

pub struct RunningHttpTrackers {
    pub running_http_trackers: Arc<HashMap<SocketAddr>, RunningHttpTrackerCoreContainer>,
}

pub struct RunningHttpTrackerCoreContainer {
    pub server_socket_addr: SocketAddr,
    pub http_stats_repository: Arc<statistics::repository::Repository>,
}

The main reason is I want to be able to run a single HTTP tracker with a new binary (if needed or for other projects) and the new field does not make sense. The "multiple trackers" feature is only a feature for the Torrust Tracker binary.

@josecelano
Copy link
Member Author

I have changed my mind. The HttpTrackerCoreContainer already contains some fields that are specific for one running HTTP tracker like this field:

pub http_tracker_config: Arc<HttpTracker>,

For the time being, I'm going to pass the server socket address from top to bottom as a new parameter without adding a "wrapper".

If we need to build specific services for one tracker server running bound to a socket we need to build those services at the AppContainer and inject the independent service into the HttpTrackerCoreContainer.

I will give an example. We can run 2 HTTP trackers on different ports (7070 and 7171).

In the future we want to collect independent metrics for each tracker. That means we could have

  1. Independent events senders and metrics repositories for each tracker.
  2. One global event sender and one global repository.
  • The repository has to split metrics for each socket.
  1. One global event sender and independent metrics repositories for each tracker.
  • The statistics factory has to return the sender and the receiver.
  • There will be one Keeper per socket. We need one listener for each tracker.

I think option 3 is the right one because:

  • It would be convenient to have only one event sender because that way you can add a listener for all events coming from all HTTP trackers. The listener can decide whether they want to filter by server socket or not.
  • Having multiple repository for metrics should scale better. Since we have to create repositories dynamically (we don't now the number of HTTP trackers at compilation time) we can increase the parallel writes.

In order to implement these changes (progressively and in parallel) I would recommend:

  • Keeping the current events sender as the global sender for all HTTP trackers.
  • Keeping the current global repository as the global stats repository for aggregate data (it will be removed in a major version since it's a breaking change).
  • Add new keepers (listeners) and repositories dynamically per server (socket address).
  • Expose the new separated metrics via the API extending the current endpoint.

The final stats endpoint could be like:

{
  "torrents": 553615,
  "seeders": 168577,
  "completed": 408060,
  "leechers": 306594,
  "tcp4_connections_handled": 345,
  "tcp4_announces_handled": 345,
  "tcp4_scrapes_handled": 0,
  "tcp6_connections_handled": 0,
  "tcp6_announces_handled": 0,
  "tcp6_scrapes_handled": 0,
  "udp_requests_aborted": 2641676,
  "udp_requests_banned": 161506719,
  "udp_banned_ips_total": 30643,
  "udp_avg_connect_processing_time_ns": 810080,
  "udp_avg_announce_processing_time_ns": 16004125,
  "udp_avg_scrape_processing_time_ns": 547930,
  "udp4_requests": 1481376340,
  "udp4_connections_handled": 409952927,
  "udp4_announces_handled": 843408690,
  "udp4_scrapes_handled": 22520384,
  "udp4_responses": 1319866945,
  "udp4_errors_handled": 43628699,
  "udp6_requests": 0,
  "udp6_connections_handled": 0,
  "udp6_announces_handled": 0,
  "udp6_scrapes_handled": 0,
  "udp6_responses": 0,
  "udp6_errors_handled": 0,
  "http_trackers": [
    {
      socker_adress: "127.0.0.1:7070",
      swarm_aggregate_stats: {
        "torrents": 553615,
        "seeders": 168577,
        "completed": 408060,
        "leechers": 306594,
      }
      metrics: {
        "tcp4": {
          announces_handled_total: 345,
          scrapes_handled_total: 345,
        },
        "tcp6": {
          announces_handled_total: 0,
          scrapes_handled_total: 0,
        }
      }
    }
  ],
  "udp_trackers": [
  ]
}

And adding some others metrics suggested here.

I would not joins the HTTP and UDP tackers' metrics in the same array because their metrics could have different fields in the future.

cc @da2ce7

@josecelano josecelano added the EPIC Contains several subissues label Mar 11, 2025
@josecelano josecelano changed the title Overhaul stats events: Refactor events Overhaul stats: Refactor events Mar 12, 2025
@josecelano
Copy link
Member Author

Sub-issues implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- Developer - Torrust Improvement Experience Code Cleanup / Refactoring Tidying and Making Neat EPIC Contains several subissues
Projects
None yet
Development

No branches or pull requests

1 participant