Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems persisted data about torrents is not loaded from database #1264

Closed
josecelano opened this issue Feb 12, 2025 · 5 comments · Fixed by #1356
Closed

It seems persisted data about torrents is not loaded from database #1264

josecelano opened this issue Feb 12, 2025 · 5 comments · Fixed by #1356
Assignees
Labels
Bug Incorrect Behavior

Comments

@josecelano
Copy link
Member

It seems persisted data about torrents (number of downloads) is not loaded from database when the tracker stars. The start function loads the authentication keys and the whitelist:

// Load peer keys
if config.core.private {
    app_container
        .keys_handler
        .load_peer_keys_from_database()
        .await
        .expect("Could not retrieve keys from database.");
}

// Load whitelisted torrents
if config.core.listed {
    app_container
        .whitelist_manager
        .load_whitelist_from_database()
        .await
        .expect("Could not load whitelist from database.");
}

However I don't see where we load torrent data. The method TorrentsManager::load_torrents_from_database is only used in test code.

Maybe that was the expected behavior. However that means the number of downloaded torrents in the database is reset every time the tracker is restarted because the persist_stats function only overwrite the in-memory counter. That in-memory counter is not preloaded with the database value when the tracker starts.

/// Persists torrent statistics to the database if persistence is enabled.
fn persist_stats(&self, info_hash: &InfoHash, swarm_metadata: &SwarmMetadata) {
    if self.config.tracker_policy.persistent_torrent_completed_stat {
        let completed = swarm_metadata.downloaded;
        let info_hash = *info_hash;

        drop(self.db_torrent_repository.save(&info_hash, completed));
    }
}

cc @da2ce7

@josecelano josecelano added the Bug Incorrect Behavior label Feb 12, 2025
@josecelano josecelano self-assigned this Mar 4, 2025
@josecelano
Copy link
Member Author

josecelano commented Mar 4, 2025

How to reproduce

  1. Run the tracker with:
TORRUST_TRACKER_CONFIG_OVERRIDE_CORE__TRACKER_POLICY__PERSISTENT_TORRENT_COMPLETED_STAT=true cargo run
  1. Make sure the output show the configuration with
"persistent_torrent_completed_stat": true,
  1. Run the tracker client to add a new torrent
cd console/tracker-client/   
cargo run --bin udp_tracker_client announce udp://127.0.0.1:6969 9c38422213e30bff212b30c360d26f9a02136422 | jq

You should see something like:

    Finished `dev` profile [optimized + debuginfo] target(s) in 0.10s
     Running `/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker/target/debug/udp_tracker_client announce 'udp://127.0.0.1:6969' 9c38422213e30bff212b30c360d26f9a02136422`
{
  "AnnounceIpv4": {
    "transaction_id": -888840697,
    "announce_interval": 120,
    "leechers": 0,
    "seeders": 1,
    "peers": []
  }
}
  1. Check with the sqlite3 client that the torrent was added
sqlite3 storage/tracker/lib/database/sqlite3.db 
SQLite version 3.46.1 2024-08-13 09:16:08
Enter ".help" for usage hints.
sqlite> select * from torrents;
1|9c38422213e30bff212b30c360d26f9a02136422|0
sqlite> 

Update the number of downloads manually:

sqlite> update `torrents` set completed = 1 where `info_hash` = "9c38422213e30bff212b30c360d26f9a02136422";
sqlite> select * from torrents;
1|9c38422213e30bff212b30c360d26f9a02136422|1
sqlite> 
  1. Restart the tracker

  2. Check with sqlite that the counter has been reset

You can also use the tracker API:

http://0.0.0.0:1212/api/v1/torrent/9c38422213e30bff212b30c360d26f9a02136422?token=MyAccessToken

{
  "info_hash": "9c38422213e30bff212b30c360d26f9a02136422",
  "seeders": 1,
  "completed": 2,
  "leechers": 0,
  "peers": [
    {
      "peer_id": {
        "id": "0x2d71423030303030303030303030303030303031",
        "client": null
      },
      "peer_addr": "0.0.0.0:17548",
      "updated": 1741179730781,
      "updated_milliseconds_ago": 1741179730781,
      "uploaded": 0,
      "downloaded": 0,
      "left": 0,
      "event": "Completed"
    }
  ]
}

The completed value should be the one the database.

@josecelano
Copy link
Member Author

I think the naive implementation would be to load the persistent torrents initially when the tracker starts but that has some problems:

  • Even if we load all torrents initially they would be peerless torrents. If removing peerless torrents is enabled in the configuration those entries will be removed from the torrent repository and we will loose the counter value again.
  • After starting the tracker even if the counter value is right for a while the counter is going to be reset if the torrent is removed from the torrent repository because it does not have peers.

I have to think a good solution.

cc @da2ce7

josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 4, 2025
… after upserting a peer

When you upsert a new peer (announce requests) the swarm stats migth
change if the peer announced that has completed the download. If that
case we return `true` given the caller the opportunity to know if stats
have changed.

In the current implementation we get stats before and after upserintg
the peer, and we compare them to check if they have changed. However
stats migth have change due to a parallel (race condition) announce from
another peer.

This is the only way to know exactly if this announce request has
altered the stats (incremented the number of downloads).
@josecelano josecelano linked a pull request Mar 4, 2025 that will close this issue
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 4, 2025
Now the `upsert_peer` fn returns `true`if the stats have changed.

IN the other hand, stats migth have changed ebcuase there was a
different paralell announce request (race condtions) so it migth not be
needed. This removes unnecessary database queries to persits the stats.
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 4, 2025
…ted downloads for one torrent

This will be used to preset the counter from the database when a noew
torrent entry is added to the torrent repo after begin removing for a
while (peerless torrents are removed).

The process is:

- A new torrent is announced and added to the torrent repo.
- The downloads counter is 0.
- One of the peers finishes downloading and the counter increases to 1.
- The torrent is removed from the repo becuase none of the peer announce
  for a while.
- A new peer announce the torrent. We have to preset the counter to 1
  even if that peer has not completed downloading yet.
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 4, 2025
@da2ce7
Copy link
Contributor

da2ce7 commented Mar 5, 2025

Keep the "loaded" torrents in a separate structure, move them into the normal structure if they get announced.

The "loaded" structure can optionally expire after a configurable time. 😊

@josecelano
Copy link
Member Author

Keep the "loaded" torrents in a separate structure, move them into the normal structure if they get announced.

The "loaded" structure can optionally expire after a configurable time. 😊

Hi @da2ce7 there are two problems with that approach:

  • When a new download is completed I have to update:
    • The number of downloads inside the in-memory repository (what it's doing now).
    • The number of downloads in the database (it's also implemented in the current version).
    • The number of downloads in the new cache for the database data.

What I'm implementing in the PR is:

  • Load counter from the database when the torrent is added to the repository (no matter if the tracker was restarted or the torrents was removed and added later again).
  • Increase the counter immediately in the DB when increases in the in-memory repo.

All of that it's done only if persisting stats is enabled (config.tracker_policy.persistent_torrent_completed_stat).

I think I will merge the PR once that patch is implemented. However I think we can implement a better solution (I will open a new issue for that):

  • Trigger an event from the in-memory repository with info about the announce event. Actually it's like re-triggering the event we have received from the peer (the announce event). The event can contain info like if the number of downloads was increased or not.
  • We add a listener and increase the counter in the database wen needed.
  • The announce handler does not have the responsibility to persist the stats (this counter).

josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 5, 2025
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 5, 2025
By using the new database trait method `increase_number_of_downloads`.
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 5, 2025
…sited

However, we still have to load the value counter from the database the
first time the otrrent is added to the repository.
josecelano added a commit to josecelano/torrust-tracker that referenced this issue Mar 5, 2025
This fixed a bug: the number of donwloads for a torrent is not loaded
from the database when the torrent is added to the in-memory torrent
repository. It should be done when stats are enabled with the
configuration option:

persistent_torrent_completed_stat = true

The patch is applied only to the torrent repository implementation we
are using in prodcution (`CrossbeamSkipList<EntryMutexStd>`).

The other implementations have this comment:

```
// todo: load persistent torrent data if provided
```

It was not implemented for the others becuase I'm considering taking the
counter (for the number of downloads) out of the in-memory repository.
And increase it by suing events triggered from the core tracker.

I will open a new issue for that. If that's implemented we will need to
remove this patch for the repository, meaning reverting cahnges in this
commit.
@josecelano
Copy link
Member Author

josecelano commented Mar 5, 2025

Hi @da2ce7 I've fixed the problem with a temporary path. I will open a new issue with a proposal for a better solution. In the mean time at least the bug it's fixed.

I have made two main changes:

  1. Preset the counter for number of downloads every time the torrent is added to the in-memory torrent repository:
pub async fn announce(
    &self,
    info_hash: &InfoHash,
    peer: &mut peer::Peer,
    remote_client_ip: &IpAddr,
    peers_wanted: &PeersWanted,
) -> Result<AnnounceData, AnnounceError> {
    self.whitelist_authorization.authorize(info_hash).await?;

    let opt_persistent_torrent = if self.config.tracker_policy.persistent_torrent_completed_stat {
        self.db_torrent_repository.load(info_hash)?
    } else {
        None
    };

    peer.change_ip(&assign_ip_address_to_peer(remote_client_ip, self.config.net.external_ip));

    let number_of_downloads_increased =
        self.in_memory_torrent_repository
            .upsert_peer(info_hash, peer, opt_persistent_torrent);

    if self.config.tracker_policy.persistent_torrent_completed_stat && number_of_downloads_increased {
        self.db_torrent_repository.increase_number_of_downloads(info_hash)?;
    }

    Ok(self.build_announce_data(info_hash, peer, peers_wanted))
}

In the core announce handler we check if the torrent is new:

  • If it's new and persistent torrents stats are enabled we load the counter value from the database and we pass it to the repository.
  • If it's not new or persistent stats are disable we just ignore stats.
  1. The in-memory repo has a new argument to preset the counter when it's provided.

Only if the torrent is new.

impl Repository<EntryMutexStd> for CrossbeamSkipList<EntryMutexStd>
where
    EntryMutexStd: EntrySync,
    EntrySingle: Entry,
{
    /// Upsert a peer into the swarm of a torrent.
    ///
    /// Optionally, it can also preset the number of downloads of the torrent
    /// only if it's the first time the torrent is being inserted.
    ///
    /// # Arguments
    ///
    /// * `info_hash` - The info hash of the torrent.
    /// * `peer` - The peer to upsert.
    /// * `opt_persistent_torrent` - The optional persisted data about a torrent
    ///   (number of downloads for the torrent).
    ///
    /// # Returns
    ///
    /// Returns `true` if the number of downloads was increased because the peer
    /// completed the download.
    fn upsert_peer(&self, info_hash: &InfoHash, peer: &peer::Peer, opt_persistent_torrent: Option<PersistentTorrent>) -> bool {
        if let Some(existing_entry) = self.torrents.get(info_hash) {
            existing_entry.value().upsert_peer(peer)
        } else {
            let new_entry = if let Some(number_of_downloads) = opt_persistent_torrent {
                EntryMutexStd::new(
                    EntrySingle {
                        swarm: PeerList::default(),
                        downloaded: number_of_downloads,
                    }
                    .into(),
                )
            } else {
                EntryMutexStd::default()
            };

            let inserted_entry = self.torrents.get_or_insert(*info_hash, new_entry);

            inserted_entry.value().upsert_peer(peer)
        }
    }
}

NOTES

  1. In my first solution I tried to created the torrent in the AnnounceHadler if it does not exist with the preset value from the DB, but that does not work becuase race conditions migth happen with the job to clean peerless torrents.
  • If the in-memory repo does not contain the torrent we add it with the preset counter value from the DB.
  • If the clean job is executed before upserting the peer the torrent would be removed from the repo and added again later during the upsert but with a 0 value for the counter.
  1. I have not implemented the patch for all the repository implementations because I'm considering a new solution described in point 3.2 I think the problems came from having the persisted counter in the in-memory repo. I think the code would be much easier if we remove that responsibility from the torrent repository.
  • The repo trigger an event including what type of event the peer announce and whether the events affects the number of downloads or not (it we should increase the counter).
  • There is a new listener that listens to that event and increases the counter in a different repo.
  • The API takes the value of the counter from the new repo.
  • The core AnnounceHandler does not have to deal with persisting stats.

With the new implementation the handler would be:

pub async fn announce(
    &self,
    info_hash: &InfoHash,
    peer: &mut peer::Peer,
    remote_client_ip: &IpAddr,
    peers_wanted: &PeersWanted,
) -> Result<AnnounceData, AnnounceError> {
    self.whitelist_authorization.authorize(info_hash).await?;

    peer.change_ip(&assign_ip_address_to_peer(remote_client_ip, self.config.net.external_ip));

    self.in_memory_torrent_repository.upsert_peer(info_hash, peer);

    Ok(self.build_announce_data(info_hash, peer, peers_wanted))
}

Notice there is no code related to stats at all and upsert_peer return unity.

josecelano added a commit that referenced this issue Mar 5, 2025
06ef1d5 fix: [#1264] number of downloads preset when torrent is persisted (Jose Celano)
8f67f12 fix: [#1264] partially. The correct number of downloads is persited (Jose Celano)
6beec3a refactor: [#1264] add a database method to increase the number of downloads for a torrent (Jose Celano)
8a4dba3 refactor: [#1264] rename variables (Jose Celano)
2fb1c6f reafactor: [#1264] add a DB methog to get the number of completed downloads for one torrent (Jose Celano)
c5db71a refactor: [#1264] remove unnecessary fn call (Jose Celano)
820329b refactor: [#1264] return wether swarm stats have change or not after upserting a peer (Jose Celano)

Pull request description:

  ### Problem

  **The number of downloads for a torrent is not persisted correctly**.

  Even when persistence is enabled, it does not work. There are two cases when the counter is reset:

  - When you restart the tracker.
  - When a torrent entry is removed from the torrents repository because the last peer was removed (peerless torrents). This only happens if "removing peerless torrents" is enabled in the configuration.

  Check the [issue](#1264) to know how to reproduce the bug.

  ### Solution

  Every time a new torrent entry is added in the torrent repository we need to load stats (also called persistent torrent, which only includes number of downloads currently) from the database. If persistence for stats is enabled.

  ### Other considerations

  **1. Unused code**

  There is a method to load the counters for all torrents but it's not used in production code.

  ```rust
  fn import_persistent(&self, persistent_torrents: &PersistentTorrents) {
      for (info_hash, completed) in persistent_torrents {
          if self.torrents.contains_key(info_hash) {
              continue;
          }

          let entry = EntryMutexStd::new(
              EntrySingle {
                  swarm: PeerList::default(),
                  downloaded: *completed,
              }
              .into(),
          );

          self.torrents.insert(*info_hash, entry);
      }
  }
  ```

  I guess it was added to load the information from the database when the tracker is restarted. However that's not enough because torrents are removed from the repository when they do not have peers (peerless torrents). That would work only if removing peerless torrents is disabled.

  **2. Better solution**

  I think there is a better solution than the one this PR applies. This PR is a temporary patch to fix the bug asap.

  I would differentiate to different counters:

  1. number of downloads per torrent while it was active in the swarm (since the tracker was restarted)
  2. historical number of downloads per torrent per web service (since the service was put into production)

  The counter 1 is what we have before merging this PR. The counter is reset every time the torrent is removed from the swarm. We can keep like it's now before merging this PR. This counter is handled at the in-memory torrent repository.

  The new counter (counter 2) would be handled by the announce handler. After processing the announce request if the number of downloads has increased the handler also increases the historical counter. This new value should be the one exposed in the API.

  ```json
  {
    "torrents": 379508,
    "seeders": 150440,
    "completed": 58992,  <- this should be the historical
    "leechers": 264865,
    "tcp4_connections_handled": 15,
    "tcp4_announces_handled": 15,
    "tcp4_scrapes_handled": 0,
    "tcp6_connections_handled": 0,
    "tcp6_announces_handled": 0,
    "tcp6_scrapes_handled": 0,
    "udp_requests_aborted": 556458,
    "udp_requests_banned": 24773536,
    "udp_banned_ips_total": 29868,
    "udp_avg_connect_processing_time_ns": 478212,
    "udp_avg_announce_processing_time_ns": 8724637,
    "udp_avg_scrape_processing_time_ns": 296950,
    "udp4_requests": 298728340,
    "udp4_connections_handled": 104369821,
    "udp4_announces_handled": 158354656,
    "udp4_scrapes_handled": 3664492,
    "udp4_responses": 273954123,
    "udp4_errors_handled": 7540716,
    "udp6_requests": 0,
    "udp6_connections_handled": 0,
    "udp6_announces_handled": 0,
    "udp6_scrapes_handled": 0,
    "udp6_responses": 0,
    "udp6_errors_handled": 0
  }
  ```

  This patches solves the problem but pre-loading the historical data every time the torrent is added to the repository.

ACKs for top commit:
  josecelano:
    ACK 06ef1d5

Tree-SHA512: 2d1ad14d26efa981d24b1264640b5e2deaca621092ec61638e85e21dc83f92ad53a4041cb97e4e798611f3bfed610c7794b68027e43a847751c8ec022abc6b19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Incorrect Behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants