Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact store: keep track of in-progress artifacts in memory #7860

Closed
wants to merge 1 commit into from

Conversation

iliana
Copy link
Contributor

@iliana iliana commented Mar 24, 2025

The artifact store in Sled Agent writes in-progress files to a known filename (tmp/{sha256}), then after the file is complete and the hash is correct, persists the file to the parent directory.

Evidently using OpenOptions::create_new is not enough to avoid a TOCTOU problem, as described in #7796. We are seeing a situation where:

  • request A creates a file at tmp/{sha256} and starts writing to it
  • request B (near-simultaneously?) creates a different file at tmp/{sha256} and starts writing to it
  • request A finishes and moves tmp/{sha256} to {sha256}, which is actually request B's incomplete file
  • request B finishes and fails to move tmp/{sha256} to {sha256}

Instead of using the filesystem to keep track of which artifacts are in progress, let's keep that information in memory. The implementation creates a oneshot channel, moving the sender into the write task and keeping the receiver in a map inside a mutex. When the write task finishes, the sender is dropped along with the rest of the ArtifactWriter; we can see this on the receiver side to track whether the write is still in progress.

The implementation now avoids writing to a specific temporary file, instead using a randomized filename within the temporary directory. The filename is still prepended with the artifact sha256 to help with debugging.

@iliana iliana requested a review from davepacheco March 24, 2025 20:15
@sunshowers
Copy link
Contributor

Thanks -- is the oneshot channel required given that temp files are now involved?

let moved_temp_dir = temp_dir.clone();
let file = match tokio::task::spawn_blocking(move || {
camino_tempfile::Builder::new()
.prefix(&format!("{sha256}."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using https://github.com/untitaker/rust-atomicwrites? It would also do an fsync at the end.

@iliana
Copy link
Contributor Author

iliana commented Mar 25, 2025

Closing in favor of #7866

@iliana iliana closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants