Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TUF artifact distribution errors in sled agent log #7796

Open
davepacheco opened this issue Mar 13, 2025 · 10 comments · May be fixed by #7866
Open

TUF artifact distribution errors in sled agent log #7796

davepacheco opened this issue Mar 13, 2025 · 10 comments · May be fixed by #7866

Comments

@davepacheco
Copy link
Collaborator

I set up a4x2 (based on main + #7781), applied the workaround so that Nexus trusts root.json, and uploaded a TUF repo. This mostly worked. But I later found a bunch of errors in the sled agent log:

root@g0:~# grep -i failed $(svcs -L sled-agent) | tail -n 100 | looker -o long 
2025-03-13 21:44:58.325Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:45:24.322Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:45:30.815Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:45:43.498Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:45:57.070Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:46:34.271Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:47:18.864Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
2025-03-13 21:47:18.865Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
2025-03-13 21:47:18.865Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000` to `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:47:59.724Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02, actual 545157dda0d40c614e9e302a36575f498d96c28e4ea865952faa2ac9054126b7
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:49:57.302Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:50:09.585Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee, actual 506b0a19cbbf9d7204d9646b4b80361cf65a1aa9e837129ca9b982464d3f7957
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:50:57.591Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136, actual 92891472f8e492f8f038e639cc38178b767e3b0b16b7afea28b04b8ede514f29
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:51:56.820Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:52:57.017Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:53:32.438Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:53:39.293Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:56:27.735Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee, actual 506b0a19cbbf9d7204d9646b4b80361cf65a1aa9e837129ca9b982464d3f7957
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 21:56:38.576Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:00:48.556Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:01:41.664Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee, actual 506b0a19cbbf9d7204d9646b4b80361cf65a1aa9e837129ca9b982464d3f7957
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:02:26.222Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02, actual 545157dda0d40c614e9e302a36575f498d96c28e4ea865952faa2ac9054126b7
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:02:48.447Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:03:27.102Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:05:26.954Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:05:49.968Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee, actual 506b0a19cbbf9d7204d9646b4b80361cf65a1aa9e837129ca9b982464d3f7957
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:06:17.243Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:07:12.698Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:07:30.301Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:07:31.841Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:08:02.230Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:08:33.736Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:10:04.733Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
2025-03-13 22:11:00.324Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:12:39.457Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:13:14.804Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
2025-03-13 22:13:14.805Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
2025-03-13 22:13:14.805Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143` to `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:14:25.905Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:15:14.729Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
2025-03-13 22:15:14.733Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
2025-03-13 22:15:14.733Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee` to `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:15:30.679Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:17:13.366Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:19:17.509Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:19:30.579Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:20:06.057Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:21:47.895Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:23:25.908Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:26:14.006Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:26:18.358Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:27:10.871Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:29:11.255Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:30:47.714Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:31:01.511Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
2025-03-13 22:32:44.930Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:34:28.696Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:34:42.442Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
2025-03-13 22:34:42.443Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
2025-03-13 22:34:42.443Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453` to `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:35:05.495Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:36:00.351Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:37:08.495Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
2025-03-13 22:38:11.306Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:39:26.619Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:39:36.850Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:42:24.328Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:42:50.594Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:44:32.603Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:44:38.556Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/8080dab8-371f-404a-8ed3-a8b0a6301674/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
2025-03-13 22:44:38.557Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
    to = /pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
2025-03-13 22:44:38.557Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/tmp/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143` to `/pool/int/f537faa0-baf5-489a-a734-d7bd3f7e7cbf/update/4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:45:28.160Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:103::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:46:15.398Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:48:47.296Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143, actual a94307aac6939043f242fa3280e8dfba5301a0298e2be9d46f10fe72ec886f75
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:51:25.312Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:52:47.559Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:53:00.156Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:54:34.175Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:55:08.282Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 22:57:52.513Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:01:21.485Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:01:32.753Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:02:31.747Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:03:16.652Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 26.945955ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:16.978Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 124.388761ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:17.127Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 186.807582ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:17.347Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 506.384474ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:17.874Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 980.528637ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:18.980Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 980.727519ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:20.157Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 979.673622ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:21.224Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 826.615676ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:22.127Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 1.153512765s
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:23.374Z WARN SledAgent/1322 (ServiceManager) on g0: wait for service svc:/milestone/single-user:default in zone Some("oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559") failed: Property not found. retry in 827.63921ms
    file = illumos-utils/src/svc.rs:42
    zone = oxz_crucible_pantry_6ce681b4-565a-4a04-b3c8-040baaa28559
2025-03-13 23:03:31.785Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:05:50.261Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143 from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:08:12.833Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:09:11.478Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Digest mismatch: expected 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453, actual 49e79473c5777cb802f28f75a29a555291e6e7592aeb526c1abb1890f516760b
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:09:23.376Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee
2025-03-13 23:10:27.727Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = 99264cf7-16cf-48a4-b965-ceb8d565fdee

So far I've only looked into the first one. It's timing out trying to talk to sled g1:

root@oxz_switch:~# omdb db sleds     
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:102::4]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:102::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (129.0.0)
note: listing all commissioned sleds (use -F to filter, e.g. -F in-service)
 SERIAL  IP                             ROLE      POLICY      STATE   ID
 g1      [fd00:1122:3344:102::1]:12345  -         in service  active  4c7db15a-3c2c-4487-8d55-959c33141129
 g0      [fd00:1122:3344:101::1]:12345  scrimlet  in service  active  99264cf7-16cf-48a4-b965-ceb8d565fdee
 g3      [fd00:1122:3344:103::1]:12345  scrimlet  in service  active  efa473c5-d076-4842-9b79-0fbf3d6effe2

That sled is up. Neither the sled nor its sled agent have restarted since well before this:

root@g1:~# uptime
23:36:01    up  6:12,  1 user,  load average: 0.82, 0.92, 0.97
root@g1:~# last reboot
reboot    system boot                   Thu Mar 13 17:23

wtmp begins Thu Mar 13 17:23
root@g1:~# svcs sled-agent
STATE          STIME    FMRI
online         17:27:14 svc:/oxide/sled-agent:default
root@g1:~# date
March 13, 2025 at 11:36:12 PM UTC

grep'ing for that first artifact id in g1's sled agent logs I see lots of requests for it that seem normal. The first one after the time of that error is:

21:49:43.825Z INFO SledAgent: Retrieved artifact
    file = sled-agent/src/artifact_store.rs:183
    path = /pool/int/6b56d446-12b4-4c22-a31d-84d6472e6585/update/558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02
    sha256 = 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02
    sled_id = 4c7db15a-3c2c-4487-8d55-959c33141129
21:49:43.825Z INFO SledAgent: request completed
    file = /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 1916
    local_addr = [fd00:1122:3344:102::1]:12348
    method = GET
    remote_addr = [fd00:1122:3344:101::1]:44043
    req_id = 2f2a0ffc-a983-4104-b056-f09094ff9544
    response_code = 200
    sled_id = 4c7db15a-3c2c-4487-8d55-959c33141129
    uri = /artifact/sha256/558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02

That is from g0 based on the IP. It finished 5 minutes later than the error reported on g0 but from the latency, we can see that it was quick. I'm not sure it's the same request at all. If it is, it was queued 5 minutes somewhere.

I'm not sure what to make of the Failed to rename temporary file due to No such file or directory (os error 2).

I was doing a bunch of other stuff here but I don't think it was relevant and most of it was after these early errors.

@davepacheco davepacheco changed the title TUF artifact distribution errors TUF artifact distribution errors in sled agent log Mar 13, 2025
@davepacheco
Copy link
Collaborator Author

I later found out that this system ran out of disk space. It's conceivable that that could cause a bunch of these issues:

  • running low on pool space (not filesystem space) can result in very slow writes, which could result in timing out requests that are trying to write data to disk
  • maybe the failure to rename the temporary file could have resulted from having failed to create the origin file due to the disk space problem? but it would seem like a second bug that we didn't fail on that problem

@iliana
Copy link
Contributor

iliana commented Mar 20, 2025

The failure to rename the temporary file is weird. We create the files here:

let mut files = Vec::new();
let mut last_error = None;
let mut datasets = 0;
for mountpoint in self.storage.artifact_storage_paths().await? {
datasets += 1;
let temp_dir = mountpoint.join(TEMP_SUBDIR);
if let Err(err) = tokio::fs::create_dir(&temp_dir).await {
if err.kind() != ErrorKind::AlreadyExists {
log_and_store!(
last_error, &self.log, "create", temp_dir, err
);
continue;
}
}
let temp_path =
Utf8TempPath::from_path(temp_dir.join(sha256.to_string()));
let file = match OpenOptions::new()
.write(true)
.create_new(true)
.open(&temp_path)
.await
{
Ok(file) => file,
Err(err) => {
if err.kind() == ErrorKind::AlreadyExists {
return Err(Error::AlreadyInProgress { sha256 });
} else {
let path = temp_path.to_path_buf();
log_and_store!(
last_error, &self.log, "create", path, err
);
continue;
}
}
};
let file = NamedUtf8TempFile::from_parts(file, temp_path);
files.push(Some((file, mountpoint)));
}

Of course we only get back a Utf8TempFile if creating the file succeeded. files gets moved into an ArtifactWriter, and each chunk is written with:

async fn write(&mut self, chunk: impl AsRef<[u8]>) -> Result<(), Error> {
self.hasher.update(&chunk);
let mut last_error = None;
for option in &mut self.files {
if let Some((mut file, mountpoint)) = option.take() {
match file.as_file_mut().write_all(chunk.as_ref()).await {
Ok(()) => {
*option = Some((file, mountpoint));
}
Err(err) => {
let path = file.path().to_owned();
log_and_store!(
last_error, &self.log, "write to", path, err
);
// `file` and `final_path` are dropped here, cleaning up
// the file
}
}
}
}
self.files.retain(Option::is_some);
if self.files.is_empty() {
Err(last_error.unwrap_or(Error::NoUpdateDataset))
} else {
Ok(())
}
}

So any errors in writing to the file will drop the Utf8TempFile, which is supposed to delete the file, but then that file is no longer in the list that we attempt to persist:

let mut last_error = None;
let mut successful_writes = 0;
for (mut file, mountpoint) in self.files.into_iter().flatten() {
// 1. fsync the temporary file.
if let Err(err) = file.as_file_mut().sync_all().await {
let path = file.path().to_owned();
log_and_store!(last_error, &self.log, "sync", path, err);
continue;
}
// 2. Open the parent directory so we can fsync it.
let parent_dir = match File::open(&mountpoint).await {
Ok(dir) => dir,
Err(err) => {
log_and_store!(
last_error, &self.log, "open", mountpoint, err
);
continue;
}
};
// 3. Rename the temporary file.
let final_path = mountpoint.join(self.sha256.to_string());
let moved_final_path = final_path.clone();
if let Err(err) = tokio::task::spawn_blocking(move || {
file.persist(&moved_final_path)
})
.await?
{
error!(
&self.log,
"Failed to rename temporary file";
"error" => &err.error,
"from" => err.file.path().as_str(),
"to" => final_path.as_str(),
);
last_error = Some(Error::FileRename {
from: err.file.path().to_owned(),
to: final_path,
err: err.error,
});
continue;
}
// 4. fsync the parent directory.
if let Err(err) = parent_dir.sync_all().await {
log_and_store!(last_error, &self.log, "sync", mountpoint, err);
continue;
}
successful_writes += 1;
}

@davepacheco
Copy link
Collaborator Author

I have since torn down this a4x2 and started another one and uploaded a TUF repo. This one has not run out of disk space. I'm seeing these errors:

2025-03-21 15:44:38.250Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-21 15:46:34.051Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-21 15:49:24.640Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-21 15:49:53.725Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-21 15:53:38.074Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169

They're all for that one artifact, d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f. And that one isn't in /pool/int/*/update. So it seems to persistently have trouble with that one artifact. Only trouble is: I don't see that shasum in my TUF repo at all. There's no file in "targets" with that name, no file has that shasum, and grep -r of the unpacked repo doesn't show it.

So what is it?

root@[fd00:1122:3344:101::3]:32221/omicron> \x
root@[fd00:1122:3344:101::3]:32221/omicron> select * from tuf_artifact where sha256 = 'd7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f';
-[ RECORD 1 ]
id            | 71ef5e44-9051-45be-8ad0-b9fecf28f17b
name          | host
version       | 00000013.00000000.00000000-0.ci+git3802ae4736a
kind          | host_phase_2
time_created  | 2025-03-19 16:02:45.874441+00
sha256        | d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
artifact_size | 1048580096


Time: 1ms total (execution 1ms / network 1ms)

Knowing that, I found it inside my TUF repo, inside targets/2cd3225eff353a38355b46967d7c974bb4c71e2ee0c2bd5309b0ce040dbea65f.host-host-13.0.0-0.ci+git3802ae4736a.tar.gz, it's the shasum of image/zfs.img.

That does appear to be the largest individual artifact:

root@[fd00:1122:3344:101::3]:32221/omicron> select artifact_size,name,version,kind from tuf_artifact ORDER BY artifact_size DESC LIMIT 5;
  artifact_size |       name        |                    version                     |        kind
----------------+-------------------+------------------------------------------------+---------------------
     1048580096 | host              | 00000013.00000000.00000000-0.ci+git3802ae4736a | host_phase_2
      311880120 | clickhouse        | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone
      297421539 | clickhouse_keeper | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone
      147303806 | cockroachdb       | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone
      129882273 | trampoline        | 00000013.00000000.00000000-0.ci+git3802ae4736a | trampoline_phase_2
(5 rows)


Time: 2ms total (execution 1ms / network 1ms)

I wonder if we're using a timeout that's not taking into account data transfer activity (e.g., an end-to-end request timeout) and isn't big enough for an artifact this size?

(If it's trying and failing repeatedly, that would also explain why my a4x2 filled up in the first place. In oxidecomputer/testbed#100 I had found that the M.2 zpools were much smaller than the files used to back them, which would make sense if something was rewriting them constantly, but I couldn't figure out what was doing that.)

@davepacheco
Copy link
Collaborator Author

I spotted Sled Agent writing the temporary file for this artifact and started watching it:

root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# while sleep 1; do date +%T; ls -l tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f; done
tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f: No such file or directory
16:52:02
tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f: No such file or directory
16:52:03
-rw-r--r--   1 root     root       13002 Mar 21 16:52 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
16:52:04
-rw-r--r--   1 root     root      450560 Mar 21 16:52 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
...
-rw-r--r--   1 root     root     14049280 Mar 21 16:53 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed83244889
16:53:03
-rw-r--r--   1 root     root     14172160 Mar 21 16:53 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
16:53:04
tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f: No such file or directory
16:53:05
tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f: No such file or directory
16:53:06

along with what you'd expect in the log around 16:53:03:

# grep d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f $(svcs -L sled-agent) | tail | looker
...
16:53:04.054Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = error streaming request body: error reading a body from connection
    error_message_internal = error streaming request body: error reading a body from connection
    file = /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 60575998
    local_addr = [fd00:1122:3344:101::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:102::5]:42636
    req_id = 07503e18-7841-47cb-a228-010c2cfc3f53
    response_code = 400
    uri = /artifacts/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
16:53:13.023Z ERRO SledAgent: Failed to write artifact
    err = Error fetching artifact d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f from depot at http://[fd00:1122:3344:102::1]:12348: Invalid Response Body Bytes: error decoding response body: error decoding response body: request or response body error: operation timed out
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169

From this I think we can infer:

  • it was trying to download it for around 60 seconds
  • in that time it had downloaded just over 14172160 bytes (230 KiBps)
  • the full file would have taken about 74 minutes. That's pretty slow, but a4x2 is known to be slow, especially around networking, so I'm not sure this is a bug.

What's surprising then is that the next largest artifact, 311880120 bytes, ever works in 60 seconds. It seems like that should take 22 minutes, too. I wonder if I can see how long it took?

That was the clickhouse artifact, which is e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136:

root@[fd00:1122:3344:101::3]:32221/omicron> select artifact_size,name,version,kind,sha256 from tuf_artifact ORDER BY artifact_size DESC LIMIT 5;
  artifact_size |       name        |                    version                     |        kind        |                              sha256
----------------+-------------------+------------------------------------------------+--------------------+-------------------------------------------------------------------
     1048580096 | host              | 00000013.00000000.00000000-0.ci+git3802ae4736a | host_phase_2       | d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
      311880120 | clickhouse        | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone               | e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
      297421539 | clickhouse_keeper | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone               | 558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02
      147303806 | cockroachdb       | 00000013.00000000.00000000-0.ci+git3802ae4736a | zone               | 84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
      129882273 | trampoline        | 00000013.00000000.00000000-0.ci+git3802ae4736a | trampoline_phase_2 | 4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
(5 rows)

That's on disk here but much smaller than artifact_size in the database:

root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# ls -l e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
-rw-r--r--   1 root     root     67069360 Mar 19 20:57 e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136

Comparing its creation and modification timestamps:

root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# ls -lE e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
-rw-r--r--   1 root     root     67069360 2025-03-19 20:57:28.981367395 +0000 e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# ls -lE%crtime e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
-rw-r--r--   1 root     root     67069360 2025-03-19 20:54:01.150904007 +0000 e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136

It looks like it maybe took over 3 minutes (unless something modified a few minutes after it was transferred).

I looked for long messages about this one and did see one of the rename errors, for what it's worth:

2025-03-19 21:06:02.309Z ERRO SledAgent/1322 on g0: Failed to rename temporary file
    error = No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:556
    from = /pool/int/e8357cff-2e9e-43bc-a8de-827cd1997d1b/update/tmp/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
    to = /pool/int/e8357cff-2e9e-43bc-a8de-827cd1997d1b/update/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
2025-03-19 21:06:02.310Z ERRO SledAgent/1322 on g0: Failed to write artifact
    err = Failed to rename `/pool/int/e8357cff-2e9e-43bc-a8de-827cd1997d1b/update/tmp/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136` to `/pool/int/e8357cff-2e9e-43bc-a8de-827cd1997d1b/update/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136`: No such file or directory (os error 2)
    file = sled-agent/src/artifact_store.rs:382
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169

along with a bunch of:

2025-03-19 22:12:20.573Z INFO SledAgent/1322 on g0: request completed
    file = /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 639
    local_addr = [fd00:1122:3344:101::1]:12348
    method = GET
    remote_addr = [fd00:1122:3344:103::1]:45785
    req_id = 423285c3-b93f-46d8-b2bc-a2fee8abad2c
    response_code = 200
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
    uri = /artifact/sha256/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
2025-03-19 22:22:21.484Z INFO SledAgent/1322 on g0: Retrieved artifact
    file = sled-agent/src/artifact_store.rs:183
    path = /pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
    sha256 = e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-19 22:22:21.484Z INFO SledAgent/1322 on g0: request completed
    file = /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 516
    local_addr = [fd00:1122:3344:101::1]:12348
    method = GET
    remote_addr = [fd00:1122:3344:103::1]:63314
    req_id = 40105fc4-2fd9-42e3-8190-804582ae802e
    response_code = 200
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
    uri = /artifact/sha256/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
2025-03-19 22:39:04.881Z INFO SledAgent/1322 on g0: Retrieved artifact
    file = sled-agent/src/artifact_store.rs:183
    path = /pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
    sha256 = e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
2025-03-19 22:39:04.881Z INFO SledAgent/1322 on g0: request completed
    file = /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 442
    local_addr = [fd00:1122:3344:101::1]:12348
    method = GET
    remote_addr = [fd00:1122:3344:103::1]:34730
    req_id = 8b8c497c-11f1-4219-a3e8-2d05260a1470
    response_code = 200
    sled_id = ade87960-6c43-4000-bb0d-e3d624663169
    uri = /artifact/sha256/e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136

I'm not sure what's going on here, in terms of: why we made these requests multiple times, why the renames happened, and if there's a 60s timeout here as it sort of looked above, how did this file manage to get transferred?

@davepacheco
Copy link
Collaborator Author

I don't see any 60-second timeout but there does appear to be a 15-second read timeout in the artifact store. I wonder if this is bursty and we hit a 15-second gap with no reads? Here's a full cycle:

-rw-r--r--   1 root     root       49152 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:52
-rw-r--r--   1 root     root      157826 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:53
-rw-r--r--   1 root     root      472762 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:54
-rw-r--r--   1 root     root      831026 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:55
-rw-r--r--   1 root     root      936794 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:56
-rw-r--r--   1 root     root     1394096 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:57
-rw-r--r--   1 root     root     1828264 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:58
-rw-r--r--   1 root     root     2303402 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:13:59
-rw-r--r--   1 root     root     2321242 Mar 21 17:13 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:00
-rw-r--r--   1 root     root     2665304 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:01
-rw-r--r--   1 root     root     2835154 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:02
-rw-r--r--   1 root     root     2885040 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:03
-rw-r--r--   1 root     root     2885040 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:05
-rw-r--r--   1 root     root     2999722 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:06
-rw-r--r--   1 root     root     3281168 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:07
-rw-r--r--   1 root     root     3335594 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:08
-rw-r--r--   1 root     root     3769776 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:09
-rw-r--r--   1 root     root     3805456 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:10
-rw-r--r--   1 root     root     3805456 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:11
-rw-r--r--   1 root     root     4112384 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:12
-rw-r--r--   1 root     root     4571136 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:13
-rw-r--r--   1 root     root     5219032 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:14
-rw-r--r--   1 root     root     5342634 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:15
-rw-r--r--   1 root     root     5796112 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:16
-rw-r--r--   1 root     root     6481322 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:17
-rw-r--r--   1 root     root     6784426 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:18
-rw-r--r--   1 root     root     6829026 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:19
-rw-r--r--   1 root     root     6829026 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:20
-rw-r--r--   1 root     root     7256472 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:21
-rw-r--r--   1 root     root     7646770 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:22
-rw-r--r--   1 root     root     7876146 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:23
-rw-r--r--   1 root     root     8110808 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:24
-rw-r--r--   1 root     root     8164328 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:25
-rw-r--r--   1 root     root     8398984 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:26
-rw-r--r--   1 root     root     8850992 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:27
-rw-r--r--   1 root     root     9275522 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:28
-rw-r--r--   1 root     root     9705874 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:29
-rw-r--r--   1 root     root     10081976 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:30
-rw-r--r--   1 root     root     10267480 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:31
-rw-r--r--   1 root     root     10513240 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:32
-rw-r--r--   1 root     root     10897536 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:33
-rw-r--r--   1 root     root     10956338 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:34
-rw-r--r--   1 root     root     11305682 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:35
-rw-r--r--   1 root     root     11689984 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:36
-rw-r--r--   1 root     root     12116690 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:37
-rw-r--r--   1 root     root     12698320 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:38
-rw-r--r--   1 root     root     12698320 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:39
-rw-r--r--   1 root     root     13022904 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:40
-rw-r--r--   1 root     root     13329106 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:41
-rw-r--r--   1 root     root     13382626 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:42
-rw-r--r--   1 root     root     13710312 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:43
-rw-r--r--   1 root     root     14306866 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:44
-rw-r--r--   1 root     root     14674776 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:45
-rw-r--r--   1 root     root     14697898 Mar 21 17:14 tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f
17:14:46
tmp/d7cd26474af78aeef326f294ad0bb9238d07598262929022168ed8324488984f: No such file or directory
17:14:47
tmp/d

Here's the delta in each second:

awk '/d7cd/{ print $5 - last; last = $5; }'
49152
108674
314936
358264
105768
457302
434168
475138
17840
344062
169850
49886
0
114682
281446
54426
434182
35680
0
306928
458752
647896
123602
453478
685210
303104
44600
0
427446
390298
229376
234662
53520
234656
452008
424530
430352
376102
185504
245760
384296
58802
349344
384302
426706
581630
0
324584
306202
53520
327686
596554
367910
23122
-14697898

It's definitely bursty but hard to say if it's enough to be a problem here.

@iliana
Copy link
Contributor

iliana commented Mar 21, 2025

Hmmmmm.

The fact that e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136 (clickhouse) persisted with a size of 67069360 when the database says 311880120 is fairly concerning. I don't think we actually check the content length before persisting, so I would be interested if it matched the hash.

I don't know if we want to make the timeout longer in order to deal with softnpu being slow? I guess I could add a configuration knob for the timeout. The goal of the timeout is to prevent a single broken sled from stopping artifact replication to all the other sleds, since Nexus only sends N requests at a time.

@iliana
Copy link
Contributor

iliana commented Mar 21, 2025

why we made these requests multiple times

This could be a race with the filesystem. I attempted to make it so that only one task could have a file in tmp/{hash} at once:

let temp_path =
Utf8TempPath::from_path(temp_dir.join(sha256.to_string()));
let file = match OpenOptions::new()
.write(true)
.create_new(true)
.open(&temp_path)
.await
{
Ok(file) => file,
Err(err) => {
if err.kind() == ErrorKind::AlreadyExists {
return Err(Error::AlreadyInProgress { sha256 });

but maybe there are consistency issues I'm not aware of. It's possible that task A started writing a file, task B started writing again and overwrote it, and then task A finished and persisted task B's file to the real location.

Maybe we should have an in-memory list of in-progress files, and not rely on the filesystem as a "lock"?

@davepacheco
Copy link
Collaborator Author

Yikes, you're right -- not all of these files have the right shasum:

root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# shasum -a 256 *
01c74800132ec8a567b8ecd876a247683ae607b5eeb168092e2da4b8b50c0548  01c74800132ec8a567b8ecd876a247683ae607b5eeb168092e2da4b8b50c0548
05d45d18ff802c79a340c59cbad3100efee93c74ae1d807db3b7cbff8ed0f4c1  05d45d18ff802c79a340c59cbad3100efee93c74ae1d807db3b7cbff8ed0f4c1
17d6f6f6221614638df5f4f694eb4d9d15aa8bca8affb29e507afaa04e79ad63  17d6f6f6221614638df5f4f694eb4d9d15aa8bca8affb29e507afaa04e79ad63
1d2be23df37418cb0362aa66e3f70604f5697c5d08c017423015b7f2c2255ebb  1d2be23df37418cb0362aa66e3f70604f5697c5d08c017423015b7f2c2255ebb
20faf5a2a8591e1a90bb6062d6934a83478b5a80184aed4b10a121017af658ac  20faf5a2a8591e1a90bb6062d6934a83478b5a80184aed4b10a121017af658ac
2456c2f999905d4c809b640ae7327a8e893a86dc536c0288a1aa2aff19bb431b  2456c2f999905d4c809b640ae7327a8e893a86dc536c0288a1aa2aff19bb431b
2e139c4eba5e13df5fccbb2716a49741d693f23ef71db9e3cb0e31153488e7ed  2e139c4eba5e13df5fccbb2716a49741d693f23ef71db9e3cb0e31153488e7ed
33862176f4c1c4d3abd446b82a69af09ff17bbe8b2719dcf763ebbeedd0bd466  33862176f4c1c4d3abd446b82a69af09ff17bbe8b2719dcf763ebbeedd0bd466
346036b708083872c8afc279dcae79b1c012046909cbaa34e3a75326a891dd1c  346036b708083872c8afc279dcae79b1c012046909cbaa34e3a75326a891dd1c
46afc89a8e0602c5c153fc67bd5b67ed1e35a238db5759fd800e105bf8a7bed6  46afc89a8e0602c5c153fc67bd5b67ed1e35a238db5759fd800e105bf8a7bed6
47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170  47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170
48442eae71fb575ba20775c8b44ce189d93343c92cf818a51c7d291f4589b5eb  48442eae71fb575ba20775c8b44ce189d93343c92cf818a51c7d291f4589b5eb
4b3e218ddbab57d2015448d12725bbea2bbb3801351059f1fa6e865ec50dce36  4b3e218ddbab57d2015448d12725bbea2bbb3801351059f1fa6e865ec50dce36
4b489fc6c5cf98f88ba252361da68e138c02f3f8b47031f66becddd0996e3a2b  4b489fc6c5cf98f88ba252361da68e138c02f3f8b47031f66becddd0996e3a2b
4d1b6e39516a4b2ff089cca30519af190e4705bc4b248980d33550bb4a367d2e  4d1b6e39516a4b2ff089cca30519af190e4705bc4b248980d33550bb4a367d2e
75b12d0aa814597e02535f54eaadcba6f523ebbcdec143ac2cd8ed8231b63dc6  4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
51678ddf5bafb1f877d28a46fa55518db6db9725836dc7cbd4c74ffea2233cb2  51678ddf5bafb1f877d28a46fa55518db6db9725836dc7cbd4c74ffea2233cb2
554468a322342f6e4157fa086cae6e0a6358ebcfacd564b8d870d71ffbe5e1c4  554468a322342f6e4157fa086cae6e0a6358ebcfacd564b8d870d71ffbe5e1c4
a34c7786ef899398e633a05a086707746633101fb8a50fece8ca70154a9543eb  558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02
5d7368712bbd8c65bca9eb8815b2eb801164fdba9a6a9876f6f88aacfc8dcc98  5d7368712bbd8c65bca9eb8815b2eb801164fdba9a6a9876f6f88aacfc8dcc98
607369d9afa1e1da040f299732f4ac9323d823809ad0e09970c06af791b6ce97  607369d9afa1e1da040f299732f4ac9323d823809ad0e09970c06af791b6ce97
6412baf9c3570bcc25585547550bea6f339c365679451d02f05031c08e6f3e3e  6412baf9c3570bcc25585547550bea6f339c365679451d02f05031c08e6f3e3e
66e480d0a051d413e8c6bcf324df0df2f75235da31f8857a6ac8336c30ec703a  66e480d0a051d413e8c6bcf324df0df2f75235da31f8857a6ac8336c30ec703a
68d9f36406cdc412d8d0bdaeab8942577ac91d4de285404b081cdbd8bcaa595c  68d9f36406cdc412d8d0bdaeab8942577ac91d4de285404b081cdbd8bcaa595c
6d5839de7781f4ee0e1b22a3d8cc6f659619bfff47ccc77b7c172c3d69aa3eea  6d5839de7781f4ee0e1b22a3d8cc6f659619bfff47ccc77b7c172c3d69aa3eea
76c281582ca77970a0a975a3602ea199afec24963403c2d0c7563808a5f356a1  76c281582ca77970a0a975a3602ea199afec24963403c2d0c7563808a5f356a1
7a29912b5916ae21653c48418a853cc1fa3181ab22726bfc3f7af9e72ee59100  7a29912b5916ae21653c48418a853cc1fa3181ab22726bfc3f7af9e72ee59100
7cf7d9402b3fa7f33a8f38acd7bd3a2e914ec9c0ce8ba4936319903e61c21bf2  7cf7d9402b3fa7f33a8f38acd7bd3a2e914ec9c0ce8ba4936319903e61c21bf2
800bce80d0c34231ae2ce5adb4ee7906a145910f82b995a2545651694ea31ecc  800bce80d0c34231ae2ce5adb4ee7906a145910f82b995a2545651694ea31ecc
4a0e0aeda8268d102e07e76fd957cdfa58eeb3915667755147015dbb5cbb0cdc  84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
86a405027e89bfb59a454357dc1e636f9adab834604984be93b365aeaa0fbb47  86a405027e89bfb59a454357dc1e636f9adab834604984be93b365aeaa0fbb47
957dd23fa8669f97ed9be8205b74fef49fe1b1b16b214a6de4d6d766c4bce82d  957dd23fa8669f97ed9be8205b74fef49fe1b1b16b214a6de4d6d766c4bce82d
9b48657d83e47d22286d47711c21489cc6d33e916684e53814687ef82bebbca0  9b48657d83e47d22286d47711c21489cc6d33e916684e53814687ef82bebbca0
9f4ae74f32343a62ce6e113b1a58d7e71724ad642929b1eb722359d6575877e8  9f4ae74f32343a62ce6e113b1a58d7e71724ad642929b1eb722359d6575877e8
a85fbe6a22637be452d7c4217ed8f80f46322eb30e3cd2c90a2c8c8f6a0eb33d  a85fbe6a22637be452d7c4217ed8f80f46322eb30e3cd2c90a2c8c8f6a0eb33d
af1f8df022e17f6305a5ae861af4efef687660958b006eda031ab98501a91319  af1f8df022e17f6305a5ae861af4efef687660958b006eda031ab98501a91319
b026592c22bd97de12a4d1d951f7273795373f12f51ca0de7bdfa1b0097dc635  b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
bcc76cf4be5c88588a6665d1357c028a121118da9c057bfba64455cd348a799a  bcc76cf4be5c88588a6665d1357c028a121118da9c057bfba64455cd348a799a
c2f06c89e9d9f257e7a2aef8f5a2c5c5b42cd6e965f1c04cdd29224f880412a6  c2f06c89e9d9f257e7a2aef8f5a2c5c5b42cd6e965f1c04cdd29224f880412a6
c7794a0b7d7a13f903fe0b84a948c0f13300a264fdbb2693fe7b0940182499c5  c7794a0b7d7a13f903fe0b84a948c0f13300a264fdbb2693fe7b0940182499c5
c7fdab7f3f36f60c2aa433b9383a5e1df723f605f4a7d09c17662a9de4e39b41  c7fdab7f3f36f60c2aa433b9383a5e1df723f605f4a7d09c17662a9de4e39b41
c9c627baf01a04b0e9f0af75f3e596b1399563b8e092c73f9f265224192410a5  c9c627baf01a04b0e9f0af75f3e596b1399563b8e092c73f9f265224192410a5
d3d280f9f1279fbe19709da0d99b89be9f8c2264a06f3347c2e91f74934cf2c2  d3d280f9f1279fbe19709da0d99b89be9f8c2264a06f3347c2e91f74934cf2c2
dd3dcce349ebcf64474771f39249abe846545f1921f336336d39d3bbb4f66865  dd3dcce349ebcf64474771f39249abe846545f1921f336336d39d3bbb4f66865
e8362534dea966e25de054cf8b962761b0fa00326d55ec17e913352ec1b0d6d0  e8362534dea966e25de054cf8b962761b0fa00326d55ec17e913352ec1b0d6d0
0dd9e2106fd965de0c712a4876fe859a015c86bc692d12134ac7028fc033820e  e92d4006d1fdad17821c996c95a7082c420163f1c5da828b93f35feae6f6a956
a441ee4db430b045202acd21aba3fb4f876e792ea82cd85e8481e604e09dd234  e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
19de3101c753b9a942853e68274e915cc59f78850ebb0fc796bf9024974bae2c  f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
f716fa221d85491d2a71b55165951aaf9f9e4318cf67e88fce80367b2cd99e13  f716fa221d85491d2a71b55165951aaf9f9e4318cf67e88fce80367b2cd99e13
f8e8d62638b0cb82c566c0abfea306961fd8372024127daa87d7105aa86279cd  f8e8d62638b0cb82c566c0abfea306961fd8372024127daa87d7105aa86279cd
f90371c0d51c58820eaf2ba14ef4da313113a049ab37dfbd8652e2ca2c3174c4  f90371c0d51c58820eaf2ba14ef4da313113a049ab37dfbd8652e2ca2c3174c4
shasum: tmp: Is a directory

These are wrong:

root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# shasum -a 256 * | awk '$1 != $2'
75b12d0aa814597e02535f54eaadcba6f523ebbcdec143ac2cd8ed8231b63dc6  4de5a57451feb11ccff6dd319a7529fc16826d4169279048eab5ee8678e68143
a34c7786ef899398e633a05a086707746633101fb8a50fece8ca70154a9543eb  558077b9a0089f2fb721de58eeafd45cbf241226fee377726e73bf1dcb501c02
4a0e0aeda8268d102e07e76fd957cdfa58eeb3915667755147015dbb5cbb0cdc  84fab1f98720d3106b19a4aa79a28d12e7401e704495a0060d2cc68e5c743453
b026592c22bd97de12a4d1d951f7273795373f12f51ca0de7bdfa1b0097dc635  b89eb69870022eb2611a5f4de1263391b1b63930e11329828a1a018ed9d4f000
0dd9e2106fd965de0c712a4876fe859a015c86bc692d12134ac7028fc033820e  e92d4006d1fdad17821c996c95a7082c420163f1c5da828b93f35feae6f6a956
a441ee4db430b045202acd21aba3fb4f876e792ea82cd85e8481e604e09dd234  e9cbe9cd9b4cbb246d64c50f88dd385a58b19a9749d704f57045b10184c95136
19de3101c753b9a942853e68274e915cc59f78850ebb0fc796bf9024974bae2c  f5dd8f8afe80cc8029c3c27fcd5aba5b8352eb611ab64fb6772ea0233f9202ee
shasum: tmp: Is a directory
root@g0:/pool/int/b1f224b0-08ed-4125-9f4a-79170fd1a559/update# 

@iliana
Copy link
Contributor

iliana commented Mar 21, 2025

Yeahhhhh that's a bug alright!

I guess the question is whether I'm holding POSIX filesystem semantics wrong or if we're implementing those wrong? I'm guessing it's the former.

@iliana
Copy link
Contributor

iliana commented Mar 25, 2025

In chat today @jgallagher pointed out the likely culprit for why we're getting truncated files in the artifact store: we're creating this Utf8TempPath before checking to see if the file already existed. We return the error and the Utf8TempPath is dropped, deleting another task's temporary files.

let temp_path =
Utf8TempPath::from_path(temp_dir.join(sha256.to_string()));
let file = match OpenOptions::new()
.write(true)
.create_new(true)
.open(&temp_path)
.await
{
Ok(file) => file,
Err(err) => {
if err.kind() == ErrorKind::AlreadyExists {
return Err(Error::AlreadyInProgress { sha256 });

I was able to write a regression test for this to help me figure out the scenario where we'd hit this. It's pretty weird, as it requires four attempts to write an artifact:

  1. Writer A is created and starts writing.
  2. Writer B fails to create, returning AlreadyInProgress. This triggers a logic error where a Utf8TempPath is dropped, unlinking one of writer A's temporary files.
  3. Writer C fails to create, returning AlreadyInProgress. Similarly to writer B, this unlinks the other of writer A's temporary files.
  4. Writer D is created successfully because writer A's files are no longer present.
  5. Writer A finishes and incorrectly persists writer C's incomplete files.
  6. Writer D finishes and fails because its files have already been moved.

Why would there be four writers with only three Nexus instances? The artifact replication task (in theory) does not attempt to write an artifact to a sled more than once; either it is PUT directly by Nexus or it is told to get the artifact from another sled. I suppose that the severely poor network conditions of an a4x2 could result in:

  1. Nexus A has a repo uploaded to it, and it begins pushing artifacts out. This is very slow.
  2. Nexus B and C's replication tasks wake up, and it notices that some sleds have artifacts that other sleds don't have, so it starts sending all the copy requests it can. The two requests have combined to delete both temporary files on a sled.
  3. Five minutes (?!) later, one of Nexus B or C wakes up again and sends a copy request to the same sled for the same artifact. Both temporary files are gone, so it sends the HTTP request and starts writing to new temporary files.
  4. Either the PUT or the third copy request finishes. If the PUT finishes first, incomplete files are persisted to the store.
  5. The other request finishes and logs that it can't persist its files because they're already gone.

Since I have a regression test and a fix, I'll put a PR up for that. In that PR I'm also getting rid of the AlreadyInProgress check altogether, so I think it's safe to remove the read timeout; we only persist files if the data written matches the sha256 checksum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants