-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor children and write exit paths #80
Conversation
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
so we catch runtime exits Signed-off-by: Peter Hunt <pehunt@redhat.com>
Codecov Report
@@ Coverage Diff @@
## main #80 +/- ##
=====================================
Coverage 6.89% 6.89%
=====================================
Files 2 2
Lines 29 29
Branches 17 17
=====================================
Hits 2 2
Misses 27 27 |
conmon-rs/server/src/child_reaper.rs
Outdated
// Immediately spawn a thread to reduce risk of dropping | ||
// a SIGCHLD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...wow yeah, I think using pidfd would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one day..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conmon-rs/server/src/lib.rs
Outdated
// TODO FIXME Ideally we would drop after socket file is removed, | ||
// but the removal is taking longer than 10 seconds, indicating someone | ||
// is keeping it open... | ||
reaper.kill_children(handled_sig)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW do we need this versus PRCTL_PR_SET_PDEATHSIG
? See e.g. containers/containers-image-proxy-rs#16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pdeathsignal just kills conmon-rs and I'm not sure why. maybe something interacting with tokio?
034c220
to
f62218d
Compare
test failures were similar to kata-containers/kata-containers#1826 -- thanks kata folks for an inspiration for the fix |
35192a5
to
2f0908e
Compare
waitpid(Pid::from_raw(-1), None) | ||
}; | ||
|
||
let grandchildren = Arc::clone(&grandchildren); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This additional clone can be removed.
let grandchildren = Arc::clone(&grandchildren); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I don't have it I get
error[E0382]: use of moved value: `grandchildren`
--> conmon-rs/server/src/child_reaper.rs:23:43
|
23 | ... thread::spawn(move || {
| ^^^^^^^ value moved into closure here, in previous iteration of loop
24 | ... if let Err(e) = Self::forget_grandchild(grandchildren, grandchild_pid, exit_status) {
| ------------- use occurs due to use in closure
|
= note: move occurs because `grandchildren` has type `Arc<std::sync::Mutex<HashMap<i32, ReapableChild>>>`, which does not implement the `Copy` trait
conmon-rs/server/src/child_reaper.rs
Outdated
// Inspired by | ||
// https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27 | ||
let wait_status = { | ||
let wait_lock = Arc::clone(&wait_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this clone:
let wait_lock = Arc::clone(&wait_lock); |
conmon-rs/server/src/child_reaper.rs
Outdated
pub fn create_child<P, I, S>( | ||
&self, | ||
cmd: P, | ||
args: I, | ||
) -> Result<std::process::ExitStatus, std::io::Error> | ||
where | ||
P: AsRef<std::ffi::OsStr>, | ||
I: IntoIterator<Item = S>, | ||
S: AsRef<std::ffi::OsStr>, | ||
{ | ||
let mut cmd = std::process::Command::new(cmd); | ||
cmd.args(args); | ||
|
||
// To prevent an error: "No child processes (os error 10)", | ||
// we prevent our reaper thread from calling waitpid until this child has completed. | ||
// Inspired by | ||
// https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27 | ||
let wait_lock = Arc::clone(&self.wait_lock); | ||
let _lock = wait_lock.lock().unwrap(); | ||
|
||
cmd.spawn()?.wait() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be changed to avoid the unwrap
. Also, I'm not sure if we really need the let wait_lock = Arc::clone(&self.wait_lock);
:
diff --git a/conmon-rs/server/src/child_reaper.rs b/conmon-rs/server/src/child_reaper.rs
index d44756c9..4264ea7f 100644
--- a/conmon-rs/server/src/child_reaper.rs
+++ b/conmon-rs/server/src/child_reaper.rs
@@ -1,6 +1,6 @@
//! Child process reaping and management.
-use anyhow::{bail, Result};
+use anyhow::{bail, format_err, Context, Result};
use getset::Getters;
use log::{debug, error};
use nix::sys::signal::{kill, Signal};
@@ -60,11 +60,7 @@ impl ChildReaper {
Ok(())
}
- pub fn create_child<P, I, S>(
- &self,
- cmd: P,
- args: I,
- ) -> Result<std::process::ExitStatus, std::io::Error>
+ pub fn create_child<P, I, S>(&self, cmd: P, args: I) -> Result<std::process::ExitStatus>
where
P: AsRef<std::ffi::OsStr>,
I: IntoIterator<Item = S>,
@@ -77,10 +73,12 @@ impl ChildReaper {
// we prevent our reaper thread from calling waitpid until this child has completed.
// Inspired by
// https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27
- let wait_lock = Arc::clone(&self.wait_lock);
- let _lock = wait_lock.lock().unwrap();
+ let _ = &self
+ .wait_lock
+ .lock()
+ .map_err(|e| format_err!("take wait lock: {}", e))?;
- cmd.spawn()?.wait()
+ Ok(cmd.spawn()?.wait().context("wait for child")?)
}
pub fn watch_grandchild(&self, id: String, pid: i32, exit_paths: Vec<PathBuf>) -> Result<()> {
conmon-rs/server/src/child_reaper.rs
Outdated
) -> Result<()> { | ||
debug!("caught signal for pid {}", grandchild_pid); | ||
|
||
let mut map = locked_map.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do not unwrap:
let mut map = locked_map.lock().unwrap(); | |
let mut map = locked_grandchildren | |
.lock() | |
.map_err(|e| format_err!("lock grandchildren: {}", e))?; |
to be able to catch children exiting Signed-off-by: Peter Hunt <pehunt@redhat.com>
do so by moving runtime creation into child reaper and keeping a lock to synchronize Signed-off-by: Peter Hunt <pehunt@redhat.com>
2f0908e
to
2d5e937
Compare
This set of patches encompasses child handling in conmon-rs. specifically it:
you know, like a normal parent does 🙂