Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor children and write exit paths #80

Merged
merged 7 commits into from
Dec 7, 2021

Conversation

haircommander
Copy link
Collaborator

@haircommander haircommander commented Dec 2, 2021

This set of patches encompasses child handling in conmon-rs. specifically it:

  • adds state to keep track of the grandchildren container processes
  • sets self as subreaper to catch grandchildren exiting
  • catches the sigchld from the children/grandchildren and writes their exit codes down
  • kills the children its tracking upon exiting

you know, like a normal parent does 🙂

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
so we catch runtime exits

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2021

Codecov Report

Merging #80 (2d5e937) into main (80b5595) will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff          @@
##            main     #80   +/-   ##
=====================================
  Coverage   6.89%   6.89%           
=====================================
  Files          2       2           
  Lines         29      29           
  Branches      17      17           
=====================================
  Hits           2       2           
  Misses        27      27           

Comment on lines 31 to 32
// Immediately spawn a thread to reduce risk of dropping
// a SIGCHLD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...wow yeah, I think using pidfd would be cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one day..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO FIXME Ideally we would drop after socket file is removed,
// but the removal is taking longer than 10 seconds, indicating someone
// is keeping it open...
reaper.kill_children(handled_sig)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do we need this versus PRCTL_PR_SET_PDEATHSIG? See e.g. containers/containers-image-proxy-rs#16

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdeathsignal just kills conmon-rs and I'm not sure why. maybe something interacting with tokio?

@haircommander haircommander force-pushed the exit-monitor-2 branch 3 times, most recently from 034c220 to f62218d Compare December 3, 2021 17:53
@haircommander
Copy link
Collaborator Author

test failures were similar to kata-containers/kata-containers#1826 -- thanks kata folks for an inspiration for the fix

@haircommander haircommander force-pushed the exit-monitor-2 branch 4 times, most recently from 35192a5 to 2f0908e Compare December 3, 2021 19:15
waitpid(Pid::from_raw(-1), None)
};

let grandchildren = Arc::clone(&grandchildren);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This additional clone can be removed.

Suggested change
let grandchildren = Arc::clone(&grandchildren);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I don't have it I get

error[E0382]: use of moved value: `grandchildren`
  --> conmon-rs/server/src/child_reaper.rs:23:43
   |
23 | ...                   thread::spawn(move || {
   |                                     ^^^^^^^ value moved into closure here, in previous iteration of loop
24 | ...                       if let Err(e) = Self::forget_grandchild(grandchildren, grandchild_pid, exit_status) {
   |                                                                   ------------- use occurs due to use in closure
   |
   = note: move occurs because `grandchildren` has type `Arc<std::sync::Mutex<HashMap<i32, ReapableChild>>>`, which does not implement the `Copy` trait

// Inspired by
// https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27
let wait_status = {
let wait_lock = Arc::clone(&wait_lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this clone:

Suggested change
let wait_lock = Arc::clone(&wait_lock);

Comment on lines 63 to 84
pub fn create_child<P, I, S>(
&self,
cmd: P,
args: I,
) -> Result<std::process::ExitStatus, std::io::Error>
where
P: AsRef<std::ffi::OsStr>,
I: IntoIterator<Item = S>,
S: AsRef<std::ffi::OsStr>,
{
let mut cmd = std::process::Command::new(cmd);
cmd.args(args);

// To prevent an error: "No child processes (os error 10)",
// we prevent our reaper thread from calling waitpid until this child has completed.
// Inspired by
// https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27
let wait_lock = Arc::clone(&self.wait_lock);
let _lock = wait_lock.lock().unwrap();

cmd.spawn()?.wait()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be changed to avoid the unwrap. Also, I'm not sure if we really need the let wait_lock = Arc::clone(&self.wait_lock);:

diff --git a/conmon-rs/server/src/child_reaper.rs b/conmon-rs/server/src/child_reaper.rs
index d44756c9..4264ea7f 100644
--- a/conmon-rs/server/src/child_reaper.rs
+++ b/conmon-rs/server/src/child_reaper.rs
@@ -1,6 +1,6 @@
 //! Child process reaping and management.
 
-use anyhow::{bail, Result};
+use anyhow::{bail, format_err, Context, Result};
 use getset::Getters;
 use log::{debug, error};
 use nix::sys::signal::{kill, Signal};
@@ -60,11 +60,7 @@ impl ChildReaper {
         Ok(())
     }
 
-    pub fn create_child<P, I, S>(
-        &self,
-        cmd: P,
-        args: I,
-    ) -> Result<std::process::ExitStatus, std::io::Error>
+    pub fn create_child<P, I, S>(&self, cmd: P, args: I) -> Result<std::process::ExitStatus>
     where
         P: AsRef<std::ffi::OsStr>,
         I: IntoIterator<Item = S>,
@@ -77,10 +73,12 @@ impl ChildReaper {
         // we prevent our reaper thread from calling waitpid until this child has completed.
         // Inspired by
         // https://github.com/kata-containers/kata-containers/blob/828a304883a4/src/agent/src/signal.rs#L27
-        let wait_lock = Arc::clone(&self.wait_lock);
-        let _lock = wait_lock.lock().unwrap();
+        let _ = &self
+            .wait_lock
+            .lock()
+            .map_err(|e| format_err!("take wait lock: {}", e))?;
 
-        cmd.spawn()?.wait()
+        Ok(cmd.spawn()?.wait().context("wait for child")?)
     }
 
     pub fn watch_grandchild(&self, id: String, pid: i32, exit_paths: Vec<PathBuf>) -> Result<()> {

) -> Result<()> {
debug!("caught signal for pid {}", grandchild_pid);

let mut map = locked_map.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do not unwrap:

Suggested change
let mut map = locked_map.lock().unwrap();
let mut map = locked_grandchildren
.lock()
.map_err(|e| format_err!("lock grandchildren: {}", e))?;

to be able to catch children exiting

Signed-off-by: Peter Hunt <pehunt@redhat.com>
do so by moving runtime creation into child reaper and keeping a lock to synchronize

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@saschagrunert saschagrunert merged commit a19f44e into containers:main Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants