Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pal] Enhance device "dev:tty" checking #1096

Closed
wants to merge 1 commit into from

Conversation

llly
Copy link
Contributor

@llly llly commented Jan 4, 2023

Description of the changes

After fork, fd in Pal for stdin/stdout is not 0/1.
During fork, PalSendHandle uses syscall sendmsg to send internal file handle(including stdin and stdout) from parent to child, SCM_RIGHTS is specified to pass file descriptors directly.

memcpy(CMSG_DATA(control_hdr), &cargo->generic.fd, sizeof(int));

memcpy(&handle->generic.fd, CMSG_DATA(control_hdr), sizeof(int));

The file descriptor numbers in the received message are reassigned.

A flag is_tty is added to mark the fd is tty. Use this flag instead of fd numbers to check tty.


This change is Reviewable

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly)

a discussion (no related file):
This PR is a draft because it depends on #827 ? (That PR assigns handle->dev.realpath for "non-stdin/stdout" devices.)


a discussion (no related file):

After fork and exec, fd in Pal for stdin/stdout is not 0/1.

Actually, why is this the case? Why the stdin/stdout FDs are reassigned?


Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This PR is a draft because it depends on #827 ? (That PR assigns handle->dev.realpath for "non-stdin/stdout" devices.)

Yes. But because realpath field is added to handle->dev structure in the PR.


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

After fork and exec, fd in Pal for stdin/stdout is not 0/1.

Actually, why is this the case? Why the stdin/stdout FDs are reassigned?

I'll have a look. Now I can see after fork, stdin and stdout fd in Pal are changed.


Copy link
Contributor

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, llly (Li Xun) wrote…

I'll have a look. Now I can see after fork, stdin and stdout fd in Pal are changed.

It seems more like a hack than a valid approach (or maybe I don't understand some part of it).
I think we should have a cleaner indication of this.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @oshogbo)

a discussion (no related file):

Previously, oshogbo (Mariusz Zaborski) wrote…

It seems more like a hack than a valid approach (or maybe I don't understand some part of it).
I think we should have a cleaner indication of this.

@oshogbo What exactly seems like a hack? The check for fd == 0 or fd == 1, or the check for realpath == NULL, or both of them? :)


Copy link
Contributor

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@oshogbo What exactly seems like a hack? The check for fd == 0 or fd == 1, or the check for realpath == NULL, or both of them? :)

Sorry for not being precise. I meant realpath == NULL.


Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Sorry for not being precise. I meant realpath == NULL.

I checked our code. fork uses sendmsg/recmsg with SCM_RIGHTS to pass file descriptors. When parent sends 7,11,14,4,0,1 fds, child receives 4,5,6,7,8,9. So 8/9 becomes stdin/stdout of child process.
So realpath == NULL is better than fd == 0 or fd == 1.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @oshogbo)

a discussion (no related file):

Previously, llly (Li Xun) wrote…

I checked our code. fork uses sendmsg/recmsg with SCM_RIGHTS to pass file descriptors. When parent sends 7,11,14,4,0,1 fds, child receives 4,5,6,7,8,9. So 8/9 becomes stdin/stdout of child process.
So realpath == NULL is better than fd == 0 or fd == 1.

Thanks @llly. That was my assumption too.

So we definitely need to go away from fd == 0/1. If @oshogbo doesn't like the use of realpath check, then we can introduce a new field like bool handle->dev.is_tty in pal_host.h:

struct {
PAL_IDX fd;
bool nonblocking;
} dev;

And then we can have checks like if (handle->dev.is_tty) { ... special case of "dev:tty" ... } else { ... other devices ... }

@oshogbo Would that be a better solution for you?


Copy link
Contributor

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks @llly. That was my assumption too.

So we definitely need to go away from fd == 0/1. If @oshogbo doesn't like the use of realpath check, then we can introduce a new field like bool handle->dev.is_tty in pal_host.h:

struct {
PAL_IDX fd;
bool nonblocking;
} dev;

And then we can have checks like if (handle->dev.is_tty) { ... special case of "dev:tty" ... } else { ... other devices ... }

@oshogbo Would that be a better solution for you?

IMHO much better approach.
It can be is_tty or we can have a type variable up to you.
For me checking realpath is slippery as some other fds also might not have realpath.


Add a flag `is_tty` to `PAL_HANDLE` for device. Device oprations check it
instead of fd to determine whether the handle is tty device because fd is
changed after fork.

Signed-off-by: Li, Xun <xun.li@intel.com>
@llly llly marked this pull request as ready for review September 1, 2023 07:36
Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)

a discussion (no related file):

Previously, oshogbo (Mariusz Zaborski) wrote…

IMHO much better approach.
It can be is_tty or we can have a type variable up to you.
For me checking realpath is slippery as some other fds also might not have realpath.

Done. Update to use is_tty. Now this PR is for master and ready for review.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly)


-- commits line 4 at r2:
typo operations. Can be fixed during final rebase.


-- commits line 6 at r2:
... because fd is changed after fork -> Previously, Gramine used the fd field for this, which is incorrect because fd numbers may change after fork. Can be fixed during final rebase.

@dimakuv
Copy link

dimakuv commented Sep 3, 2023

Jenkins, test this please

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly)


-- commits line 2 at r2:

Suggestion:

[PAL]

pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

        if (access == PAL_ACCESS_RDONLY) {
            hdl->flags |= PAL_HANDLE_FD_READABLE;
            hdl->dev.fd = 0; /* host stdin */

Hmm, how does this work if LibOS opens dev:tty in the child?
ditto for SGX PAL


pal/src/host/linux/pal_devices.c line 124 at r2 (raw file):

        ret = DO_SYSCALL(close, handle->dev.fd);
    }
    handle->dev.fd = PAL_IDX_POISON;

Isn't this whole logic still broken? Aren't we leaking an FD here in the child? (the stdin/stdout FD there is not 0/1, and the only way to reference it was this handle here?
ditto for SGX PAL

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Hmm, how does this work if LibOS opens dev:tty in the child?
ditto for SGX PAL

This should actually work as-is.

Gramine creates the child process using normal vfork() + execve(). The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).

So looks like 0 and 1 will always refer to stdin and stdout.

Plus additionally, the child process receives via SCM_RIGHTS the duplicates of 0 and 1 (e.g. 8 and 9). So stdin and stdout now can be accessed by both 0 and 1 and 8 and 9.

This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.

Proposed solutions:

  1. Keep as-is, since this is low priority, works and hopefully nobody messes with stdin/stdout too much.
  2. Upon child startup, always rewire the dev:tty Gramine device to 0 and 1 (ignore received 8 and 9).
  3. Upon child startup, close FDs 0 and 1 (then 8 and 9 are the only stdin/stdout FDs).

I would go with 2, but also I don't care much. This wasn't a show stopper for us, and the current PR should also work.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly and @mkow)


pal/src/host/linux/pal_devices.c line 124 at r2 (raw file):

Aren't we leaking an FD here in the child?

Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.

...and the only way to reference it was this handle here?

No, this is not the only way. See my other comment. FDs 0 and 1 are always present in the child.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @llly)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.

Do I understand correctly, that each time you fork+execve we leak two additional FDs?

The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).

Then what is this PR actually doing? I'm confused what's happening here :) Seems like the old code is wrong, but the new code is also wrong, just in a different way ;)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.

Do I understand correctly, that each time you fork+execve we leak two additional FDs?

The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).

Then what is this PR actually doing? I'm confused what's happening here :) Seems like the old code is wrong, but the new code is also wrong, just in a different way ;)

Yes, I think what @mkow wrote is correct -- we're fixing a wrong thing by doing another wrong thing.

@llly What was the exact problem in this code? Did some test or app break, or you just found it during manual review and debugging?

Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, I think what @mkow wrote is correct -- we're fixing a wrong thing by doing another wrong thing.

@llly What was the exact problem in this code? Did some test or app break, or you just found it during manual review and debugging?

We can check whether a non-0/1 fd is tty. I'll add dev_setlength and dev_map in shared memory PR, which should not be applied to tty device.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

We can check whether a non-0/1 fd is tty.

But that's not the essence of this whole logic in Gramine here. What the code tried to do (I think) was to preserve stdin/stdout FDs across all processes inside Gramine, regardless of what the app does. It doesn't matter if it's a tty or not, it should work the same if they are redirected to a file.

I'll add dev_setlength and dev_map in shared memory PR, which should not be applied to tty device.

Why in the shared mem PR? It seems unrelated?

Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):
Currently only tty fd can reach these operations, it doesn't matter even if we don't check anything.
Checking fd is tty or not is prerequisite for adding other device fd to pal dev. tty doesn't support pread/pwrite, mmap, etc. and typical devices does.

Why in the shared mem PR? It seems unrelated?

shared memory device file in /dev/shm/ need support for ftruncate and mmap. It also use same pal dev ops as tty.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Currently only tty fd can reach these operations, it doesn't matter even if we don't check anything.
Checking fd is tty or not is prerequisite for adding other device fd to pal dev. tty doesn't support pread/pwrite, mmap, etc. and typical devices does.

I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.

Why in the shared mem PR? It seems unrelated?

shared memory device file in /dev/shm/ need support for ftruncate and mmap. It also use same pal dev ops as tty.

But how's that related to what this PR does? I don't see why one would depend on the other.


pal/src/host/linux/pal_devices.c line 124 at r2 (raw file):

Aren't we leaking an FD here in the child?

Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.

Don't they leak on every fork or execve?

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)


pal/src/host/linux/pal_devices.c line 124 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Aren't we leaking an FD here in the child?

Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.

Don't they leak on every fork or execve?

Ok, seems it actually needs close+open. But still, it's a leak, just rare.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Gramine creates the child process using normal vfork() + execve(). The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).

UPDATE: I was wrong.

In the child process, Gramine will not re-wire in-Gramine FDs 0 and 1 to the host FDs 0 and 1. This is because we have these checks in the code:

So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.

In the child processes, Gramine will not wire in-Gramine's FD 0 to host FD 0. Instead, the checkpoint-restore logic will kick in: in-Gramine's FD 0 will be wired to e.g. the host FD 8 (the one that was sent via sendmsg(SCM_RIGHTS)).

Actually, the current PR is not bad. Yes, we seem to leak host FDs 0, 1, and 2, but this leak is bounded.

Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.

I understand your comment now. There are two different kinds of checking, one is stdin/stdout opened/sent by LibOS need to be preserved or not, the other is tty or other device. This PR should focus on the first one.

But how's that related to what this PR does

Right. Not related.

So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.

Wont go so far. Another check returns
if (thread->handle_map)
return 0;

Copy link
Contributor Author

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)


pal/src/host/linux/pal_devices.c line 46 at r2 (raw file):

Previously, llly (Li Xun) wrote…

I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.

I understand your comment now. There are two different kinds of checking, one is stdin/stdout opened/sent by LibOS need to be preserved or not, the other is tty or other device. This PR should focus on the first one.

But how's that related to what this PR does

Right. Not related.

So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.

Wont go so far. Another check returns
if (thread->handle_map)
return 0;

if (thread->handle_map)
return 0;


pal/src/host/linux/pal_devices.c line 124 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ok, seems it actually needs close+open. But still, it's a leak, just rare.

I had a try to remove the fd 0/1 or tty checking in dev_close. It works well.
LibOS has handled stdin/stdout correctly. They are closed only when LibOS exit, but not when user calls close(1). return values and prints of fprintf(stdout/stderr) are also correct after user calls close(1/2).

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)

a discussion (no related file):
Ok, I refactored the whole /dev/tty aka dev:tty aka stdin/stdout/stderr in this PR: #1563.

We should review my new PR, and this PR can be closed.


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)


pal/src/host/linux/pal_host.h line 54 at r2 (raw file):

        struct {
            PAL_IDX fd;
            bool is_tty;

Ahh, wait, I think I got confused by this variable. It's not saying whether the handle points to a tty, but whether it points to the stdio of the first Gramine process. This is name is IMO super confusing.

Anyways, I hope #1563 will clean up this whole thing in a better way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants