-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pal] Enhance device "dev:tty" checking #1096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly)
a discussion (no related file):
This PR is a draft because it depends on #827 ? (That PR assigns handle->dev.realpath
for "non-stdin/stdout" devices.)
a discussion (no related file):
After fork and exec, fd in Pal for stdin/stdout is not 0/1.
Actually, why is this the case? Why the stdin/stdout FDs are reassigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This PR is a draft because it depends on #827 ? (That PR assigns
handle->dev.realpath
for "non-stdin/stdout" devices.)
Yes. But because realpath
field is added to handle->dev
structure in the PR.
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
After fork and exec, fd in Pal for stdin/stdout is not 0/1.
Actually, why is this the case? Why the stdin/stdout FDs are reassigned?
I'll have a look. Now I can see after fork
, stdin and stdout fd in Pal are changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, llly (Li Xun) wrote…
I'll have a look. Now I can see after
fork
, stdin and stdout fd in Pal are changed.
It seems more like a hack than a valid approach (or maybe I don't understand some part of it).
I think we should have a cleaner indication of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @oshogbo)
a discussion (no related file):
Previously, oshogbo (Mariusz Zaborski) wrote…
It seems more like a hack than a valid approach (or maybe I don't understand some part of it).
I think we should have a cleaner indication of this.
@oshogbo What exactly seems like a hack? The check for fd == 0 or fd == 1
, or the check for realpath == NULL
, or both of them? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@oshogbo What exactly seems like a hack? The check for
fd == 0 or fd == 1
, or the check forrealpath == NULL
, or both of them? :)
Sorry for not being precise. I meant realpath == NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, oshogbo (Mariusz Zaborski) wrote…
Sorry for not being precise. I meant
realpath == NULL
.
I checked our code. fork
uses sendmsg/recmsg
with SCM_RIGHTS
to pass file descriptors. When parent sends 7,11,14,4,0,1 fds, child receives 4,5,6,7,8,9. So 8/9 becomes stdin/stdout of child process.
So realpath == NULL
is better than fd == 0
or fd == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @oshogbo)
a discussion (no related file):
Previously, llly (Li Xun) wrote…
I checked our code.
fork
usessendmsg/recmsg
withSCM_RIGHTS
to pass file descriptors. When parent sends 7,11,14,4,0,1 fds, child receives 4,5,6,7,8,9. So 8/9 becomes stdin/stdout of child process.
Sorealpath == NULL
is better thanfd == 0
orfd == 1
.
Thanks @llly. That was my assumption too.
So we definitely need to go away from fd == 0/1
. If @oshogbo doesn't like the use of realpath
check, then we can introduce a new field like bool handle->dev.is_tty
in pal_host.h
:
gramine/pal/src/host/linux/pal_host.h
Lines 52 to 55 in be3331f
struct { | |
PAL_IDX fd; | |
bool nonblocking; | |
} dev; |
And then we can have checks like if (handle->dev.is_tty) { ... special case of "dev:tty" ... } else { ... other devices ... }
@oshogbo Would that be a better solution for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Thanks @llly. That was my assumption too.
So we definitely need to go away from
fd == 0/1
. If @oshogbo doesn't like the use ofrealpath
check, then we can introduce a new field likebool handle->dev.is_tty
inpal_host.h
:gramine/pal/src/host/linux/pal_host.h
Lines 52 to 55 in be3331f
struct { PAL_IDX fd; bool nonblocking; } dev; And then we can have checks like
if (handle->dev.is_tty) { ... special case of "dev:tty" ... } else { ... other devices ... }
@oshogbo Would that be a better solution for you?
IMHO much better approach.
It can be is_tty
or we can have a type variable up to you.
For me checking realpath
is slippery as some other fds also might not have realpath.
Add a flag `is_tty` to `PAL_HANDLE` for device. Device oprations check it instead of fd to determine whether the handle is tty device because fd is changed after fork. Signed-off-by: Li, Xun <xun.li@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv)
a discussion (no related file):
Previously, oshogbo (Mariusz Zaborski) wrote…
IMHO much better approach.
It can beis_tty
or we can have a type variable up to you.
For me checkingrealpath
is slippery as some other fds also might not have realpath.
Done. Update to use is_tty
. Now this PR is for master and ready for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly)
-- commits
line 4 at r2:
typo operations
. Can be fixed during final rebase.
-- commits
line 6 at r2:
... because fd is changed after fork
-> Previously, Gramine used the fd field for this, which is incorrect because fd numbers may change after fork.
Can be fixed during final rebase.
Jenkins, test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly)
Suggestion:
[PAL]
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
if (access == PAL_ACCESS_RDONLY) { hdl->flags |= PAL_HANDLE_FD_READABLE; hdl->dev.fd = 0; /* host stdin */
Hmm, how does this work if LibOS opens dev:tty
in the child?
ditto for SGX PAL
pal/src/host/linux/pal_devices.c
line 124 at r2 (raw file):
ret = DO_SYSCALL(close, handle->dev.fd); } handle->dev.fd = PAL_IDX_POISON;
Isn't this whole logic still broken? Aren't we leaking an FD here in the child? (the stdin/stdout FD there is not 0/1, and the only way to reference it was this handle here?
ditto for SGX PAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, how does this work if LibOS opens
dev:tty
in the child?
ditto for SGX PAL
This should actually work as-is.
Gramine creates the child process using normal vfork() + execve()
. The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).
So looks like 0 and 1 will always refer to stdin and stdout.
Plus additionally, the child process receives via SCM_RIGHTS
the duplicates of 0 and 1 (e.g. 8 and 9). So stdin and stdout now can be accessed by both 0 and 1 and 8 and 9.
This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.
Proposed solutions:
- Keep as-is, since this is low priority, works and hopefully nobody messes with stdin/stdout too much.
- Upon child startup, always rewire the
dev:tty
Gramine device to 0 and 1 (ignore received 8 and 9). - Upon child startup, close FDs 0 and 1 (then 8 and 9 are the only stdin/stdout FDs).
I would go with 2, but also I don't care much. This wasn't a show stopper for us, and the current PR should also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @llly and @mkow)
pal/src/host/linux/pal_devices.c
line 124 at r2 (raw file):
Aren't we leaking an FD here in the child?
Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.
...and the only way to reference it was this handle here?
No, this is not the only way. See my other comment. FDs 0 and 1 are always present in the child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @llly)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.
Do I understand correctly, that each time you fork+execve we leak two additional FDs?
The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).
Then what is this PR actually doing? I'm confused what's happening here :) Seems like the old code is wrong, but the new code is also wrong, just in a different way ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This is confusing, but this seems to work (I guess that's why we don't see any tests breaking). The only problem here is that Gramine indeed leaks stdin/stdout FDs.
Do I understand correctly, that each time you fork+execve we leak two additional FDs?
The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).
Then what is this PR actually doing? I'm confused what's happening here :) Seems like the old code is wrong, but the new code is also wrong, just in a different way ;)
Yes, I think what @mkow wrote is correct -- we're fixing a wrong thing by doing another wrong thing.
@llly What was the exact problem in this code? Did some test or app break, or you just found it during manual review and debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Yes, I think what @mkow wrote is correct -- we're fixing a wrong thing by doing another wrong thing.
@llly What was the exact problem in this code? Did some test or app break, or you just found it during manual review and debugging?
We can check whether a non-0/1 fd is tty. I'll add dev_setlength
and dev_map
in shared memory PR, which should not be applied to tty device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
We can check whether a non-0/1 fd is tty.
But that's not the essence of this whole logic in Gramine here. What the code tried to do (I think) was to preserve stdin/stdout FDs across all processes inside Gramine, regardless of what the app does. It doesn't matter if it's a tty or not, it should work the same if they are redirected to a file.
I'll add
dev_setlength
anddev_map
in shared memory PR, which should not be applied to tty device.
Why in the shared mem PR? It seems unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Currently only tty fd can reach these operations, it doesn't matter even if we don't check anything.
Checking fd is tty or not is prerequisite for adding other device fd to pal dev. tty doesn't support pread/pwrite, mmap, etc. and typical devices does.
Why in the shared mem PR? It seems unrelated?
shared memory device file in /dev/shm/
need support for ftruncate and mmap. It also use same pal dev ops as tty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Currently only tty fd can reach these operations, it doesn't matter even if we don't check anything.
Checking fd is tty or not is prerequisite for adding other device fd to pal dev. tty doesn't support pread/pwrite, mmap, etc. and typical devices does.
I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.
Why in the shared mem PR? It seems unrelated?
shared memory device file in
/dev/shm/
need support for ftruncate and mmap. It also use same pal dev ops as tty.
But how's that related to what this PR does? I don't see why one would depend on the other.
pal/src/host/linux/pal_devices.c
line 124 at r2 (raw file):
Aren't we leaking an FD here in the child?
Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.
Don't they leak on every fork or execve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)
pal/src/host/linux/pal_devices.c
line 124 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Aren't we leaking an FD here in the child?
Yes, we are leaking FDs here. On the other hand, this is rather benign, as programs don't tend to close and open std streams as crazy.
Don't they leak on every fork or execve?
Ok, seems it actually needs close+open. But still, it's a leak, just rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Gramine creates the child process using normal
vfork() + execve()
. The FDs 0 and 1 are kept in the child (because there is no reason why they wouldn't survive across fork/execve).
UPDATE: I was wrong.
In the child process, Gramine will not re-wire in-Gramine FDs 0 and 1 to the host FDs 0 and 1. This is because we have these checks in the code:
gramine/libos/src/bookkeep/libos_handle.c
Line 182 in eed050c
if (!HANDLE_ALLOCATED(handle_map->map[0])) { gramine/libos/src/bookkeep/libos_handle.c
Line 200 in eed050c
if (!HANDLE_ALLOCATED(handle_map->map[1])) { gramine/libos/src/bookkeep/libos_handle.c
Line 218 in eed050c
if (!HANDLE_ALLOCATED(handle_map->map[2])) {
So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.
In the child processes, Gramine will not wire in-Gramine's FD 0 to host FD 0. Instead, the checkpoint-restore logic will kick in: in-Gramine's FD 0 will be wired to e.g. the host FD 8 (the one that was sent via sendmsg(SCM_RIGHTS)
).
Actually, the current PR is not bad. Yes, we seem to leak host FDs 0, 1, and 2, but this leak is bounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.
I understand your comment now. There are two different kinds of checking, one is stdin/stdout opened/sent by LibOS need to be preserved or not, the other is tty or other device. This PR should focus on the first one.
But how's that related to what this PR does
Right. Not related.
So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.
Wont go so far. Another check returns
if (thread->handle_map)
return 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @mkow)
pal/src/host/linux/pal_devices.c
line 46 at r2 (raw file):
Previously, llly (Li Xun) wrote…
I'm sorry, but I don't understand this answer. Do you understand why and how Gramine is preserving the stdin/stdout descriptors on the host? Please also see my previous message from this discussion.
I understand your comment now. There are two different kinds of checking, one is stdin/stdout opened/sent by LibOS need to be preserved or not, the other is tty or other device. This PR should focus on the first one.
But how's that related to what this PR does
Right. Not related.
So these checks will be true in the parent (first) process, and Gramine will wire in-Gramine's FD 0 to host FD 0; same for 1 and 2.
Wont go so far. Another check returns
if (thread->handle_map)
return 0;
gramine/libos/src/bookkeep/libos_handle.c
Lines 155 to 156 in eed050c
if (thread->handle_map) | |
return 0; |
pal/src/host/linux/pal_devices.c
line 124 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Ok, seems it actually needs close+open. But still, it's a leak, just rare.
I had a try to remove the fd 0/1 or tty checking in dev_close
. It works well.
LibOS has handled stdin/stdout correctly. They are closed only when LibOS exit, but not when user calls close(1)
. return values and prints of fprintf(stdout/stderr)
are also correct after user calls close(1/2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @llly and @mkow)
a discussion (no related file):
Ok, I refactored the whole /dev/tty
aka dev:tty
aka stdin/stdout/stderr in this PR: #1563.
We should review my new PR, and this PR can be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv and @llly)
pal/src/host/linux/pal_host.h
line 54 at r2 (raw file):
struct { PAL_IDX fd; bool is_tty;
Ahh, wait, I think I got confused by this variable. It's not saying whether the handle points to a tty, but whether it points to the stdio of the first Gramine process. This is name is IMO super confusing.
Anyways, I hope #1563 will clean up this whole thing in a better way.
Description of the changes
After
fork
,fd
in Pal for stdin/stdout is not0
/1
.During
fork
,PalSendHandle
uses syscallsendmsg
to send internal file handle(including stdin and stdout) from parent to child,SCM_RIGHTS
is specified to pass file descriptors directly.gramine/pal/src/host/linux/pal_streams.c
Line 210 in be3331f
gramine/pal/src/host/linux/pal_streams.c
Line 285 in be3331f
The file descriptor numbers in the received message are reassigned.
A flag
is_tty
is added to mark the fd is tty. Use this flag instead of fd numbers to check tty.This change is