-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs in async code #753
base: master
Are you sure you want to change the base?
Conversation
1. Insert `-` in `echo -nE "$suggestion"`. This is necessary to prevent `"$suggestion"` from being treated as an option for `echo`. 2. Close file descriptors only in `_zsh_autosuggest_async_response` to ensure that each file descriptor is closed only once. It's the second bug that prompted the fix. The original code in some cases could close the same file descriptor twice. The code relied on an invalid assumption that `_zsh_autosuggest_async_response` cannot fire after the file descriptor is closed. Here's a demo that shows this assumption being violated: () { emulate -L zsh function callback1() { zle -I emulate -L zsh -o xtrace : "$@" zle -F $fd1 exec {fd1}>&- zle -F $fd2 exec {fd2}>&- } function callback2() { zle -I emulate -L zsh -o xtrace : "$@" } exec {fd1} </dev/null exec {fd2} </dev/null zle -F $fd1 callback1 zle -F $fd2 callback2 } And here's the output I get if the code is pasted into an interactive zsh: +callback1:3> : 12 +callback1:4> zle -F 12 +callback1:6> zle -F 13 +callback2:3> : 13 Note that `callback2` fires after its file descriptor has been closed by `callback1`. This bug was the culprit of several issues filed against powerlevel10k. In a nutshell: 1. `_zsh_autosuggest_async_request` opens a file. 2. `_zsh_autosuggest_async_request` closes the file descriptor. 3. powerlevel10k opens a file and gets the same file descriptor as above. 4. `_zsh_autosuggest_async_response` fires and closes the same file descriptor. 5. powerlevel10k encounters errors when trying to read from the file descriptor.
# have been forked by the suggestion strategy | ||
kill -TERM -$_ZSH_AUTOSUGGEST_CHILD_PID 2>/dev/null | ||
if (( _ZSH_AUTOSUGGEST_CHILD_PID )); then | ||
kill -TERM -- $_ZSH_AUTOSUGGEST_CHILD_PID 2>/dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've also removed the check of [[ -o MONITOR ]]
. Was that intentional and if so can you give some reasoning for it? I had never fully tested the different cases there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for MONITOR
is still there. It used to be performed before invoking kill
, which isn't quite right. I moved it to the proper point (when forking). This makes a difference if the option is changed between forking and killing. I should've this change in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping.
local suggestion | ||
_zsh_autosuggest_fetch_suggestion "$1" | ||
echo -nE - "$suggestion" | ||
) || return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romkatv Can you elaborate on the addition of the || return
added here and after read
and echo
? In what situations can exec
/read
/echo
have non-zero exit status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose one could read the source code of the builtins exec
/read
/echo
and find out which syscalls they rely on. Then read the source code or docs of all operating systems to figure out when those system calls can fail.
I think a better question is what the code should do when exec
/read
/echo
fail. Which variant of this code do you think behaves better in case of errors?
I have been trying to wrap my head around this today. I'm having trouble with the specifics of the situation you've outlined, but I do think I've found a (different?) way that this stuff can get borked stemming from As to the scenario you've outlined:
I'm not understanding how step 3 can occur here. The scenario I'm imagining from these steps is a user typing two characters in relatively quick succession. The first character fires off a child process to fetch a suggestion ( But then I don't understand how p10k could get the same fd that Would you be able to provide a more in-depth explanation? While I was looking into this, as mentioned above, I did see a problem that could arise in a different but related scenario. I was able to trigger it by enabling |
I've broken out some of your changes into a separate branch here: develop...fixes/romkatv-async-fixes I tacked the |
It might help if you take a look at the self-contained code snippet I posted together with its output. It basically shows that the main zle loop works as follows (pseudo code): while (true) {
// Get a list of all file descriptors that are ready for reading.
// This includes STDIN and all file descriptors registered via `zle -F`.
ready_fds = select();
if (ready_fds contains STDIN) {
// STDIN has input. Read it and invoke bound widgets.
read_keys_and_invoke_widgets();
}
// Invoke `zle -F` handlers for all ready file descriptors.
watches = get_watches_for_fds(ready_fds);
foreach (watch in watches) {
invoke_fd_watch(watch);
}
} If several file descriptors become ready for reading simultaneously, and the handler for the first of them closes the second file descriptor, the second handler will still get called. That's what the code snippet in the PR description demonstrates. Thus, when I took a cursory look at your changes and I don't think they fix this bug, or at least it's difficult for me to see that they do. In my code the correctness is easier to assess because 1) whenever a file descriptor is opened, a handler for it is registered; 2) the file descriptor is closed only from the handler. There is no room for a double close or fd leak: one function opens an fd and ensures that another function is eventually invoked; the second function closes the fd. |
I just realized that #630 contained this Please let me know if you see more issues and if so, it would help if you could provide some specific reproduction steps. Another thing that would help me in the future is to break the PR changes into more commits to make it clearer which changes are addressing which problems. |
I am positive that #630 does not fix the bug. I understand that it would be easier for you if I gave you a reproducible test case but it would take me a long time to build one given that we are talking about a race condition. Can you see that the existing code does not provide a guarantee that each fd is closed only once? Do you see that the same fd can be first closed by this line and then again by this line? For that to happen, |
Ok, thanks. I'm still trying to get my head around this. Is this an example of what you're thinking? Please correct/extend this example to illustrate the scenario you're thinking of and help me check my own assumptions: Some keystroke invokes async_request forking pid 1000 and opening fd 12 And then some time later, simultaneously:
Then in one iteration of the main zle loop:
I think I see your point here, and something should be done to fix this 👍 Though I still don't quite see how something else could open fd 12 in between points 1 and 2 above. I wonder if we have pivoted from the originally reported bug to something slightly different? |
Spot on! Arbitrary code can be executed between 1 and 2 by a widget wrapper. If you look at the code of my version rather than the diff, it might be easier to see its correctness. It should be possible to convince yourself that each fd is always closed exactly once. |
@ericfreese Wondering whether this will be merged, definitely still hitting this bug periodically. |
I can also confirm that the problem still exists.
This happens to me specifically. |
This PR fixes two bugs in async code.
-
inecho -nE "$suggestion"
. This is necessary to prevent"$suggestion"
from being treated as an option forecho
._zsh_autosuggest_async_response
to ensure that each file descriptor is closed only once.It's the second bug that prompted the fix. The original code in some cases could close the same file descriptor twice. The code relied on an invalid assumption that
_zsh_autosuggest_async_response
cannot fire after the file descriptor is closed. Here's a demo that shows this assumption being violated:And here's the output I get if the code is pasted into an interactive zsh:
Note that
callback2
fires after its file descriptor has been closed bycallback1
.This bug was the culprit of several issues filed against powerlevel10k. In a nutshell:
_zsh_autosuggest_async_request
opens a file._zsh_autosuggest_async_request
closes the file descriptor._zsh_autosuggest_async_response
fires and closes the same file descriptor.