-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turbo daemon creates / leaves a ton of <defunct>
processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes.
#9455
Comments
We've seen this on other developer machines at my company as well. |
If either of you could share daemon logs ( |
Here is what I got: ❯ pnpm turbo daemon status
# ...
✓ daemon is running
log file: <repo>/.turbo/daemon/e224a4a441d772ef-turbo.log.2024-11-19
uptime: 16m 6s 566mss
pid file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.pid
socket file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.sock ope, big filethere is a lot of text
oops 🙈 here is a file tho as I was poking around in here, I noticed there was a lot of activity from watchman cookies. |
It seems this is happening nearly daily for me -- can't really pinpoint what is causing the defunct processes to show up. In Activity Monitor, I do occasionally see > 20 git processes spawn, and then go away -- maybe related? idk. |
<defunct>
processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes.<defunct>
processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes.
We are trying setting https://turbo.build/repo/docs/reference/configuration#daemon to false for the time being. 🤞 |
### Description In the case of an error when parsing `git` output. We would drop a `Child` without `wait`ing on it which results in a zombie process as the pid is never reaped. From [Rust docs](https://doc.rust-lang.org/std/process/struct.Child.html#warning) > On some systems, calling [wait](https://doc.rust-lang.org/std/process/struct.Child.html#method.wait) or similar is necessary for the OS to release resources. A process that terminated but has not been waited on is still around as a “zombie”. Leaving too many zombies around may exhaust global resources (for example process IDs). > The standard library does not automatically wait on child processes (not even if the Child is dropped), it is up to the application developer to do so. As a consequence, dropping Child handles without waiting on them first is not recommended in long-running applications. When there was a parse error we would `kill` the child process, but never reap the pid. This PR ensures we make a best effort to do just that. The way I'm calling wait is probably overkill, but I wanted to ensure that we don't introduce any accidental waiting on a process that didn't receive the kill signal. Sources for comments: - [unix](https://man7.org/linux/man-pages/man2/kill.2.html) - [windows](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminateprocess#return-value) ### Testing Instructions I have done some manual confirmation that this works for a command like `bash -c "sleep 100"` where it will Hoping to get someone from #9455 to test this out in a canary and confirm this helps.
Thank you so much for this comment! I didn't realize the daemon shelled out to We should be correctly reaping child |
Thanks, @chris-olszewski ! I've tested with 2.3.4-canary.2 So far, I've not seen any defunct processes spawn from the canary.2 pendingProcess
ResultsBaseline env
Note that total process count will fluctuate a bit, because the OS does do things. 🙈 The following tables will use the format [starting process count, ending process count] control branch with
|
Run | All Processes | Defunct Processes |
---|---|---|
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
branch with turbo
@ 2.3.4-canary.2
expected outcome: defunct process count does not grow at all, for the entirety of the duration of the command
Run | All Processes | Defunct Processes |
---|---|---|
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
1 | [657, ] | [6, ] |
I need to wait for one of my worktrees to reproduce the issue before I collect data.
Been trying to re-create the situation manually, but it's clear I still don't know the right order operations to reproduce the defunct spawning problem.
Awesome work, folks. Thank you, @NullVoxPopuli, for your thoroughness. |
Verify canary release
Link to code that reproduces this issue
I think: all turbo projects running turbo while in interactive-rebase.
This is a pretty bad bug, because MacOS only has a limit of ~ 5600 processes, and once you hit that, you can't spawn terminals, can't open apps, can't create new tabs in the browser, can't run
ps
, even.You have to have already had activity monitor (or similar) open so that you can kill the
turbo
daemon process. Else you may be forced to reboot.Which canary version will you have in your reproduction?
2.3.1-canary.0
Enviroment information
Setup, check processes:
Normally, an OS should be around < 1000 processes:
Scenario A (inconsistent)
(I'm splitting commits into more commits)
Scenario B (inconsistent)
Test:
Test after upgrading to latest canary (noting that we run
build
in postinstall):I have an ongoing monitor for this running every second in a terminal that I just leave up all the time.
❯ watch -n 1 "echo \"All: \$(ps -ef | wc -l), Defunct: \$(ps -ef | grep defunct | wc -l)\""
And with pstree we can see that these all come from turbo
Which will print something like this:
Expected behavior
no defunct processes exist ever, as the OS will not halt these.
Actual behavior
defunct processes are left laying around.
To Reproduce
It's possible this is reproducible in these OSS repos:
I somewhat regularly have to kill the top level
turbo
daemon on Linux due to CPU usage -- but it's maybe possible that the reason for that is the same root reason that is causing me to observe the behavior that has resulted in me reporting this issue for MacOS.In both cases, Linux (where I do most of my OSS) and Mac (where I do my closed-source employer-owned work), Killing the turbo daemon processes immediately makes any of my machines happier -- cleaning up defunct processes (macos) or freeing up cpu cycles (linux)
Additional context
No response
The text was updated successfully, but these errors were encountered: