Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

NullVoxPopuli · 2024-11-18T17:06:22Z

Verify canary release

I verified that the issue exists in the latest Turborepo canary release.

Link to code that reproduces this issue

I think: all turbo projects running turbo while in interactive-rebase.

This is a pretty bad bug, because MacOS only has a limit of ~ 5600 processes, and once you hit that, you can't spawn terminals, can't open apps, can't create new tabs in the browser, can't run ps, even.

You have to have already had activity monitor (or similar) open so that you can kill the turbo daemon process. Else you may be forced to reboot.

Which canary version will you have in your reproduction?

2.3.1-canary.0

Enviroment information

❯ pnpm turbo info
turbo 2.3.1-canary.0

CLI:
   Version: 2.3.1-canary.0
   Path to executable: <.pnpm>/[email protected]/node_modules/turbo-darwin-arm64/bin/turbo
   Daemon status: Running
   Package manager: pnpm9

Platform:
   Architecture: aarch64
   Operating system: macos
   WSL: false
   Available memory (MB): 10455
   Available CPU cores: 12

Environment:
   CI: None
   Terminal (TERM): alacritty
   Terminal program (TERM_PROGRAM): unknown
   Terminal program version (TERM_PROGRAM_VERSION): unknown
   Shell (SHELL): /opt/homebrew/Cellar/bash/5.2.32/bin/bash
   stdin: false

Setup, check processes:

ps -ef | grep defunct | wc -l
# 1 or 2

Normally, an OS should be around < 1000 processes:

ps -ef | wc -l
# I usually hover around 600 to 800

Scenario A (inconsistent)

be in interactive rebase
(I'm splitting commits into more commits)
have prepare or postinstall trigger turbo's build
run turbo again (maybe for lint, or whatever)

Scenario B (inconsistent)

after changing a dependency of a package

Test:

ps -ef | grep defunct | wc -l
# 807

Test after upgrading to latest canary (noting that we run build in postinstall):

❯ ps -ef | grep defunct | wc -l
#    1435

I have an ongoing monitor for this running every second in a terminal that I just leave up all the time.

❯ watch -n 1 "echo \"All: \$(ps -ef | wc -l), Defunct: \$(ps -ef | grep defunct | wc -l)\""

And with pstree we can see that these all come from turbo

# get a list of all unique parent processes for each defunct process
❯ ps -ef | grep defunct | awk '{print $3}' | sort -u

# pass each of these to pstree
while IFS= read -r pid; do
    pstree -p $pid
done <<< $(ps -ef | grep defunct | awk '{print $3}' | sort -u)

Which will print something like this:

-+= 00001 root /sbin/launchd
 \-+= 11557 $USER /opt/homebrew/opt/borders/bin/borders
   \--- 11558 $USER <defunct>
-+= 00001 root /sbin/launchd
 \-+= 43271 $USER <.pnpm>/[email protected]/node_modules/turbo-darwin-arm64/bin/turbo --skip-infer daemon
   |--- 43359 $USER <defunct>
   |--- 43361 $USER <defunct>
   # and a few many hundred more
   \--- 57042 $USER <defunct>

Expected behavior

no defunct processes exist ever, as the OS will not halt these.

Actual behavior

defunct processes are left laying around.

To Reproduce

It's possible this is reproducible in these OSS repos:

I somewhat regularly have to kill the top level turbo daemon on Linux due to CPU usage -- but it's maybe possible that the reason for that is the same root reason that is causing me to observe the behavior that has resulted in me reporting this issue for MacOS.

In both cases, Linux (where I do most of my OSS) and Mac (where I do my closed-source employer-owned work), Killing the turbo daemon processes immediately makes any of my machines happier -- cleaning up defunct processes (macos) or freeing up cpu cycles (linux)

Additional context

No response

The text was updated successfully, but these errors were encountered:

wagenet · 2024-11-18T17:28:12Z

We've seen this on other developer machines at my company as well.

chris-olszewski · 2024-11-18T18:13:27Z

If either of you could share daemon logs (turbo daemon status should display the logfile) that would be helpful. We should not be spawning child processes from the daemon.

NullVoxPopuli · 2024-11-19T21:42:05Z

Here is what I got:

❯ pnpm turbo daemon status
# ...
✓ daemon is running
log file: <repo>/.turbo/daemon/e224a4a441d772ef-turbo.log.2024-11-19
uptime: 16m 6s 566mss
pid file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.pid
socket file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.sock

ope, big file

there is a lot of text

There was a problem saving your comment. 
Your comment is too long (maximum is 65536 characters). 
Please try again.

oops 🙈

here is a file tho

output.txt

as I was poking around in here, I noticed there was a lot of activity from watchman cookies.

NullVoxPopuli · 2024-11-22T17:00:32Z

It seems this is happening nearly daily for me -- can't really pinpoint what is causing the defunct processes to show up. In Activity Monitor, I do occasionally see > 20 git processes spawn, and then go away -- maybe related? idk.

NullVoxPopuli · 2024-12-04T22:44:31Z

We are trying setting https://turbo.build/repo/docs/reference/configuration#daemon to false for the time being. 🤞

### Description In the case of an error when parsing `git` output. We would drop a `Child` without `wait`ing on it which results in a zombie process as the pid is never reaped. From [Rust docs](https://doc.rust-lang.org/std/process/struct.Child.html#warning) > On some systems, calling [wait](https://doc.rust-lang.org/std/process/struct.Child.html#method.wait) or similar is necessary for the OS to release resources. A process that terminated but has not been waited on is still around as a “zombie”. Leaving too many zombies around may exhaust global resources (for example process IDs). > The standard library does not automatically wait on child processes (not even if the Child is dropped), it is up to the application developer to do so. As a consequence, dropping Child handles without waiting on them first is not recommended in long-running applications. When there was a parse error we would `kill` the child process, but never reap the pid. This PR ensures we make a best effort to do just that. The way I'm calling wait is probably overkill, but I wanted to ensure that we don't introduce any accidental waiting on a process that didn't receive the kill signal. Sources for comments: - [unix](https://man7.org/linux/man-pages/man2/kill.2.html) - [windows](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminateprocess#return-value) ### Testing Instructions I have done some manual confirmation that this works for a command like `bash -c "sleep 100"` where it will Hoping to get someone from #9455 to test this out in a canary and confirm this helps.

chris-olszewski · 2024-12-05T13:44:21Z

It seems this is happening nearly daily for me -- can't really pinpoint what is causing the defunct processes to show up. In Activity Monitor, I do occasionally see > 20 git processes spawn, and then go away -- maybe related? idk.

Thank you so much for this comment! I didn't realize the daemon shelled out to git and there was in fact a bug where those git processes weren't getting reaped.

We should be correctly reaping child git processes with #9564 which is being released in 2.3.4-canary.1 which I will cut today. I would greatly appreciate if you could test it out.

NullVoxPopuli · 2024-12-06T19:44:54Z

Thanks, @chris-olszewski !

I've tested with 2.3.4-canary.2
and had a control as well to verify defunct processes were still getting created (yay git worktrees!)

So far, I've not seen any defunct processes spawn from the canary.2

pending

Process

I'm watching total process count vs defunct count via

watch -n 1 "echo \"All: \$(ps -ef | wc -l), Defunct: \$(ps -ef | grep defunct | wc -l)\""

looks like this:

The command I'm running is pnpm build --no-cache --force so turbo actually does stuff 😉 (too efficient otherwise!). We have a wrapper CLI that mixes in some environment variables, flags, and handles whether or not to reach out to the remote cache with a custom AWS S3 SSO

pnpm turbo --color --no-update-notifier \
  --env-mode=loose --summarize=true --output-logs=new-only \
  _:build \ # We use a _: prefix because we need to define "build" in the package.json, but also want `build` in each package to go through turbo
  --filter=./libraries/**/* --no-cache --force

In my two branches, I've removed "daemon": false from the turbo.json at the root of the repo
I'm running the pnpm build --no-cache --force command 4 times to make sure behavior is consistent. Each time I run it, I make note of the total processes before and after, as well as defunct processes.

Starting with a fresh rebase on the main branch so I don't have any local caches, deleted node_modules, etc

# once 
killall turbo
git fetch origin

# each branch / worktree
get rebase origin/default-branch-name
nuke # local recursive clean script here:  https://github.com/NullVoxPopuli/dotfiles/blob/323173c6042882a17079bccca7149985038dd1b6/home/scripts/bash-support/aliases.sh#L8
pnpm install # runs an initial build via postinstall

Results

Baseline env

All Processes	Defunct Processes
659	6

Note that total process count will fluctuate a bit, because the OS does do things. 🙈

The following tables will use the format [starting process count, ending process count]
example: [659, 656] would mean that before I ran the build command, we started with 659 total processes and ended up with 656)

control branch with `turbo` @ `2.3.3`

expected outcome: defunct processes spawn

Run	All Processes	Defunct Processes
1	[657, ]	[6, ]
1	[657, ]	[6, ]
1	[657, ]	[6, ]
1	[657, ]	[6, ]

branch with `turbo` @ `2.3.4-canary.2`

expected outcome: defunct process count does not grow at all, for the entirety of the duration of the command

Run	All Processes	Defunct Processes
1	[657, ]	[6, ]
1	[657, ]	[6, ]
1	[657, ]	[6, ]
1	[657, ]	[6, ]

I need to wait for one of my worktrees to reproduce the issue before I collect data.

Been trying to re-create the situation manually, but it's clear I still don't know the right order operations to reproduce the defunct spawning problem.

anthonyshew · 2024-12-06T21:49:49Z

Awesome work, folks. Thank you, @NullVoxPopuli, for your thoroughness.

NullVoxPopuli added kind: bug Something isn't working needs: triage New issues get this label. Remove it after triage labels Nov 18, 2024

chris-olszewski removed the needs: triage New issues get this label. Remove it after triage label Nov 18, 2024

anthonyshew mentioned this issue Nov 23, 2024

Turbo daemon uses 100% CPU even when no tasks are running #8122

Open

1 task

chris-olszewski mentioned this issue Dec 5, 2024

chore(scm): avoid dropping child before wait #9564

Merged

anthonyshew closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

NullVoxPopuli commented Nov 18, 2024 •

edited

Loading

wagenet commented Nov 18, 2024

chris-olszewski commented Nov 18, 2024

NullVoxPopuli commented Nov 19, 2024 •

edited

Loading

NullVoxPopuli commented Nov 22, 2024

NullVoxPopuli commented Dec 4, 2024

chris-olszewski commented Dec 5, 2024

NullVoxPopuli commented Dec 6, 2024

Process

Results

Baseline env

control branch with `turbo` @ `2.3.3`

branch with `turbo` @ `2.3.4-canary.2`

anthonyshew commented Dec 6, 2024

Turbo daemon creates / leaves a ton of <defunct> processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

Turbo daemon creates / leaves a ton of <defunct> processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

Comments

NullVoxPopuli commented Nov 18, 2024 • edited Loading

Verify canary release

Link to code that reproduces this issue

Which canary version will you have in your reproduction?

Enviroment information

Expected behavior

Actual behavior

To Reproduce

Additional context

wagenet commented Nov 18, 2024

chris-olszewski commented Nov 18, 2024

NullVoxPopuli commented Nov 19, 2024 • edited Loading

NullVoxPopuli commented Nov 22, 2024

NullVoxPopuli commented Dec 4, 2024

chris-olszewski commented Dec 5, 2024

NullVoxPopuli commented Dec 6, 2024

Process

Results

Baseline env

control branch with turbo @ 2.3.3

branch with turbo @ 2.3.4-canary.2

anthonyshew commented Dec 6, 2024

Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

NullVoxPopuli commented Nov 18, 2024 •

edited

Loading

NullVoxPopuli commented Nov 19, 2024 •

edited

Loading

control branch with `turbo` @ `2.3.3`

branch with `turbo` @ `2.3.4-canary.2`