In https://github.com/golang/go/issues/74187#issuecomment-2992284502, @n2vi reported a large number of idle go tool trace .../flightrecordNNNN
processes piling up on their openbsd-ppc64 builder.
In https://groups.google.com/g/golang-dev/c/p5eBs5k-ZXU/m/-bZVcRVDAgAJ, @mengzhuo reports the same thing on their riscv64 builders.
In a one-off check, @dmitshur sees the same thing on linux-arm builders.
From a quick search, the only place I see files created with a name like flightrecord
is in gopls: https://cs.opensource.google/go/x/tools/+/master:gopls/internal/debug/flight.go;l=50;drc=0d2de46602636e75ef9960a2b53202dc34827ca8. So I suspect that gopls is the source of these left over processes.
This code intentionally avoids waiting on the subprocess, instead saying it relies on the child receiving SIGPIPE receiving when the parent exits. Perhaps this isn't working in some cases? Or the child is not actually exiting when it receives SIGPIPE?
cc @golang/tools-team @golang/release
Comment From: prattmic
I haven't double-checked the actual behavior, but from https://man7.org/linux/man-pages/man7/pipe.7.html it sounds like SIGPIPE is only generated if you actually try to write to the pipe. go tool trace
won't write to stdout/stderr after initialization, so perhaps this is the problem?
Comment From: gabyhelp
Related Issues
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: gopherbot
Change https://go.dev/cl/689195 mentions this issue: gopls/internal/test/integration/web: kill "go tool trace" processes
Comment From: prattmic
@mengzhuo @n2vi Alan's CL will hopefully resolve this going forward, but the old processes left over from before still need to be manually killed.
Comment From: n2vi
I'm still seeing new processes accumulate on my builder. I'm not sure how long before Alan's fix percolates through.
On Mon, Jul 21, 2025 at 8:25 AM Michael Pratt @.***> wrote:
prattmic left a comment (golang/go#74668) https://github.com/golang/go/issues/74668#issuecomment-3097236335
@mengzhuo https://github.com/mengzhuo @n2vi https://github.com/n2vi Alan's CL will hopefully resolve this going forward, but the old processes left over from before still need to be manually killed.
— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/74668#issuecomment-3097236335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACADPOQ4FKYNCCUBGZ5SJY33JUA6NAVCNFSM6AAAAACB3ILINOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJXGIZTMMZTGU . You are receiving this because you were mentioned.Message ID: @.***>
Comment From: adonovan
It should have taken effect by now. If you're still seeing new processes, I think my test has other problems. I could suppress it on it on platforms other than darwin and linux, since that's all we develop on, and this feature is just for our use.
Sorry for the churn.
Comment From: n2vi
Up to you; I'd be inclined to nail down the exact cause so we can learn from it, but I'm way too far from understanding the context or how to run an instance for debugging. I can give you a login on the builder machine if that helps, though we have indications that the problem is generic.
The latest stale processes from my perspective on openbsd-ppc64, all times UTC:
Mon08PM 0:01.68
/home/swarming/.swarming/w/ir/x/t/go-build1577034934/b001/exe/ Mon10PM 0:02.45 /home/swarming/.swarming/w/ir/x/t/go-build1618394692/b001/exe/ Tue01AM 0:03.01 /home/swarming/.swarming/w/ir/x/t/go-build945231067/b001/exe/t 10:40AM 0:02.15 /home/swarming/.swarming/w/ir/x/t/go-build2954316331/b001/exe/ 3:16PM 0:01.13 /home/swarming/.swarming/w/ir/x/t/go-build1987332523/b001/exe/ 4:20PM 0:01.53 /home/swarming/.swarming/w/ir/x/t/go-build205669808/b001/exe/t 5:55PM 0:00.29 /home/swarming/.swarming/w/ir/x/t/go-build3224626174/b001/exe/ 7:16PM 0:00.70 /home/swarming/.swarming/w/ir/x/t/go-build3806395625/b001/exe/ 9:16PM 0:00.30 /home/swarming/.swarming/w/ir/x/t/go-build888657233/b001/exe/t 9:34PM 0:00.04 /home/swarming/.swarming/w/ir/x/t/go-build2245301038/b001/exe/
Comment From: n2vi
Trying again with "ps auxww" to get the full command...
Mon08PM 0:01.70 /home/swarming/.swarming/w/ir/x/t/go-build1577034934/b001/exe/trace /home/swarming/.swarming/w/ir/x/t/flightrecord3144583419
Comment From: adonovan
Up to you; I'd be inclined to nail down the exact cause so we can learn from it, but I'm way too far from understanding the context or how to run an instance for debugging. I can give you a login on the builder machine if that helps, though we have indications that the problem is generic.
The latest stale processes from my perspective on openbsd-ppc64, all times UTC: ... 9:34PM 0:00.04 /home/swarming/.swarming/w/ir/x/t/go-build2245301038/b001/exe/
Thanks. Definitely still a problem.
I think what is happening is that the new os.Process.Kill logic is successfully killing the go tool trace
process, but not killing its cmd/trace child. (Aside: it saddens me that we still haven't come up with a clean portable solution to this problem in the thirty-odd years since I first encountered it.) I suppose we could send a signal to its process group (on UNIX). There's no equivalent on Windows (without the horror of job objects), but this test is already skipped on Windows for other reasons.
CL pending.
Comment From: gopherbot
Change https://go.dev/cl/689476 mentions this issue: gopls/internal/debug: KillTraceViewers: kill process group on UNIX
Comment From: n2vi
crazy suggestion: Maybe take a cue from the rocket launch people, and keep sending a keepalive message to the child in some portable way like touching a file. If time goes by without an update, the child self destructs.
Comment From: adonovan
crazy suggestion: Maybe take a cue from the rocket launch people, and keep sending a keepalive message to the child in some portable way like touching a file. If time goes by without an update, the child self destructs.
That would require changing the logic of the child process. If you're allowed to do that, then the completely foolproof (and portable) mechanism is to have the child read a byte from stdin (or some other agreed pipe fd) and have the child terminate itself when the read returns, which indicates that the parent process has either sent it a byte, or itself died for any reason.