The lack of parallelization for compilations within a single package doesn't matter too much when building go files, but does affect cgo compilation. I have a package with about 32k lines of (generated) C++. Compiling it serially takes ~10.5s. A small hack to cmd/go/build.go to compile C/C++/Obj-C files in parallel brings this down to ~3.7s.
The patch I hacked together to parallelize cgo compilation work is ~50 lines of deltas and is mostly contained in builder.cgo()
.
PS The compilation times above were measured using go 1.4.1 on Mac OS X, but this code hasn't changed at head and similar compilation times can be seen there.
Comment From: robpike
Seems reasonable to me. Why not send the patch?
Comment From: petermattis
Done: https://go-review.googlesource.com/#/c/4931/
Comment From: gopherbot
CL https://golang.org/cl/4931 mentions this issue.
Comment From: petermattis
I'm curious if there is any plan to parallelize cgo compilation for go1.7. The finer-grained work scheduling for go tool actions mentioned in https://go-review.googlesource.com/#/c/4931/ would be great if it ever happens, though it doesn't look like there is any movement on that front.
Comment From: bcmills
Contrast #21605. Are there any similar ordering issues for other languages?
Comment From: ianlancetaylor
Currently we only support C, C++, and Fortran, and C and C++ have no ordering issues.
I'm not aware of ordering issues for any other language.
Comment From: r0l1
Any updates on this? Compiling one of our packages takes around 30 seconds... Would be great, if cgo compilation could be parallelized.
Comment From: bcmills
@r0l1, this issue is milestoned to Unplanned
, meaning that we do not currently plan to work on it.
The previous CL was closed in favor of “implementation of finer-grained actions”, which I think may be #8893.
Comment From: mvdan
I had a brief look at this last week, because this is causing an annoying build time bottleneck in a project of mine. I assume the file we're interested in is src/cmd/go/internal/work/exec.go
.
In particular, the Builder.cgo
method makes calls to Builder.gcc
synchronously in a loop.
It's unclear to me how we'd fix this, though. Can anyone point me to a doc that explains how work.Action
is designed at a high level?
For example, I can imagine these ways to fix the issue, but it's unclear which would be the right one:
- In
Builder.cgo
, create a childAction
for eachgcc
call, and ensure they're all run before we continue (is this possible?) - Before the entire
cgo
method is called viabuild
, separate the work into a tree of actions (how?cfiles
is constructed as part ofcgo
, for example)
Comment From: mvdan
I had a sudden realisation: one of the packages that are really slow to compile with cgo, https://github.com/olebedev/go-duktape/tree/v3, simply bundles a ton of C code as a single 3.5MiB file. I presume that parallelizing cgo compilation per-file wouldn't help at all in that case, and I also assume that there is no way to make that any faster.
Luckily, that's not the norm. For example, https://github.com/ipsn/go-libtor builds tor from its C source, directly building its thousands of C files. That seems like it would get a big win from a fix for this issue.
Comment From: diamondburned
I had a brief look at this last week, because this is causing an annoying build time bottleneck in a project of mine. I assume the file we're interested in is
src/cmd/go/internal/work/exec.go
.In particular, the
Builder.cgo
method makes calls toBuilder.gcc
synchronously in a loop.
From my observation, both cmd/go/internal/work
and cmd/cgo
invoke gcc
sequentially a lot of times, but cmd/cgo
takes up the bulk of the time.
I was able to write a prototype that attempts to parallelize both of them,
however the CL for cmd/go/internal/work
was denied in favor of another
unplanned refactor. I still believe that parallelizing just cmd/cgo
will
significantly improve compile time, though.
If anybody wants to test how much faster the changes are, it can be found at
diamondburned/go/work-gcc-parallelism.
When using this branch, use ./make.bash
to compile instead, as the change
currently fails a Cgo test that I'm still not sure why.
I'm currently daily driving Go 1.17 with these 2 patches applied on top almost daily, and I find compile time to be roughly 3-4x faster. The implementation will limit the maximum workers at 4 per package regardless of the thread count, so the number is to be expected.
The branch doesn't account for Fortran, however, so it might not work at all for codebases that need those.
Comment From: r0l1
@diamondburned thanks for the info. I am currently testing your patch (cmd/go/internal/work
) on latest go master and all tests pass. Could you tell me more about the first patch (cmd/cgo
)? I didn't find it. Is it already merged into upstream?
Edit:
I tested the patch with our cgo-based project and the build acceleration is enormous. Thanks! * unpatched: 66.91 sec * patch: 28.34 sec * patch with 8 routines on an 8 core: 26.99 sec
@bcmills The linked issue has been deprioritized by @rsc
@rsc willing to include the patch from @diamondburned ?
Edit 2: @diamondburned found the second patch. Didn't see, that you have split it into multiple branches. With the second patch the build times are 23.6 sec.
Comment From: diamondburned
Just as a small update, branch go-cgo-gcc-parallelism-1.18 now tracks the 2 patches for Go 1.18.
For Nix users, the drop-in overlay is
(self: super: {
go = super.go.overrideAttrs (old: {
version = "1.18";
src = builtins.fetchurl {
url = "https://golang.org/dl/go1.18.linux-amd64.tar.gz";
sha256 = "0kr6h1ddaazibxfkmw7b7jqyqhskvzjyc2c4zr8b3kapizlphlp8";
};
doCheck = false;
patches = [
# cmd/go/internal/work: concurrent ccompile routines
(builtins.fetchurl "https://github.com/diamondburned/go/commit/ec3e1c9471f170187b6a7c83ab0364253f895c28.patch")
# cmd/cgo: concurrent file generation
(builtins.fetchurl "https://github.com/diamondburned/go/commit/50e04befeca9ae63296a73c8d5d2870b904971b4.patch")
];
});
})
Comment From: Evengard
@diamondburned Unfortunately your code have problems. When attempting to compile my project with your version, it failed with a rather cryptic error:
cgo: duplicate pkg.Name nox_xxx_objGetTeamByNetCode_418C80, &{nox_xxx_objGetTeamByNetCode_418C80 _Cfunc_nox_xxx_objGetTeamByNetCode_418C80 nox_xxx_objGetTeamByNetCode_418C80 func <nil> 0xc001b12870 false } == &{nox_xxx_objGetTeamByNetCode_418C80 _Cfunc_nox_xxx_objGetTeamByNetCode_418C80 nox_xxx_objGetTeamByNetCode_418C80 func <nil> 0xc0007171d0 false }
nox_xxx_objGetTeamByNetCode_418C80
is a function imported from C via cgo.
The project compiles fine without multithreaded cgo though (aka regular Golang build)
The project in question is https://github.com/noxworld-dev/opennox, if need to check the bug out.
I used this branch: https://github.com/diamondburned/go/tree/go-cgo-gcc-parallelism-1.18
Comment From: diamondburned
Unfortunately your code have problems.
The modifications are definitely band-aid. It works for my purposes so I've stopped there, because cmd/cgo
's code is very complicated and weird for me to work with.
If you can't get the compiler to work with just the cmd/cgo
patch, then you might need to investigate deeper yourself.
Comment From: diamondburned
I've made go-cgo-gcc-parallelism-1.19 to track Go 1.19.
Comment From: Evengard
https://github.com/Evengard/go/tree/go-cgo-gcc-parallelism-1.20.3 tracks the Go 1.20.3 release for anyone who needs that.
Comment From: podtserkovskiy
It's been a while since this issue was created and two CLs were abandoned in favor to properly redesign cmd/internal/work
to make actions API more fine grained.
This is definetely rught thing to do, but nobody had time to implement that. I do anticipate that this is not an easy project and personally I don't have enough time to invest into it at the moment.
Are we happy to do a tiny fix to just parallelize the work inside Builder.cgo
as a temporary solution?
Comment From: gopherbot
Change https://go.dev/cl/579815 mentions this issue: cmd/go/internal/work: parallelize C/C++ code compilation
Comment From: podtserkovskiy
I've tried to keep it as less invasive as possible, let me know if it works for you https://go-review.googlesource.com/c/go/+/579815
Comment From: matloob
It would be ideal to try to fix this by making actions more fine grained, especially since there aren't any other cases where we do work concurrently within an action. It complicates to have two layers of concurrency- one layer being the actions that are running and the other being the C compilations within the action.
Comment From: diamondburned
It would be ideal to try to fix this by making actions more fine grained, especially since there aren't any other cases where we do work concurrently within an action. It complicates to have two layers of concurrency- one layer being the actions that are running and the other being the C compilations within the action.
AFAIK, this was one of the reasons for not merging the CLs. I definitely agree that that's the ideal solution, but as @podtserkovskiy pointed out, it wasn't really being worked on, unfortunately.
Comment From: podtserkovskiy
It would be ideal to try to fix this by making actions more fine grained
I fully agree that the redesign of the actions API would be ideal, but this is not a trivial change and it requires much more time for proper implementation than this fix.
there aren't any other cases where we do work concurrently within an action
I can recall one more place with a potential to execute things concurrently gcToolchain.asm
within an action.
Comment From: matloob
there aren't any other cases where we do work concurrently within an action
I can recall one more place with a potential to execute things concurrently
gcToolchain.asm
within an action.
Sorry, I meant that there aren't any other places currently where we do work concurrently with an action
Comment From: gucio321
hi @podtserkovskiy , I've tried out your change on cimgui-go repository but my results are not really good:
[examples (130) ]$ time ~/git/go/bin/go build .
real 3m19.036s
user 3m39.633s
sys 0m4.831s
[examples (0) ]$ time go build -a .
real 3m30.830s
user 3m46.801s
sys 0m6.414s
[examples (0) ]$
[examples (0) ]$ ~/git/go/bin/go version # with applied patch
go version devel go1.23-fe5487327a Tue Apr 23 08:28:17 2024 +0200 linux/amd64
Comment From: diamondburned
Yeah, unfortunately, cmd/go/internal/work
is only the slow one in some cases. Most of the bottleneck is in cmd/cgo
which is its own beast.
Comment From: podtserkovskiy
cmd/go/internal/work
can potentially run cmd/cgo
for all packages concurrently (even before any dependencies were compiled), because this action doesn't need -importcfg
and any .a
files from dependencies.
However, this requires significant redesign of cmd/go/internal/work
.
Most of the bottleneck is in cmd/cgo which is its own beast.
BTW, I've tried to do a quick experiment with cmd/cgo
. I've parallelized gccDefines
calls.
It didn't fix my particular issue, but maybe useful for you https://go-review.googlesource.com/c/go/+/581336
Comment From: podtserkovskiy
I've published parallelisation of gccDefines
calls in cmd/cgo
.
There are more CC
calls to make concurrent.
But even this change makes processing of "net" package (5 CGo files) 25% faster. it won't improve performance of packages with a single CGo file, the more CGo files we have, the higher the performance increase.
CL: https://go-review.googlesource.com/c/go/+/581336
Comment From: mappu
Hi, I have a very large CGO module. The /qt/ package alone has 340 .cpp files plus 340 .go files using CGO.
Some tests on a 4-core Neoverse N1:
Branch | Performance |
---|---|
go1.19.8 | About 14 minutes |
go1.23.3 | About 14 minutes |
CL 581336 (go1.24-d555358bf8) | About 14 minutes |
Today's gotip (go1.24-e67c0f0) | About 14 minutes |
CL 579815 (go1.23-b0a2790fb08) | Much faster (about 8 minutes) |
Comment From: leaanthony
Hello from the future. Is there any plans for this to go ahead? The main pain point here is the need to use CGO on non-Windows machines for some large libraries (QT & GTK). These take a ridiculous amount of time to compile. To be clear, I'd much prefer we had something like purego in the standard library, but if that's not going to happen, parallel CGO compiles would go an awful long way to helping some serious pain points. Thanks all 🙏
Comment From: matloob
Hi, we're trying to determine whether to prioritize the work to fix this properly (with fine grained actions) for Go 1.26. Please let me know if your workflow is affected by this, and if you'd be able to test out the changes to see if it improves your workflow.
Thanks!
Comment From: ilius
@matloob miqt library is a good example, because it takes about 10 minutes to compile on an intel i5-12400
.
I can test it on my ayandict app.
Comment From: DeedleFake
@matloob, my workflow on Trayscale is affected by this. Compilations without the cache can take 20 to 30 minutes, which makes testing small changes incredibly annoying. I don't know how much something like this would help, but if it could at all that would be very much appreciated.
And yes, I'd be able to test it.