The lack of parallelization for compilations within a single package doesn't matter too much when building go files, but does affect cgo compilation. I have a package with about 32k lines of (generated) C++. Compiling it serially takes ~10.5s. A small hack to cmd/go/build.go to compile C/C++/Obj-C files in parallel brings this down to ~3.7s.

The patch I hacked together to parallelize cgo compilation work is ~50 lines of deltas and is mostly contained in builder.cgo().

PS The compilation times above were measured using go 1.4.1 on Mac OS X, but this code hasn't changed at head and similar compilation times can be seen there.

Comment From: robpike

Seems reasonable to me. Why not send the patch?

Comment From: petermattis

Done: https://go-review.googlesource.com/#/c/4931/

Comment From: gopherbot

CL https://golang.org/cl/4931 mentions this issue.

Comment From: petermattis

I'm curious if there is any plan to parallelize cgo compilation for go1.7. The finer-grained work scheduling for go tool actions mentioned in https://go-review.googlesource.com/#/c/4931/ would be great if it ever happens, though it doesn't look like there is any movement on that front.

Comment From: bcmills

Contrast #21605. Are there any similar ordering issues for other languages?

Comment From: ianlancetaylor

Currently we only support C, C++, and Fortran, and C and C++ have no ordering issues.

I'm not aware of ordering issues for any other language.

Comment From: r0l1

Any updates on this? Compiling one of our packages takes around 30 seconds... Would be great, if cgo compilation could be parallelized.

Comment From: bcmills

@r0l1, this issue is milestoned to Unplanned, meaning that we do not currently plan to work on it.

The previous CL was closed in favor of “implementation of finer-grained actions”, which I think may be #8893.

Comment From: mvdan

I had a brief look at this last week, because this is causing an annoying build time bottleneck in a project of mine. I assume the file we're interested in is src/cmd/go/internal/work/exec.go.

In particular, the Builder.cgo method makes calls to Builder.gcc synchronously in a loop.

It's unclear to me how we'd fix this, though. Can anyone point me to a doc that explains how work.Action is designed at a high level?

For example, I can imagine these ways to fix the issue, but it's unclear which would be the right one:

  • In Builder.cgo, create a child Action for each gcc call, and ensure they're all run before we continue (is this possible?)
  • Before the entire cgo method is called via build, separate the work into a tree of actions (how? cfiles is constructed as part of cgo, for example)

Comment From: mvdan

I had a sudden realisation: one of the packages that are really slow to compile with cgo, https://github.com/olebedev/go-duktape/tree/v3, simply bundles a ton of C code as a single 3.5MiB file. I presume that parallelizing cgo compilation per-file wouldn't help at all in that case, and I also assume that there is no way to make that any faster.

Luckily, that's not the norm. For example, https://github.com/ipsn/go-libtor builds tor from its C source, directly building its thousands of C files. That seems like it would get a big win from a fix for this issue.

Comment From: diamondburned

I had a brief look at this last week, because this is causing an annoying build time bottleneck in a project of mine. I assume the file we're interested in is src/cmd/go/internal/work/exec.go.

In particular, the Builder.cgo method makes calls to Builder.gcc synchronously in a loop.

From my observation, both cmd/go/internal/work and cmd/cgo invoke gcc sequentially a lot of times, but cmd/cgo takes up the bulk of the time.

I was able to write a prototype that attempts to parallelize both of them, however the CL for cmd/go/internal/work was denied in favor of another unplanned refactor. I still believe that parallelizing just cmd/cgo will significantly improve compile time, though.

If anybody wants to test how much faster the changes are, it can be found at diamondburned/go/work-gcc-parallelism. When using this branch, use ./make.bash to compile instead, as the change currently fails a Cgo test that I'm still not sure why.

I'm currently daily driving Go 1.17 with these 2 patches applied on top almost daily, and I find compile time to be roughly 3-4x faster. The implementation will limit the maximum workers at 4 per package regardless of the thread count, so the number is to be expected.

The branch doesn't account for Fortran, however, so it might not work at all for codebases that need those.

Comment From: r0l1

@diamondburned thanks for the info. I am currently testing your patch (cmd/go/internal/work) on latest go master and all tests pass. Could you tell me more about the first patch (cmd/cgo)? I didn't find it. Is it already merged into upstream?

Edit:

I tested the patch with our cgo-based project and the build acceleration is enormous. Thanks! * unpatched: 66.91 sec * patch: 28.34 sec * patch with 8 routines on an 8 core: 26.99 sec

@bcmills The linked issue has been deprioritized by @rsc

@rsc willing to include the patch from @diamondburned ?

Edit 2: @diamondburned found the second patch. Didn't see, that you have split it into multiple branches. With the second patch the build times are 23.6 sec.

Comment From: diamondburned

Just as a small update, branch go-cgo-gcc-parallelism-1.18 now tracks the 2 patches for Go 1.18.

For Nix users, the drop-in overlay is

(self: super: {
    go = super.go.overrideAttrs (old: {
        version = "1.18";
        src = builtins.fetchurl {
            url    = "https://golang.org/dl/go1.18.linux-amd64.tar.gz";
            sha256 = "0kr6h1ddaazibxfkmw7b7jqyqhskvzjyc2c4zr8b3kapizlphlp8";
        };
        doCheck = false;
        patches = [
            # cmd/go/internal/work: concurrent ccompile routines
            (builtins.fetchurl "https://github.com/diamondburned/go/commit/ec3e1c9471f170187b6a7c83ab0364253f895c28.patch")
            # cmd/cgo: concurrent file generation
            (builtins.fetchurl "https://github.com/diamondburned/go/commit/50e04befeca9ae63296a73c8d5d2870b904971b4.patch")
        ];
    });
})

Comment From: Evengard

@diamondburned Unfortunately your code have problems. When attempting to compile my project with your version, it failed with a rather cryptic error:

cgo: duplicate pkg.Name nox_xxx_objGetTeamByNetCode_418C80, &{nox_xxx_objGetTeamByNetCode_418C80 _Cfunc_nox_xxx_objGetTeamByNetCode_418C80 nox_xxx_objGetTeamByNetCode_418C80  func <nil> 0xc001b12870 false } == &{nox_xxx_objGetTeamByNetCode_418C80 _Cfunc_nox_xxx_objGetTeamByNetCode_418C80 nox_xxx_objGetTeamByNetCode_418C80  func <nil> 0xc0007171d0 false }

nox_xxx_objGetTeamByNetCode_418C80 is a function imported from C via cgo. The project compiles fine without multithreaded cgo though (aka regular Golang build) The project in question is https://github.com/noxworld-dev/opennox, if need to check the bug out.

I used this branch: https://github.com/diamondburned/go/tree/go-cgo-gcc-parallelism-1.18

Comment From: diamondburned

Unfortunately your code have problems.

The modifications are definitely band-aid. It works for my purposes so I've stopped there, because cmd/cgo's code is very complicated and weird for me to work with.

If you can't get the compiler to work with just the cmd/cgo patch, then you might need to investigate deeper yourself.

Comment From: diamondburned

I've made go-cgo-gcc-parallelism-1.19 to track Go 1.19.

Comment From: Evengard

https://github.com/Evengard/go/tree/go-cgo-gcc-parallelism-1.20.3 tracks the Go 1.20.3 release for anyone who needs that.

Comment From: podtserkovskiy

It's been a while since this issue was created and two CLs were abandoned in favor to properly redesign cmd/internal/work to make actions API more fine grained.

This is definetely rught thing to do, but nobody had time to implement that. I do anticipate that this is not an easy project and personally I don't have enough time to invest into it at the moment.

Are we happy to do a tiny fix to just parallelize the work inside Builder.cgo as a temporary solution?

Comment From: gopherbot

Change https://go.dev/cl/579815 mentions this issue: cmd/go/internal/work: parallelize C/C++ code compilation

Comment From: podtserkovskiy

I've tried to keep it as less invasive as possible, let me know if it works for you https://go-review.googlesource.com/c/go/+/579815

Comment From: matloob

It would be ideal to try to fix this by making actions more fine grained, especially since there aren't any other cases where we do work concurrently within an action. It complicates to have two layers of concurrency- one layer being the actions that are running and the other being the C compilations within the action.

Comment From: diamondburned

It would be ideal to try to fix this by making actions more fine grained, especially since there aren't any other cases where we do work concurrently within an action. It complicates to have two layers of concurrency- one layer being the actions that are running and the other being the C compilations within the action.

AFAIK, this was one of the reasons for not merging the CLs. I definitely agree that that's the ideal solution, but as @podtserkovskiy pointed out, it wasn't really being worked on, unfortunately.

Comment From: podtserkovskiy

It would be ideal to try to fix this by making actions more fine grained

I fully agree that the redesign of the actions API would be ideal, but this is not a trivial change and it requires much more time for proper implementation than this fix.

there aren't any other cases where we do work concurrently within an action

I can recall one more place with a potential to execute things concurrently gcToolchain.asm within an action.

Comment From: matloob

there aren't any other cases where we do work concurrently within an action

I can recall one more place with a potential to execute things concurrently gcToolchain.asm within an action.

Sorry, I meant that there aren't any other places currently where we do work concurrently with an action

Comment From: gucio321

hi @podtserkovskiy , I've tried out your change on cimgui-go repository but my results are not really good:

[examples (130) ]$ time ~/git/go/bin/go build .

real    3m19.036s
user    3m39.633s
sys 0m4.831s
[examples (0) ]$ time go build -a .

real    3m30.830s
user    3m46.801s
sys 0m6.414s
[examples (0) ]$ 
[examples (0) ]$ ~/git/go/bin/go version # with applied patch
go version devel go1.23-fe5487327a Tue Apr 23 08:28:17 2024 +0200 linux/amd64

Comment From: diamondburned

Yeah, unfortunately, cmd/go/internal/work is only the slow one in some cases. Most of the bottleneck is in cmd/cgo which is its own beast.

Comment From: podtserkovskiy

cmd/go/internal/work can potentially run cmd/cgo for all packages concurrently (even before any dependencies were compiled), because this action doesn't need -importcfg and any .a files from dependencies. However, this requires significant redesign of cmd/go/internal/work.

Most of the bottleneck is in cmd/cgo which is its own beast.

BTW, I've tried to do a quick experiment with cmd/cgo. I've parallelized gccDefines calls. It didn't fix my particular issue, but maybe useful for you https://go-review.googlesource.com/c/go/+/581336

Comment From: podtserkovskiy

I've published parallelisation of gccDefines calls in cmd/cgo.

There are more CC calls to make concurrent.

But even this change makes processing of "net" package (5 CGo files) 25% faster. it won't improve performance of packages with a single CGo file, the more CGo files we have, the higher the performance increase.

CL: https://go-review.googlesource.com/c/go/+/581336

Comment From: mappu

Hi, I have a very large CGO module. The /qt/ package alone has 340 .cpp files plus 340 .go files using CGO.

Some tests on a 4-core Neoverse N1:

Branch Performance
go1.19.8 About 14 minutes
go1.23.3 About 14 minutes
CL 581336 (go1.24-d555358bf8) About 14 minutes
Today's gotip (go1.24-e67c0f0) About 14 minutes
CL 579815 (go1.23-b0a2790fb08) Much faster (about 8 minutes)

Comment From: leaanthony

Hello from the future. Is there any plans for this to go ahead? The main pain point here is the need to use CGO on non-Windows machines for some large libraries (QT & GTK). These take a ridiculous amount of time to compile. To be clear, I'd much prefer we had something like purego in the standard library, but if that's not going to happen, parallel CGO compiles would go an awful long way to helping some serious pain points. Thanks all 🙏

Comment From: matloob

Hi, we're trying to determine whether to prioritize the work to fix this properly (with fine grained actions) for Go 1.26. Please let me know if your workflow is affected by this, and if you'd be able to test out the changes to see if it improves your workflow.

Thanks!

Comment From: ilius

@matloob miqt library is a good example, because it takes about 10 minutes to compile on an intel i5-12400. I can test it on my ayandict app.

Comment From: DeedleFake

@matloob, my workflow on Trayscale is affected by this. Compilations without the cache can take 20 to 30 minutes, which makes testing small changes incredibly annoying. I don't know how much something like this would help, but if it could at all that would be very much appreciated.

And yes, I'd be able to test it.