(It is unclear to me if this is an issue with the test, cmd/go, the compiler/linker, or the builder itself)

Example failure: https://ci.chromium.org/ui/inv/build-8759926960216361809/test-results?sortby=&groupby=

Both tests are failing because they aren't getting a reproducible build.

    script_test.go:156: FAIL: testdata/script/build_issue48319.txt:29: cmp -q main.exe main1.exe: main.exe and main1.exe differ
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ

I haven't yet been able to reproduce on a gomote because the LUCI gomote setup doesn't currently set up Xcode properly, so cgo doesn't work (which these tests require).

cc @bcmills @dmitshur @mknyszek @cagedmantis

Comment From: prattmic

Workaround to get Xcode on a gomote:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /bin/mkdir /tmp/xcode
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app

Comment From: bcmills

Might have something to do with code-signing? (But then why aren't those tests failing on the darwin-amd64-longtest legacy TryBots too?)

Comment From: prattmic

With Xcode installed, this (thankfully) does reproduce (no pun intended):

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go  
# Streaming results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout"...
=== RUN   TestScript
vcs-test.golang.org rerouted to http://127.0.0.1:50941
https://vcs-test.golang.org rerouted to https://127.0.0.1:50942
go test proxy running at GOPROXY=http://127.0.0.1:50943/mod
=== RUN   TestScript/build_plugin_reproducible
=== PAUSE TestScript/build_plugin_reproducible
=== CONT  TestScript/build_plugin_reproducible
    script_test.go:132: 2024-01-03T19:36:00Z
    script_test.go:134: $WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
    script_test.go:156: 
        PATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin:/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go/bin:/Users/swarming/.swarming/w/ir/tools/bin:/Users/swarming/.swarming/cipd_cache/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
        HOME=/no-home
        CCACHE_DISABLE=1
        GOARCH=amd64
        TESTGO_GOHOSTARCH=amd64
        GOCACHE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/gocache
        GOCOVERDIR=
        GODEBUG=
        GOEXE=
        GOEXPERIMENT=
        GOOS=darwin
        TESTGO_GOHOSTOS=darwin
        GOPROXY=http://127.0.0.1:50943/mod
        GOPRIVATE=
        GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        GOROOT_FINAL=
        GOTRACEBACK=system
        TESTGO_GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        TESTGO_EXE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin/go
        TESTGO_VCSTEST_HOST=127.0.0.1:50941
        TESTGO_VCSTEST_TLS_HOST=127.0.0.1:50942
        TESTGO_VCSTEST_CERT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/vcstest63272679/cert.pem
        TESTGONETWORK=panic
        GOSUMDB=localhost.localdev/sumdb+00000c67+AcTrnkbUA+TU4heY3hkjiSES/DSQniBqIeQ/YppAUtK6
        GONOPROXY=
        GONOSUMDB=
        GOVCS=*:all
        devnull=/dev/null
        goversion=1.22
        CMDGO_TEST_RUN_MAIN=true
        HGRCPATH=
        GOTOOLCHAIN=auto
        newline=

        WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
        TMPDIR=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/tmp
        GOPATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath
        PWD=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath/src

        > [!buildmode:plugin] skip
        [condition not met]
        > [short] skip
        [condition not met]
        > go build -trimpath -buildvcs=false -buildmode=plugin -o a.so main.go
        > go build -trimpath -buildvcs=false -buildmode=plugin -o b.so main.go
        > cmp -q a.so b.so
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ
--- FAIL: TestScript (0.10s)
    --- FAIL: TestScript/build_plugin_reproducible (8.78s)
FAIL
FAIL    cmd/go  9.089s
FAIL
# Wrote results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout".
Error running run: unable to execute ./go/bin/go: rpc error: code = Unknown desc = command execution failed: exit status 1

Comment From: prattmic

Complete recipe:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ export GOROOT=/home/prattmic/src/go/ # set to your GOROOT
$ export GOMOTELUCI=true
$ gomote create gotip-darwin-amd64-longtest
mpratt-gotip-darwin-amd64-longtest-1
$ export INSTANCE=mpratt-gotip-darwin-amd64-longtest-1
$ gomote run ${INSTANCE} /bin/mkdir /tmp/xcode
$ gomote run ${INSTANCE} /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run ${INSTANCE} /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app
$ gomote push ${INSTANCE}
$ gomote run ${INSTANCE} ./go/src/make.bash
$ gomote run ${INSTANCE} ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go

Comment From: prattmic

The only differences between a.so and b.so are something near the beginning of the file (still investigating) and the Go Build ID:

diff -C 5 a.hex b.hex
*** a.hex       Wed Jan  3 12:03:42 2024
--- b.hex       Wed Jan  3 12:03:47 2024
***************
*** 117,128 ****
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 edfb 2d7d ab6e 374d  ..........-}.n7M
! 000007a0: 8eba 7c75 012c c264 3200 0000 2000 0000  ..|u.,.d2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
--- 117,128 ----
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 f0cb 7393 b5bd 3b76  ..........s...;v
! 000007a0: 9fb5 5f03 dd32 4c8a 3200 0000 2000 0000  .._..2L.2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
***************
*** 916,928 ****
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f66  GiuqE-vRNRx1Xx/f
! 00003990: 4735 6f64 7563 424e 6d4f 7053 6455 4e51  G5oducBNmOpSdUNQ
! 000039a0: 7861 5522 0a20 ffcc cccc cccc cccc cccc  xaU". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....
--- 916,928 ----
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f6d  GiuqE-vRNRx1Xx/m
! 00003990: 436f 5971 6470 5854 386a 7a54 6e64 6d4f  CoYqdpXT8jzTndmO
! 000039a0: 3038 5022 0a20 ffcc cccc cccc cccc cccc  08P". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....

Comment From: prattmic

Based on the otool output, it looks like this other component is the LC_UUID value: EDFB2D7D-AB6E-374D-8EBA-7C75012CC264 vs F0CB7393-B5BD-3B76-9FB5-5F03DD324C8A.

I don't know MachO very well, but it seems that this is just another build ID...

Comment From: bcmills

Huh. See also https://bugs.chromium.org/p/chromium/issues/detail?id=1068970. 😵‍💫

Comment From: prattmic

The output of GODEBUG=gocachehash=1 is identical for both builds.

Comment From: bcmills

Yeah, looks like the LC_UUID depends on at least the last component of the output file path: https://github.com/apple-opensource/ld64/blame/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/OutputFile.cpp#L3724-L3733

I'm not sure what _options.buildContextName() is derived from. Looks like maybe the RC_RELEASE environment variable? (https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L4529-L4530C30)

Comment From: prattmic

Thanks for the reference! Looking at the go build -x output, the last few steps are:

GOROOT_FINAL='$GOROOT' /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
/Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/buildid -w $WORK/b001/exe/a.out.so # internal
mv $WORK/b001/exe/a.out.so b.so

Running the link step (first line) multiple times, even without changing the path, yields a .so with different LC_UUID each time:

$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so 
9029867749bbecd942c0037526ababa6f0d83932  a.out.so
$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so                                                                                                                                                                                                                                                                                                                                                        
d76b438421660ffd24d4ca06dc30b3150b0b9fee  a.out.so

Diffing the binary shows that the UUID is the only difference (Go Build ID is identical presumably because I'm not running the buildid command).

Comment From: prattmic

It doesn't seem to be related to the file paths. cmd/link invokes clang like so:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/go.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000000.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000001.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000002.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000003.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000004.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000005.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000006.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000007.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000008.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000009.o" "-O2" "-g" "-lpthread"

The go-link-202179431 path component changes each iteration, but this can be forced to be the same with -tmpdir /tmp/tmp:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" "/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" "/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "/tmp/tmp/000009.o" "-O2" "-g" "-lpthread"

Even with identical paths each time we get different output.

Comment From: bcmills

Maybe we can just set the -no_uuid flag if it exists? https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L3412-L3415

Comment From: prattmic

Perhaps, but I'd like to better understand what is happening. Plus it seems like some users may want the UUID, as Chrome did.

FWIW, the clang command consistently generates identical output from the same inputs. It seems it is the output of dsymutil that is differing:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp2/go.o" "/tmp/tmp2/000000.o" "/tmp/tmp2/000001.o" "/tmp/tmp2/000002.o" "/tmp/tmp2/000003.o" "/tmp/tmp2/000004.o" "/tmp/tmp2/000005.o" "/tmp/tmp2/000006.o" "/tmp/tmp2/000007.o" "/tmp/tmp2/000008.o" "/tmp/tmp2/000009.o" "-O2" "-g" "-lpthread"
host link dsymutil: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/dsymutil" "-f" "a.out.so" "-o" "/tmp/tmp2/go.dwarf"
host link strip: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/strip" "-S" "a.out.so"
$ for f in /tmp/tmp/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp/000009.o
aa2a253f8abd8c84c3ef27ddfc9c12bc1481f277  /tmp/tmp/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp/trivial.c
$ for f in /tmp/tmp2/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp2/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp2/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp2/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp2/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp2/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp2/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp2/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp2/000009.o
3c3cf42192f5f6427f05b2dceca7c5733e6f1721  /tmp/tmp2/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp2/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp2/trivial.c

(go.dwarf differs)

Edit: I'm not 100% certain about dsymutil being at fault here, as I can't seem to reproduce the non-reproducibility when running clang + dsymutil manually.

Comment From: prattmic

cc @thanm see https://github.com/golang/go/issues/64947#issuecomment-1875889679 for reproducer instructions

Comment From: thanm

I spent a little while looking at this. What's weird is that the actual DWARF in the two go.dwarf files is identical-- what is different is (again) the uuid. E.g.

$  llvm-dwarfdump-16 xxx/tmpdir1/go.dwarf > dw1.txt
$  llvm-dwarfdump-16 xxx/tmpdir2/go.dwarf > dw2.txt
$ diff dw1.txt dw2.txt
1c1
< xxx/tmpdir1/go.dwarf: file format Mach-O 64-bit x86-64
---
> xxx/tmpdir2/go.dwarf: file format Mach-O 64-bit x86-64
$
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir1/go.dwarf > h1.txt
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir2/go.dwarf > h2.txt
$ diff h1.txt h2.txt
1c1
< xxx/tmpdir1/go.dwarf:
---
> xxx/tmpdir2/go.dwarf:
3880c3880
<     uuid 3BA8085B-DD85-312C-B9AD-2CEDAE928E62
---
>     uuid E559C1A0-DDFF-3BD3-8CD8-7652DC367F9F
$

So basically what seems to be happening is that dsymutil is generating a different uuid each time and embedding it into the go.dwarf file, in spite of the fact that the dwarf is the same, hmm.

I will spend a little time digging into the dsymutil source code, maybe I can find out more.

Comment From: prattmic

FWIW, the version of Xcode we're installing is 15.0.0. I peeked at the release notes for 15.0.1 and 15.1 and nothing stood out as a fix for this kind of issue, but I'll see if we can get a different version to test.

Comment From: thanm

OK (duh) in fact dsymutil is just faithfully copying the uuid from its input, so the problem here is that clang is generating a different uuid. I'll look into the clang source code instead.

Comment From: prattmic

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

Comment From: thanm

OK, I think I am making some progress here. For a while I thought this might be an ld-prime problem, but that turned out to be a red herring. In fact it looks like it is a bit simpler than that.

Running the link with -ldflags=-v -tmpdir=/tmp/tmp I see

# command-line-arguments
HEADER = -H1 -T0x1001000 -R0x1000
host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load"
 "-dynamiclib" "-o" "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build2833294421/b001/exe/a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" 
"/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" 
"/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "-O2" "-g" "-lpthread" "-ld64"

Note the "-o" output, which incorporates the go build dir go-build2833294421, which is going to vary from build to build. The problem is that this is being incorporated into the dynamic info in the a.out.so output, e.g. from the output of llvm-objdump-16 --macho --all-headers I see:

Load command 4
          cmd LC_ID_DYLIB
      cmdsize 128
         name /Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build1516003297/b001/exe/a.out.so (offset 24)
   time stamp 1 Thu Jan  1 00:00:01 1970
      current version 0.0.0
compatibility version 0.0.0

and the external linker is almost certainly going to hash this section when creating the build ID.

Not sure what the best approach is to fix this. Also not sure why we aren't seeing similar problems with the older gomotes (I will spin one up and compare).

Comment From: bcmills

@thanm, that sounds very similar to an existing reproducibility workaround here: https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347

Perhaps we need to extend that workaround to more build modes, or take a similar approach when running other commands?

Comment From: thanm

Well phooey, I am afraid I've had a Homer Simpson moment here.

My gomote expired, and I created a new one, but when I started using the new one I didn't update the PATH setting in my script, so it wasn't picking up the correct version of Go. It looks like with LUCI gomotes the location of GOROOT is slightly different each time:

bindir from my first gomote: "/Users/swarming/.swarming/w/ituz4dfd04/workdir-swarming-task/go/bin"
bindir from my second gomote: "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/go/bin"

Oh well, a learning experience I suppose.

That explains why I was not picking up Cherry's fix (https://go-review.googlesource.com/c/go/+/478196, which extends the workaround that you mention Bryan).

Now I'm back to seeing only a difference in the UUID.

Comment From: thanm

One more important bit of info: problem goes away if I build with -extldflags=-ld_classic, meaning that this may be another thing we can add to the long list of problems that crop up with "ld-prime" (e.g. issue #61229).

Looking at the setup we have on our old-style gomotes I see:

$ gomote run `cat mote.txt` softwareupdate --history

Display Name                                       Version    Date                  
------------                                       -------    ----                  
Command Line Tools for Xcode                       14.0       11/07/2022, 16:16:24  
Command Line Tools for Xcode                       14.1       11/07/2022, 16:16:24

e.g. command line tools, not a complete Xcode installation. For the new LUCI gomotes we are obviously a full Xcode install, and we're using version 15, which defaults to ld-prime.

Comment From: thanm

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

I just tested the most recent one (15.2) and it appears to have the same problem. Hmph.

Comment From: thanm

OK, one more update. I can reproduce the problem with just the C compiler, and what I think must be going on is that the name of the output file is being incorporated into the UUID. If I do:

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o b.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

then I see a difference, whereas if I instead do

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
mv a.so b.so
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

The UUIDs are the same (the only thing different in the second example is that both builds target a.so).

How would we feel about changing the test in question to target the same filename? Or does the current ld-prime behavior not really meet our criteria for reproducible builds?

Comment From: bcmills

Huh. I guess it would be ok for the tests to mv the output file so that they can go build -o to the same filename, although that seems a bit subtle.

Does the LC_UUID depend only on the output file's basename, or on the directory path as well? I think it's probably ok for it to depend on the basename, but (especially if the user is building with -trimpath) we should ensure that it doesn't depend on the current working directory.

Comment From: thanm

Does the LC_UUID depend only on the output file's basename

I checked just now and it looks like it is just the output file basename, not the directory. If I run the C compiler building example.cpp once in directory xxx, then do the same compile in directory yyy, I get identical binaries. I'll send a CL, although I agree it is a bit weird.

Comment From: thanm

I poked a bit at the other failure (TestScript/build_issue48319). That one looks like it will require another Go command fix -- since this is not a shared-mode build the "-o" argument being passed to the external linker is a full path. Hence the difference in build IDs.

@bcmills would be make sense to take the code you mentioned before (https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347) and extend it even farther (e.g. any link being done on Darwin)?

Comment From: gopherbot

Change https://go.dev/cl/554059 mentions this issue: cmd/go/testdata: tweak build_plugin_reproducible test for Xcode 15

Comment From: cherrymui

If it is only a test issue, and user's normal "go build" (be default, without any extra weird flags) is still reproducible, I think it is okay to just update the test. Another option is that we (over)write the LC_UUID in the Go linker after C linking, based on the file content (or the Go build ID). (We overwrite the binary for DWARF combining anyway, but that may be changed with #62577.)

For another issue, if it is not a shared object there would be no LC_ID_DYLIB, so it is still the UUID that is affected by the output file path?

Comment From: bcmills

If it is only a test issue, and user's normal "go build" (be default, without any extra weird flags) is still reproducible, I think it is okay to just update the test.

If I understand correctly, it's in an awkward grey area: the build is “reproducible” but only if you specify an output filename with the same basename at each invocation.

That is:

$ go build -trimpath -o foo
$ mv foo bar
$ go build -trimpath -o foo

will produce a foo identical to bar, but

$ go build -trimpath -o foo
$ go build -trimpath -o bar

will not.

Since -trimpath is supposed to redact local filenames, and the name of the output file is arguably a local filename, that technically fails reproducibility. On the other hand, it is still the case that repeating exactly the same command — provided that the -o flag is also the same — should continue to produce the same (reproducible) output bytes independent of the working directory.

Comment From: cherrymui

We could have the Go linker to always pass, say a.out to the C linker, and rename the file after.

Comment From: cherrymui

Perhaps the most reliable way is to just write the UUID ourselves. For short term (and possibly backport), we can work around it in the test.

Comment From: thanm

Circling back on this: when I ran the experiment using clang (described in https://github.com/golang/go/issues/64947#issuecomment-1877722584) I assumed that the same fix would work for "go build", but in fact it looks like the UUIDs are still different, so I think there is still some work to do here. Apologies, I have been busy working on other bugs, I will spend some more time on it later this afternoon.

Comment From: thanm

Tiny bit more progress:

I hacked the Go linker code to save off a copy of the original "a.out.so" produced by the Apple linker before it gets run through dsymutil and then strip. When I compare the two instances of the original a.out.so (e.g. before being stripped) I can see differences in the symbol table output:

+ diff xxx/asvsh.txt xxx/bsvsh.txt
1c1
< xxx/tmpdir1/a.out.so.save:
---
> xxx/tmpdir2/a.out.so.save:
3311c3311
< 00000000659ee137      d  *UND* /private/tmp/tmp/go.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/go.o
12706c12706
< 00000000659ee137      d  *UND* /private/tmp/tmp/000004.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/000004.o
12714c12714
< 00000000659ee137      d  *UND* /private/tmp/tmp/000005.o
---
> 00000000659ee145      d  *UND* /private/tmp/tmp/000005.o

So this explains why we have this bizarre situation where the only thing different in the final binary is the build ID.

These phantom symbols are also really weird, too: their value doesn't seem to be meaningful at all (normally an undef symbol has a zero value?), and doesn't correspond to anything in the section table. I will see if I can reproduce this same weirdness with a pure C example -- if this works then we can file a bug against the apple linker.

Comment From: thanm

OK, I have successfully reproduced the problem with a C program via:

$ rm -rf /tmp/tmp
$ mkdir /tmp/tmp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj1.o a.cpp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj2.o b.cpp
$ clang -arch x86_64 -m64 -dynamiclib "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" -o a.out.so /tmp/tmp/obj1.o /tmp/tmp/obj2.o
$ mv a.out.so a.so
$
$ rm -rf /tmp/tmp
$ mkdir /tmp/tmp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj1.o a.cpp
$ clang -arch x86_64 -m64 -dynamiclib -O2 -g -c -o /tmp/tmp/obj2.o b.cpp
$ clang -arch x86_64 -m64 -dynamiclib "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" -o a.out.so /tmp/tmp/obj1.o /tmp/tmp/obj2.o
$ mv a.out.so b.so

Diffing the objdump output from a.so and b.so above produces

1c1
< a.so:
---
> b.so:
17c17
< 00000000659ee84c      d  *UND* /private/tmp/tmp/obj1.o
---
> 00000000659ee84d      d  *UND* /private/tmp/tmp/obj1.o
31c31
< 00000000659ee84c      d  *UND* /private/tmp/tmp/obj2.o
---
> 00000000659ee84d      d  *UND* /private/tmp/tmp/obj2.o
234c234
<     uuid 78168A31-07CA-3F90-B84D-A352B00AD9AE
---
>     uuid 6D77FC7A-5C55-3D9E-BCBF-46C334F27478
(END)

I think Cherry's idea of rewriting the build ID is looking a bit more attractive at this point.

Comment From: cherrymui

Well, it seems the values of those symbols aren't really arbitrary:

$ objdump -t a.out | grep a.o
a.out:  file format mach-o 64-bit x86-64
0000000065a02667      d  *UND* /private/tmp/pp/a.o
$ date -r 0x0000000065a02667
Thu Jan 11 12:33:27 EST 2024
$ stat a.o
16777220 270066454 -rw-r--r-- 1 cherryyz wheel 0 1856 "Jan 11 12:37:37 2024" "Jan 11 12:33:27 2024" "Jan 11 12:33:27 2024" "Jan 11 12:33:27 2024" 4096 8 0 a.o

It it actually the timestamp of the .o file (!)...

I guess one way to work around it is to zero the timestamps of the .o files before feeding it to the C linker...

Comment From: bcmills

Maybe try setting ZERO_AR_DATE=1 in the environment? (Just a guess based on https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L4513.)

Comment From: prattmic

Should we add skips for these tests while we figure out how to work around this?

Comment From: thanm

Should we add skips for these tests while we figure out how to work around this?

Probably a good idea. I sent a CL (565376).

Comment From: gopherbot

Change https://go.dev/cl/565376 mentions this issue: cmd/go/testdata/script: add darwin skips for selected buildrepro tests

Comment From: bcmills

See previously: - #58557

Comment From: gopherbot

Change https://go.dev/cl/584937 mentions this issue: cmd/go/testdata/script: disable build_plugin_reproducible on darwin

Comment From: matloob

@thanm Should we backport CL 565376 to Go 1.22? The Go 1.22 darwin/amd64 builders seem to have been broken for a while.

Comment From: dmitshur

Please do. Given there isn't a better fix available, let's do that to get GOOS=darwin builders passing on release branches. Thanks.

Comment From: matloob

@gopherbot please create a backport issue for Go 1.22

Comment From: gopherbot

Backport issue(s) opened: #67314 (for 1.22).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

Comment From: dmitshur

Comment https://github.com/golang/go/issues/64947#issuecomment-1887778630 suggested trying to set ZERO_AR_DATE=1 in the environment. I gave it a shot, and it in fact worked for me locally with the Xcode version I had installed: the test was passing with that change, and failing without it. However, it turned out not to be enough to get the test passing on the LUCI builder (perhaps because of the difference in Xcode version); see CL 574895 and the trybot run on it.

So ZERO_AR_DATE may still be relevant, but it appears not to be enough on its own.

Comment From: gopherbot

Change https://go.dev/cl/584238 mentions this issue: [release-branch.go1.22] cmd/go/testdata/script: add darwin skips for selected buildrepro tests

Comment From: gopherbot

Change https://go.dev/cl/585356 mentions this issue: cmd/go/testdata/script: turn back on build_plugin_reproducible script test

Comment From: gopherbot

Change https://go.dev/cl/585355 mentions this issue: cmd/link/internal/ld: clean tmpdir obj timestamps

Comment From: thanm

OK, latest go-round: following CL 585355 if I run this sequence of commands (essentially the guts of the plugin repro test) on the gotip-darwin-amd64-longtest gomote (with Xcode 15E204a):

rm -f a.so b.so

# First build
rm -rf /tmp/tmp
mkdir /tmp/tmp
go build -trimpath -buildvcs=false -buildmode=plugin -o a.so \
      -ldflags=-tmpdir=/tmp/tmp main.go 1> err1.txt 2>&1

# Second build
rm -rf /tmp/tmp
mkdir /tmp/tmp
go build -trimpath -buildvcs=false -buildmode=plugin -o b.so \
   -ldflags=-tmpdir=/tmp/tmp main.go 1> err2.txt 2>&1

# Compare
cmp a.so b.so

I can execute this 50, 100, 200 times and it works every time. If I remove the -ldflags=-tmpdir=/tmp/tmp flag from the build, then the compare works most of the time but fails maybe once every 10 or 20 iterations.

So: working theory: external linker seems to sometimes (not always, who knows why) take into account the path to the object files being fed it when it generates the UUID.

At this point I am now leaning towards falling back on Cherry's second suggestions: run the external linker and then just stomp on the uuid in the generated binary.

Comment From: gopherbot

Change https://go.dev/cl/586079 mentions this issue: cmd/link/internal/ld: rewrite LC_UUID for darwin external links