The Go runtime passes cgoSymbolizer a pointer to a structure to write results. This struct is usually stack allocated. When calling in a loop from a single goroutine (e.g., via runtime.CallersFrames
), the pointer is usually the same across calls. If the goroutine migrates threads, then calls from different threads will pass the same pointer argument.
When the cgo build is using TSAN (-fsanitize=thread
, not Go's -race
), these calls from different threads will report a false race similar to:
==================
WARNING: ThreadSanitizer: data race (pid=31770)
Read of size 8 at 0x00c00005dc60 by thread T5:
#0 tcSymbolizer <null> (tsan_tracebackctxt+0x569499) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#1 runtime.asmcgocall /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:921 (tsan_tracebackctxt+0x543b63) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
Previous write of size 8 at 0x00c00005dc60 by thread T3:
#0 tcSymbolizer <null> (tsan_tracebackctxt+0x5694d2) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#1 runtime.asmcgocall /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:921 (tsan_tracebackctxt+0x543b63) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
Thread T5 (tid=31776, running) created by main thread at:
#0 pthread_create <null> (tsan_tracebackctxt+0x423d3b) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#1 _cgo_try_pthread_create <null> (tsan_tracebackctxt+0x569d67) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#2 _cgo_sys_thread_start <null> (tsan_tracebackctxt+0x56a08a) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#3 x_cgo_thread_start <null> (tsan_tracebackctxt+0x56ab7e) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#4 runtime.asmcgocall /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:949 (tsan_tracebackctxt+0x543b9c) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
Thread T3 (tid=31774, running) created by main thread at:
#0 pthread_create <null> (tsan_tracebackctxt+0x423d3b) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#1 _cgo_try_pthread_create <null> (tsan_tracebackctxt+0x569d67) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#2 _cgo_sys_thread_start <null> (tsan_tracebackctxt+0x56a08a) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#3 x_cgo_thread_start <null> (tsan_tracebackctxt+0x56ab7e) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
#4 runtime.asmcgocall /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:949 (tsan_tracebackctxt+0x543b9c) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a)
SUMMARY: ThreadSanitizer: data race (/tmp/TestTSANtsan_tracebackctxt4073566884/001/tsan_tracebackctxt+0x569499) (BuildId: cb4109606cb33da489c0ec2a2bd9e97e6d4b3b6a) in tcSymbolizer
==================
https://go.dev/cl/677955 contains a reproducer test case.
To avoid similar races, cmd/cgo wraps normal cgo calls in _cgo_tsan_acquire
/ _cgo_tsan_release
. The runtime does the same when calling cgoTraceback
.
We should do the same for cgoSymbolizer
.
Comment From: gabyhelp
Related Issues
- cmd/cgo: ThreadSanitizer barks about data races in all.bash #14602 (closed)
- runtime: GC from cgo callbacks is broken #5899 (closed)
- gccgo: TestShared/tsan_shared fails with GCC 5.5.0 #24046
- misc/cgo/testsanitizers: ThreadSantitizer test fails #14902 (closed)
- race: Thread Sanitizer breaks sequential consistency semantics #58020 (closed)
- runtime/race: sigsegv when using cgo callbacks with the race detector enabled #10874 (closed)
- runtime/race: document and fix "trap when race detected" #23577
Related Code Changes
- runtime/cgo: add TSAN locks around mmap call
- runtime: don't inspect the stack for delayed signals from TSAN
- runtime: never call into race detector with retaken P
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: prattmic
The runtime does the same when calling
cgoTraceback
.
This is the call path for the signal handler.
Standard traceback also calls cgoTraceback
via cgoContextPCs
.
I believe this case has the same race problem as cgoSymbolizer
/ callCgoSymbolizer
. The big difference is that cgoContextPCs
appears to always be called on the system stack. That means that in order for two threads to see the same argument pointer, you would need to do:
- M1 calls
cgoTraceback
, passing an address from its system stack. - M1 exits, freeing the system stack.
- M2 starts, allocates the exact same system stack top that M1 had.
- M2 calls
cgoTraceback
, passing the same address that M1 passed.
This is going to be much more difficult to run into randomly.
Comment From: prattmic
I think the cleanest approach forward is for runtime/cgo
to register _cgo_tsan_acquire
/ _cgo_tsan_release
with the runtime (e.g., runtime.x_cgo_tsan_acquire
), which would allow the runtime to directly call acquire/release when necessary in callCgoSymbolizer
, etc.
I believe doing this may even allow us to remove the TSAN wrappers from cmd/cgo
and move that logic directly into runtime.cgocall
. This wouldn't be necessary, it may just be a nice cleanup. That said, this would not work if the user cgo code was built with TSAN, but runtime/cgo
was not. It's not clear to me whether this is supported or not. It seems like it shouldn't work (I think runtime/cgo
would report false races), but I'm not seeing immediate issues in my one-off tests that drop -installsuffix=tsan
from go build
. @ianlancetaylor do you happen to know if this is supported?
Comment From: gopherbot
Change https://go.dev/cl/677955 mentions this issue: runtime: acquire/release C TSAN lock when calling cgo symbolizer/tracebacker
Comment From: ianlancetaylor
When people are using C TSAN, we expect them to build with -fsanitize=thread
in CGO_CFLAGS
, so runtime/cgo should be built with that option. So I agree that both user code and runtime/cgo will be built with TSAN, and we don't need to worry about the case where that does not happen. And I agree that it should be possible to move the TSAN calls into runtime.cgocall
and runtime.cgocallback
.