Consider the following program:

package main

var f func(int)

//go:nosplit
func x(y int) { f(y+1) }

func main() {
  f = x
  f(0)
}

When run, this program segfaults. This is because taking the address of a nosplit function completely avoids the linker's nosplit stack size check, meaning that this program very quickly exausts g0's stack and does a hard segfault.

All that this program prints is signal: segmentation fault, and occasionally random debugging spew before the runtime kills itself.

I think this is a good case for making using nosplit require import "unsafe", because I can't see another way to fix this without violating the "calling a nosplit functions does not preempt" invariant.

Comment From: Jorropo

We can keep nosplit as is and make grabbing function pointers of nosplit functions disallowed, or make that require unsafe if it can be used in some way.

Comment From: mcy

If you want to go that direction it also needs to cover making a value with a nosplit method into an interface, because that will silently turn all of its methods into function pointers, no matter the interface. And that seems like it will be very messy.

No matter the fix it's a breaking change, and I think it's hard to argue that this isn't covered by compatibility requirements... so probably means a GOEXPERIMENT?

Comment From: randall77

Then don't use //go:nosplit? Problem solved.

(It is a directive intended for runtime internal functions that can't be preempted - unfortunately it wasn't implemented as runtime-only at the start, so now it has escaped to user code and we get problems like this...)

Comment From: Jorropo

If we solve this by requiring users to import unsafe I feel we are missing out on the compiler pointing out mistakes. My limited understanding is that virtual nosplit doesn't work, and I don't see why the compiler should let you do that even if you import unsafe.

so probably means a GOEXPERIMENT?

To disable it ? To turn it back on ?

Comment From: mcy

Then don't use //go:nosplit? Problem solved.

Sure, but it's an attractive nuisance (because it can measurably improve performance) that's documented in go help, so it's kind of too late to just tell people not to hit themselves at this point... :(

My limited understanding is that virtual nosplit doesn't work,

Calling a virtual nosplit function from a regular function is always safe. The problem is calling enough of them to blow up the nosplit stack.

Note also that leaf functions with a small enough stack frame are automatically marked as nosplit, and those can be turned into function pointers too.

Comment From: ianlancetaylor

What if we just give a compiler error when converting a nosplit function value to a split function value? Or, we could permit that in the runtime package if necessary, but give an error elsewhere.

Comment From: mcy

Causing existing code that currently builds to stop building seems not great, especially because this also affects nosplit methods if the receiver is cast to an interface.

If we wanted to go that route, we could simply make nosplit not do anything if -+ isn't set. Which, honestly, I wouldn't mind, because I would just build my package with -+, which you already have to do if you want to do other naughty performance things. Seems better to just put it behind the "if it breaks you can keep both pieces" flag.

The only purpose nosplit really serves in user code is to avoid evicting cache lines due to needing to load from the g pointer. If people are doing it to lock the current p they're already playing with fire. While testing, I discovered that async preemption considers sync/atomic.AddInt64's instructions a safe point! (Seems like a bug?)

Comment From: ianlancetaylor

Do you have examples of existing user code that is correctly using nosplit and that would fail to build with that compiler error?

I'll note that another thing we can reasonably do is nothing. The code in the original post on this bug is clearly misusing nosplit, and the behavior is consistent with the documentation. I don't see why that suggests that we should require import "unsafe". Perhaps we should just document that nosplit breaks stack copying.

Comment From: mcy

Do you have examples of existing user code that is correctly using nosplit and that would fail to build with that compiler error?

Not open-source, but yes, it is correct. This bug is only possible when a nosplit function calls a virtual function which happens to be nosplit and the compiler does not devirtualize the call (so that the linker can do its stack size check); I only call virtual nosplits from a splitable function, which guarantees it has enough stack. I do care about keeping the nosplit, because loading the g pointer actually has a harmful cache impact for me, but not enough that I wouldn't build with -+ and accept the consequences.

and the behavior is consistent with the documentation

Go is a memory safe language. I have smashed the stack by using a publicly-documented language feature that does not explicitly say it will produce memory errors. Therefore the documentation is wrong or Go is as memory-safe as C++. I think the ideal amendment to the documentation (if we accept that nosplit is unsafe) should mention:

  1. A best-effort stack size check occurs in the linker.
  2. Code using nosplit may stop building in future versions due to the aforementioned check.
  3. Calling a nosplit function can livelock the runtime. (This is not related to this issue, but I do have a reliable repro of livelock by getting every P into an unpremptable spinloop.)

I assume go puts a guard page before every stack, but a carefully crafted version of this repro can definitely be used to decrement sp past the guard page.

Comment From: ianlancetaylor

Thanks. I think it's simpler to document that nosplit is not safe. We don't want to promise anything about nosplit behavior, as that would unnecessarily constrain future runtime development. It's unfortunate that we ever permitted functions outside of the runtime to use nosplit. I sent https://go.dev/cl/686115.

Comment From: gopherbot

Change https://go.dev/cl/686115 mentions this issue: cmd/compile: document that nosplit directive is unsafe

Comment From: randall77

because loading the g pointer actually has a harmful cache impact for me

What platform are you on? We haven't loaded G from memory for the stack check for amd64 and arm64 for quite a while.

Comment From: mcy

@randall77 x86_64-unknown-linux and aarch64-apple-darwin. Yes, you do load from the g pointer, you load g.stackguard0. This is enough to trigger a cache miss.

Tbh it may be worthwhile to burn a register in the ABI to hold the stack limit, because that would make virtually all code faster.

There is a recent post on my blog (mcyoung.xyz) with all my observations about nosplit. It includes benchmarks.

Comment From: randall77

@randall77 x86_64-unknown-linux and aarch64-apple-darwin. Yes, you do load from the g pointer, you load g.stackguard0. This is enough to trigger a cache miss.

Ah, you meant load from the G pointer, not loading the G pointer itself.

Tbh it may be worthwhile to burn a register in the ABI to hold the stack limit, because that would make virtually all code faster.

That would make synchronous preemption harder, because we need to update the stack guard to preempt goroutines. Updating memory is easier than updating the register of another thread. We have async preemption now also, so maybe that's not a showstopper.

I'm surprised you get lots of cache misses there. It should have crazy-good locality.

Comment From: mcy

@randall77 crazy good until all of your utilization for a long-running operation is inside of a function that only calls nosplits, at which point *g gets evicted from L1. It's a pretty rare case, but it happens.

Updating memory is easier than updating the register of another thread.

Oh, that's unfortunate. I didn't realize that the stackguard was (ab)used like that to force early preemption. I assumed it was only modified while a G was not running. I'm actually surprised that's still a thing given that async preemption seems to work so well.

Comment From: cherrymui

Making the stack bounds check faster (and shorter) has been discussed, including the idea of reserving a register for the stack guard, instead of the G. If we want to discuss more on this, it should probably be a seperate issue.

Another option for making nosplit safer while not breaking (most) existing code is that when the code takes the address of a nosplit function, the compiler generates a wrapper that does the stack bounds check, and has the function pointer point to the wrapper. This way, direct call of a nosplit function is unaffected. Indirect calls go through the wrapper, which checks the bounds before growing the stack. Not sure it is worth the complexity.