Conservative scanning filters out invalid pointers by relying on the span lookup mechanics (spanOf), and double-checking that the pointer points within that span's bounds, since the span table can contain stale entries.

Unfortunately, the upper bound on the span is set very late for heap spans. It is not set in allocSpan before the atomic write to the span state, thereby publishing it to the garbage collector, but is instead written far later, either in mcache.allocLarge or mcentral.grow. It's also not cleared by mspan.init which would also be fine.

If any one of these things were true, the bug would have been avoided: 1. mspan.init sets or clears limit. 2. mspan.allocSpan sets limit explicitly in the heap span case, just like the other cases. 3. On span free, we clear the span table.

I propose we do (1) and (2). My proposed semantics are that mspan.init always sets limit to a safe value, but this value is later adjusted down by allocSpan and allocLarge respectively. mcentral.grow should not be setting limit because it's way too late. mspan.init does not have enough information on its own to set right size, but for small object spans we need to set the limit before returning from allocSpan, since part of the span is not valid space for objects, but is instead only for heap metadata. However, we do need to allow allocLarge to adjust the value, because that's the only place we have the true size of the object. Luckily, it's fairly safe if a lookup tries to check the 'dead' part of a large object span.

Once upon a time I had a change for (3), which was necessary for some experimental performance changes, and it didn't seem to cost much extra to clear the span table entries, if anything at all. But perhaps let's leave that alone for now, since there is a performance risk.

AFAICT, this bug has existed since Go 1.14 when conservative scanning was introduced. mspan.limit was essentially always just used for debug checks before then, but in Go 1.14 it became load-bearing.

This bug is filed on behalf of a crash found within Google.

Comment From: gopherbot

Change https://go.dev/cl/682655 mentions this issue: runtime: set mspan limit field early and eagerly

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: mknyszek

@gopherbot Please open backport issues for Go 1.23 and Go 1.24.

This problem can cause rare random crashes in the runtime with no workaround.

Comment From: gopherbot

Backport issue(s) opened: #74289 (for 1.23), #74290 (for 1.24).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.