Proposal Details

The issue

Go compiler / linker packs tightly global variables, so multiple global variables can be put into a single cache line and result in unexpected false sharing issues in production like in this case.

The solution

To put each global variable into its own cache line by aligning global vars by CPU cache line size.

This may significantly increases the size of the memory region with big number of global vars. For example, if the application has a 1000 distinct global variables of type uint64, then aligning them to CPU cache line size at GOARCH=amd64 (64 bytes usually) will increase the global vars memory region size from 8KB to 64KB. This is OK in most cases, since typical programs have less than 1000 global vars (because every global var needs to be manually defined in the source code and this is not what software engineers like to do).

The workaround

To manually add padding to global variables if they are affected by false sharing issues. This is fragile approach, since newly added global variables may lead to false sharing issues, and this cannot be controlled easily in third-party dependencies.

Comment From: rittneje

This is OK in most cases, since typical programs have less than 1000 global vars (because every global var needs to be manually defined in the source code and this is not what software engineers like to do).

Global variables may also be defined via code generation templates. (For example, the gRPC code generator results in 6 global variables per proto file.)

In addition, our own moderately sized code base does in fact contain 1000+ global variables, mostly because we have things that are essentially constants but cannot be declared as const.

Thus I don't think this line of reasoning alone is sufficient.

I wonder if this is something pgo could help with?

Comment From: adonovan

I wonder whether restricting this to just variables that are (or contain) sync.Mutex or atomic.Counter or a handful of other types that invite concurrent mutation might deliver most of the benefit.

Comment From: randall77

Or maybe just global variables that are written outside of init functions?

Comment From: adonovan

Or maybe just global variables that are written outside of init functions?

That's a good metric too, so long as you include "address-taken" as a form of write.

Comment From: database64128

Or you can just embed x/sys/cpu.CacheLinePad in contended global variables?