Golang runtime: some possible SwissTable map benchmark improvements

The current runtime map benchmarks (imported from the CockroachDB SwissTable repo and used for recent runtime map work) have clearly been useful, especially for comparing the optimizations that have happened so far, but it might be worth tweaking those benchmarks.

For example, for the current benchmarks:

Most might be overly friendly to the branch predictor with repetitive key patterns, especially at lower elem counts.
Most of the elem sizes are power-of-two, which translates to a relatively low load factor and might not make the tables "sweat" as much as they would in practice.
Most of the benchmarks use a mod operation in the benchmark code, which might be around 1/3 of the CPU time for the benchmarks that fit in cache.
- This is fine if someone has this in mind when comparing results, but might be cleaner without.
Could probably benefit from some cold map cases (e.g., so many small or medium maps that they become cold) to better assess cache miss impact.
- The original SwissTable folks and Folly F14 folks had collaborated on C++ benchmarks to help them compare approaches, with a sample of one of their cold map C++ benchmarks here. (A couple of years ago, I had a WIP set of cold map Go benchmarks here, partially based on that).

Part of my immediate interest is the current benchmarks might not capture some of the benefits / costs of some adjustments to the current implementation, like: * Trying 16-element group sizes (instead of the current 8), or * Alternative memory layouts (like putting all the control bytes for a single table together, rather than current interspersing of control bytes with the slots)

In particular, I have a draft change to the runtime SwissTable for 16-element group sizes (currently SWAR-only, not SIMD), but I suspect some of the benefit would be masked by the current benchmarks (maybe -- I don't know without tweaking the benchmarks 😅).

Separately, the benchmarks as they are might not be showing the advantages of the new implementation as clearly as they could compared to the old runtime map (including because "easier" benchmarks are probably making the prior map implementation look artificially better, vs. "easy" benchmarks not helping the SwissTable implementation as much).

All that said, maybe some of those concerns don't matter in practice or maybe I'm mistaken.

Filing this issue to help with any discussion (including perhaps any feedback of "not worth it").

I plan on sending some benchmark CLs to attempt to improve or mitigate above issues.

CC @prattmic

Comment From: gabyhelp

Related Issues

runtime: use SwissTable #54766

Related Code Changes

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: gopherbot

Change https://go.dev/cl/634396 mentions this issue: internal/runtime/maps: speed up small map lookups ~1.7x for unpredictable keys

Comment From: gopherbot

Change https://go.dev/cl/634698 mentions this issue: runtime: remove somewhat expensive mod from map benchmarks

Comment From: gopherbot

Change https://go.dev/cl/634700 mentions this issue: runtime: add hot key lookup benchmark for maps

Comment From: gopherbot

Change https://go.dev/cl/634699 mentions this issue: runtime: shuffle keys to reduce impact of branch predictor in map benchmarks