The current runtime map benchmarks (imported from the CockroachDB SwissTable repo and used for recent runtime map work) have clearly been useful, especially for comparing the optimizations that have happened so far, but it might be worth tweaking those benchmarks.
For example, for the current benchmarks:
- Most might be overly friendly to the branch predictor with repetitive key patterns, especially at lower elem counts.
- Most of the elem sizes are power-of-two, which translates to a relatively low load factor and might not make the tables "sweat" as much as they would in practice.
- Most of the benchmarks use a mod operation in the benchmark code, which might be around 1/3 of the CPU time for the benchmarks that fit in cache.
- This is fine if someone has this in mind when comparing results, but might be cleaner without.
- Could probably benefit from some cold map cases (e.g., so many small or medium maps that they become cold) to better assess cache miss impact.
Part of my immediate interest is the current benchmarks might not capture some of the benefits / costs of some adjustments to the current implementation, like: * Trying 16-element group sizes (instead of the current 8), or * Alternative memory layouts (like putting all the control bytes for a single table together, rather than current interspersing of control bytes with the slots)
In particular, I have a draft change to the runtime SwissTable for 16-element group sizes (currently SWAR-only, not SIMD), but I suspect some of the benefit would be masked by the current benchmarks (maybe -- I don't know without tweaking the benchmarks 😅).
Separately, the benchmarks as they are might not be showing the advantages of the new implementation as clearly as they could compared to the old runtime map (including because "easier" benchmarks are probably making the prior map implementation look artificially better, vs. "easy" benchmarks not helping the SwissTable implementation as much).
All that said, maybe some of those concerns don't matter in practice or maybe I'm mistaken.
Filing this issue to help with any discussion (including perhaps any feedback of "not worth it").
I plan on sending some benchmark CLs to attempt to improve or mitigate above issues.
CC @prattmic
Comment From: gabyhelp
Related Issues
Related Code Changes
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: gopherbot
Change https://go.dev/cl/634396 mentions this issue: internal/runtime/maps: speed up small map lookups ~1.7x for unpredictable keys
Comment From: gopherbot
Change https://go.dev/cl/634698 mentions this issue: runtime: remove somewhat expensive mod from map benchmarks
Comment From: gopherbot
Change https://go.dev/cl/634700 mentions this issue: runtime: add hot key lookup benchmark for maps
Comment From: gopherbot
Change https://go.dev/cl/634699 mentions this issue: runtime: shuffle keys to reduce impact of branch predictor in map benchmarks