Proposal Details
Background
Today, the Go standard library does not have a clear boundary between what is part of the gc-specific runtime and what is portable code. Some parts clearly are (the runtime
package itself), and many parts clearly aren’t (encoding
), but other packages are in a gray area.
The maphash
package exemplifies this gray area. Currently, maphash has two implementations: one that is deeply tied to the Go "gc" runtime and uses the exact hash functions constructed by the compiler for use in built-in maps, and another "purego" version that reimplements hashing using reflection and a Go implementation of wyhash. We created both so that maphash would work unmodified with non-gc runtimes such as TinyGo and GopherJS.
Maphash is conceptually clearly part of the runtime, since it exports map hashing. But it is technically possible albeit awkward and slow to implement the letter of its spec in portable code.
Unfortunately, the portable implementation causes problems with the dependency graph. The non-portable version lies very low in the package dependency graph and depends on only 8 packages (go list -deps hash/maphash
), all of which are extremely low level. The portable version depends on 87 packages (go list -tags purego -deps hash/maphash
), mostly because it depends on crypto/rand
to generate seeds and reflect
to walk types.
As a result, the presence of the purego maphash means we cannot use maphash in other low-level packages such as internal/sync
and unique
. Even in higher level packages, this introduces tricky dependency issues.
Proposal
I propose we declare maphash to be "part of the runtime" and drop the purego implementation.
As part of this, I propose we work to be more disciplined about what parts of std are part of the runtime. For maphash, we would maintain the current separation between the pure logic in one file and the gc-specific glue code in another file. The gc-specific file would be build tagged with the “gc” build tag to clearly indicate which parts are specific to gc. This would hopefully make it easy for other Go implementations to use the package as-is and simply layer in their own file of glue code.
This would be a reversal of #47342.
Effect on other Go implementations
I believe TinyGo currently uses the purego implementation of maphash, but it has the necessary runtime functions for the gc version (rand and memhash). It uses a different type layout, which makes Comparable
tricky, but does have the basic mechanism to support this.
GopherJS has its own implementation of maphash based on the Go 1.19 gc version. At the time, it needed to make significant modifications to avoid non-portable uses of unsafe. Since then, all of this code has been factored out of the main logic. It does note that this version could be dropped in favor of the purego version of maphash, but GopherJS does not require the full strictness of purego. Notably, it can and does use the GopherJS runtime random generator.
Alternatives
We could reduce the pressure on the dependency graph by rewriting the purego maphash to use math/rand/v2
(which itself is "part of the runtime" since it uses runtime.rand
) and internal/reflectlite
(also clearly "part of the runtime"). I believe that would reduce the number of packages to 31 (go list -deps math/rand/v2 internal/reflectlite hash/maphash
). However, I believe math/rand/v2
is not currently usable in TinyGo or GopherJS, so this would still cause churn for them. I believe both implement internal/reflectlite
. This approach may still limit the use of maphash in low-level packages, though to a far lesser extent, and would not address the more general problem of clearly isolating toolchain-specific code.
Comment From: gabyhelp
Related Issues
- hash/maphash: Provide a `purego` implementation #47342 (closed)
- hash/maphash: purego implementation is not updated #69940 (closed)
- hash/maphash: Document whether used hash algorithm is stable across future versions #37040 (closed)
Related Code Changes
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: aclements
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — aclements for the proposal review group
Comment From: bjorndm
Apart from maphash. I think it is a great idea to give the Go standard library a clear boundary between what is part of the gc-specific runtime and what is portable code. This should make non gc-ports a lot easier to build and maintain.
Comment From: aclements
CC @nevkontakte and @flimzy for GopherJS and @aykevl and @deadprogram for TinyGo.
Comment From: aykevl
Thank you @aclements for tagging me!
I believe TinyGo currently uses the purego implementation of maphash, but it has the necessary runtime functions for the gc version (rand and memhash).
We use the portable version only because we have to set the purego build tag to get crypto packages to build (build tags can't be specified per package). I don't see anything in the package that wouldn't work if we used maphash_runtime.go instead (so if maphash_portable.go was removed and the //go:build !purego
line was removed from maphash_runtime.go).
For maphash, we would maintain the current separation between the pure logic in one file and the gc-specific glue code in another file. The gc-specific file would be build tagged with the “gc” build tag to clearly indicate which parts are specific to gc.
Actually, for TinyGo, it would be much easier to support maphash by using the //go:linkname
lines like those in maphash_runtime.go. In fact, it would likely mean we don't need to do anything to keep supporting maphash. In that case, the contract/ABI between the maphash package and the runtime package would be that //go:linkname
line and that is fine by me.
(Also, it looks like we already set the gc
build tag. I think because that's the default in go list
).
This would hopefully make it easy for other Go implementations to use the package as-is and simply layer in their own file of glue code.
I know that GopherJS does this, but in TinyGo we have been using packages as a whole. So either it's from the standard library, or we replace the package entirely with our own version. This makes changes across Go versions easier to manage since meddling in the internals of a package is much more fragile than relying on what is explicitly exported from a package. So I'd prefer if we didn't have to start adding a specific file just for this package - a //go:linkname
to be implemented in the runtime is a much smaller interface. (I don't know about //go:linkname
support in GopherJS though).
Comment From: nevkontakte
Hi! Speaking on behalf of GopherJS, I believe the use of unsafe was the original problem for us, yes. We support //go:linkname
but, for example, we can't support free-form pointer arithmetic or assembly, simply because JS runtime wouldn't let us. For that reason, the purego version was important to us.
So the impact of this proposal really depends on what would be considered "runtime glue". If it is just a matter of runtime.rand
— that's not a problem we can substitute it easily enough. If a large chunk of the algorithm would be declared "runtime", that's a bigger problem. So if you could provide more detail, it would be easier to assess the impact.
In the hindsight, I suspect that the original mistake was making maphash
a part of the public API. It seems like something that's inherently runtime-specific (different runtimes can use different map implementations), and even gc
's implementation can change over time. Alas, this isn't something we can take back at this point.
Comment From: aclements
@aykevl , thanks!
@nevkontakte , to be concrete, this is what I'm proposing: https://go.dev/cl/689876
Comment From: gopherbot
Change https://go.dev/cl/689876 mentions this issue: hash/maphash: drop purego implementation
Comment From: nevkontakte
@aclements thanks, looking at the CL, it pretty much drops purego implementation without any changes to the interface.
As a result, the presence of the purego maphash means we cannot use maphash in other low-level packages such as internal/sync and unique. Even in higher level packages, this introduces tricky dependency issues.
Reading between the lines, it means you are planning to use maphash in low-level packages. For GopherJS it means that we can't just copy the purego implementation into our source tree and maintain it internally, because that would only push the dependency issue on our side, with much less we could do about it.
So, if we were to proceed in this direction, the only options for GopherJS would be to implement the hashing logic in pure JS, which is our own counterpart to the unsafe. We could do it, but it would significantly increase complexity of the port and maintenance cost.
Comment From: nevkontakte
I think there may be an alternative that may be a reasonable compromise for both sides.
If you look at https://go-review.googlesource.com/c/go/+/468795 that originally added a purego implementation, its additional dependencies were pretty minimal. In fact, both could be reduced to zero in a way that wouldn't be a problem for today's GopherJS:
- Replace
crypto/rand
with a go:linkname toruntime.rand
. - Copy
bits.Mul64(a, b)
implementation into the maphash package.
Looking at the modern day implementation, it grew a few more imports, most of them due to the generic https://pkg.go.dev/hash/maphash#Comparable function. I suspect that reflect
is the most problematic of them, and hardest to get rid of.
So, the compromise option could be the introduction of internal/maphash
package, which would provide the minimal functionality required by low-level packages, and be used by hash/maphash
for the core algorithm, in purego and unsafe variants. Then hash/maphash
would provide all the bells and whistles one might desire, without having to worry about the dependencies too much.
As a side benefit, that would establish a more formal interface that isolates runtime-specific hash algorithm from the public API built on top of it. Would that work for you?
Comment From: aclements
looking at the CL, it pretty much drops purego implementation without any changes to the interface. Reading between the lines, it means you are planning to use maphash in low-level packages.
Both of these are correct.
So, if we were to proceed in this direction, the only options for GopherJS would be to implement the hashing logic in pure JS, which is our own counterpart to the unsafe.
How does GopherJS implement Go maps today? Doesn't it already have to have a hashing implementation that's compatible with Go's definition of equality?
Replace crypto/rand with a go:linkname to runtime.rand. Copy bits.Mul64(a, b) implementation into the maphash package. So, the compromise option could be the introduction of internal/maphash package, which would provide the minimal functionality required by low-level packages, and be used by hash/maphash for the core algorithm, in purego and unsafe variants.
@nevkontakte , thanks for looking into the history. I'd be fine with these changes. The reflect
dependency is harder, though we may be able to switch to internal/reflectlite
, which is much lower in the package DAG. The linkname to runtime.rand
would mean this would still be tied to the runtime, though presumably in a way that's much easier to port.
Comment From: nevkontakte
How does GopherJS implement Go maps today? Doesn't it already have to have a hashing implementation that's compatible with Go's definition of equality?
We use JS Map under the hood for better performance. We do a little bit of work to generate key strings, but we don't hash them explicitly.
The linkname to runtime.rand would mean this would still be tied to the runtime, though presumably in a way that's much easier to port.
Yes. JS runtime provides access to random generators, so implementing runtime.rand
is pretty easy. On the other hand, it doesn't expose any analog for maphash
, so we would have to maintain our own.
though we may be able to switch to
internal/reflectlite
GopherJS implements reflectlite, so using it should be ok for us.
Comment From: aclements
Thanks @nevkontakte ! I hadn't appreciated that GopherJS didn't implement its own hashing at all.
Given that GopherJS is, as far as I can tell, the sole consumer of the reflection-based pure Go hashing implementation, how about if GopherJS just takes that over? We can arrange the maphash package so it's easy for GopherJS to swap in the reflection-based hasher without that having to live in the gc implementation.