As the number of Go implementations continues to increase, the number of cases where the unsafe
package is unlikely to work properly also rises. Currently, there is appengine
, gopherjs
, and possibly wasm
where pointer arithmetic is not allowed.
Currently, protobuf and other packages special cases build tags for appengine
and js
and may need to add others in the near future. It does not scale very well to blacklist specific known Go implementations where unsafe does not work.
My proposal is to document safe
as a community agreed upon tag meaning that unsafe
should not be used. It is conceivable that this concept be extended to the compiler rejecting programs that use unsafe
when the safe
tag is set, but I'm currently more interested as a library owner in knowing whether to avoid unsafe
in my own packages.
\cc @zombiezen @dneil @neelance @shurcooL
Comment From: dsnet
The code history of protobuf seems to indicate that this very same concept was discussed but not pursued further. I'd like to push this more since I see this distinction in at least 2 packages I own.
https://github.com/golang/protobuf/issues/154
Comment From: mdempsky
I think standardizing a build tag to indicate whether package unsafe is available makes sense.
It is conceivable that this concept be extended to the compiler rejecting programs that use unsafe when the safe tag is set
I disagree. Currently, build tags are strictly a build-system concept. I'd argue the compiler should remain ignorant of them. cmd/compile already has a -u
flag that prevents importing package unsafe, and the build system can arrange to pass -u
as appropriate.
Comment From: dmitshur
/cc @bradfitz who also ran into this with go4.org/reflectutil
, and seemed to like "safe" at the time.
Comment From: bradfitz
@shurcooL, I'm still fine with "safe", as long as it's defined (i.e. "code that doesn't import "unsafe"
).
But does it also mean no assembly?
Those are the sorts of things that should be clarified, if this is to be blessed somehow. (our own use, wiki page, etc)
Comment From: dmitshur
@dsnet Can you clarify if your proposal is about documenting safe
to have a very specific meaning and applied to all packages?
Or is it about documenting the fact that safe
is a commonly used build tag for a given purpose, but individual projects still have get final say on the exact meaning of the safe
build tag for their own needs?
Comment From: flimzy
Would this proposal be codified in the standard library somehow, perhaps by adding a !safe
build tag to the unsafe
package? Or would it live purely in documentation?
Comment From: davecb
In a previous life, we had to identify versions of libraries, and rapidly found out that that was too coarse a measure. We eventually attached a label via the linker to each entry point*, and could tell if, for example, a call to memcpy allowed overap or not. '
We also used it in migration work, to identify parts of programs that could not be supported on a different OS or hardware platform.
You arguably should consider labelling parts of the unsafe library with supported and unsupported by target OS, language or whatever, not the whole library if only one operation is unavailable.
--dave [* a description of using per-entry-point labels for a different purposes is at https://leaflessca.wordpress.com/2017/02/12/dll-hell-and-avoiding-an-np-complete-problem/ ]
Comment From: andlabs
Would this unsafe
build tag also affect code that uses cgo? SWIG?
How would this build tag interact with the standard library, where both are used a lot? Does no unsafe
mean no reflect
as well?
What is the unsafe
policy on nacl
? is there anything in nacl
that we could use for this?
@shurcooL it sounds like the latter at minimum, the former ideally.
Comment From: dsnet
I propose that "safe" be soft signal that a library should have memory safety (i.e., makes no assumptions about how objects are laid out in memory, the architecture endianess, semantics regarding registers, etc).
Thus, the "safe" tag has the following properties:
* This is just a hint for library authors who want to write code that is highly portable. There is no logic in the compiler or the build tool to enforce this.
* Thus, the standard library doesn't have to use it. Portability is achieve by either requiring a fork of the standard library (as is the case for gopherjs) or via a series of build tags as the mainline standard library does for various architectures.
* Use of reflect is allowed since it doesn't allow you to violate memory safety.
* Use of cgo is allowed. We already have a build tag for that, which is cgo
. In practice, safe
implies that cgo is not used since it is difficult to use cgo without pointers (for which you need unsafe).
* Use of assembly is forbidden. Any reasonable use of assembly makes assumption about how memory is laid out on the stack and/or heap.
Thus, appengine
and gopherjs
are example toolchains that would always set the safe
tag.
Comment From: rsc
Based on discussion with proposal-review:
- It seems reasonable for appengine, gopherjs, and wasm, all of which declare their own "restricted build" tag, to agree on a common one.
- It should mean no asm, no cgo, no unsafe. (It's impossible to use cgo without unsafe.)
- Nothing in the standard distribution would care; this is really about coordination between these other non-standard environments. Where do you propose to document this?
- A better name than "safe" would be nice. "purego"?
Comment From: neelance
On WebAssembly: The wasm backend that I'm working on uses a linear memory, so unsafe
is fully supported. It also has its own asm instructions, just like other architectures. Cgo is not supported (unless someone wants to do a crazy integration with emscripten).
Comment From: cznic
A better name than "safe" would be nice. "purego"?
:+1: for purego
, already using it in my projects for some years.
Comment From: dsnet
I support purego
as well. Even if WebAssembly supports unsafe
(which is great to hear!), there is always still the use case where someone wants to compile with a pure-Go version for a variety of reasons.
I don't have any great suggestions for where to document this, but perhaps the godoc for go/build
?
Comment From: dmitshur
I wanted to point out that in colloquial usage, I've seen "pure Go" most commonly refer to packages that don't use cgo (but can use unsafe
, assembly). Seeing it mean "no unsafe and no assembly as well" would require some calibration. But maybe it's fine.
The math/big
package contains some precedent on this: it defines a math_big_pure_go
tag, which is being used as proposed here (no assembly, no unsafe, no cgo).
Comment From: rsc
OK, purego
it is.
Comment From: gopherbot
Change https://golang.org/cl/103239 mentions this issue: go/build: document purego convention
Comment From: cristaloleg
Looks like the patch is still not merged, there is 1 small suggestion, kindly ping @dsnet 👀
Comment From: cristaloleg
@rsc can this be accepted and merged with #41184 ? So the new //go:build
will consolidate community on 1 safety oriented tag (which is purego
based on this issue). Thanks.
Comment From: cespare
It's unfortunate that the best reference for this convention is still this issue, since in the intervening four years no documentation change has been merged.
Additionally, I have found during this time that, in practice, I cannot use the purego
tag for its intended purpose in our internal codebase at my company. The reason is that we use go-cmp. Some of go-cmp's core functionality is unsafe. That functionality is now behind a purego
build tag. Enabling the purego
tag makes most of our tests panic. We have packages with asm as well as pure-Go implementations, and we often want to run automated tests of both code paths, but we cannot run the tests with purego
, so we end up using a different build tag to indicate "Go rather than assembly".
I'm not even sure what the right fix is. Maybe the ideal outcome would be that something in the Go standard library (testing? reflect?) would allow go-cmp to do what it needs to do without unsafe. But for now, the existence and popularity of go-cmp kind of "infects" the purego
tag and makes it a not-very-useful convention.
Comment From: zephyrtronium
I'm not even sure what the right fix is. Maybe the ideal outcome would be that something in the Go standard library (testing? reflect?) would allow go-cmp to do what it needs to do without unsafe.
Perhaps also relevant: #45200
Comment From: kortschak
I'd like to clarify, based on discussion here, whether this should even be a thing. I know that people want it, but it looks like in that discussion it isn't being considered as having any great importance.
Comment From: gopherbot
Change https://go.dev/cl/561935 mentions this issue: crypto: use and test purego tag consistently
Comment From: aykevl
I'm working on TinyGo, and combining these three concepts together seems like a bad idea to me (no assembly, no cgo, and no unsafe).
TinyGo:
* Supports unsafe, though it has a slightly different memory layout for some types. It should still be possible to write portable unsafe code in most cases (e.g. unsafe.String
was a great addition for this).
* Supports CGo, though it has some features missing compared to the main Go toolchain. These missing features can probably be implemented when needed.
* Does not support Go assembly. I've tried it, and the only way I got it to work was one giant hack.
Furthermore, there is already a perfectly good tag for CGo support: the cgo
build tag. I don't think we need a new build tag that also says something about CGo support.
I see there are some systems (like appengine) where unsafe is not allowed, I would suggest using a different build tag for that than the one that controls Go assembly support.
Right now we set the purego
build tag by default to get crypto packages to work, but I don't really like it because it's not very clearly defined right now and I'd rather not limit things like unsafe
. A build tag like noasm
and a separate build tag like nounsafe
(or whatever) would be much better in my opinion.
Comment From: gopherbot
Change https://go.dev/cl/660136 mentions this issue: cmd/go: document purego convention
Comment From: FiloSottile
I think we might have decided this wrong. Banning all uses of unsafe under the same build tag as assembly is overly broad.
There are really at least three classes of unsafe: linknames, type conversions (especially now with unsafe.String), and pointer arithmetic. AFAICT only the last one is really non-portable.
On top of https://github.com/golang/go/issues/23172#issuecomment-1000544013 and https://github.com/golang/go/issues/23172#issuecomment-2000390548, which make compelling arguments, CL 657297 made me notice that a low-level package (hash/maphash) depends on crypto/rand under purego because it wanted to avoid a linkname to runtime.rand, while the ask in #47342 was to avoid pointer arithmetic.
Maybe we should rescope purego to only banning assembly and "non-portable" unsafe, leaving the cgo tag and CGO_ENABLED for cgo. What's non-portable unsafe is fuzzy, but after all non-gc implementations are always balancing downstream patches and upstream conveniences, so they will let us know like in #47342. (We certainly can't make the whole standard library build without unsafe.)
Comment From: seankhliao
If it's about documenting convention, there are a lot more examples of:
//go:build !purego
+ import "unsafe"
: 108: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.%21purego%2F+%2F%22unsafe%22%2F&type=code
vs
//go:build purego
+ import "unsafe"
: 14: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.%5B%5E%21%5Dpurego%2F+%2F%22unsafe%22%2F&type=code
Related, there seems to be a trend of using some other build tag to guard unsafe, though I don't think there's a strong consensus on safe
, unsafe
, nounsafe
, or something else
71: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.safe%2F+%2F%22unsafe%22%2F&type=code
however it doesn't seem to combine often with purego:
4: https://github.com/search?q=++NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.safe.purego%2F+%2F%22unsafe%22%2F&type=code + 2: https://github.com/search?q=++NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.purego.*safe%2F+%2F%22unsafe%22%2F&type=code
Total hits for purego
(excluding !purego
): 345: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.*%5B%5E%21%5Dpurego%2F&type=code
Comment From: aclements
Maybe part of the problem here is that we don't have a well-defined boundary of what packages are or are not portable. E.g., this comes up in maphash, which I would argue is tightly coupled with the runtime. Another Go implementation with a different runtime would also have to define its own maphash package. But a lot of the packages in std are portable and only make use of exported APIs. Today we don't draw that boundary.
Comment From: aclements
On top of https://github.com/golang/go/issues/23172#issuecomment-1000544013 and https://github.com/golang/go/issues/23172#issuecomment-2000390548, which make compelling arguments, CL 657297 made me notice that a low-level package (hash/maphash) depends on crypto/rand under purego because it wanted to avoid a linkname to runtime.rand, while the ask in https://github.com/golang/go/issues/47342 was to avoid pointer arithmetic.
IMO, maphash is part of the runtime, and therefore does not need to have a purego implementation, in the same way that the runtime package itself clearly cannot have a purego implementation.
My understanding is that GopherJS depends on the purego implementation of maphash, but I think they could easily provide runtime_rand and runtime_memhash, just as the current purego version of maphash does, and otherwise continue to use the existing maphash package. We could then drop the purego implementation of maphash.
Comment From: rolandshoemaker
It sounds like what we want (correct me if I'm wrong) is for purego
to mean "portable Go". That probably means something along the lines of you cannot use assembly, nor probably cgo.
What from unsafe
you can use is complicated. Sizeof
is maybe fine (although I wonder about host specific alignment/padding stuff for structs)? I think clearly AlignOf and OffsetOf are probably out the window. The others I'm not really sure of either way.
Comment From: aclements
I filed a separate proposal for dealing with the problems caused by purego maphash: #74285.
Comment From: cherrymui
Based on the discussion above, it seems still unclear what the meaning of purego people expect, and how the tag would be used. Sometimes it is meant for other (non-gc) implementations of the Go distribution (note that a non-gc Go distribution could support cgo and unsafe, e.g. gccgo). Sometimes it might mean "safe"? And sometimes it could mean "portable Go" as @rolandshoemaker mentioned above.
In the standard library, besides hash/maphash, the purego tag is used in crypto packages mostly to provide a generic fallback, to support platforms that don't have the assembly implementation. According to @rolandshoemaker , it is unclear how/whether the tag is used on a platform that does have assembly support, in which case the GOARCH build tag would just do the same thing.
Could someone who actively uses the purego tag in their code comments on what the intention is? Thanks.
Comment From: FiloSottile
The systematic use of purego in crypto is for two purposes: TinyGo (who previously unnecessarily had a number of crypto packages marked as broken), and testing generic fallbacks on dev machines (which are always arm64 or amd64 which generally has assembly).