Golang doc: mention "purego" build tag convention somewhere

As the number of Go implementations continues to increase, the number of cases where the unsafe package is unlikely to work properly also rises. Currently, there is appengine, gopherjs, and possibly wasm where pointer arithmetic is not allowed.

Currently, protobuf and other packages special cases build tags for appengine and js and may need to add others in the near future. It does not scale very well to blacklist specific known Go implementations where unsafe does not work.

My proposal is to document safe as a community agreed upon tag meaning that unsafe should not be used. It is conceivable that this concept be extended to the compiler rejecting programs that use unsafe when the safe tag is set, but I'm currently more interested as a library owner in knowing whether to avoid unsafe in my own packages.

\cc @zombiezen @dneil @neelance @shurcooL

Comment From: dsnet

The code history of protobuf seems to indicate that this very same concept was discussed but not pursued further. I'd like to push this more since I see this distinction in at least 2 packages I own.

https://github.com/golang/protobuf/issues/154

Comment From: mdempsky

I think standardizing a build tag to indicate whether package unsafe is available makes sense.

It is conceivable that this concept be extended to the compiler rejecting programs that use unsafe when the safe tag is set

I disagree. Currently, build tags are strictly a build-system concept. I'd argue the compiler should remain ignorant of them. cmd/compile already has a -u flag that prevents importing package unsafe, and the build system can arrange to pass -u as appropriate.

Comment From: dmitshur

/cc @bradfitz who also ran into this with go4.org/reflectutil, and seemed to like "safe" at the time.

Comment From: bradfitz

@shurcooL, I'm still fine with "safe", as long as it's defined (i.e. "code that doesn't import "unsafe").

But does it also mean no assembly?

Those are the sorts of things that should be clarified, if this is to be blessed somehow. (our own use, wiki page, etc)

Comment From: dmitshur

@dsnet Can you clarify if your proposal is about documenting safe to have a very specific meaning and applied to all packages?

Or is it about documenting the fact that safe is a commonly used build tag for a given purpose, but individual projects still have get final say on the exact meaning of the safe build tag for their own needs?

Comment From: flimzy

Would this proposal be codified in the standard library somehow, perhaps by adding a !safe build tag to the unsafe package? Or would it live purely in documentation?

Comment From: davecb

In a previous life, we had to identify versions of libraries, and rapidly found out that that was too coarse a measure. We eventually attached a label via the linker to each entry point*, and could tell if, for example, a call to memcpy allowed overap or not. '

We also used it in migration work, to identify parts of programs that could not be supported on a different OS or hardware platform.

You arguably should consider labelling parts of the unsafe library with supported and unsupported by target OS, language or whatever, not the whole library if only one operation is unavailable.

--dave [* a description of using per-entry-point labels for a different purposes is at https://leaflessca.wordpress.com/2017/02/12/dll-hell-and-avoiding-an-np-complete-problem/ ]

Comment From: andlabs

Would this unsafe build tag also affect code that uses cgo? SWIG?

How would this build tag interact with the standard library, where both are used a lot? Does no unsafe mean no reflect as well?

What is the unsafe policy on nacl? is there anything in nacl that we could use for this?

@shurcooL it sounds like the latter at minimum, the former ideally.

Comment From: dsnet

I propose that "safe" be soft signal that a library should have memory safety (i.e., makes no assumptions about how objects are laid out in memory, the architecture endianess, semantics regarding registers, etc).

Thus, the "safe" tag has the following properties: * This is just a hint for library authors who want to write code that is highly portable. There is no logic in the compiler or the build tool to enforce this. * Thus, the standard library doesn't have to use it. Portability is achieve by either requiring a fork of the standard library (as is the case for gopherjs) or via a series of build tags as the mainline standard library does for various architectures. * Use of reflect is allowed since it doesn't allow you to violate memory safety. * Use of cgo is allowed. We already have a build tag for that, which is cgo. In practice, safe implies that cgo is not used since it is difficult to use cgo without pointers (for which you need unsafe). * Use of assembly is forbidden. Any reasonable use of assembly makes assumption about how memory is laid out on the stack and/or heap.

Thus, appengine and gopherjs are example toolchains that would always set the safe tag.

Comment From: rsc

Based on discussion with proposal-review:

It seems reasonable for appengine, gopherjs, and wasm, all of which declare their own "restricted build" tag, to agree on a common one.
It should mean no asm, no cgo, no unsafe. (It's impossible to use cgo without unsafe.)
Nothing in the standard distribution would care; this is really about coordination between these other non-standard environments. Where do you propose to document this?
A better name than "safe" would be nice. "purego"?

Comment From: neelance

On WebAssembly: The wasm backend that I'm working on uses a linear memory, so unsafe is fully supported. It also has its own asm instructions, just like other architectures. Cgo is not supported (unless someone wants to do a crazy integration with emscripten).

Comment From: cznic

A better name than "safe" would be nice. "purego"?

:+1: for purego, already using it in my projects for some years.

Comment From: dsnet

I support purego as well. Even if WebAssembly supports unsafe (which is great to hear!), there is always still the use case where someone wants to compile with a pure-Go version for a variety of reasons.

I don't have any great suggestions for where to document this, but perhaps the godoc for go/build?

Comment From: dmitshur

I wanted to point out that in colloquial usage, I've seen "pure Go" most commonly refer to packages that don't use cgo (but can use unsafe, assembly). Seeing it mean "no unsafe and no assembly as well" would require some calibration. But maybe it's fine.

The math/big package contains some precedent on this: it defines a math_big_pure_go tag, which is being used as proposed here (no assembly, no unsafe, no cgo).

Comment From: rsc

OK, purego it is.

Comment From: gopherbot

Change https://golang.org/cl/103239 mentions this issue: go/build: document purego convention

Comment From: cristaloleg

Looks like the patch is still not merged, there is 1 small suggestion, kindly ping @dsnet 👀

Comment From: cristaloleg

@rsc can this be accepted and merged with #41184 ? So the new //go:build will consolidate community on 1 safety oriented tag (which is purego based on this issue). Thanks.

Comment From: cespare

It's unfortunate that the best reference for this convention is still this issue, since in the intervening four years no documentation change has been merged.

Additionally, I have found during this time that, in practice, I cannot use the purego tag for its intended purpose in our internal codebase at my company. The reason is that we use go-cmp. Some of go-cmp's core functionality is unsafe. That functionality is now behind a purego build tag. Enabling the purego tag makes most of our tests panic. We have packages with asm as well as pure-Go implementations, and we often want to run automated tests of both code paths, but we cannot run the tests with purego, so we end up using a different build tag to indicate "Go rather than assembly".

I'm not even sure what the right fix is. Maybe the ideal outcome would be that something in the Go standard library (testing? reflect?) would allow go-cmp to do what it needs to do without unsafe. But for now, the existence and popularity of go-cmp kind of "infects" the purego tag and makes it a not-very-useful convention.

Comment From: zephyrtronium

I'm not even sure what the right fix is. Maybe the ideal outcome would be that something in the Go standard library (testing? reflect?) would allow go-cmp to do what it needs to do without unsafe.

Perhaps also relevant: #45200

Comment From: kortschak

I'd like to clarify, based on discussion here, whether this should even be a thing. I know that people want it, but it looks like in that discussion it isn't being considered as having any great importance.

Comment From: gopherbot

Change https://go.dev/cl/561935 mentions this issue: crypto: use and test purego tag consistently

Comment From: aykevl

I'm working on TinyGo, and combining these three concepts together seems like a bad idea to me (no assembly, no cgo, and no unsafe). TinyGo: * Supports unsafe, though it has a slightly different memory layout for some types. It should still be possible to write portable unsafe code in most cases (e.g. unsafe.String was a great addition for this). * Supports CGo, though it has some features missing compared to the main Go toolchain. These missing features can probably be implemented when needed. * Does not support Go assembly. I've tried it, and the only way I got it to work was one giant hack.

Furthermore, there is already a perfectly good tag for CGo support: the cgo build tag. I don't think we need a new build tag that also says something about CGo support. I see there are some systems (like appengine) where unsafe is not allowed, I would suggest using a different build tag for that than the one that controls Go assembly support.

Right now we set the purego build tag by default to get crypto packages to work, but I don't really like it because it's not very clearly defined right now and I'd rather not limit things like unsafe. A build tag like noasm and a separate build tag like nounsafe (or whatever) would be much better in my opinion.

Comment From: gopherbot

Change https://go.dev/cl/660136 mentions this issue: cmd/go: document purego convention

Comment From: FiloSottile

I think we might have decided this wrong. Banning all uses of unsafe under the same build tag as assembly is overly broad.

There are really at least three classes of unsafe: linknames, type conversions (especially now with unsafe.String), and pointer arithmetic. AFAICT only the last one is really non-portable.

On top of https://github.com/golang/go/issues/23172#issuecomment-1000544013 and https://github.com/golang/go/issues/23172#issuecomment-2000390548, which make compelling arguments, CL 657297 made me notice that a low-level package (hash/maphash) depends on crypto/rand under purego because it wanted to avoid a linkname to runtime.rand, while the ask in #47342 was to avoid pointer arithmetic.

Maybe we should rescope purego to only banning assembly and "non-portable" unsafe, leaving the cgo tag and CGO_ENABLED for cgo. What's non-portable unsafe is fuzzy, but after all non-gc implementations are always balancing downstream patches and upstream conveniences, so they will let us know like in #47342. (We certainly can't make the whole standard library build without unsafe.)

Comment From: seankhliao

If it's about documenting convention, there are a lot more examples of: //go:build !purego + import "unsafe": 108: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.%21purego%2F+%2F%22unsafe%22%2F&type=code vs //go:build purego + import "unsafe": 14: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.%5B%5E%21%5Dpurego%2F+%2F%22unsafe%22%2F&type=code

Related, there seems to be a trend of using some other build tag to guard unsafe, though I don't think there's a strong consensus on safe, unsafe, nounsafe, or something else 71: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.safe%2F+%2F%22unsafe%22%2F&type=code however it doesn't seem to combine often with purego: 4: https://github.com/search?q=++NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.safe.purego%2F+%2F%22unsafe%22%2F&type=code + 2: https://github.com/search?q=++NOT+is%3Aarchived+NOT+is%3Afork++language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.purego.*safe%2F+%2F%22unsafe%22%2F&type=code

Total hits for purego (excluding !purego): 345: https://github.com/search?q=NOT+is%3Aarchived+NOT+is%3Afork+language%3Ago+%2F%5C%2F%5C%2Fgo%3Abuild+.*%5B%5E%21%5Dpurego%2F&type=code

Comment From: aclements

Maybe part of the problem here is that we don't have a well-defined boundary of what packages are or are not portable. E.g., this comes up in maphash, which I would argue is tightly coupled with the runtime. Another Go implementation with a different runtime would also have to define its own maphash package. But a lot of the packages in std are portable and only make use of exported APIs. Today we don't draw that boundary.

Comment From: aclements

On top of https://github.com/golang/go/issues/23172#issuecomment-1000544013 and https://github.com/golang/go/issues/23172#issuecomment-2000390548, which make compelling arguments, CL 657297 made me notice that a low-level package (hash/maphash) depends on crypto/rand under purego because it wanted to avoid a linkname to runtime.rand, while the ask in https://github.com/golang/go/issues/47342 was to avoid pointer arithmetic.

IMO, maphash is part of the runtime, and therefore does not need to have a purego implementation, in the same way that the runtime package itself clearly cannot have a purego implementation.

My understanding is that GopherJS depends on the purego implementation of maphash, but I think they could easily provide runtime_rand and runtime_memhash, just as the current purego version of maphash does, and otherwise continue to use the existing maphash package. We could then drop the purego implementation of maphash.

Comment From: rolandshoemaker

It sounds like what we want (correct me if I'm wrong) is for purego to mean "portable Go". That probably means something along the lines of you cannot use assembly, nor probably cgo.

What from unsafe you can use is complicated. Sizeof is maybe fine (although I wonder about host specific alignment/padding stuff for structs)? I think clearly AlignOf and OffsetOf are probably out the window. The others I'm not really sure of either way.

Comment From: aclements

I filed a separate proposal for dealing with the problems caused by purego maphash: #74285.

Comment From: cherrymui

Based on the discussion above, it seems still unclear what the meaning of purego people expect, and how the tag would be used. Sometimes it is meant for other (non-gc) implementations of the Go distribution (note that a non-gc Go distribution could support cgo and unsafe, e.g. gccgo). Sometimes it might mean "safe"? And sometimes it could mean "portable Go" as @rolandshoemaker mentioned above.

In the standard library, besides hash/maphash, the purego tag is used in crypto packages mostly to provide a generic fallback, to support platforms that don't have the assembly implementation. According to @rolandshoemaker , it is unclear how/whether the tag is used on a platform that does have assembly support, in which case the GOARCH build tag would just do the same thing.

Could someone who actively uses the purego tag in their code comments on what the intention is? Thanks.

Comment From: FiloSottile

The systematic use of purego in crypto is for two purposes: TinyGo (who previously unnecessarily had a number of crypto packages marked as broken), and testing generic fallbacks on dev machines (which are always arm64 or amd64 which generally has assembly).

Comment From: ianlancetaylor

I think the first mention of "purego" was https://go-review.googlesource.com/c/crypto/+/17962/7#message-ec4785364af48054dc28c681dd83e7124ed35384. There @bradfitz suggested using for AppEngine, which at the time, as far as I can recall, did not permit using assembly code and did not permit importing the unsafe package.

Today I think we can drop the restriction on the unsafe package, and say that "purego" means "no assembly language, no code written in C or any other non-Go language."

It would be interesting to hear of any cases where "purego" means something either less or more restrictive.

Comment From: FiloSottile

Today I think we can drop the restriction on the unsafe package, and say that "purego" means "no assembly language, no code written in C or any other non-Go language."

It would be interesting to hear of any cases where "purego" means something either less or more restrictive.

TinyGo mostly supports cgo (https://github.com/golang/go/issues/23172#issuecomment-2000390548), so it would benefit more from a "purego" that means "no assembly language". We already have the !cgo build tag for cgo fallbacks.

Comment From: dsnet

Perhaps the problem with purego is that it means slightly different things to different people. Maybe we should decompose it down into tags that target specific dimensions such as nocgo (which I guess is equivalent to !cgo), nounsafe, or noasm? That way the meaning is explicitly clear and people can select exactly the set that is relevant in their particular situation.

Comment From: kortschak

In Gonum we used (continue to use) safe (synonym for the — probably better — nounsafe above) and noasm together to avoid conflating the two concepts. safe means no import of unsafe and noasm means no use of assembly.

Comment From: FiloSottile

The problem is that nounsafe also would need decomposing into nolinkname (which I suspect no one needs), notypecasts (which I don't know of any users of but could imagine some), and nopointerarithmetic (which TinyGo needs due to different layouts). See https://github.com/golang/go/issues/23172#issuecomment-2762810054.

Then after all this decomposition, we'd probably only see any actual use for nopointerarithmetic+noasm by TinyGo, and noasm by us when developing. (I'm curious, what does Gonum use safe for?)

Feels like this conversation is going a bit in circles, certainly complicit the long timespan, so in the interest of making progress, I propose we rescope purego to just mean "no assembly language". There's !cgo for cgo, and I honestly have seen so little non-portable usage of unsafe that I am not convinced we need to go down the rabbit-hole of defining what is and isn't portable, and can handle it on a case-by-case basis. Are there use cases this doesn't address?

Comment From: cherrymui

If we are talking about different Go implementations (gc, gccgo, GopherJS, TinyGo, etc.), currently there are gc and gccgo build tags that are recognized by the go command, and set by the corresponding Go distribution. We could consider introducing gopherjs and tinygo (and others, if any). Or we could just use !gc for things that are meant for a non-gc implementation, which would imply no Go assembly, and no dependencies on runtime internal details.

Comment From: aykevl

Feels like this conversation is going a bit in circles, certainly complicit the long timespan, so in the interest of making progress, I propose we rescope purego to just mean "no assembly language".

This would work well for TinyGo. The name noasm would have been clearer but changing all instances of purego to noasm will probably be difficult. I'd be happy with purego meaning "no assembly".

nopointerarithmetic (which TinyGo needs due to different layouts).

True, though in practice they line up the same in many cases and I'm pretty sure edge cases can be fixed by using unsafe.Offsetof and the like. So this might not need a specific build tag at all. The only cases where this wouldn't work that I can think of would be questionable anyway, such as trying to inspect interface, func, or chan types.

We could consider introducing gopherjs and tinygo (and others, if any). Or we could just use !gc for things that are meant for a non-gc implementation, which would imply no Go assembly, and no dependencies on runtime internal details.

I'd prefer to avoid this. It smells too much like UA sniffing in browsers, and all associated problems. Any new compiler will need to convince the rest of the ecosystem to accept their new build tag, or fall back to slow/safe implementations even when that's not necessary. Or, probably, just reuse an existing compiler build tag and add a new one (which is what TinyGo does, basically). Specific build tags (noasm etc) are much more explicit.

Comment From: aclements

I'd prefer to avoid this. It smells too much like UA sniffing in browsers, and all associated problems.

My hope would be that tags like gc (or tinygo or gopherjs) are exceedingly rare and used only for things that are highly specific to their respective runtimes, such as actual runtime code or code that depends deeply on their runtime implementation details.

safe/nounsafe etc

Unsafe has two rings: "portable" usage and "non-portable" usage. What's specified in the unsafe package docs is portable across different implementations of Go and different platforms (modulo where it says it isn't, like where rule 1 says "Provided that T2 is no larger than T1 and that the two share an equivalent memory layout"). The reason it's still "unsafe" is because the compiler can't statically check if your usage is portable or not, so it's up to the programmer.

Practically speaking, there are no Go platforms these days that don't support "portable" unsafe. Therefore, it seems unnecessary to have a standard tag for "no unsafe usage". The "non-portable" usage of unsafe is non-portable because it depends on architecture, OS, and/or Go toolchain, and we already have build tags to limit it in whichever of these is appropriate.

Comment From: cherrymui

"purego" literally sounds like purely Go code, no any other language (including C). Allowing cgo in "purego" sounds confusing. "noasm" would be much better.

Is there a place where one actually wants to use cgo but avoid assembly? (In the context of package implementation or uses, not toolchain implementation.)

At least in the standard library, there is no code that has a purego tag and uses cgo. So if we (continue to) disallow cgo in purego files, it won't break. And we can gradually shift to noasm.

Comment From: aykevl

Is there a place where one actually wants to use cgo but avoid assembly? (In the context of package implementation or uses, not toolchain implementation.)

Yes, in TinyGo. Cgo mostly works in TinyGo (and we can fix the cases where it doesn't), while Go assembly is not supported for multiple reasons.

My hope would be that tags like gc (or tinygo or gopherjs) are exceedingly rare and used only for things that are highly specific to their respective runtimes, such as actual runtime code or code that depends deeply on their runtime implementation details.

Agreed. That may not be how people will actually use them, but yeah in my opinion they really should only be used when depending on runtime/compiler specific behaviors.

Comment From: eihigh

Based on the discussion, it's probably impossible to classify Go code into just two categories of purego or not purego.

Go code can have various types of dependencies including whether it contains assembly code, whether it uses Cgo, whether it depends on special packages like unsafe or runtime, and whether it uses special compiler directives.

While we don't usually need to be conscious of these when using only gc, Go code can depend on a wide variety of things in combination.

Would it be helpful to have a tool that can easily extract what a Go code depends on? This would seem to make it easier to perform pre-checks before building with tools other than gc.

Comment From: FiloSottile

I think we're letting the perfect be the enemy of the good.

We have one major use case (TinyGo) and one minor use case (debugging generic implementations). They both essentially need a tag to disable assembly and nothing else (https://github.com/golang/go/issues/23172#issuecomment-2989965004).

We have a widely used convention, the purego tag, to disable assembly. It's used both in the standard library (where it's easy to change) and in the ecosystem (where it is not).

We caused trouble by trying to expand its definition, because it led to "we can't use linkname under purego" which turns out is unnecessary.

purego is a bad name for "no assembly" but it's the one we have and it does the thing we need, so I think we should keep it and document it as "no assembly" and nothing else. (GODEBUG is also a bad name for feature flags, but it's the one we had.)

(If we had a do-over, maybe we should have had the gc toolchain set a default asm tag (like the cgo tag) which TinyGo wouldn't set, and use !asm (like !cgo) in fallback files. It would have the benefit of not having to tag assembly files !purego, but it also doesn't solve—on its own—the debugging use case, because you can't specify negative tags with -tags.)

Comment From: aclements

It seems like we're reaching consensus:

Document that purego conventionally disables assembly code. It's not the best name, but it's what we have and captures how purego is generally used in the ecosystem. It's just a convention, though, and the toolchain won't enforce that there's no assembly code (we could explore this separately, but it may break a lot of things).
purego is orthogonal to cgo. The cgo tag controls cgo usage.
Any code that's truly GC-specific should use the gc tag (but hopefully that's very rare!), and any code that's OS- or architecture- specific should use the corresponding tags.

Comment From: aclements

Have all remaining concerns about this proposal been addressed?

The proposal is to document that the purego build tag in go help buildconstraint conventionally disables the use of assembly code in packages, and does not conventionally affect the use of cgo (use the cgo tag for that).

Comment From: aclements

Based on the discussion above, this proposal seems like a likely accept. — aclements for the proposal review group