Golang Protobuf question: .proto packaging / repository layout

Hi Trying to summarize the question is something like How to write a pure reusable distributed proto packages with arbitrary depth of transitive dependencies. E.g A, B->A, C->(A,B) : where A,B,C - are all separated repos with a proto declarations. The same time in a way it works for google/protobuf/empty.proto.

I'm not sure how better to structure the entire question. Instead there's a batch of tiny ones that looks contradicting the subject intention or just confusing the hole picture vision.

All the examples shows a package naming as "foo.bar.baz". Still google/protobuf/empty.proto is imported as some url(fs) path.
When referring import "some/other/mod.proto" does it really mean the referring (linking) on an initial photo-source level or at the compiled to target language one. As we can see google/protobuf/empty.proto as a source reference. Still the result compilation refers the real golang "google.golang.org/protobuf/types/known/emptypb" module.
The most resources say the only (best) composition for protos is a subtree/submodule linking 3.1 So that how to keep code clean and consistent that case. As I can see every level of repo reference adds additional fs tree nesting to a target E.g.

repoA
   a.proto

repoB
   A(->repoA)
     a.proto
   B
     b.proto

repoC
  A(->repoA)
       a.proto
  B(->repoB)
     A(->repoA)
       a.proto
     B
       b.proto
   C
      c.proto

3.2 As well. How to avoid a code duplication in case of C->(B, A) if B is just a black box that doesn't leak it's dependancies and to compose all the reps within C as flat set. 3.3 Anyway in case of google/protobuf/empty.proto we don't subtree/fetch/declare whatever manually at all. So how to achieve. the same with custom repos. 4. How to make it even fluent and language agnostic in different packaging managers, e.g. go mod, npm etc. as the mentioned google/protobuf/empty.proto does.

Thank you

Comment From: stapelberg

Hey @kudla

Your question is long and has many parts, not all of which I understand.

I’ll try to help as best as I can:

The package name (e.g. google.protobuf) is separate from the file system path (e.g. google/protobuf/empty.proto) of a .proto file. While they are separate, a common naming strategy is for the package to match the file system directory (as seen here).
The path google/protobuf/empty.proto refers to a “well-known proto”, i.e. one which is shipped with protoc itself.
If, during proto compilation (running protoc), you want to import other .proto files, you need to adjust protoc’s import search path by using the -I flag.
- The well-known protos are found the same way: the import search path by default points to the location where the well-known protos can be found, which is the include/ directory next to the bin/ directory in which protoc lives:
% unzip -l protoc-29.1-linux-x86_64.zip Archive: protoc-29.1-linux-x86_64.zip Length Date Time Name --------- ---------- ----- ---- 0 1980-01-01 00:00 bin/ 9623664 1980-01-01 00:00 bin/protoc 0 1980-01-01 00:00 include/ 0 1980-01-01 00:00 include/google/ 0 1980-01-01 00:00 include/google/protobuf/ 6154 1980-01-01 00:00 include/google/protobuf/any.proto 7729 1980-01-01 00:00 include/google/protobuf/api.proto 0 1980-01-01 00:00 include/google/protobuf/compiler/ 8556 1980-01-01 00:00 include/google/protobuf/compiler/plugin.proto 2185 1980-01-01 00:00 include/google/protobuf/cpp_features.proto 52531 1980-01-01 00:00 include/google/protobuf/descriptor.proto 4892 1980-01-01 00:00 include/google/protobuf/duration.proto 2363 1980-01-01 00:00 include/google/protobuf/empty.proto […] 1. For repository b to work with a service that is implemented in repository a, it is typically sufficient to import the generated protobuf Go code (.pb.go files). You typically don’t need the .proto files from a in b. 1. (I don’t know enough about npm or other package managers to say anything about them.)

Comment From: kudla

Hi @stapelberg . Thank you a lot for a such elaborate answer.

not all of which I understand.

I bet this is due to my initial lack of understanding either.

Looks I can see my crucial misunderstanding know. I though having any deps we should provide per each one both .proto sources (to statically match typings) and compiled libs to link the target lang deps.

Please confirm I see it the right way now. Do we have two SEPARATE options to resolve every single deps * to provide a path with -I flag if we have the deps as a .proto sources so that to be compiled in place * or just to provide a go mod url with -M in case we have compiled package to be linked straight on a go mod level?

Comment From: kudla

Ah, looks it is still not))

Without deps .proto sources the actual references other.package.Message are stoped to be resolved.

Anyway the separation of terms package name vs file system path looks very helpful as well. Need to rethink the whole story once again.

Comment From: stapelberg

Please confirm I see it the right way now. Do we have two SEPARATE options to resolve every single deps

to provide a path with -I flag if we have the deps as a .proto sources so that to be compiled in place

or just to provide a go mod url with -M in case we have compiled package to be linked straight on a go mod level?

Both -I and --go_opt=M… are command-line flags for protoc (the protobuf compiler): * The -I flag extends the import search path, making protoc find .proto files on disk, i.e. translating from relative paths like google/protobuf/duration.proto to absolute file system paths like /opt/protobuf/include/google/protobuf/duration.proto. * The M option overrides the Go package import path for generated code (only needed if you do not specify option go_package = "…"; in your .proto file), see https://protobuf.dev/reference/go/go-generated-opaque/#package

I would not call this “two separate options” at a high level — yes, these are two flags, and they are related, but they are not replacements for each other.

In general, protoc is only needed if you want to compile .proto files into Go code. Most often, you would do that compilation within one repository and then only export the generated Go code, because downstream users often do not need the .proto files.

Without deps .proto sources the actual references other.package.Message are stoped to be resolved.

Yes. If the repositories are not supposed to depend on each other’s .proto files directly, you probably need to reach for proto extensions: https://protobuf.dev/programming-guides/editions/#extensions

Comment From: kudla

After some rethinking of a topic and several new challenges I think I’me ready to form the final question :)

First of all I think I found my main misandestanding. Or even I just overvalued the term import from the .proto instruction. So that it maybe more about link from another file instead of import from another lib

Anyway to make it all clear my question is what is the practical way of using import and —import-path to refer declaration from the .proto declarations defined in some another repository. Mainly focusing in golang flow now.

So is it really possible. E.g. having

github.com/definition-scope/raw-protos.git - a repo with the basic protos declarations only bitbucket.org/lib-scope/go-mod-lib.git - repo with pure golang module with reference to raw-protos (e.g. over submodule) and compiled sources in /protos folder gitlab.com/consumer-scope/app.git which is endpoint application with own proto declaration which should refer some declarations from raw-protos definitions

So here's mainly to equations 1. Is it really possible to refer from the app protos scope to import the declarations from raw-protos or go-mod-lib repos. I.e. without manually prefetching the lib protos to some wellnown path. E.g. as we talkning about golang env is it possible to use any magic just something as

go get bitbucket.org/lib-scope/go-mod-lib.git 
# or 
go get github.com/definition-scope/raw-protos.git # still how this should work for non go mod repos any way
protoc —proto_path (-I) “$GOHOME/pkg” app/protos/*.proto

so that to take a profit from a go mod loader flow it self

Taking into account as you @stapelberg mentioned that no-one cares on .proto files so is it really possible to make the very same described in 1. just having only compiled sources as bitbucket.org/lib-scope/go-mod-lib without raw proto definitions from github.com/definition-scope/raw-protos.git at all.

To summarize. As in basics it looks preatty clear the —proto_path idea so in practice is it really possible (intended) to run imports from the external repositories (.protos or compiled .go sources)

Thank you

Comment From: stapelberg

I recommend thinking about the workflow in terms of two separate, independent compilations: protoc compilation vs. Go compilation.

When you invoke protoc, it is your responsibility to arrange for all necessary .proto files to be present. protoc does not care about your repository structure, it only sees .proto files.

The go tool only deals with Go code, it does not know what .proto files are. So, it is your responsibility to place the output of protoc (generated .pb.go code) in the correct place for the go tool to pick it up.

In Go, the typical way to run external code generator tools is to use go generate (see https://go.dev/blog/generate if you are unfamiliar).

For example, in this repository I’m running protoc via go generate: https://github.com/robustirc/robustirc/blob/64321cf1809f36db5c0c700dcb0ef73bb72bc9a4/internal/proto/generate.go#L5

Note that Go expects module authors to submit any generated code to the repository, so that module users don’t need to be able to re-generate the code. In the example above, you can see snapshot.pb.go submitted in the repository, and users of this code never need to do anything with snapshot.proto — in fact, I could delete snapshot.proto and the program would work the same.

So, from the Go module user perspective: Yes, importing generated protobuf code from another repository is totally possible.

But from the Go module author perspective: You are expected to supply all .go code files, and it’s up to you to structure this process.

So, specifically for your situation, here’s a strawman suggestion: maybe write a shell script that checks out all needed repos in the right way and runs protoc?

I hope this helps. If not, maybe ask around within your company for opinions on how to structure multi-repository build processes in your environment.

Comment From: kudla

So, from the Go module user perspective: Yes, importing generated protobuf code from another repository is totally possible.

Sure. Still my point was to import definition on a protofile level. Similar as we do with google.predefined.types

import "some/custom/lib"

message Definition {
     lib.Primitive property = 1;
}

For my self I ended up with linking proto definitions with git submodule directly to target go module which looks the most simple solution.

Anyway go generate you mentioned is quite interesting new tool I've discovered for my self. Thank you. Still it's not for my case as submodules looks more explicit here for me.

Thank you for sharing your vision.