Proposal Details

The goal of this proposal is to simplify implementing fs.FS() on top of archive/tar

There was 2 "soft" rejected requests to implement fs.FS(): - https://github.com/golang/go/issues/61232 - https://github.com/golang/go/issues/74041

I could find 4 implementations that could benefit from that: - https://pkg.go.dev/github.com/nlepage/go-tarfs: it finds the header offset by looking at how much was already read - 512 (https://github.com/nlepage/go-tarfs/blob/5be978f25f2e456c2e72636c61eb3af3f155989f/fs.go#L70), so it assumes no internal buffering - https://pkg.go.dev/github.com/quay/claircore/pkg/tarfs: it re-implements it own parsing (https://github.com/quay/claircore/blob/v1.5.37/pkg/tarfs/parse.go) - https://github.com/containers/image/blob/9822b6ffa5c1dc56088cbb82a2276d4f3f872a68/docker/internal/tarfile/reader.go#L242C6-L242C22 which just rescan the whole archive each time it want to find a file - https://github.com/mholt/archives

The idea would be to add

func (tr *Reader) NextOffset() (*Header, int, error)

that also returns the offset in the io.Reader the Header was found at, so that if you Seek() to this offset and call Next() you end up with the same Header / data

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: mholt

I've implemented using tar (and zip, and any other archive file) as fs.FS in https://github.com/mholt/archives -- however, it is indeed very inefficient for tar files due to lack of an index, so random access is almost unbearable in many cases.

Having a way to get and use a header's offset would be phenomenal for performance, and I'm told other ecosystems like Python have this ability.

Comment From: mholt

It may have been proposed elsewhere before, but I'd also be Very Okay with a new field in the tar.Header struct called Offset.

Comment From: changeling

For reference, Python's tarfile library offers the following:

tarfile.TarInfo

TarInfo.offset: int
The tar header starts here.
TarInfo.offset_data: int
The file’s data starts here.