Proposal Details
The goal of this proposal is to simplify implementing fs.FS() on top of archive/tar
There was 2 "soft" rejected requests to implement fs.FS(): - https://github.com/golang/go/issues/61232 - https://github.com/golang/go/issues/74041
I could find 4 implementations that could benefit from that: - https://pkg.go.dev/github.com/nlepage/go-tarfs: it finds the header offset by looking at how much was already read - 512 (https://github.com/nlepage/go-tarfs/blob/5be978f25f2e456c2e72636c61eb3af3f155989f/fs.go#L70), so it assumes no internal buffering - https://pkg.go.dev/github.com/quay/claircore/pkg/tarfs: it re-implements it own parsing (https://github.com/quay/claircore/blob/v1.5.37/pkg/tarfs/parse.go) - https://github.com/containers/image/blob/9822b6ffa5c1dc56088cbb82a2276d4f3f872a68/docker/internal/tarfile/reader.go#L242C6-L242C22 which just rescan the whole archive each time it want to find a file - https://github.com/mholt/archives
The idea would be to add
func (tr *Reader) NextOffset() (*Header, int, error)
that also returns the offset in the io.Reader the Header was found at, so that if you Seek() to this offset and call Next() you end up with the same Header / data
Comment From: gabyhelp
Related Issues
- proposal: archive/tar: add Reader FS() #74041 (closed)
- proposal: archive/tar: implement fs.FS #61232 (closed)
- archive/tar: add Reader.NextRaw method to read only one raw header #17657
- proposal: archive/tar: add iterator form of (\&Reader).Next() #68062
- proposal: archive/tar: export Reader.handleRegularFile() #45122 (closed)
Related Code Changes
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: mholt
I've implemented using tar (and zip, and any other archive file) as fs.FS in https://github.com/mholt/archives -- however, it is indeed very inefficient for tar files due to lack of an index, so random access is almost unbearable in many cases.
Having a way to get and use a header's offset would be phenomenal for performance, and I'm told other ecosystems like Python have this ability.
Comment From: mholt
It may have been proposed elsewhere before, but I'd also be Very Okay with a new field in the tar.Header struct called Offset.
Comment From: changeling
For reference, Python's tarfile library offers the following:
TarInfo.offset: int
The tar header starts here.
TarInfo.offset_data: int
The file’s data starts here.