Proposal Details

Context

When using the compress/gzip package to decompress gzip files, receiving a gzip: invalid header error can indicate two distinct possibilities.

  1. The file is not a valid gzip file.

    Example test case (click to expand)

    ```go import ( "compress/gzip" "io" "os" "testing" )

    func TestNotAValidGzipFile(t *testing.T) { f, err := os.Open("/tmp/python-3.12.3-amd64.exe") // Arbitrary example, any non-gzip file will suffice. if err != nil { t.Fatal(err) } defer f.Close()

    rc, err := gzip.NewReader(f)
    if err != nil {
        t.Fatal(err)
    }
    defer rc.Close()
    
    _, err = io.ReadAll(rc)
    if err != nil {
        t.Fatal(err)
    }
    

    }

    // === RUN TestNotAValidGzipFile // gzip_test.go:19: gzip: invalid header ```

  2. The file is a valid gzip file, but contains invalid trailing data.

    Example test case (click to expand)

    ```go func TestValidGzipFileWithTrailingData(t *testing.T) { // Reproducer file. There are many examples of this. // https://github.com/udacity/self-driving-car-sim/blob/4b1f739ebda9ed4920fe895ee3677bd4ccb79218/Assets/Standard%20Assets/Environment/SpeedTree/Conifer/Conifer_Desktop.spm f, err := os.Open("/tmp/Conifer_Desktop.spm") if err != nil { t.Fatal(err) } defer f.Close()

    rc, err := gzip.NewReader(f) if err != nil { t.Fatal(err) } defer rc.Close()

    _, err = io.ReadAll(rc) if err != nil { t.Fatal(err) } }

    // === RUN TestValidGzipFileWithTrailingData // gzip_test.go:19: gzip: invalid header ```

Scenario 2 can be especially confusing because Go's implementation of compress/gzip rejects invalid trailing data, while many popular applications and languages do not. Hence, the ambiguity of this error can lead people to believe that Go is rejecting a valid gzip file.

$ file -i Conifer_Desktop.spm
Conifer_Desktop.spm: application/gzip; charset=binary
$ gzip -S .spm -d /tmp/Conifer_Desktop.spm
gzip: stdin: decompression OK, trailing garbage ignored. 

Side-note: https://github.com/golang/go/issues/47809#issuecomment-902057103 explains why this choice was made and how to handle invalid trailing data using gzip.Reader.Multistream.

Proposal Details

I propose adding a distinct error for when subsequent inputs in a data stream don't have valid headers (i.e., trailing garbage). This would provide clarity for users who many not realize that they need to disable gzip.Reader.Multistream. Additionally, it could be used to programmatically call Multistream(false) for specific files with trailing garbage.

I will leave the specific error message and behave for future discussion, should there be interest in implementing this proposal.


Caveat: I am not familiar with the implementation of gzip.Reader. This change may not be possible or desirable due to technical limitations.

Comment From: gabyhelp

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: rgmz

Similar Issues

https://github.com/golang/go/issues/47809
https://github.com/golang/go/issues/61797

Drats, I spent quite a bit of time searching existing issues. Not sure how I missed the second one.

Comment From: rsc

Duplicate of #61797.