Go version

go1.22-20240109-RC01 linux/amd64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/usr/local/google/home/jba/.cache/go-build'
GOENV='/usr/local/google/home/jba/.config/go/env'
GOEXE=''
GOEXPERIMENT='fieldtrack,boringcrypto'
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/usr/local/google/home/jba/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/usr/local/google/home/jba/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org'
GOROOT='/usr/lib/google-golang'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/google-golang/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22-20240109-RC01 cl/597041403 +dcbe772469 X:fieldtrack,boringcrypto'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/usr/local/google/home/jba/repos/work/go.mod'
GOWORK='/usr/local/google/home/jba/repos/work/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1055306189=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The following program is zipmem.go:

package main

import (
    "archive/zip"
    "bytes"
    "io"
    "log"
    "os"
    "path/filepath"
    "runtime"
    "runtime/pprof"
)

func main() {
    file := filepath.Join(runtime.GOROOT(), "lib/time/zoneinfo.zip")
    data, err := os.ReadFile(file)
    if err != nil {
        log.Fatal(err)
    }
    zr, err := zip.NewReader(bytes.NewReader(data), int64(len(data)))
    if err != nil {
        log.Fatal(err)
    }
    zw := zip.NewWriter(io.Discard)
    if err := zw.Copy(zr.File[0]); err != nil {
        log.Fatal(err)
    }
    writeHeapProfile()
    if err := zw.Close(); err != nil {
        log.Fatal(err)
    }
}

func writeHeapProfile() {
    const filename = "heap.prof"
    f, err := os.Create(filename)
    if err != nil {
        log.Fatal(err)
    }
    runtime.GC()
    if err := pprof.WriteHeapProfile(f); err != nil {
        log.Fatal(err)
    }
    if err := f.Close(); err != nil {
        log.Fatal(err)
    }
}

Shell command:

> go build zipmem.go && ./zipmem && go tool pprof zipmem heap.prof

Inside pprof:

(pprof) top

What did you see happen?

Showing nodes accounting for 1761.95kB, 100% of 1761.95kB total
      flat  flat%   sum%        cum   cum%
 1024.17kB 58.13% 58.13%  1024.17kB 58.13%  archive/zip.(*Reader).init
  737.78kB 41.87%   100%   737.78kB 41.87%  os.ReadFile
         0     0%   100%  1024.17kB 58.13%  archive/zip.NewReader
         0     0%   100%  1761.95kB   100%  main.main
         0     0%   100%  1761.95kB   100%  runtime.main

What did you expect to see?

The memory profile is written after a GC should have freed up the zip.Reader. So it's surprising to see the memory allocated by os.ReadFile is still there.

I believe the reason is this line in Writer.Copy:

fw, err := w.CreateRaw(&f.FileHeader)

CreateRaw stores the FileHeader pointer in the Writer w. The code that populates the FileHeader takes care to ensure that it shares no memory with the zip.Reader from which it came. Unfortunately, FileHeader is embedded in File, which does refer to the underlying reader:

type File struct {
    FileHeader
    ...
    zipr         io.ReaderAt // the argument to NewReader
    ...
}

I believe the pointer to the embedded FileHeader causes the entire File struct to be retained, including the bytes used for the reader.


The problem can be worked around by copying the embedded FileHeader. Luckily, Writer.Copy is a convenience method whose functionality can be written outside the standard library:

func zipWriterCopy(w *zip.Writer, f *zip.File) error {
    r, err := f.OpenRaw()
    if err != nil {
        return err
    }
    fh := f.FileHeader
    fw, err := w.CreateRaw(&fh)
    if err != nil {
        return err
    }
    _, err = io.Copy(fw, r)
    return err
}

I believe Writer.Copy should be fixed in this way. I don't think the fix should be in Writer.CreateRaw because users may depend on the identity of its argument. But its documentation should be updated to alert users to the problem.

Comment From: gopherbot

Change https://go.dev/cl/560238 mentions this issue: archive/zip: reduce memory held by Writer.Copy

Comment From: andrewmbenton

It looks like this was merged.

Comment From: ianlancetaylor

Thanks.