The pprof format specifies that profiles must be gzip compressed on disk. Go implements this by unconditionally applying gzip compression (level 1) to all pprof profiles it produces.
This is problematic because gzip is no longer considered to be competitive in the compression space, see the accepted proposal for adding compress/zstd to the stdlib. Also see the compression comparison below, showing that zstd-3 can produce profiles that are 18% smaller than gzip-1 while being 13% faster.
Data volumes are directly correlated to cost (egress, ingress, load balancers), so continuous profiling tools have to make an unpleasant tradeoff: They can either decompress the profiles from the runtime and recompress them as zstd and accept increased CPU/memory overhead. Or they can leave the gzip-1 compression as-is and accept increased network overhead.
Possible Solutions
- Provide an API to disable the compression
- Provide an API to make the compression algorithm configurable
- Switch to zstd compression by default (would depend on #62513 and might require pprof to support zstd as well)
Initial discussions at yesterday's runtime: performance and diagnostics meeting seemed to hint at rough consensus for option 1 (meeting notes should be available soon). This would also be aligned with runtime/trace
which produces uncompressed data. However, for CPU profiles this will probably depend on the implementation of https://github.com/golang/go/issues/42502. For the other profile types, the debug
argument to Profile.WriteTo could be used.
If that sounds roughly right, I can turn this issue into a proposal for option 1.
Compression Comparison
Below is somewhat haphazard, but illustrative comparison between a few different compression algorithms for compressing pprof data. The source code is available.
- file: A random cpu profile that is 2.4 MiB before compression (not supplied here)
- algorithm: A algorithm-level tuple.
- zstd is
github.com/klauspost/compress/zstd
- kgzip is
github.com/klauspost/compress/gzip
- lz4 is
github.com/pierrec/lz4/v4
- gzip is
compress/gzip
- compression_ratio:
uncompressed bytes / compressed bytes
- speed_mb_per_sec:
uncompressed bytes / duration
(median of 10 runs) - utility:
compression_ratio * speed_mb_per_sec
(suggested by this blog post)
file | algorithm | compression_ratio | speed_mb_per_sec | utility |
---|---|---|---|---|
cpu.pprof | zstd-1 | 2.93 | 304 | 889.06 |
cpu.pprof | zstd-2 | 3.13 | 224 | 700.85 |
cpu.pprof | lz4-0 | 2.04 | 292 | 593.92 |
cpu.pprof | kgzip-1 | 2.69 | 190 | 510.83 |
cpu.pprof | zstd-3 | 3.27 | 141 | 460.03 |
cpu.pprof | kgzip-6 | 2.92 | 121 | 351.93 |
cpu.pprof | gzip-1 | 2.68 | 123 | 328.17 |
cpu.pprof | lz4-1 | 2.53 | 56 | 141.02 |
cpu.pprof | lz4-9 | 2.53 | 51 | 127.88 |
cpu.pprof | lz4-4 | 2.53 | 51 | 127.86 |
cpu.pprof | gzip-6 | 3.02 | 39 | 117.89 |
cpu.pprof | zstd-4 | 3.43 | 26 | 90.29 |
cpu.pprof | gzip-9 | 3.03 | 16 | 48.9 |
cpu.pprof | kgzip-9 | 3.05 | 15 | 46.34 |
Conclusion: For this profile, zstd-3 produces profiles that are 18% (1-2.68/3.27
) smaller while being 13% faster (1-123/141
) than gzip-1.
cc @mknyszek @prattmic @nsrip-dd
Comment From: prattmic
I think providing an uncompressed option provides the most flexibility for users to do whatever works best for them. Plus, passing in a compression io.Writer would be quite natural in Go.
It's unfortunate that the debug
argument to WriteTo
is so opaque. Perhaps we should also provide some named constants for the various values of debug
?