The pprof format specifies that profiles must be gzip compressed on disk. Go implements this by unconditionally applying gzip compression (level 1) to all pprof profiles it produces.
This is problematic because gzip is no longer considered to be competitive in the compression space, see the accepted proposal for adding compress/zstd to the stdlib. Also see the compression comparison below, showing that zstd-3 can produce profiles that are 18% smaller than gzip-1 while being 13% faster.
Data volumes are directly correlated to cost (egress, ingress, load balancers), so continuous profiling tools have to make an unpleasant tradeoff: They can either decompress the profiles from the runtime and recompress them as zstd and accept increased CPU/memory overhead. Or they can leave the gzip-1 compression as-is and accept increased network overhead.
Possible Solutions
- Provide an API to disable the compression
- Provide an API to make the compression algorithm configurable
- Switch to zstd compression by default (would depend on #62513 and might require pprof to support zstd as well)
Initial discussions at yesterday's runtime: performance and diagnostics meeting seemed to hint at rough consensus for option 1 (meeting notes should be available soon). This would also be aligned with runtime/trace which produces uncompressed data. However, for CPU profiles this will probably depend on the implementation of https://github.com/golang/go/issues/42502. For the other profile types, the debug argument to Profile.WriteTo could be used.
If that sounds roughly right, I can turn this issue into a proposal for option 1.
Compression Comparison
Below is somewhat haphazard, but illustrative comparison between a few different compression algorithms for compressing pprof data. The source code is available.
- file: A random cpu profile that is 2.4 MiB before compression (not supplied here)
- algorithm: A algorithm-level tuple.
- zstd is
github.com/klauspost/compress/zstd - kgzip is
github.com/klauspost/compress/gzip - lz4 is
github.com/pierrec/lz4/v4 - gzip is
compress/gzip - compression_ratio:
uncompressed bytes / compressed bytes - speed_mb_per_sec:
uncompressed bytes / duration(median of 10 runs) - utility:
compression_ratio * speed_mb_per_sec(suggested by this blog post)
| file | algorithm | compression_ratio | speed_mb_per_sec | utility |
|---|---|---|---|---|
| cpu.pprof | zstd-1 | 2.93 | 304 | 889.06 |
| cpu.pprof | zstd-2 | 3.13 | 224 | 700.85 |
| cpu.pprof | lz4-0 | 2.04 | 292 | 593.92 |
| cpu.pprof | kgzip-1 | 2.69 | 190 | 510.83 |
| cpu.pprof | zstd-3 | 3.27 | 141 | 460.03 |
| cpu.pprof | kgzip-6 | 2.92 | 121 | 351.93 |
| cpu.pprof | gzip-1 | 2.68 | 123 | 328.17 |
| cpu.pprof | lz4-1 | 2.53 | 56 | 141.02 |
| cpu.pprof | lz4-9 | 2.53 | 51 | 127.88 |
| cpu.pprof | lz4-4 | 2.53 | 51 | 127.86 |
| cpu.pprof | gzip-6 | 3.02 | 39 | 117.89 |
| cpu.pprof | zstd-4 | 3.43 | 26 | 90.29 |
| cpu.pprof | gzip-9 | 3.03 | 16 | 48.9 |
| cpu.pprof | kgzip-9 | 3.05 | 15 | 46.34 |
Conclusion: For this profile, zstd-3 produces profiles that are 18% (1-2.68/3.27) smaller while being 13% faster (1-123/141) than gzip-1.
cc @mknyszek @prattmic @nsrip-dd
Comment From: prattmic
I think providing an uncompressed option provides the most flexibility for users to do whatever works best for them. Plus, passing in a compression io.Writer would be quite natural in Go.
It's unfortunate that the debug argument to WriteTo is so opaque. Perhaps we should also provide some named constants for the various values of debug?
Comment From: mknyszek
I think no matter what this is going to need a proposal, but I agree with @prattmic. @prattmic and I have been brainstorming improvements to the runtime/pprof package that would give us a clear place to add disabling compression as a configuration parameter. We'll share more soon.
Comment From: rockdaboot
@felixge Is it possible to provide the data used for the benchmark? And the script to generate the table? Because with different data my results look very different (e.g. lz4-0 has by far the best utility value, zstd-1 is by far not the best). I'd like to find out whether this is because of - different data - different CPU arch - different Go version
I also added brotli (from "github.com/andybalholm/brotli") to the benchmark, and it is always 2nd after lz4-0.
Comment From: felixge
I can't share this particular CPU profile right away. The one thing that might be a bit unusual about it is that it is using a lot of pprof labels for trace<->profile correlation (labels for span id, trace id).
What kind of data have you been using for your tests?
I agree that the optimal compression algorithm selection problem will require more disciplined testing. But I think your comment provides further evidence that it would be good to have flexibility when it comes to the compression, i.e. allowing a way for the application/library code to decide which compression to use.