What version of Go are you using (go version)?

I have tested on: go version go1.9.2 darwin/amd64 go version go1.9.2 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

darwin: GOARCH=amd64 GOOS=darwin linux: GOARCH=amd64 GOOS=linux

What did you do?

pprof labels that I add using pprof.Do() do not appear in the goroutine profile.

Steps: - Compile and start: https://play.golang.org/p/SgYgnDqaVKB - Start "go tool pprof localhost:5555/debug/pprof/goroutine" - Run the "tags" command - See no tags, but I expect to see a tag for the label a-label=a-value

I also downloaded the file "localhost:5555/debug/pprof/goroutine"", gunzipped that file, and did not see either the label key nor value in the protobuf file.

When I run "go tool pprof localhost:5555/debug/pprof/goroutine" twice and in the second run run "tags", I see

(pprof) tags
bytes: Total 3
         2 (66.67%): 325.31kB
         1 (33.33%): 902.59kB

This shows that labels can work. (I expect no output on the first run, since it is reasonable for no heap memory to have been allocated.)

What did you expect to see?

I expect to see the tags command output the label key-value pair in the program.

What did you see instead?

The tags command reports an empty value:

(pprof) tags
(pprof)

Comment From: hyangah

I believe currently only CPU profile utilizes tag information. I am quite surprised pprof on goroutine profile ever reports tags (tag 1 and 2?, kB? maybe pprof bug)

When I run "go tool pprof localhost:5555/debug/pprof/goroutine" twice and in the second run run "tags", I see...

pprof code for goroutine profile (https://golang.org/src/runtime/pprof/pprof.go?s=6343:6376#L599) depends on runtime.GoroutineProfile, and that runtime function doesn't handle tags. But I agree that it would be nice if profiles other than CPU profile can generate tagged profile samples.

Comment From: ccfrost

Thank you, @hyangah. It would have helped me if the documentation for pprof labels had noted this. Can I ask for that change to the documentation? If it'll help, I'm happy to draft it.

I think that I would find tags helpful in heap and goroutine profiles. If there is someone who would lend some guidance, I may be able to make time to work on the changes to add this. How does this sound and is anyone interested enough to discuss the high level changes and review the work?

Comment From: hyangah

@ccfrost Who will not love well-written documentation. Send a CL!

I don't think it's too hard to extend the goroutine profile to include the labels. Probably not difficult for any custom pprof.Profile (returned from pprof.NewProfile) that reports the 'current' status either.

But I am afraid it's challenging to extend profiles such as heap, block, or mutex because they involve never decremented counters (e.g. heap profile's data to support --alloc_* options).

@rsc and @aclements , what do you think about adding tags to other profile types?

Comment From: ianlancetaylor

CC @matloob

Comment From: aclements

Hi Chris!

I agree labels would be good to add to "bounded" profiles. Unfortunately, most of the profile types are "monotonic", so we'd have to hold on to labels created from the beginning of the program. Even the --inuse_* heap profiles have this problem because they can report allocations from early in the program. Maybe for heap profiles we could treat labels as garbage-collectable, so we could still report labels inuse profiles without having to hold on to all labels forever, even if we can't report them for alloc profiles.

@matloob, I remember there was some question about whether/how C++ pprof handles this for heap profiles. Did we ever figure out the answer?

Comment From: hyangah

@aclements

Does runtime keep track of the labels at the time of the object allocation already? An object can be freed by a different goroutine so the labels for the profiled block should be stored somewhere for use in free.

When using the experimental api in cl/102755 (background profiling support), we kept track of a mapping from a profiled block to the labels when the object was created in a user space. One drawback is that, if the reader is slow so a record gets lost due to the overflow, that will lead a leak in the map. If the runtime actively maintains the mapping, the experimental api can be changed to take advantage of it.

Comment From: aclements

Does runtime keep track of the labels at the time of the object allocation already? An object can be freed by a different goroutine so the labels for the profiled block should be stored somewhere for use in free.

It doesn't currently. Right now, we just have a hash table from allocation stack to bucket+memRecord, and profiled allocations have a link to the right bucket. The memRecord is just a few cumulative stats. Adding labels would probably require adding a second level to the memRecord that mapped from label set to per-label-set stats. Alternatively, we could key the main hash table by both stack and label set, but then it's unclear how to clean things out of that hash table when the last object when a given label set has been freed.

When using the experimental api in cl/102755 (background profiling support), we kept track of a mapping from a profiled block to the labels when the object was created in a user space. One drawback is that, if the reader is slow so a record gets lost due to the overflow, that will lead a leak in the map. If the runtime actively maintains the mapping, the experimental api can be changed to take advantage of it.

Yes, if the runtime was tracking the label set it could be reported on both the alloc and the free.

But isn't there a more fundamental problem here? With a log-based heap profile, if you drop records, won't that permanently skew the profile? Unlike a CPU profile, a log-based heap profile needs to match up the allocation and free records; it's okay to lose both, but not one or the other. I would think the system would have to guarantee that it can report the free record if it reports the allocation record, for example, by chaining overflow free records through the heap specials until the reader catches up.

(FWIW, I'm a big fan of log-based profiling systems because it gets aggregation out of the runtime, which both simplifies the runtime and makes it possible to plug in more sophisticated aggregation. This is something I think Linux perf got really right.)

Comment From: hyangah

The memRecord is just a few cumulative stats. Adding labels would probably require adding a second level to the memRecord that mapped from label set to per-label-set stats. Alternatively, we could key the main hash table by both stack and label set, but then it's unclear how to clean things out of that hash table when the last object when a given label set has been freed.

The first one (two levels memRecord) seems more feasible and simpler for mPtog_Flush*.

But isn't there a more fundamental problem here? With a log-based heap profile, if you drop records, won't that permanently skew the profile?

Agree the runtime must not drop the free records of sampled allocations.

Regarding the log-based heap profile: inspired by cl/102755, I experimented to generate labeled heap profiles using it, but encountered a couple of challenges, in addition to dealing with the dropped free event records.

1) Dropping the allocation record is not very ideal either if the allocated object turns out to be a long-lived, large object which users may be interested. We need to start the log reader as early as possible. If the log reader lives outside the runtime, how can we ensure it 'starts' and 'starts as early as possible'?

2) Implementation of the complex 3-stage algorithm (implemented in mprof.go). The goal of the algorithm is to avoid the bias towards malloc but provide more stable info - so currently inuse_* stat is the stat from the 'latest' GC. So, I think the runtime should log records about GC stages if the goal of this 3-stage algorithm offers is really important.

Comment From: aclements

Dropping the allocation record is not very ideal either if the allocated object turns out to be a long-lived, large object which users may be interested.

That's true, though this can always happen because it's statistical anyway (albeit more likely to sample large objects). Overflow allocation records could also be threaded through the heap specials.

We need to start the log reader as early as possible. If the log reader lives outside the runtime, how can we ensure it 'starts' and 'starts as early as possible'?

If there's a canonical log reader (say, in runtime/pprof), we can always special-case this. If there isn't, then things get trickier. One could imagine feeding in synthetic records for the current state of the heap. Obviously you'd miss out on objects that had already been freed before starting the log reader, but you wouldn't miss out on long-lived objects that had already been allocated.

Implementation of the complex 3-stage algorithm (implemented in mprof.go). The goal of the algorithm is to avoid the bias towards malloc but provide more stable info - so currently inuse_* stat is the stat from the 'latest' GC. So, I think the runtime should log records about GC stages if the goal of this 3-stage algorithm offers is really important.

Yes. It seems quite reasonable to feed GC records into the log, and that should make it possible to implement the 3 stage algorithm.

Comment From: komuw

I spent half an hour today wondering what I was doing wrong since my tags/labels were not showing up in my profiles. That is till I came across this issue. I have sent out; https://go-review.googlesource.com/c/go/+/131275 hoping to address the documentation bit.

Comment From: gopherbot

Change https://golang.org/cl/131275 mentions this issue: runtime/pprof: document labels bug

Comment From: gopherbot

Change https://golang.org/cl/189317 mentions this issue: runtime/pprof: Mention goroutine label heritability

Comment From: gopherbot

Change https://golang.org/cl/189318 mentions this issue: runtime/pprof: Plumb labels for goroutine profiles

Comment From: anacrolix

Will this be merged soon?

Comment From: jared2501

@ianlancetaylor @andybons - any chance we could revive the CR above that adds labels to goroutine profiles? I have an annoying goroutine leak that I'm debugging, for which this would be absolutely invaluable!

Comment From: ianlancetaylor

I'm not sure I understand that CL. Does "go tool pprof" understand the labels that it emits? I don't see the code for that, but I could easily be missing it.

CC @martisch who has looked at some of this code.

Comment From: jared2501

Scanning through the CL and surrounding code, it appears to add labels to debug=0 and debug=1 outputs. The debug=0 format is protobuf and is used by go tool pprof, and the protobufs already have "labels" modeled (see profile.proto#L120).

Looking through the google/pprof library, it appears to have had support for labels since it's OSS release (i.e. here they are in profile.go#L75). Also, I know that labels are displayed in go tool pprof's outputs for CPU profiles already too. So I do believe that go tool pprof will understand the labels the CL emits.

Also, since debug=1 is text, it would still be super helpful to have something like this CL, since I could still access this information using curl!

Comment From: ianlancetaylor

Thanks. Can you point me to the code where the labels: syntax is parsed?

Comment From: jared2501

@ianlancetaylor So I don't believe anything actually parses the labels: syntax, since I believe that the text output (i.e. debug=1) is there only for debugging using curl. The go tool pprof only uses the binary protobuf format, which is changed in pprof.go#L477 of that PR.

Comment From: jared2501

I.e. go tool pprof vendors google/pprof, and the code to decode labels is here: google/pprof/profile/encode.go#L434

Comment From: ianlancetaylor

Thanks.

CC @hyangah @pjweinbgo @matloob

Comment From: martisch

go tool pprof imports https://github.com/google/pprof

These samples in proto profile format generally allow labels: Proto definition.

The parsing happens with proto unmarshalling: link

The sample struct in go then contains Labels: [link] (https://github.com/google/pprof/blob/d6d2dc122c202476d1eacc56b587de6a95d4fae5/profile/profile.go#L72)

"labels: " seems to be used to construct a key to count profiles here but the key seems only be printed on text debug output.

So I would agree with @jared2501 that the "labels: " part isnt parsed but purely for debug output and generating unique keys for grouping of samples with the same labels.

Comment From: dfinkel

Hmm, it seems that I forgot to subscribe to this issue before sending out the CL under discussion.

Now I know why my CL (https://golang.org/cl/189318) suddenly woke up last week. :)

Anyway, go tool pprof has supported tags and labels for a while, and uses them when rendering CPU profiles, so that CL only needs to set the appropriate fields in the protobuf when rendering.

The assessment of the labels: line on the goroutines dump (visible by default with the net/http/pprof package as /debug/pprof/goroutine?debug=1) is correct. It's not supposed to be parsed by anything other than human eyes, although it is used for deduping/counting.

I generally consume goroutine dumps in text form, so having labels there is rather useful, IMO.

Based on @hyangah's comments, it looks like the CL will be submitted soon if no one voices strong objections.

Comment From: jared2501

Super cool to see this merge! Thanks everyone for the work on it :)

Comment From: jared2501

My next FR in this area would be to get the labels into the goroutines debug=2 traces ;) ... but at least this will help out enormously with debugging!

Comment From: dfinkel

I wanted to add them to the runtime stacktraces that are produced with debug=2, but there are a few complications that I didn't want to address in the first CL: - those traces are produced entirely within the runtime package, using the same code as the stack-traces dumped for panics - the labelMap type lives in the runtime/pprof package, and we wouldn't want the runtime package to depend on its subpackage - that code is run during panics and other critical failures, so it really shouldn't allocate, so simple go map iteration is a bit iffy.

38c2c12 was the low-hanging fruit of plumbing. I don't have time to get anything further in before the 1.15 freeze hits us later this week, but I want to loop back and push on some of the other traces (including the debug=2 dumps) eventually. I'll probably post some ideas for the necessary plumbing/accounting on this issue before writing anything more, though.

Comment From: hyangah

Can we split this into (at least) a couple of sub issues?

  • in_use (live) heap profile, and possibly other custom profiles: @aclements proposed log-based profiling inspired by the background cpu profiling. https://github.com/golang/go/issues/23458#issuecomment-378034420 Need to think about the labels can be different between alloc/free or Profile.Add/Remove.

  • allocs/mutex/block profiles: not a good, scalable solution is proposed yet. They are monotonic and the sampled profile so its labels is supposed to live forever. Maybe with the recent support for the delta profile, we can consider adopting the log-based profiling like approach, but I am not sure.

  • goroutine profile with debug=2, as described by @dfinkel's recent comment, currently it is done by calling runtime.Stack. A separate unexported function (visible only in runtime/pprof) is a possibility.

  • threadcreates profile: broken. (https://github.com/golang/go/issues/6104)

Comment From: dfinkel

@hyangah thanks for enumerating the topics being discussed here.

WRT goroutine profiles with debug=2, we can use an approach to visibility similar to https://golang.org/cl/189318, with a private function moved into runtime/pprof via a go:linkname comment. What do you think about that taking a callback argument with the signature func(p unsafe.Pointer) string to decode the label-map associated with that goroutine?

Comment From: gopherbot

Change https://go.dev/cl/694119 mentions this issue: runtime: print goroutine labels in tracebacks