#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1
    metrics_test.go:1065: lock contention growth in runtime/pprof's view  (0.067019s)
    metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.067012s)
    metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=56713089
    metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200)
--- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.10s)

watchflakes

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"
2024-07-07 19:43 gotip-linux-ppc64le_power9 go@5565462a runtime:cpu4.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log) === RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1065: lock contention growth in runtime/pprof's view (0.067019s) metrics_test.go:1066: lock contention growth in runtime/metrics' view (0.067012s) metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=56713089 metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.10s)

watchflakes

Comment From: mknyszek

@rhysh Any chance this is an old flake that has probably been resolved? Does it appear before any of your fix CLs? Thanks.

Comment From: rhysh

Hmm, no, that commit is between go1.23rc1 and go1.23rc2.

It's powerpc64, which has had problems with this test in terms of its nanotime and cputicks clocks not lining up. But the part of the test that failed here is in counting the contention events.

Maybe cputicks returned 0 at the start of lock2, and we don't think profiling is enabled: https://github.com/golang/go/blob/go1.23rc2/src/runtime/mprof.go#L664

Maybe cputicks went backwards, and we dropped the event: https://github.com/golang/go/blob/go1.23rc2/src/runtime/mprof.go#L680

That seems more likely — cputicks is not guaranteed to be monotonic: https://github.com/golang/go/blob/go1.23rc2/src/runtime/cputicks.go#L9

But this test relies on it being monotonic, and (between the two calls a thread makes in lock2) strictly increasing.

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"
2024-07-26 20:11 gotip-openbsd-ppc64 go@77a3c3b9 runtime:cpu4.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log) === RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1065: lock contention growth in runtime/pprof's view (6.160510s) metrics_test.go:1066: lock contention growth in runtime/metrics' view (6.160458s) metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=6098006161 metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (9.25s)

watchflakes

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime:cpu4" && test == "TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1"
2024-08-07 16:07 gotip-openbsd-ppc64 go@a4a130f6 runtime:cpu4.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log) === RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1065: lock contention growth in runtime/pprof's view (4.179351s) metrics_test.go:1066: lock contention growth in runtime/metrics' view (4.179299s) metrics_test.go:1104: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=199 value=4145036493 metrics_test.go:1192: mutex profile reported contention count different from the known true count (199 != 200) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (4.23s)

watchflakes

Comment From: mauri870

Similar failure, this time on Linux Arm https://github.com/golang/go/issues/68781

Comment From: rhysh

The block and (user-space) mutex profiles have a minimum number of cycles for the events they record ... and they clamp any non-conforming events to that minimum. The block profile has a minimum of 1, and the mutex profile has a minimum of 0.

The runtime-internal mutex profile has a minimum number of cycles for the events it records ... but it drops any non-conforming events.

I haven't been able to reproduce the failures, so I don't know if this is the explanation. But it's a relevant-looking difference, so I'll send a patch to clamp instead of drop.

https://github.com/golang/go/blob/go1.23rc2/src/runtime/mprof.go#L505-L508 https://github.com/golang/go/blob/go1.23rc2/src/runtime/mprof.go#L851-L854 https://github.com/golang/go/blob/go1.23rc2/src/runtime/mprof.go#L679-L682

Comment From: gopherbot

Change https://go.dev/cl/604355 mentions this issue: runtime: record all sampled mutex profile events