@rsc says that watchflakes has code to detect broken builders. It would be nice to turn that awareness into issues, especially for breakages on x/ repos, where problems tend to go unnoticed for a long time.

cc @cherrymui, @bcmills

Comment From: bcmills

I suggest: - If the main repo is broken, watchflakes should automatically file a single issue for all affected builders, and automatically close the issue if/when all of those builders pass again. - If only specific repo is broken, watchflakes should automatically file a single issue for all affected builders, and automatically close it if/when they pass again. - If watchflakes posts more than N (5?) failures for a single builder, GOOS, or GOARCH in a 24h period, it should file an issue with a default rule for the affected builder, GOOS, or GOARCH (to avoid spamming the issue tracker with duplicates). (#59379 is an example of such an issue.)

Comment From: bcmills

I just filed #61891 for a runtime/pprof test failure introduced about a week ago. It would be nice not to have to file issues like that manually.

Comment From: bcmills

Some more examples of test failures that had to be reported manually: - Subrepo failures in x/mobile and x/pkgsite-metrics due to the changes for #61035. - Long-standing test failures in x/telemetry. - #62137 - #62138

As far as I can tell those failures were only noticed because I happened to look at the dashboard this morning.

Comment From: bcmills

More examples this week: - https://go.dev/cl/527758 was broken on js and wasip1 since last Wednesday (now fixed). - The -longtest builders have been broken on x/tools since last Tuesday (#62703). - The runtime package has been failing two tests on the ios builder since Sep. 8 (#62700, #62671) - x/telemetry has been persistently broken on js/wasm (now fixed) and wasip1/wasm (#62704) since Aug. 21.

Comment From: bcmills

  • x/build has been broken on windows/386 since June (#63243).

Comment From: bcmills

  • 63794 also had to be reported manually, although fortunately it was only ~1 day after the breakage occurred.

Comment From: cherrymui

I found the failure shortly after I submitted the CL, and sent a CL to fix. I didn't file an issue.

Comment From: findleyr

Right now, there are several broken builders on the dashboard, some going undetected for over a week. It would be good to prioritize this issue for the next friction fixit.

Comment From: gopherbot

Change https://go.dev/cl/601439 mentions this issue: cmd/watchflakes: report consistent failures at top

Comment From: gopherbot

Change https://go.dev/cl/602036 mentions this issue: cmd/watchflakes: add the ability to query for broken bots

Comment From: dmitshur

There's a recent case where watchflakes doesn't seem to be reporting a consistent failure. Filed #68753 for it.