On August 14, 2025, Ted Unangst posted a blog post “what is the go proxy even doing?,” questioning the amount of traffic he has observed on his server humungus.tedunangst.com from proxy.golang.org. This issue tracks work to understand and fix the reported traffic.

Action Items

  • [x] disable refresh traffic for humungus
  • [x] add a proxy.golang.org FAQ entry for where to send reports of excessive traffic
  • [x] look for newer example of “thundering herd” / “flood” mentioned in blog post
  • [ ] #75119
  • [ ] #75191

Go module mirror background

The Go module mirror caches modules for the Go ecosystem and is used by default in the go command, for increased availability and to reduce load on individual servers. To reduce latency as well as improve availability, the module mirror refreshes its cached information on regular basis as long as users have requested that cached information recently.

Relevant details about the module mirror’s cache operation and refresh policies include:

  • Module zip files containing recognized LICENSE files that allow redistribution are cached indefinitely and never refreshed.
  • Module zip files without recognized LICENSE files allowing redistribution are only held in cache for 30 days and then expire. (These module versions can be recognized by having no recognized license and no displayed docs on pkg.go.dev.) If a user request arrives for a zip file not in cache, the zip file is obtained on demand, cached, and then served back; the user sees this as high latency, as well as reduced availability if the upstream server is not available at that moment. To improve both latency and availability, the module mirror aims to refresh these module zip files once they are 25 days old, but only if they have been accessed recently: module versions that stop being accessed stop being refreshed.
  • The module mirror also caches version lists, as displayed by “go list -m -version \” and also used by “go get -u”, “govulncheck”, and other tools. Cached version lists expire after 30 minutes, and the module mirror refreshes them when they reach 25 minutes old, but again only if they have been accessed recently: module lists that stop being accessed stop being refreshed. The much smaller expiry is to reduce the amount of time required for “go get -u” to see the latest version.
  • The module mirror also caches the result of branch and latest queries, used by “go get module@main” and by “go get module@latest” for modules with no tagged versions. These cached results also expire after 30 minutes and are refreshed at 25, again only if they have been accessed recently: module queries that stop being accessed stop being refreshed. And again, the much smaller expiry is to reduce the amount of time that users see stale information.
  • For scaling purposes as well as failure containment, all the individual pieces of cached module information are stored and refreshed as independent units. That is, each module version zip file is handled separately, as is the version list and the result of each version query.
  • The module mirror does not store cached copies of entire upstream version control repositories. It only caches the output of “go mod download” and “go list -json -m --versions”. The repository cache is internal to the go command, and the module mirror does not break through that abstraction. This does mean that each refresh must re-download what it needs from the upstream repository, but the implementation limits the size of that information in a few key ways, especially when using Git.
  • When using Git, a fetch of a specific module version uses “git clone --depth=1”, to avoid downloading any repo history.
  • When using Git, a fetch of the version lists does need repo history, to understand where the go.mod files are in the repository and when they existed.
  • When using Git, both of those fetches are further optimized by saving a hash of a subset of the information available from “git ls-remote”. Refreshes run “git ls-remote” and recompute that hash. If it matches the cached hash, then the cached information is still up-to-date and can have its expiry extended, without downloading any extra data from the repository. The “go mod download -reuse” and “go list -reuse” flags implement this lightweight check, but only for Git.
  • When using Mercurial, these refreshes do full repository clones, because Mercurial does not provide something like “git ls-remote” (“hg identify” has “--tags” and “--branches” flags, but these are disabled for remote repositories.)
  • When using Subversion, refreshes are cheap since Subversion never downloads full repository history.
  • When using Fossil and Bazaar, refreshes must download full repository information.
  • In the past, we have disabled refresh entirely for domains that reported receiving too much refresh traffic. Their modules are still cached, but refreshes only happen on demand, during actual requests when the cached information is determined to be too old. For example discussion on #44577 (before we implemented the lightweight Git refreshes), led to adding git.lubar.me and git.sr.ht to the no-refresh list.
  • The module mirror keeps logs related to module fetches and refreshes for only a small number of weeks, for privacy and operational reasons.

Initial Investigation

The blog post reported traffic sent to humungus.tedunangst.com. Unfortunately, the post on 2025-08-14 is reporting traffic from 2025-05-19, three months earlier. We no longer have logs from then, making it hard to piece together the exact root causes for the observed requests.

One important detail is that humungus sends an HTTP 429 (StatusTooManyRequests) in response to “hg clone” of a given repo from a Google IP address, unless it has been 24 hours since the last clone of that repo. I assume the server has done this since around 2025-03-04, when this code was committed.

Traffic to humungus has never been reported to us as problematic, except for the blog post. That is perhaps our fault, as there is not an obvious answer to where to send such a report. Others have used this issue tracker, but we could add a FAQ entry to proxy.golang.org offering that as an explicit option, perhaps with an email option for people who do not want to post publicly. In any event, humungus was not on the no-refresh list mentioned in the previous section. After preliminary investigation, we added humungus to the no-refresh list on August 21, but we continued to investigate root causes.

Our investigation has showed the following:

  • Some modules on humungus have LICENSE files, while others do not. For example gozstd has a LICENSE, meaning it displays docs and does not use module zip file refreshes, while webs does not have a LICENSE, meaning it does not display docs and does use module zip refreshes.
  • Humungus hosts 32 repos (not all Go modules) totaling 12 MB. The webs repo has 131 tags; miniwebproxy has 11; and the others all have fewer than 10.
  • For 24 hours on 2025-08-19, our logs show 904 total module-related fetches to humungus: 756 list fetches and 148 version fetches. Of these, 133 failed with I/O timeouts while fetching the HTML redirect page that would contain the hg server information, 762 were rejected during hg clone, and 9 succeeded. The total size of the rejected clones (size unpacked on disk, possibly smaller on the wire) would have been about 380 MB, or an average 40kbit/s over the course of the day. That’s not a humongous amount of bandwidth but still more than we’d like to cause a small site operator.
  • In the 12 hours after we added humungus to the no-refresh list, our logs show 21 module-related fetches: 7 list fetches and 14 version fetches. Of these, 11 had rejected “hg clone” operations. This confirms that the refresh traffic is the problem.
  • Mercurial support for -reuse would essentially eliminate all refresh clones and is not as impossible as we originally believed. Issue #75119 tracks implementing it.

Update, 2025-08-25

See comment below. Added link to #75191 in Action Items above.

Comment From: ALTree

about 380 GB, or an average 40kbit/s over the course of the day

This is off by ~1000x, 380GB/day is 35 Mb/s (either that or it's 380 MB).

Comment From: rsc

Thanks @ALTree, yes, that was a typo (MB not GB). The humungus repos total 12MB so even cloning all of them 1000 times would only be 12 GB. Fixed.

Comment From: thediveo

I highly appreciate the detailed analysis giving better insight into the "Gopher's Workings" ... which helps in diagnosing future problems much better. Would you mind also writing a blog post so this wisdom is more ready accessible and visible?

Comment From: rsc

@thediveo I think it could make sense to do a blog post once the Mercurial changes and any other mitigations have gone live on the proxy, which would be after Go 1.26.0 in February.

Comment From: rsc

Update, 2025-08-29

Ted Unangst was able to find a more recent batch of coordinated requests to humungus and sent his logs. We checked the Go module mirror's fetch logs and proxy.golang.org request logs. The coordinated requests in the recent batch were ultimately the result of many coordinated incoming HTTP requests to proxy.golang.org for different tagged versions, not background refresh traffic. The proxy.golang.org traffic was being sent all from one IP address that was also fetching the corresponding sum.golang.org entries and using a Python user-agent string, suggesting a custom module crawler of some kind.

(As a reminder to any module crawler authors, whenever possible, please use https://proxy.golang.org/cached-only instead of https://proxy.golang.org, as documented on https://proxy.golang.org. It will be faster for you and for us!)

The row of dominoes that had to line up to cause the recent batch on humungus was:

  • Many HTTP requests to proxy.golang.org for different versions of the same module (from this Python crawler).
  • Module versions expired because of not having a LICENSE file.
  • Module versions not prefetched because of rate-limited 429s.

If not for the 429s, humungus would have seen about one hg clone per tagged version per 25 days of background version refresh traffic, and proxy.golang.org would have handled the module version scan without contacting humungus at all.

As for what happened in May, it is hard to say why the coordinated batch hit humungus so quickly after Ted made the mirror aware of the new tagged version. When a new module version is indexed, it is added immediately to the index.golang.org feed. Ted says he was not himself running any programs that would have fetched different versions at that moment. Perhaps some crawler was watching the feed and reacted by trying to fetch a handful of versions from proxy.golang.org, which fell through to humungus for all the same reasons as above. That explanation is not entirely satisfying, but it is plausible and perhaps the best explanation we will ever have.

The module version scans explain the traffic bursts on humungus. The background refresh traffic, which contributed more to the daily average than any traffic bursts, was also too high. Investigating the background refresh logic, we have realized that the refresh of version lists and version queries is not optimally paced. See #75191 for those details.