go 1.8.4
Random sampling is a very commonly used method to draw samples from a collection, like random.sample(population, k) in python.
I hope there is a plan to add it in.
Comment From: ALTree
The reason such a method is useful in python is that you can pass to it different kinds of collections, so it provides a nice unified interface.
In Go you can only draw a sample from slices and maps, but there's no overloading so you'd have to add two different functions (SampleSlice
and SampleMap
), losing the advantage of a nice unified interface for random sampling.
Comment From: pciet
The range random map access implementation detail does invite using it to get random elements.
A built-in like this would be useful for me:
// pick returns a slice (len = n) of pseudorandom elements
// in unspecified order from c which is an array, slice, or map.
for i, e := range pick(c, n) {
Comment From: ALTree
The range random map access implementation detail does invite using it to get random elements.
This is a very bad idea if you need the distribution to be uniform. It won't be. We should discourage doing that.
Comment From: uluyol
This is easy enough today
idx := rand.Perm(n)
sample := make([]T, n)
for i := 0; i < k; i++ {
sample[i] = population[idx[i]]
}
or with go1.10
sample := append([]T(nil), population...)
rand.Shuffle(n, func(i, j int) { sample[i], sample[j] = sample[j], sample[i] })
sample = sample[:k]
Comment From: dongweigogo
@uluyol LGTM
Comment From: josharian
Sounds like this can be closed. Please comment if that’s not right.
Comment From: znkr
Not trying to reopen this, but I just ran into this problem where I wanted to have a small sample from a very large slice. I ended up implementing sampling myself because rand.Shuffle
would have shuffled the whole array which is a bit inefficient. I ended up using this code:
picked := make(map[int]struct{}, n)
sample := make([]string, 0, n)
for len(sample) < n {
i := rand.IntN(len(population))
if _, ok := picked[i]; ok {
continue
}
sample = append(sample, population[i])
picked[i] = struct{}{}
}
Comment From: josharian
https://github.com/josharian/vitter might be helpful. (I had it just sitting around on my disk, so I tossed it into a module just now.)
Comment From: josharian
(Incidentally, the use case of dealing with sparse sampling from very large populations is exactly where Vitter's Algorithm D shines, and why I implemented it.)