Back to the email
M

That's a really elegant algorithm, which I hadn't heard of before. Thanks for sharing it!

I wondered if it's commonly used, so I checked the implementation of Python's random.sample() method, but found it does something different: it either performs the first k iterations of a Fisher-Yates shuffle (when k is close to n) [1], or it repeats draws (when k is small relative to n) [2].

The second algorithm looks suboptimal, in that it generates more than k random numbers, and at first glance it seems like it might be possible to fix this by changing while j in selected: j = randbelow(n) into if j in selected: j = n to turn it into Floyd's algorithm.

Unfortunately this is not permissible, because random.sample() returns a list, not a set, and the documentation species the result is a uniformly random permutation [3], which Floyd's algorithm as presented doesn't provide.

Separately, I found a discussion of the algorithm in ACM's Programming Pearls [4], which gives some background on its invention. You might find it interesting because it gives a third type of proof, which is based on a more general algorithm that returns a randomly permuted array (rather than a set) of k distinct elements. note that this provides exactly the behavior required by the Python documentation, which was lost in the more succinct version returning a set.

Just for fun I implemented the general algorithm in Python [5]. It's easy to see how the two versions are related, though the additional bookkeeping required makes the second algorithm not quite as elegant as the first. In theory, floyd2() could replace the Python standard library implementation, since it always produces the result in O(k) time and space and exactly k calls to randbelow(), which is slightly better than the existing implementation, in theory. However, I suspect it's slightly slower in practice. Still, it was fun to explore the possibilities.

  1. https://github.com/python/cpython/blob/95cbd4a232d9a68680059a0cb42e3f0d2994e206/Lib/random.py#L442-L448
  2. https://github.com/python/cpython/blob/95cbd4a232d9a68680059a0cb42e3f0d2994e206/Lib/random.py#L450-L457
  3. https://docs.python.org/3/library/random.html#random.sample
  4. https://dl.acm.org/doi/pdf/10.1145/30401.315746
  5. https://gist.github.com/maksverver/e1fefb559da2133fdea3c5804683c725