Recipe 18.4. Generating Random Samples Without ReplacementCredit: Raymond Hettinger ProblemYou need to generate random samples without replacement out of a "population" (the integers between included and some n excluded), and you want better memory consumption characteristics than random.sample provides. SolutionA generator for this purpose requires only constant memory and makes a small number of calls to random.random: import random def sample(n, r): " Generate r randomly chosen, sorted integers from [0,n) " rand = random.random pop = n for samp in xrange(r, 0, -1): cumprob = 1.0 x = rand( ) while x < cumprob: cumprob -= cumprob * samp / pop pop -= 1 yield n-pop-1 Discussionrandom.sample(xrange(10), 3) produces output statistically equal to list(sample(10, 3)) using this recipe's sample. Differently from random.sample(xrange(n), r), this recipe's sample(n, r) requires a bounded amount of memory (which does not grow with either r or n) and is guaranteed to make only r calls to random.random. Moreover, this recipe's sample yields the r numbers of the sample in sorted order, while random.sample returns them in random orderwhich may be insignificant or a crucial advantage one way or the other, depending on your application's needs. A definite advantage of random.sample is that its running time is O(r), while this recipe's sample function's running time is O(n). This recipe was inspired by Knuth's Algorithm S in Donald E. Knuth, The Art of Computer Programming, Volume 3, Seminumerical Algorithms, in section 3.4.2. However, this recipe has one improvement over Knuth's algorithm: by tracking a cumulative probability for each selection, this recipe eliminates the need to make n calls to random.random. A potential major improvement would be to find a direct formula giving the same result as the inner loop: given x, samp, and pop, compute the index of the first sample. Finding this formula would reduce the running time to O(r). See AlsoLibrary Reference and Python in a Nutshell docs about random. |