Recipe10.2.Generating Easily Remembered Somewhat-Random Passwords

Recipe 10.2. Generating Easily Remembered Somewhat-Random Passwords

Credit: Luther Blissett

Problem

You need to create new passwords randomlyfor example, to assign them automatically to new user accounts. You want the passwords to be somewhat feasible to remember for typical users, so they won't be written down.

Solution

We can use a pastiche approach for this, mimicking letter n-grams in actual English words. A grander way to look at the same approach is to call it a Markov Chain Simulation of English:

import random, string class password(object):     # Any substantial file of English words will do just as well: we     # just need self.data to be a big string, the text we'll pastiche     data = open("/usr/share/dict/words").read( ).lower( )     def renew(self, n, maxmem=3):         ''' accumulate into self.chars `n' random characters, with a             maximum-memory "history" of `maxmem` characters back. '''         self.chars = [  ]         for i in range(n):             # Randomly "rotate" self.data             randspot = random.randrange(len(self.data))             self.data = self.data[randspot:] + self.data[:randspot]             # Get the n-gram             where = -1             # start by trying to locate the last maxmem characters in             # self.chars.  If i<maxmem, we actually only get the last             # i, i.e., all of self.chars -- but that's OK: slicing             # is quite tolerant in this way, and it fits the algorithm             locate = ''.join(self.chars[-maxmem:])             while where<0 and locate:                 # Locate the n-gram in the data                 where = self.data.find(locate)                 # Back off to a shorter n-gram if necessary                 locate = locate[1:]             # if where==-1 and locate='', we just pick self.data[0] --             # it's a random item within self.data, tx to the rotation             c = self.data[where+len(locate)+1]             # we only want lowercase letters, so, if we picked another             # kind of character, we just choose a random letter instead             if not c.islower( ): c = random.choice(string.lowercase)             # and finally we record the character into self.chars             self.chars.append(c)     def _ _str_ _(self):         return ''.join(self.chars) if _ _name_ _ == '_ _main_ _':     "Usage: pastiche [passwords [length [memory]]]"     import sys     if len(sys.argv)>1: dopass = int(sys.argv[1])     else: dopass = 8     if len(sys.argv)>2: length = int(sys.argv[2])     else: length = 10     if len(sys.argv)>3: memory = int(sys.argv[3])     else: memory = 3     onepass = password( )     for i in range(dopass):         onepass.renew(length, memory)         print onepass

Discussion

This recipe is useful when creating new user accounts and assigning each user a different, random password: it uses passwords that a typical user will find it feasible to remember, hopefully so they won't get written down. See Recipe 10.1 if you prefer totally random passwords.

The recipe's idea is based on the good old pastiche concept. Each letter (always lowercase) in the password is chosen pseudo-randomly from data that is a collection of words in a natural language familiar to the users. This recipe uses the file that is /usr/share/dict/words supplied with Linux systems (on my machine, a file of over 45,000 words), but any large document in plain text will do just as well. The trick that makes the passwords sort of memorable, and not fully random, is that each letter is chosen based on the last few letters already picked for the password as it stands so far. Thus, letter transitions will tend to be "repetitive" according to patterns that are familiar to the user.

The code in the recipe takes some care to locate each time a random occurrence, in the text being pastiched, of the last maxmem characters picked so far. Since it's easy to find the first occurrence of a substring, the code "rotates" the text string randomly, to ensure that the first occurrence is a random one from the point of view of the original text. If the substring made up with the last maxmem characters picked is not found in the text, the code "backs down" to search for just the last maxmem-1, and so on, backing down until, worst case, it just picks the first character in the rotated text (which is a random character from the point of view of the original text).

A break in this Markov Chain process occurs when this picking procedure chooses a character that is not a lowercase letter, in which case, a random lowercase letter is chosen instead (any lowercase letter is picked with equal probability).

Here are a couple of typical sample runs of this pastiche.py password-generation script:

[situ@tioni cooker]$ python pastiche.py yjackjaceh ackjavagef aldsstordb dingtonous stictlyoke cvaiwandga lidmanneck olexnarinl [situ@tioni cooker]$ python pastiche.py ptiontingt punchankin cypresneyf sennemedwa iningrated fancejacev sroofcased nryjackman

As you can see, some of these are definitely word-like, others less so, but for a typical human being, none are more problematic to remember than a sequence of even fewer totally random, uncorrelated letters. No doubt some theoretician will complain (justifiably, in a way) that they aren't as random as all that. Well, tough. My point is that they had better not be, if some poor fellow is going to have to remember them! You can compensate for this limitation by making them a bit longer. If said theoretician demonstrates how to compute the entropy per character of this method of password generation (versus the obvious 4.7 bits/character, the base-2 logarithm of 26, for passwords made up of totally random lowercase letters), now that would be a useful contribution indeed. Meanwhile, I'll keep generating passwords this way, rather than in a totally random way. If nothing else, it's the closest thing I've found to a useful application for the lovely pastiche concept.

The concept of passwords that are not totally random, but rather a bit more memorable, goes back a long wayat least to the 1960s and to works by Morrie Gasser and Daniel Edwards. A Federal Information Processing Standard (FIPS), FIPS 181, specifies in detail how "pronounceable" passwords are to be generated; see http://www.itl.nist.gov/fipspubs/fip181.htm.

Recipe10.2.Generating Easily Remembered Somewhat-Random Passwords