Section 8.4. Designing a Challenge Question Authentication System | Security and Usability: Designing Secure Systems That People Can Use

8.4. Designing a Challenge Question Authentication System

Previous sections presented options for question and answer types and criteria upon which to evaluate a challenge question system design. This section discusses the design of a complete authentication system.

8.4.1. Determining the Number of Questions to Use

Usability tends toward requiring fewer questions and answers. This lessens the recall requirements for an individual, and also introduces fewer repeatability mistakes. For reasons of security, however, it is often necessary that more than one question-answer pair be registered by a user. This is to ensure a sufficient difficulty for either guessing or observing the answer. To ensure a sufficient level of protection against guessing, the entropy for the answers should provide a level of security similar to that for routine authentication (that may be performed with a password). In situations in which no complementary security measures are used (see the later section, "Complementary Security Techniques"), the entropy for the answers should be at least that for the routine authentication. In terms of guessability considerations, the strength of a challenge question system can be measured explicitly against password-based authentication. For example, an 8-character password constructed from the set of 52 upper- and lowercase characters, 10 numbers, and 32 punctuation characters, has approximately 2⁵² possible passwords. An 8-character answer to a question that uses only lowercase characters has only 2³⁸ possibilities.

Unfortunately, this number is misleading. Answers to questions cannot be expected to conform to the same, strict rules as for passwords; otherwise, the answers effectively become passwords. Instead, we would expect many answers to be dictionary words. This can be a problem, as most dictionaries have only between 2¹⁶ and 2²⁰ words, while studies show that many adults have vocabularies between 2¹⁵ and 2¹⁷ words. Thus, even with these extremely optimistic values, at least two questions would need to be asked to ensure security similar to that provided by an 8-character password. In addition, as discussed earlier under "Security Criteria," observability is important, but unfortunately is less quantifiable.

When more than one question is asked, both the interface and the administrative storage for the answers should ensure that multiple answer attempts are all validated before an indication of success or failure is given. Through the interface, for example, if two questions are asked, an indication of success or failure should be given only when both questions have been answered. If not, an attacker can guess answers to one question at a time, even though the entropy level of only one question might not provide a sufficient level of security. Similarly, as with the storage of passwords, answers should be obscured (hashed), but additionally, when multiple answers are used, they should be obscured in such a way that if the obscured answers are compromised, an attacker would have to guess all answers before determining the success of his guess. This can be achieved, for example, by inputting all answers to a single hash rather than by separately hashing each answer.

Variations exist where the number of questions presented at recovery is less than the number of questions registered. There are at least two models:

The user registers n questions, but is presented only t t questions must be answered properly in order for the recovery process to continue.

The user registers n questions, and is presented t n questions upon recovery. Differing from the previous model, only The first model is an attempt to offer a level of security equivalent to that of n questions, but to provide a usability benefit at the time of recovery, with fewer questions to the user. However, the usability benefits appear only to reduce the time required for recovery and do not affect the arguably more important concerns of memorability and repeatability (the user still has to remember the answers for n questions, as it is not known what questions will be posed at recovery). Yet there is some benefit for users who register n questions, but after a period of time happen to forget the answers to some of these questions. The purpose of the second option is to tolerate mistakes upon answer presentation. However, it seems that an additional question is being used to tolerate such mistakes whereas a more usable system might attempt to reduce the number of questions used.

For a set of candidate questions, some form of question grouping might be beneficial. For example, supposing that three questions are to be registered, it may be advantageous to require that one fixed and two controlled questions be selected, and for these questions, that a combination of fact-based and opinion-based questions be used. Alternatively, questions might be classified based on their topic so that users might have to select one question that required them to enter a "date" response, while the second might require a numeric response and the third, an alphabetic response. Finally, if one can classify users' questions based on their security strength, the system could offer multiple classes where a user must select one question from each class as part of registration.

8.4.2. Determining the Types of Questions and Answers to Use

The types of questions and answers used contribute to both the security and the usability of the challenge question authentication system.

8.4.2.1 Determining the appropriate question type

With fixed questions, individuals are not required to conceive of their own questions at registration, perhaps offering an advantage to some. An open question has potential for improved memorability and improved applicability for individuals who are better able to recognize information that is more memorable to them, and to construct an appropriate question from this information. However, asking for a completely open question might require too much novelty. A controlled question seems to support a reasonable compromise whereby only part of the question development is delegated to the individual. For example, a question may be as simple as "Enter a number that is memorable for you" (giving some content control and guidance for the individual), and the individual can provide the hint, "Grade 8 locker," thereby providing some equivalence to an open question. However, controlled questions also share the weaknesses of open questions, as the question or hint entered can be insecure by providing too much guidance for the answer to an attacker. Notice, however, that the repeatability and the memorability of the hint are not a concern because the hint is shown to the user upon answer presentation.

With a fixed question, individuals are prevented from a potentially insecure question selection (e.g., "What color are my eyes?") whose answer space is exhausted easily, thereby providing a security advantage to an attacker. With an open question, individuals might select a question that is potentially insecure, although capable individuals are able to select more secure questionsfor example, individuals are able to customize questions directly related and meaningful to their childhood. In addition, with open questions, individuals can form associative word pairs (e.g., the word "cat" might associate with the word "my pet," or possibly with "shedding").

When developing questions for a challenge question system, further distinctions can be made. One such distinction is that of fact-based versus opinion-based questions.^[2] Fact-based questions relate to factual statements regarding an individual. Such questions might be expected to have less varying answers over time, and can be constructed as such (e.g., by asking for the first place the individual lived rather than his most recent residence). Care must be taken, though, as the answers to such questions (involving factual information about a user) might be more readily available to an attacker. Opinion-based questions relate to beliefs an individual has, and thus may be more susceptible to change over time. However, they should be less pervasive than the answers to fact-based questions, as opinions might be less frequently presented and recorded as part of the individual's day-to-day activities.

^[2] W. Haga and M. Zviran, "Question-and-Answer Passwords: An Empirical Evaluation," Information Systems 16:3 (1991), 335343.

8.4.2.2 Determining the appropriate answer type

Individuals can be prevented from selecting insecure answers if the system requires choosing an answer from a set of fixed answers. Such systems must be designed to disallow answers that would be very common and thus easily guessed by an attacker. However, memorability and repeatability may be hampered if there is no unique answer to satisfy an individual's preference (either the individual's first choice is not available, or more than one satisfactory choice is available). With open answers, larger variation in the answer space is provided, although for certain questions, a user would be able to select highly probable answers. Memorability may be better than with fixed answers, although repeatability can be problematic if the registered answer is ambiguous (e.g., "St." versus "Street"). Controlled answers offer an alternative whereby a large answer space can be used, but control over the possible values improves repeatability. There do not seem to be any significant security advantages offered by using a controlled answer instead of by supporting a large answer space.

An interesting option is supported with answers whereby the answer registered and the answer presented need not be of the same type. Two such options are:

Fixed answer at registration; open answer at authentication. When registering his question and answer, the individual is provided a fixed answer set corresponding to the question. However, at subsequent authentication, an open answer is designed, allowing the individual to enter his response, rather than choosing from a list. Still, as noted earlier, fixed answers at registration are problematic.
Open answer at registration; fixed answer at authentication. When registering his question and answer, the individual provides a free-form response. However, at subsequent authentication, a fixed list of answers is provided, one of which is the correct answer originally chosen by the individual. This option offers improved repeatability and can be an advantage to individuals with poor memories.

Expanding upon the open-fixed option, a likely implementation might involve the storage of a set of "fake answers" along with the user's given answer upon registration. At answer presentation, the user's answer would consistently be presented along with the same set of fake answers. There are numerous issues to consider regarding the secure implementation of such a system. In particular:

The "fake answer" set must not be repeated across users; otherwise, an attacker could easily determine the fake answer sets (and thus eliminate and recover the user's submitted answer) by attempting to recover two or more users.
The fake answer set must be consistent from one recovery attempt to the next; otherwise, an attacker could identify the user's answer as the only consistent answer across a number of recovery attempts.
The fake answer set must be changed should the user choose to modify his submitted answer; otherwise, an attacker (aware of a potential answer update) could determine the user's answer from the variance in the answer sets from before and after the update.

Care must be taken in the selection of the fake answer sets for each user so that the user's submitted answer is sufficiently concealed by the fake answers. For example, suppose that the user is asked the question "What is your favorite fruit?" but answers with the word "mushroom." In this case, if only fruits were provided as part of the fake answer set, the user's submitted answer would be easily distinguishable. Optionally, "incorrect" fake answers might be provided in order to anticipate any user variance and serve to confuse would-be attackers. Finally, two security problems present themselves with this scenario:

The size of the fake answer set should be large enough to resist exhaustive guessing attacks against the individual user.
The user's submitted answer must not be hashed, as it must be presented to the user as part of answer presentation. Thus, while great care must be taken for this solution, it does offer an interesting variation.

8.4.3. Complementary Security Techniques

In addition to the construction, evaluation, and grouping of questions, additional techniques can be used for authenticating individuals, some of which are more suited to a recovery system than to general user authentication. Most notably, mailing to an address of record is a useful tool. For example, if the address of record is an email address, then as part of the recovery process an appropriate message can be emailed giving instructions. Some recovery systems will even choose to rely only upon a mailing, and not include any additional authentication (e.g., with a challenge question). While it is certainly possible for an attacker to intercept unprotected email (if an individual needs to recover, he likely won't have key material to support a protected email message), the decision to use an additional factor is a risk management decision. When combined with a challenge question recovery system, an additional factor is used, and an email might be sent immediately after the user has answered the challenge questions successfully. By using an address of record, additional security is provided, as other security precautions are typically in place to control access to that address.^[3] Adding another communication step does impact usability; however, the extent to which this is true depends primarily upon the amount of time required for this step to be completed. For some accounts, such latency may not be tolerable.

^[3] S. Garfinkel, "Email-Based Identification and Authentication: An Alternative to PKI?", IEEE Security and Privacy (Nov./Dec. 2003).

In addition, there are additional security measures that can greatly improve the usability of a challenge question system by reducing the security rigor that is applied to each question and possibly reducing the number of questions. These include:

A system lockout feature whereby access to the recovery functionality would be reduced or removed after a number of failed attempts.
A "graduated lockout" feature that would reduce access over time, perhaps locking out recovery for a fixed period of time after some number of failed recovery attempts, and fully blocking the recovery after some number of temporary lockouts.

CANADA'S GOL SOLUTION

A candidate challenge question system, based upon the framework in this chapter, was recently designed in support of Canada's Government OnLine solution.^[4] Input to some of the design decisions came from a focus group consisting of 17 individuals from the general population that had Internet experience. Compared to a previous five-question system that was perceived negatively, participants of the current group appreciated the following three-question system:

Question 1. Consists of 15 fixed questions, where the focus group input was used to determine several of these questions. The corresponding answer is open, both at registration and recovery. Some of the fixed questions proposed for this fixed list include: "What was my first pet's name?" "Where did I first meet my significant other?" and "What was the last name of my childhood best friend?"
Question 2. Consists of a controlled question, "Please choose a person who is memorable to you," and an open hint. Originally, a fixed hint was used, but participants were not comfortable with the choices it offered, as they had difficulty mapping their desired hint to a single selection of a fixed hint.
Question 3. Consists of a controlled question, "Please choose a date that is memorable to you," and an open hint. The corresponding answer is controlled at both registration and recovery, consisting of drop-down selections for each of year, month, and day.

Free-form answers are normalized, removing whitespace, some punctuation, and capitalization. A confirmation page is displayed to confirm the user's answers. Some additional lessons learned from the focus group include the following:

Although questions related to "first-time" events are good for repeatability, they can be more difficult for older users to recall.
Regarding questions with calendar date answers, participants indicated an inability to recall more than a half-dozen dates. However, even in this situation, such a question offers strength against a random attack, while being more susceptible to a targeted attack. Thus, additional questions and/or complementary security techniques should also be used.
Although participants indicated a preference for open questions, the candidate list of questions they provided did confirm the designers' assumptions that an insufficient level of security would be attained for open questions.

^[4] Mike Just, "An Overview of Public Key Certificate Support for Canada's Government On-Line (GoL) Initiative," Proceedings of the 2nd Annual PKI Research Workshop (April 2003).

Of course, the denial-of-service implications of using such features must be considered carefully. Reverse Turing Tests (e.g., CAPTCHA^[5]) help reduce the likelihood of success for automated attacks.^[6] Client puzzles^[7] offer a variation for limiting the effectiveness of denial-of-service attacks, whereby the client is required to perform additional computations before his request can be processed.

^[5] The CAPTCHA Project; http://www.captcha.net/.

^[6] G. Mori and J. Malik, "Up to the Challenge: Computer Scientists Crack a Set of AI-Based Puzzles," SIAM News (Nov. 2002).

^[7] J. Brainard and A. Jules, "Client Puzzles: A Cryptographic Defense Against Connection Depletion," Proceedings of the Network and Distributed System Security (NDSS) Symposium (Feb. 1999).