We can summarize the points of Holmes’s methodology that apply to software debugging under the following categories:
Use cross-disciplinary knowledge
Focus on facts
Pay attention to unusual details
Gather facts before hypothesizing
State the facts to someone else
Start by observing
Exclude alternative explanations
Reason in both directions
Watch for red herrings
To apply the wisdom of Sherlock Holmes to debugging software, we must consider the analogy between software defects and crime. We must answer the following questions:
Who did it?—suspect
How did the culprit do it?—means
When did the culprit do it?—opportunity
Our approach to motive is somewhat different, since we assume that all defects were caused accidentally. We are interested in an answer to the question, why did this problem happen? We treat the why question the way an accident investigator would, rather than the way a detective would. The detective seeks to assign guilt for a crime. The investigator seeks to find contributing causes, to prevent the accident from occurring again.
Detectives don’t need to find a motive either, although juries are more easily convinced when a motive is given. If the prosecution presents an ironclad case that only one person had both the means and opportunity to commit a crime, a rational jury should convict the defendant. By analogy, if only one possible piece of code had both the means and opportunity to cause an observable failure, the programmer should focus on that code segment as the cause of the defect.
In The Valley of Fear [Do15], Holmes is called in to assist in the investigation of a grisly murder in the country. As the police are pursuing a man who fits the general description of the suspect all over the country, Holmes tells them to give up the chase. They want an explanation. He describes some interesting features of the ancient manor house where the body was discovered. The police inspectors complain that he’s making fools of them, and he replies, “Breadth of view … is one of the essentials of our profession. The interplay of ideas and the oblique uses of knowledge are often of extraordinary interest.”
In The Five Orange Pips [Do92], a young man asks Holmes to help him. His uncle and father have died from unusual accidents, and he has received the same mysterious threats that they did. After the man leaves 221B Baker Street, Holmes and Watson discuss the meager evidence:
Problems may be solved in the study which have baffled all those who sought a solution by the aid of their senses. To carry the art, however, to its highest pitch, it is necessary that the reasoner should be able to utilize all the facts which have come to his knowledge; and this in itself implies, as you will readily see, a possession of all knowledge, which, even in these days of free education and encyclopedias, is a somewhat rare accomplishment.
When developing applications software, it’s as important to understand the application as the software to be able to diagnose defects. Back in the bad old days, corporate information technology (IT) departments were referred to as data processing (DP). Those DP departments employed an artificial separation of tasks between systems analysts and programmers. Systems analysts talked with the users, understood their requirements, and translated them into specifications for programmers to implement.
Nowadays, most development methodologies encourage programmers to work closely with users to develop prototypes, create specifications, and the like. The Extreme Programming movement, not surprisingly, has taken this trend to its logical conclusion [Be99]. It advocates daily feedback from the users on what the programmers created the previous day. Not every project can be developed using the Extreme Programming approach, but the advocacy of close interaction with the client is a good idea.
Domain-specific knowledge makes it possible to identify problems in logic that are plain to the user of the application. At a minimum, you should learn enough about the application domain to master its jargon and fundamental concepts. If you’re using the object-oriented analysis and design methodology, you will be identifying entities in the real world that should be modeled by objects in your program. This is difficult without having some understanding of the application domain.
In The Adventure of the Copper Beeches [Do92], a young woman asks Holmes’s advice. She wants to know whether she should take a position as a governess, which has some unusual requirements. She decides to take the job and promises to telegraph Holmes if she needs help. As Holmes waits for an update on her situation, Watson observes him muttering, “Data! Data! Data! I can’t make bricks without clay.”
In A Study in Scarlet [Do88], we’re first introduced to the characters of Dr. Watson and Sherlock Holmes. Inspector Gregson of Scotland Yard asks Holmes to help him with a baffling case of murder. Holmes and Watson review the scene of the crime, and Holmes makes a number of deductions about the murderer. Holmes and Watson then head off to interview the policeman who found the corpse. On the way, Holmes remarks, “There is nothing like first-hand evidence.”
One of the fundamental tenets of modern science is that results must be reproducible to be accepted. Holmes understood this principle well. The police he worked with were often convinced that a suspect was guilty based on their view that the suspect had a motive. Motives are purely subjective. They can’t be observed or measured.
Holmes focuses on facts. He seeks to establish that a suspect has access to the means used to commit the crime and opportunity to commit the crime. These criteria are objective and can be proved or disproved with facts.
Don’t accept defect reports unless you’re given sufficient data to reproduce the problem. If you don’t have sufficient data to reproduce the problem, you only have second-hand evidence that a problem exists. You can get very frustrated attempting to debug an alleged defect without sufficient data.
Just because someone is technically adept, don’t assume that person is capable of providing a reproducible defect description. Some of the most useless defect reports we have ever received were written by experienced programmers.
In A Case of Identity [Do92], Holmes is visited by a young woman who wishes to find her fianc , who disappeared shortly before they were to be married. As she describes the man, Holmes remarks on the value of the details she is providing: “It has long been an axiom of mine that the little things are infinitely the most important.”
In The Boscombe Valley Mystery [Do92], Holmes requests that Watson take a few days away from his practice and his wife. He wants a companion to journey with him to investigate an unusual murder case near the Welsh border. Holmes describes the murderer in great detail to police detective Lestrade, after observing the scene of the crime. Lestrade leaves to pursue his own methods. Holmes then explains to Watson how he came to be certain of the details of his description: “You know my method. It is founded upon the observance of trifles.”
In The Man with a Twisted Lip [Do92], Watson discovers Holmes in disguise in an opium den. Watson is helping a woman who believes she saw her husband looking out the window on the second floor of that same building. The husband seems to have disappeared, and evidence indicates that he may have been murdered. When Holmes and Watson visit her home, she receives a letter indicating that he’s alive. Holmes observes several interesting characteristics of the letter and envelope and observes, “It is, of course a trifle, but there is nothing so important as trifles.”
In A Study in Scarlet [Do88], Holmes solves a case of revenge murder, which had taken the victims and their pursuer across the Atlantic. Holmes reviews how he solved the case with Watson: “I have already explained to you that what is out of the common is usually a guide rather than a hindrance.”
In The Reigate Puzzle [Do93], Sherlock Holmes accompanies Dr. Watson to the country estate of one of the officers Watson served with in India. The purpose of the trip is a holiday to restore Holmes’s strength. He had exhausted himself with a very important case of financial fraud. When a series of burglaries results in a murder, the local constabulary calls Holmes in for assistance. After nearly getting killed himself when he identified the villain, he later explains to Watson and his friend how he came to the solution: “It is of the highest importance in the art of detection to be able to recognize out of a number of facts, which are incidental and which vital. Otherwise your energy and attention must be dissipated instead of being concentrated.”
To know what is unusual, you must know what is common. What should be common in an operational piece of software is correct behavior.
Correct behavior is defined by an external reference. Some common external references include the following:
A standards document
A functional specification document
A prototype system
A competitive product
The correct behavior of some software is defined by a document issued by a standards group, such as ANSI/ISO or the IEEE. Compilers for programming languages, for example, must accept as valid all programs that the standard for the language defines as valid. They must also reject as invalid all programs that the standard for the language defines as invalid. If you’re working with a product whose behavior is defined by a standards document, you should always compare bug reports against the standard.
The correct behavior of most software is defined by a functional specification document. This document defines what the software should do when presented with both valid and invalid inputs. It doesn’t define how the software should achieve this result. This is normally done in an internal design document. If you’re working with a product whose behavior is defined by a functional specification, you should always compare bug reports against the specification.
The correct behavior of some software is defined by the behavior of other software. In the case of a prototype, the customer or sponsor agrees that the behavior of the prototype is how the final product should work. In the case of a replacement product, the customer or sponsor mandates that the behavior of a competitive product is how the final product should work. If the purpose of your product is to replace software that comes from another source, you may even choose to be “bug-for-bug” compatible.
Without an external reference that defines correct behavior, trying to resolve a defect report is like flying blind. You only know you’re done when you hit the ground.
In A Study in Scarlet [Do88], Inspector Gregson of Scotland Yard asks Holmes to help him with a baffling case of murder. On the way to the scene of the crime, Watson is puzzled by Holmes’s lack of attention to the matter at hand. Holmes replies, “No data yet. It is a capital mistake to theorize before you have all the evidence. It biases the judgment.”
In A Scandal in Bohemia [Do92], Watson reads aloud a letter that Holmes has just received. It announces that Holmes will receive a visit from an unnamed visitor of great importance. Watson wonders what it all means, and Holmes says, “I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
In The Adventure of Wisteria Lodge [Do17], Holmes is visited by an upset young man who has had an unusual visit with an acquaintance in the country. Just as he’s about to tell the story, police inspector Gregson visits Holmes looking for this very man as a suspect in a murder. After the man tells his story, and the police take him to their station, Holmes and Watson review the case. As Holmes begins to formulate a hypothesis, he comments, “If fresh facts which come to our knowledge all fit themselves into the scheme, then our hypothesis may gradually become a solution.”
The five steps of the scientific method are as follows:
State the problem.
Form a hypothesis.
Observe and experiment.
Interpret the data.
You shouldn’t be experimenting (collecting data) unless you’re trying to confirm a hypothesis. The probability that you will generate a correct program by randomly mutating your source code is much lower than the probability that you will be hit by lightning next year.
The best way to ensure that you have a hypothesis is to write it down. If you take the trouble to record hypotheses, we recommend that you also record the corresponding problems and observations. There are several ways to do this.
You can keep a notebook in which you write as you work. The benefit of doing this is that you can walk away from the computer and read and review your notes.
You can keep an open editor window, in which you type as you collect information. The benefit of having this data online is that you can search it quickly. Instead of saying, “I think I’ve seen this before,” while scratching your head, you can execute a search and know for sure. The drawback of this approach is that if ignorant or unscrupulous authority figures find out about the existence of your information, they may use it in ways that you never intended. In addition, your employer may not allow you to take this information with you to another job.
You can use a hand-held computer to record your notes as you work. This approach combines the benefits of the notebook and the editor window. The drawback is the investment in the hardware. If you bear the cost on your own, of course, you’re more likely to be able to walk away with information when you change jobs. You also have to pick up and put down the hand-held, which some people find disrupts their efficient use of the keyboard.
Silver Blaze [Do93] begins as Holmes and Watson depart by train for the scene of a murder in the country. As is often the case, Holmes rehearses the facts for Watson and explains why he does so.
At least I have got a grip of the essential facts of the case. I shall enumerate them to you, for nothing clears up a case so much as stating it to another person, and I can hardly expect your cooperation if I do not show you the position from which we start.
Describing a problem to someone else is one of the oldest ways programmers have used to uncover defects. Weinberg was one of the first computer scientists to document and recommend its use [We71]. It has been rediscovered for the umpteenth time by the Extreme Programming movement [Be99].
Since the chief benefit of this method is to get the viewpoint of another person, the place to start is by giving your viewpoint. Try to answer the following questions:
What do I know for sure? (What have you observed?)
What do I believe to be true? (What have you inferred?)
What do I not know?
In The Adventure of the Cardboard Box [Do93], Inspector Lestrade requests assistance from Holmes in solving the riddle of a box containing two human ears, which was sent to a woman in the suburbs of London. After determining who was responsible, Holmes turns the information over to the inspector and reviews the case with Watson back at their home. He explains his method thus: “Let me run over the principal steps. We approached the case, you remember, with an absolutely blank mind, which is always an advantage. We had formed no theories. We were simply there to observe and to draw inferences from our observations.”
In A Scandal in Bohemia [Do92], Dr. Watson has married and moved away and returns to visit Holmes. Holmes delights Watson with his deductions about Watson’s life and work, based on trivial details. Watson remarks that Holmes always baffles him with these deductions, even though his eyesight is just as good as Holmes. Sherlock replies, “You see, but you do not observe. The distinction is clear.”
Detecting and debugging differ in the extent to which we can create additional evidence. Detectives can take things from a crime scene and have them analyzed and identified. All forensic analysis, however, is after the fact.
When we debug software, we can repeat the defective behavior and observe what we want to see. For a detective, it would be the equivalent of going back in time and planting a video camera at the scene of the crime.
When collecting data, consider not only what to observe, but also what point of view to observe it from. It is helpful to consider how application performance analysis tools collect data. They observe one aspect of a program’s behavior. The alternative approaches they use to collect data are analogous to either manual or automatic collection of information related to software bugs.
First, performance information can be collected either when an event occurs, such as a subroutine call, or at fixed time intervals. The former approach is referred to as event-based, while the latter is called sampling.
Second, the collected information can be accumulated as discrete records, or it can be summarized as it’s generated. The former approach is called tracing, while the latter is referred to as reduction.
Since there are two axes of choice, and two options on each axis, there are four possible ways to apply these approaches. When you’re instrumenting a program to collect data, consider all four approaches. Each has its benefits and drawbacks.
Event-based data collection is precise, but can take longer.
Sampling-based data collection is an approximation, but can take less time.
Trace-based data collection is complete, but can use lots of storage.
Reductionist data collection loses details, but uses much less storage.
Another way to look at point of view is the difference between synchronic and diachronic approaches. A synchronic (“with time”) approach takes a snapshot of a static situation at a given instant. A diachronic (“through time”) approach observes the evolution of a situation over some period. Make sure you use both viewpoints when observing the behavior of a program.
The Sign of the Four [Do90] begins with Holmes in a cocaine reverie. He is waiting for a problem to challenge his intellect, expounding to Watson on various aspects of the science of detection. After Holmes outlines the life of Watson’s brother, based on observing a watch that the brother owned, Watson is astonished. He wonders whether Holmes was just lucky at guessing certain facts. Holmes replies, “I never guess. It is a shocking habit—destructive to the logical faculty. What seems strange to you is only so because you do not follow my train of thought or observe the small facts upon which large inferences may depend.”
If you ever hear yourself saying, “let’s see what this will do,” or “maybe this will fix the problem,” it is time to go home. If you don’t know what effect a change will have, you have no business making the change. You can undo weeks of work in a few minutes of desperation. There are many definitions of hacking, some positive, some negative. Hacking at its worst is making changes without being sure of what effect they will have.
A hypothesis isn’t the same thing as a guess. A hypothesis includes a means of verifying the correctness of the proposition. Guessing is symptomatic of inadequate knowledge of the software. If you find yourself guessing, you need to spend more time understanding the software you’re working on. You may need to study the application domain, the programming language, or the system design before you will be able to generate hypotheses that include means of verification. Taking time to learn these aspects of the software will reduce your total debugging time.
In The Adventure of the Beryl Coronet [Do92], Holmes is called upon to solve the disappearance of jewels entrusted to a banker as security on a loan to a member of the royal household. After he returns the jewels to the banker, he explains how he solved the case. In the middle of this explanation, he offers this nugget of logic: “It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth.”
In The Sign of the Four [Do90], Holmes and Watson are called upon to aid a young woman whose father, an infantry officer, disappeared mysteriously and who had been receiving valuable treasure in the mail from an even more mysterious benefactor. After meeting with one of her benefactors and finding the other one dead, Holmes and Watson review what they know about the case. As they consider how the murderer committed his crime, Holmes rebukes Watson: “How often have I said to you that when you have eliminated the impossible, whatever remains, ‘however improbable,’ must be the truth?”
In computer software, causation isn’t a matter of probability. Either a code segment caused a problem or it did not. Something may be a logically possible cause until we have sufficient information to make a yes or no determination.
Try to eliminate as many possible causes and culprits as you can with a single observation. Eliminating possible causes isn’t quite as simple as it sounds. For example, you can’t eliminate a procedure as a possible cause because it isn’t executed. The lack of execution may have the side effect of causing default values not to change, resulting in an incorrect computation. The same goes for loops that aren’t executed and conditional branches that aren’t taken.
Defects that are difficult to locate are often caused by a conjunction of several conditions, each of which isn’t sufficient to cause the problem on its own. While you may be able to eliminate each individual condition as a cause of the defect, you should be careful not to eliminate also the conjunction of the conditions from your list of suspects.
In The Adventure of Black Peter [Do05], Holmes is called to help a local police detective solve the grisly murder of a former ship’s captain. After setting a trap for the culprit and apprehending a suspect, Holmes reviews the case with Watson. Holmes expresses disappointment with the work of the policeman and states, “One should always look for a possible alternative and provide against it. It is the first rule of criminal investigation.”
In The Hound of the Baskervilles [Do02], Holmes is consulted by Dr. Mortimer, the executor of the estate of Sir Charles Baskerville; he’s concerned about the manner of Sir Charles’s death and the potential for harm to the heir, Sir Henry Baskerville. After hearing all of the relevant facts, Holmes asks for twenty-four hours to consider the matter. Holmes asks Watson to obtain tobacco for him to smoke and begins the process Watson describes as follows: “He weighed every particle of evidence, constructed alternative theories, balanced one against the other, and made up his mind as to which points were essential and which immaterial.”
When you formulate a hypothesis, consider if you already have evidence in hand that disqualifies the hypothesis. This is much easier to do if you’re keeping a log of your observations and hypotheses.
It takes less time to disqualify a hypothesis in this way than to perform yet another experiment. After you have performed an experiment, consider brainstorming a list of all the hypotheses that this experiment disqualifies.
In A Study in Scarlet [Do88], Holmes explains to Watson how he was able to capture the criminal in three days:
“In solving a problem of this sort, the grand thing is to be able to reason backward. That is a very useful accomplishment, and a very easy one, but people do not practice it much. In the everyday affairs of life it is more useful to reason forward, and so the other comes to be neglected. There are fifty who can reason synthetically for one who can reason analytically.”
“I confess,” said I, “that I do not quite follow you.”
“I hardly expected that you would. Let me see if I can make it clearer. Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you told them a result, would be able to evolve from their own inner consciousness what the steps were that led up to that result. This power is what I mean when I talk of reasoning backward, or analytically.”
In The Five Orange Pips [Do92], a young man whose uncle and father have died from unusual accidents asks Holmes to help him when he receives the same mysterious threat that they did. After the man leaves 221B Baker Street, Holmes and Watson discuss the meager evidence. “‘The ideal reasoner,’ he remarked, ‘would when he had once been shown a single fact in all its bearings, deduce from it not only all the chain of events which led up to it but also all the results which would follow from it.’”
The American Heritage Dictionary of the English Language defines deduction as, “The process of reasoning in which a conclusion follows necessarily from the stated premises; inference by reasoning from the general to the specific.” The same dictionary defines induction as, “The process of deriving general principles from particular facts or instances.”
As Holmes suggests, programmers almost always work on debugging inductively, rather than deductively. It is possible to reason through a nontrivial bug without touching a keyboard. You need a lack of disturbances and a thorough knowledge of the software you’re working on. You mentally enumerate the possible causes of a given effect and then prune them with the knowledge you have.
Many software systems are too big for any one person to have enough knowledge of the system to do deductive debugging. Deductive analysis is possible for medium-sized systems, particularly if a single author created them. If you’re working on such a system, we encourage you to try it sometime.
In A Study in Scarlet [Do88], when a suspect in the first murder turns up dead himself, Holmes tries an experiment, and after he sees the results, comments, “I ought to know by this time that when a fact appears to be opposed to a long train of deductions, it invariably proves to be capable of bearing some other interpretation.”
In The Boscombe Valley Mystery [Do92], Watson is traveling with Holmes by train to Herefordshire to investigate a murder case in which the circumstantial evidence seems to point overwhelmingly to the son of a murdered man. Watson opines that the facts seem to be so obvious as to make the endeavor pointless. Holmes retorts, “There is nothing more deceptive than an obvious fact.”
Red herring is fish cured by smoke, which changes its color. It has a very persistent odor. Dog trainers use red herring for training a dog to follow a scent. A dog that gets a good whiff of red herring will lose any other scent that it has been following. The English idiom “red herring” means a piece of information that distracts or deceives.
Where there is one defect, there are likely to be more. When you come upon a fact that seems to bear upon your investigation, consider all of the interpretations of the fact. The ability to recognize red herrings in a debugging investigation improves with experience. You can quicken your acquisition of this experience by keeping a debugging log.