As Marta bore down on him with Tim in tow, Dan assumed as calm an air as he could. "Hello there, guys. Long time no see."
Marta ignored both his tone and his greeting. "Dan, there's a problem, and it's just what I was afraid of! The code doesn't work, and no one can fix it, not Tim, or Bill, or Sam, or Beth, or Mike—and I don't know how to fix it myself! I can't handle all of this responsibility! I can't make sure it's done right." She was clearly exasperated. "What was I thinking? I knew I should have left when I tried, and not let everyone talk me into staying!"
Tim looked miserable. He corroborated Marta's judgement about the broken code. "I just don't understand it, Dan. The build worked fine in the test lab, but when we moved it to the multi-processor box, the response time actually went down. Now we're getting these weird error messages. This morning, the box actually blue-screened." He sighed. "The only thing I can think of is that there's something wrong with the SMP support. I guess we'll have to send this box back, move RMS to a regular box, and hope we can live with the speed."
Dan knew that he needed to take charge of the situation. "First of all, let's not have this conference in the hall. Come into my office, calm down, and let's talk." He ushered Tim and Marta into his office, shut the door, and sat with his teammates at his conference table.
Dan could tell that Marta and Tim were up against some common pitfalls of the Stabilizing Phase. "I understand why you're stressed, Marta. Trust me, everyone who works with computers has experienced it. Because you can't see everything going on inside the box, it can be horribly frustrating to track down problems like this one. You can't tell whether it's the hardware, the operating system, some obscure driver, or the software we wrote. But getting angry just muddies your thought processes even more."
He turned to his Network Manager. "And you, Tim, are doing two things you should never do: First, you're assuming too much, and second, you're grasping at straws. You should know better."
Tim's demeanor changed from defeated to angry, which is what Dan wanted. "What other choices do I have in a crisis like this?" he demanded.
"I don't know," said Dan, "because I don't have the background you and the others have had with this experience." Tim looked somewhat mollified as Dan continued. "What I do have is more experience doing troubleshooting. If you were a highly paid consultant, you'd have to approach problems systematically. If someone contacted you to check out this situation in another company, what would you want to check first?"
Tim thought for a moment. "Well…first I'd check all the hardware. I'd make sure the hardware was on the approved hardware list and do some research into anything about the hardware that I thought was unusual. Then I'd try to make sure the hardware was working properly."
"OK," Dan said encouragingly, "those are good general-purpose things to check. Now get more specific. Describe a specific case to me."
"Let me give it a try," said Marta. Her frustration was beginning to subside, and she was intrigued enough by the conversation to want to contribute her natural problem-solving tendencies. "We have a custom software application that interacts with certain services—say, MTS and SQL Server—on a Windows NT server. This application seems to run fine on one box, but either runs poorly or not at all on another box. The second box is an SMP box, which may or may not have anything to do with the problem."
"Can I jump in here?" asked Tim. Marta nodded. "When we tried to solve this problem, the box seemed to get less and less stable, especially after the blue screen came up this morning."
"Now," challenged Dan, "what strategy can you come up with for tackling this problem?"
"Well, to eliminate the SMP box as the source of the problem, we could get another SMP box, configure it like the one we have, and try running the application on the new one," said Marta.
"True, but unfortunately, that means a schedule slippage because we can't get another SMP box for at least a month," reminded Dan. He turned to Tim. "You mentioned that the box became more and more unstable? Any clues there, or anything you could check?"
"No, not that I can think of. We've got other SMP boxes, and they are all very stable."
"Anything different about this one?" asked Dan. "The build process, the patch levels, anything like that?"
"I checked the patches already," said Tim, "and they all looked OK, as far as I could tell. I didn't do the build—I had one of the guys do the build when the box came in." Tim thought for a moment. "The build…I wonder if…Hey, could I use your computer for a moment, Dan?"
Tim hurried over to Dan's computer. "Can I log you off and log myself in?" Dan nodded, and Tim logged in and opened several Windows programs.
Dan and Marta watched as Tim arranged the various windows on the screen. He scrolled through file listings and checked various file properties. After a few minutes, he snapped his fingers. "That's it!" Turning to Dan, Tim pointed at the screen. "That's the problem."
"What? What's the problem?" asked Marta, confused.
Dan thought he knew the answer, but wanted Tim to explain the details. "Come on, Tim, don't keep us in suspense."
"Look," said Tim, activating one file window. "This is a directory on the MTS server that's already in production." He switched to another window. "And this is the same directory on the new server. Notice anything?"
"The directories don't contain all the same files," said Marta.
"That's OK," said Tim. "Unless two servers are identical and loaded with exactly the same software, the directories won't be the same. There's another way they're different, but you can't see it in this view." He switched both windows to a detail view and then arranged them on the screen so that all the details were visible. "Now do you notice anything?"
"Some of the file dates aren't the same, even though the file names are," said Marta. "Is that important?"
"Sometimes it can be very important," said Dan, nodding to Tim, "It can mean that different versions of the software have been installed on the two servers. Usually, that doesn't matter. But if one program installs an older version of a file that is also used by another program, the other program may not function correctly."
"It can get really hairy with servers, what with patches, add-on software, and the like," said Tim. "I told my guy to be sure the box was current and that it had all the services and software it needed, but I never told him what order to install everything in, or which files to keep and which ones to overwrite. I just assumed he would know."
"There's that 'assume' word again," said Marta, grinning at Tim. "And you know what happens when you assume."
"OK, OK, I got it," said Tim, holding up his hands.
"So what do we do now?" asked Marta.
"Order in doughnuts," said Dan.
Marta looked puzzled. Tim explained, with a wry smile, "What he means, Marta, is that he knows a Network Manager who's going to learn not to assume anything by rebuilding a server tonight."