|< Day Day Up >|
Chapter 19. Case Study: Friends Fly Free-for-All A Promotion Gone Wrong
Thanks to your stellar consulting work in helping High-Hat Airways solve the devastating performance problems introduced by its new package-tracking application (see Chapter 18, "Case Study: High-Hat Delivers!"), you are now the official MySQL performance expert for the airline. Of course, there is no budget to increase the number of billable hours that you can work with the airline, but the CIO assures you that in the event of another big performance crisis, you will be the first person called.
That call arrives several weeks later from the CIO's replacement. Apparently, something went terribly wrong with a promotion the airline ran just after you finished your analysis. According to the new CIO, several members of the senior management team had stock options that were to be vested soon. Wall Street airline analysts had just raised their profitability forecast, but there was a good chance that High-Hat would miss the new target. Missing these estimates by even one penny per share would likely mean that the executives' eventual seaside retirement villas would instead be found several miles inland, and would feature half the planned square footage. Clearly, this could not be allowed to happen.
The senior vice president of marketing had a brilliant idea: High-Hat would immediately run an amazing airfare sale, one that would likely sell hundreds of thousands of new tickets and boost results for that quarter. To keep reservation agent overtime costs down, all ticket sales would be done over the Web, and would be nonrefundable. Some new software would have to be written, but this was just a mere detail.
The CIO proceeded to describe how the timing for this airfare sale could not have been worse. There was very little notice, so everything had to be rushed. During the last round of cost-cutting (following your success in solving the shipping problems), many IT employees had been furloughed. The austerity measures had even affected the number of available web and database servers in production. Although the previous CIO had begged for it, there was no money allocated for load-balancing hardware or software. Finally, despite significant pleas for reconsideration, much of the development of the new functionality to support the web-only ticket sale was outsourced to a company owned by the CEO's brother-in-law.
The airfare sale was announced at noon on a Monday, with the first flights available on Wednesday. Things immediately got off to a rocky start by 1:00 p.m. on Monday when customers began calling the reservation agents to complain about how slowly the website was performing. Some customers couldn't even access the site. Others were able to get in, but couldn't make a reservation. Finally, and worst of all: Numerous customers made a reservation and provided their credit card numbers, but did not receive an acknowledgment from the website.
If Monday and Tuesday were bad, Wednesday was horrendous. Gate agents were astonished to see large jetliners leaving with only two or three passengers on board. Ironically, dozens of other flights were oversold by 300, 400, or even 500 percent. To make matters worse, all ticketed passengers received boarding passes, regardless of whether the flight was oversold. This meant that every seat on these flights needed to hold anywhere between one and five occupants at the same time. This made for some very bad PR, as television news crews around the world descended upon airports to report the results of various onboard melees and donnybrooks as between 600 and 1,000 people attempted to squeeze into 200 seats on each airplane.
The aftermath of this fiasco saw the filing of dozens of lawsuits along with the immediate dismissal of the CIO and most of his team. The replacement CIO contacted you; there is much work to be done.
As you did during your last optimization exercise, your first task is to inventory the problems and then set out to diagnose and solve these issues. Based on what you have learned so far, you decide to classify these problems into two categories: server availability and application/transaction issues.
|< Day Day Up >|