|  | ||
Hackers with botnets , teeming with thousands of zombified computerswhat is a Web site administrator to do? We've presented a few specific countermeasures during our discussion so far, but in this section we'll explain more broadly how to confront the generic problem of DoS/DDoS.
Almost all defenses against denial-of-service attacks are about enhancing the robustness and scalability of the site. As we've seen, given a large enough botnet, it is practically impossible to completely block an attack, so work has to be done to make it possible to weather the attack. Hopefully, the site will remain up long enough to identify the attacker or bore them and make them go away.
The first thought that runs through many people's heads is to simply add more capacity than the attackers can use up. Unfortunately, economics are against the site administrator. It costs money to add network links and servers, and these additions promise only marginal improvements in defense.
So, we know we cannot completely block denial-of-service attacks and it can be expensive to build the capacity to weather them. What can be done? There are three steps that need to be taken to deal with denial-of-service attacks:
Proactively place defensive measures to blunt and/or weather an attack.
Put in place measures to detect when an attack occurs.
Have plans to respond to an attack.
These steps follow the classic security "defense- in-depth " mantra of preventive, detective, and reactive mitigations. We'll discuss each one in turn in this section.
As shown, attacks can come at many layers of the network or application. Low-level attacks are more common, but high-level attacks can often do more damage. A defense strategy must take into account all the levels an attack could come from. An attacker will always find and exploit the weakest link in the defenses.
Some products are advertised as Anti-DoS; they claim to be able to protect your Web site from a denial-of-service attack and do a good job at protecting against some DoS attacks. Other devices enhance your scalability, which will help the site handle increased load under attack or even increases in normal usage. The key to using these products is to understand what they can and what they cannot do, the areas of protection provided, and the areas that need to be addressed separately.
Firewalls Firewalls are in many ways the simple solution to denial-of-service attacks. Most sites already use a firewall to restrict network access, so using the firewall to protect against DoS at the same time is a no-brainer. Firewalls are split into two categoriessoftware and hardware, and both provide protection against DoS, although they have different areas of advantage. Software firewalls like Checkpoint are typically better at detecting and notifying when an attack occurs, and some also have limited protection against application-layer denial-of-service. Hardware firewalls have the ability to deal with network traffic at wire speed and are better suited to dealing with massive bandwidth floods.
Checkpoint firewalls have three methods of dealing with a SYN flood attack; these are collectively called SYNDefender. First, SYNDefender Relay ensures that an ACK returns from the source of a SYN before a SYN is sent from the firewall to the server. Second, SYNDefender Gateway forwards SYNs immediately to the server; it then returns the SYN/ACK but responds to the server with an ACK immediately. This allows the server to allocate a connection and reap it if necessary without waiting for the ACK to actually return. Finally, SYNDefender Passive Gateway acts the same as the regular Gateway mode except it does not generate an ACK to the server, but it instead waits for the valid ACK or sends a RST to close the connection if it times out. If servers can handle the connection load, SYNDefender Gateway or Passive Gateway offers the best performance. In an overwhelming flood, SYNDefender Relay is the best defense as it offloads all the flood handling to the firewall.
Firewalls are used to deal with IP and TCP layer attacks. SYN and UDP floods, smurf and fraggle attacks, and most of the old stack vulnerabilities can be addressed by firewalls. Firewalls that integrate application-layer proxies can also deal with denial-of-service attacks against applications, although frequently the firewall will simply become the bottleneck. A firewall can only drop or block traffic, so a flood that fills up the network links will still take down the Web site and traffic that looks perfectly valid will still get through. Firewalls themselves can be vulnerable to DoS attacks, which leaves the network cut off by its protector.
Most devices on the market will advertise the number of connections they can handle500,000, 1,000,000, or more connections, and this may sound impressive. However, a single cable modem can send hundreds of SYNs a second; within minutes a small botnet can fill up the connection table. Look deeper at the products for the detection and management capabilities, especially clustering, which allows the use and management of multiple devices collectively, and failover, which allows for devices to be placed in pairs and for one member to replace its partner if a fault occurs. Hardware devices like Netscreen firewalls and CiscoGuards are capable of sustaining much higher connection loads than software firewalls; they also come with robust clustering capabilities that only high-end software firewalls like Checkpoint support. The other side of the coin is that a cheap software firewall can do much more than a cheap hardware solution. A Linux firewall running IPTables or an OpenBSD firewall running PF can do everything that a cheap SO/HO firewall can do and much more, but they require much more manual work and expertise in management.
Load Balancers Performing much the same role as a firewall in defending against network DoS attacks, load balancers are designed to be able to soak up large numbers of SYN requests . Most load balancers are also able to deal with HTTP floods by offloading and/ or proxying the initial HTTP request. Requests are terminated at the load balancer and only a single connection is made between the load balancer and the Web server, which reduces the load of communications on the Web server, allowing it to devote more resources to handling requests. Many load balancers also support SSL offload. SSL is a very resource- intensive protocol, and the encryption processing takes up a great deal of CPU on a Web server. SSL offload devices use special processors designed to handle encryption tasks ; they can handle many more clients than a typical Web server.
There are a few common architectures for setting up load balancingone is Layer 2 spoofing (called Direct Server Return on Foundry devices). This is a technique where requests come to the load balancer, which simply rewrites the MAC address and sends it back to a switch to forward to the Web server. Rewriting different MAC addresses enables traffic to be forwarded to different Web servers. The advantage of this technique is that return traffic (responses from the Web servers) does not travel over the load balancer's backplane. This prevents a potential bottleneck and allows the device to focus on handling incoming requests. Sites that are primarily serving content will get the most benefit; however, there are drawbacks in the complexity of handling Mega-proxies and SSL connections. Mega-proxies are large Web proxies run by providers like AOL and RoadRunner that aggregate traffic from all their clients. As millions of customers may share the same source IP of the proxy, it becomes impossible to filter based on the IP address.
The second common technique is to place the load balancer inline and use it to handle Layer-4 switching. Each Web server behind the device is set to a VIP (virtual IP) on the device. When traffic is sent to the VIP, it is forwarded to one of the servers that are configured to service the VIP. This is the setup that will most commonly be found when looking at a load-balanced architecture.
The final architecture we will cover is "delayed binding"this is actually the same architecture as the one just explained, but the method of load balancing is different. This is a feature of high-end load balancers that supports Layer-7 switching like Alteon WebSwitches and CiscoGuards. Instead of a request coming into the device and then being sent to a Web server based on the selected load-balancing algorithm ( lowest latency, round robin , sequential, load factor, etc.), the device forces a full application connection (HTTP handshake) before it creates a connection to the Web server. This limits basic SYN floods and forces attackers to make valid application connections (which are much slower, not spoofable, and more difficult to execute).
Caching Devices Caching is one of the best ways of adding capacity to a site. Caching content allows servers to focus on processing more complicated requests. Sites in the past focused on caching static contentimages, basic text pages, download filesbut with application-layer DoS attacks, it makes sense to judiciously cache as much dynamic content as possible. For most Web sites the homepage is the page that receives the most hits. Sites want to provide the content on that page dynamically, but it may make sense to make the page static and dynamically update it on a frequent basis. This tradeoff in functionality and performance is critical to designing a site that is robust enough to handle a DoS attack.
Caching devices run the gamut from basic Squid reverse proxies to custom XML processing devices for Web services like Datapower's XML hardware devices. These devices play one of two rolesthey keep data or static content in high-speed storage (RAM) to reduce I/O loads, or they offload complex processing tasks like SSL, XSL conversions, SAML assertions, etc. Using specialized devices that are designed for this work allows for the general processing capability of servers to be reserved for more advanced tasks.
| Note | The next section will address caching services like Akamai that can supply global Web capacity-on-demand. | 
So, there are all these wonderful devices out there that can help you mitigate or survive a denial-of-service attack. How do you know which one is right for you? What size device should you get once you have decided on a product? The answer to these questions can be found by looking at your capacity planning and threat modeling. Deciding how many users will be accessing the site and how much it is worth the time and effort of an attacker to take the site down will give the baselines for a capacity plan. Don't forget, however, to take into account a roadmap for site growth and an increase in popularity. At its most basic, a capacity plan has to be able to tell the administrator how much network bandwidth they need to obtain and how many servers to purchase to host the site.
Network Network bandwidth can be a very easy or very hard thing to obtain depending on the circumstances. A site hosted at a datacenter or collocation facility with direct Ethernet taps to the cage can ramp up bandwidth almost instantly. A simple call to the ISP may be all that is necessary to increase the amount of bandwidth to the site. Many facilities come with bandwidth allocations plans that automatically adjust to the average site load and simply charge more as the amount of bandwidth used increases. On the other hand, sites hosted out of a company's own facility may require the provisioning of new data lines through the ISP or local ILEC before bandwidth can be increased.
Hosting at a datacenter seems to have the edge in addressing denial-of-service attacks, and for the most part this is true; however, there is one potential advantage of using dedicated linesthe ability to use more than one provider. The redundancy of having two fiber optic cables going to two different Telco networks can be a great asset when an attack is coming from only one provider's network. You can shut off that connection and redirect all traffic over the alternate connection. For example, an attacker could be a student at a large university who has compromised numerous boxes in the labs at school. All traffic from the university will travel over the Internet on a single Telco (backbone) network to the site. Alternatively, by filtering all traffic from the university at the network edge, it may be possible to prevent smaller internal network links from being flooded.
Server With all the whiz-bang gadgets like firewalls and load balancers, it is easy to forget that the primary resource hosting the site is servers. The easiest way to increase capacity is to buy more powerful servers or to buy more servers. Additional server capacity will help not only against a denial-of-service attack, but it can actually support everyday traffic of real users as well! Additional capacity can reduce the latency of requests, making the users' experience better, and support additional application load or new functionality to be added. Gauging the amount of server capacity required can be tough, and this is an area where the input from testing can help (we'll discuss this more in the upcoming "DoS Testing" section). Additional servers may also force additional architectural complexity in the form of clustering. Remember that servers cannot handle all denial-of-service attacks, so the key is to find a balance.
Many precautions can be taken by the administrators of a Web site, but sometimes outside help is needed. Many small sites do not control their own network; they are hosted by an ISP and their datacenter. Larger companies may host some of their own services but still rely on their ISP for others. For small and midsize companies that host their Web presence from their own network, it is often better to allow an ISP to host DNS rather than take responsibility for hosting such services locally. Taking out DNS services is one of the easiest ways to knock a site off the Internet. ISPs usually have dedicated and redundant hardware for hosting DNS, something which few companies can do. DNS can also support the most primitive of load-balancing techniquesround robin DNS. Round robin DNS tells DNS servers to rotate the IP addresses that are returned when a domain name is queried. This spreads the load of a domain name onto multiple IP addresses and multiple servers. Round robin DNS is easy to detect and would not stop a determined attacker, but it may slow them down or at least increase the targets they must attack.
Larger companies and very popular Web sites need to look at more complex techniques for maintaining their uptime during an attack. Working with their ISP, many large sites will implement Global Server Load Balancing (GSLB). GSLB provides a way of geographically segmenting traffic as well as allowing for physically distributed sites to serve traffic. Using it, a popular site can be served from multiple locations; this provides redundancy if one site fails under the load. DDoS attacks that are coming from disparate geographic locations are tougher to handle if there is not enough capacity at any of the locations to handle the attack. Hopefully, it will allow a certain percentage of valid customers to get through.
Another technique that can work hand in hand with GSLB is external caching with a service like Akamai or Savvis CDN. These services cache static content globally and redirect or proxy traffic to the site, protecting the host site from direct network attacks. Akamai's cache devices are designed to soak up SYN traffic, and since they spread the load across many sites, it makes a DDoS attack much harder to target. Unfortunately, such external caches cannot always work; site architecture, type of content, and cost may make an external cache unworkable for some sites.
For small Web sites that are hosted at an ISP, this is an area that is not under the control of site administrators. They must work with the ISP to make sure that the best possible network filtering is in place. Larger sites that have their own networking equipment can do much more on their own. The goal of hardening the network edge is to ensure that traffic is filtered as early as possible in the communication path . The farther into the network a rogue communication reaches , the more resources it has consumed.
In a typical network layout, a border router is used to connect a line from the ISP with the network hosting the Web application. ACLs should be placed on the router to filter spoofed packets coming from internal network addresses, nonroutable network space
(10.x, 172.16.x, 192.168.x, etc.). Many resources will recommend that ICMP be filtered to prevent the ICMP-based attacks or any amplification. This, however, is typically not the best advice. ICMP is a necessary diagnostic protocol, and filtering or blocking it will break many protocols. The better solution is to use rate limiting for SYN and ICMP packets. With Cisco, a CAR allows policies to be set to provide Quality of Service guarantees to network traffic. ICMP traffic can be restricted to a small percentage of available bandwidth to ensure that a flood or amplification attack over ICMP is filtered at the edge before it reaches hosts .
No matter what defenses are put in place at the network edge, the servers hosting a Web site must be configured properly themselves. The majority of recommendations for securing a Web site apply equally to defending it against a denial-of-service attack. The number-one priority is keeping the operating system patches up-to-date. All the attacks described to this point involving vulnerabilities were resolved early on with patches. Attacks continued to be successful because few administrators updated their servers to resolve the issue. Strong and consistent patch management is the most important step in defending against an attack.
Beyond patching, all operating systems have methods for tweaking the network stack to handle differing traffic loads. Under Windows, most network settings that can be tuned can be found under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services\Tcpip\Parameters.
Linux has a special option called SYNcookies, which can be useful in defending against SYN floods by delaying resource allocation on the OS until a response (ACK) from the client is received. It trades increased processing loads for decreased memory consumption.
Application design is probably the most difficult area to address concerning denial-of-service attacks. There are so many places where an attack can occur and so many pieces of functionality that can be abused. The first step is in-depth threat modeling of the application. Proper threat modeling and attention to detail during the design and development process will catch many potential problems. This is because threat modeling requires looking at each piece of the site from a state of paranoia .
As we saw with the Google MailTo: application-layer DDoS attack inadvertently caused by MyDoom-O, proper resource allocation across application tiers is a major element of defending against this type of attack. It is critical to cache as much content as possible and just as important to gracefully handle cache misses. Part of your threat modeling should determine the ease with which an attacker can get past your first- and second-layer caches. Next, we'll discuss some ways to handle some common gotchas.
Processing Do not take on processing tasks from the client whenever possible; instead, defer processing to the client when the data is not sensitive. Use standard libraries and protocols when handling encryption. Try not to "roll your own" encryption, authentication, or authorization mechanisms. When processing does need to be performed on behalf of the client, make sure it can be throttled, limited to a specific length of time, and that it can be tied back to a valid user (in other words, do not perform arbitrary length computations for anonymous users). Cache results whenever possible, use indexes to make data retrieval faster, and use static content over dynamic whenever possible.
Memory Do not allow arbitrary length input from the client. When loading data for processing, set reasonable limits for memory usage and fail or throttle transactions that exceed those limits. Use batch processing of large, complex requests to restrict concurrent usage of large amounts of memory. Do not rely on virtual memory because disk latency will play havoc with site performance. Use in-memory caching whenever possible, but make sure the cache cannot grow arbitrarily.
Database Cache data from the database when possible to reduce the number of queries that need to be made. Tune the database pool so there are no starvation issues and make sure (network) connections to the database are set up ahead of time rather than on demand. Limit complex joins and make use of indexes to speed database work. Optimize queries by using stored procedures rather than string assembly. If the database supports pegging tables in memory, do so for tables that are hit frequently.
User Logins One of the trickiest decisions in application design is control of usernames and passwords. Besides the usual concerns of password strength, a site dealing with denial of service has to decide how to handle logon failure. The most common method of dealing with failed logons and the prevention of brute-force attacks is account lockout. Unfortunately, account lockout can be counterproductive when trying to defend against an attacker eager to keep valid users out of the system. An attacker need not take the system down if they can simply prevent users from logging in.
Application developers have two choices:
Do not implement account lockout but instead implement some method of delaying the attacker enough that brute-force is useless, or
Implement a lockout policy that degrades user experience gracefully.
The first choice is best used in applications where there is a significant risk of guessing/predicting/obtaining usernames. Delay can result from a slowdown in response by the application to requests, password complexity requirements (the stronger the passwords are, the more requests an attacker will need to brute-force the password), or a HIP/CAPTCHA (see the section on these earlier in the chapter).
The second choice works best in systems where the attacker will have to brute-force the username as well as the password. Such systems usually set a number of attempts before lockout. The lockout period then slowly increases the more times lockout is reached. The recovery from a locked-out state must not be so onerous that users who are locked out by an attacker are significantly inconvenienced. The recovery can be strictly lockout-period expiration, a change-password process, or HIP/CAPTCHA, depending on usability.
| Tip | The difficulty attackers have in attacking sites where they need to brute-force both usernames and passwords demonstrates the importance of ensuring that the applications do not leak information to attackers by way of response-error messages during login. Failed logon attempts should not reveal whether a bad username or password caused the failure without first carefully considering the impact. | 
All the strongest configuration and well- considered design is useless without proactive and continuous testing. The best sites measure their load continuously and test new components or functionality before going live. Testing of basic network floods is not very valuable because an attack will always succeed given enough resources, but testing of application-layer attacks and especially critical Web application functionality is a must. Many load-testing applications like JMeter, OpenSTA, Webload, and Microsoft's Web Application Stress Tool can be adapted for DoS testing very easily. More advanced systems like ANTARA's FlameThrower use dedicated hardware to allow generation of complex requests at wire speed. The goal of testing is to find the points at which the application reaches resource limitations, in addition to determining the load that can be supported. Figure 11-4 shows JMeter graphing a Web application load test.
The first step in defeating a denial-of-service attack is knowing that it is occurring. Having the proper logging, detection facilities, and notification systems in place to detect an attack immediately is far better than waiting until you get calls from irate customers saying they cannot reach the site.
Logging directly onto systems may be the easiest method for determining if an attack is occurring. Review the TaskManager on Windows or run the top utility on UNIX/Linux to determine if the CPU is pegged at 100 percent. Also, look at the I/O load on the system to review if the system is bogged down with disk activity. On almost all operating systems, the netstat command can provide information on the operation of the network stack. Under a SYN flood, netstat will show numerous SYN_RECV socket connections, typically from very random ports and IP ranges. A connection-hogging attack will typically show numerous ESTABLISHED or FIN_WAIT connections. Note, however, that UDP floods will not show up with netstat at all.
The simplest form of data collection is logs. Most network devices and UNIX hosts support logging to a remote syslog server; Windows hosts report events to the Event log, which can be scraped by custom scripts or an application like MOM (Microsoft Operations Manager). Once logs are collected, some form of processing to detect the important log messages and trigger alerts must be done. Some systems can automatically generate e-mail alerts when activity occurs, in addition to placing messages in the log. Many attacks can be detected by simple performance logs that monitor CPU utilization, memory consumption, etc. Applications that perform their own logging can help identify what the nature of an attack may be. An attack that triggers built-in throttling controls should also trigger application messages, identifying the inputs and source of the request.
Intrusion detection systems (IDSs) are advanced logging systems that perform event classification and anomaly detection. Network-based IDS systems will quickly detect basic SYN floods and network attacks and host-based devices will quickly detect abnormal traffic levels reaching an individual host. Anomaly-based systems, such as those by Arbor Networks, may detect more advanced application attacks by noticing when someone is sending irregular or abusive traffic out of the norm. IDSs that support event correlation across numerous agents will be able to identify when an attack is being spread across multiple servers. The primary conundrum of an IDS installation is tuning the detection system. IDSs typically flood an administrator with data and false positives, and it often takes a full-time IDS operator to tune the systems to reach a point of effectiveness. New to the market and slowly becoming more popular are intrusion prevention systems (IPS). These systems function in a similar fashion to a standard IDS, but when an attack is detected, they are able to act like a firewall and filter the attacking traffic, preventing the attack from reaching the targets.
Once an attack has been detected, the next step is to begin a response. This means taking a logical, carefully prepared plan and putting it into action. Jumping the gun and pulling a plug is rarely the best option.
The first step in handling an attack is to execute a previously devised and tested plan. It is much easier and much safer to put into action a plan that has been considered ahead of time and that has been tested in the past, rather than customizing an instantaneous response. A good plan will include allocating a certain amount of breathing space to get a handle on the full situation before attempting to take remedial actions. The plan should be developed to handle most conceivable possibilities, and each one of these should be tested individually and with an eye toward possible changes or adaptations by the attacker. "Fire drills" to test the DoS response plan should be conducted regularly (at least annually), since no DoS plan is worth the paper it's written on if it hasn't undergone trial by fire.
The first response to most infrastructure and many application denial-of-service attacks is to put in place ACLs or firewall rules to filter traffic from the attacker. Using a sniffer like Ethereal, an RMON probe, or NetFlow data collected from Cisco devices, you should attempt to identify the IP addresses or networks that the attacks are coming from. If network traffic looks normal, begin working your way up the network stack until you reach the application layer to determine the type of attack. Use your baseline analysis to determine what level of traffic from an IP or set of IPs is normal or extreme. If IP addresses are spoofed, you will likely not be able to easily filter the traffic. If the attack is coming from only one IP address or a small set of them, an ACL may quickly and easily end the attack. Traffic floods that are bogging down the network devices or firewalls themselves may require additional help from your ISP.
The next step is to contact your ISP and gain their assistance in dealing with the attack. If the traffic is flooding your connections or is spoofed, your ISP may be able to help provide ways of throttling the attack before it reaches you. Your ISP may also be able to trace back the attack or work with the other network providers they peer with to identify the source of the attack. In the case of application attacks or very determined attackers, contacting the ISP may not be enough on its own.
How an attack is targeted can play a major role in the decision-making process for defending against an attack. For example, an attack that is hard-coded against an IP address may be solved as simply as having the ISP change the IP address of the site and updating the corresponding DNS address. In addition, do not drop your guard if a successful defense stops the attack against the site. It may only be a brief pause until the attacker adapts to the defense and finds a new manner for attacking the site.
The final technique for keeping a site up while under an attack is to shift to an alternate method of handling traffic. This is a technique that is commonly used by sites for handling spikes in traffic loads, whether or not they are caused by an attack. Most commonly, dynamic sites shift to a static content operating mode where fixed content is provided, rather than providing the dynamic content that is typically offered . Most popular sites (Amazon, NYTimes, etc.) use this technique already for the homepage, which is the most often hit page on the site. Under extreme load, sites have to be able to switch as much content as possible to static pages, which require little to no processing. Another method of leveraging this technique is to work with an external caching service like Akamai to have a failover capability. If the site reaches a certain level of load or becomes inaccessible, caching providers can take over serving cached static content to maintain the site's presence until the attack ends or can be dealt with.
|  | ||
