Monday 4 February 2013

Some thoughts on Distributed Denial of Service (DDoS) attacks

I recently answered a question on StackOverflow on how web applications could be protected from a DDoS. This was a popular question, and (for me) a very popular answer, and a few other members added their thoughts.  I have expanded on my post, done more research, and rewritten large portions.

UPDATE: Spamhaus was subject to an enormous DDoS in March 2013 and they turned to Cloudflare, a professional CDN I mentioned in this posting. I discuss the attack in a follow-up posting here and a second here.

What is a Distributed Denial of Service?


DDoS is a family of attacks which overwhelm key systems in the datacenter, disabling your web site and the web applications you are running on it. DDoS attacks are very common and carried out for a number of reasons:

  • Political or ideological individuals, or groups as Anonymous
  • Criminal groups using them either to:
    • Extort money from business that don't know how or can't afford to protect themselves or
    • Are paid to damage business competitors reputation or ability to trade on the web

The first DDoS attacks were very simple, but they have evolved over the years taking advantage of any weakness in the network protocols, firewalls, operating systems and web servers. However, at the most basic level, a DDoS can be indistinguishable from just a lot of users accessing your services - almost like a digital Occupy.

DDoS can attack any one (or indeed several) of the many different components of the application stack including:
  • The hosting center's network connection to the internet
  • The hosting center's internal network and routers
  • Your firewall and load balancers
  • Your web servers, application servers and database.
DDoS is just one way hackers can attack your systems and services. When securing your systems, you should consider defense against DDoS as just one aspect of your overall security posture.

There's a lot of information below the break.

How much effort should I invest in protection?


Before you start planning your DDoS defence consider the:
  • Value-at-risk
    • For a non-critical, free-to-use service for a small community, the total value at risk might be peanuts 
    • For a paid-for, public-facing, mission-critical system for an established multi-billion dollar business, the value might be the worth of the company
  • Likelihood of attack
    • This may be very low for an uncontroversial open-source project
    • If you are offering betting/gambling services, or your site has politically charged views, the risk of attack is much higher.
For low-value, low-risk sites, you probably don't want to invest much time or money in defending yourself from an unlike attack. If your company is at high risk, and your company's future depends on your web presence, you really need to be hiring professional, certified security experts.

What should I do to protect myself?


Attackers choose the time of the attack; they have a lot more experience that you (this is their business); attackers have more resources than you (it is cheap to purchase a botnet); they may have a brand new attack that no-one has a defence for. So what can be done?

To defend against DDoS, you need defence in-depth - protection at every level of your solution stack - so that you have options and fallbacks while you are under attack.

The rest of the article discusses how you can configure your entire system with defensive measures. You might not need to implement every idea here, but at least consider the following topics:
  • Hosting centers
  • Content delivery networks
  • Security patching
  • Firewalls and security appliances
  • Network monitoring and intrusion detection
  • Operating system and web servers
  • System logging
  • Application level logging
  • Controlling anonymous access
  • Controlling sessions and users
  • Database tier

Hosting center


If you are expecting a DDoS  it's a very good idea to qualify your hosting provider on the level of protection they can provide. Many hosting centers have DDoS experience and tools to monitor and mitigate it - understand their tools, processes and escalation procedures. Also ask about what support the hosting provider has from their upstream providers. These services might mean more up-front or monthly cost, but treat this as an insurance policy. A simple query on Google will list many such providers.

Most hosting plans specify bandwidth - and that is the first thing that will be eaten up by a DDoS - so be prepared to pay to upgrade your plan to unlimited. Even with unlimited bandwidth, the disruption on their other customers might force them to to disconnect your web site.

Your hosting provider's network monitoring tools may be the first to spot a DDoS - make sure that they have a clear, 24x7 escalation path to your IT organisation. Know their emergency numbers, chat handles, etc, and be on good terms with them.

The Hosting Center should be able to help you by:
  • Firewalling your servers, including blocking all unused ports and services
  • Monitoring the network and services on your servers
  • Protecting you from network-level attacks (such as TCP Syn or AcCK Floods, UDP Floods, ICMP Floods, port scans and similar)
  • Blocking traffic from Internet regions or countries
  • Some may even be able to allow only whitelisted IPs (if your business model permits)

Content delivery network


Use a Content Delivery Network such as CloudFlare both to distribute content and services close to your end users and also to hide your real servers from the DDoS attackers. CDNs offer DDoS services similar to hosting centers. In addition, a DDoS attack might not be large enough to take out all the CDN's edge nodes, and so there will still be routes customers can go to use your web site and services.

Security patching


Keep all your systems and software packages updated with the latest security patches - and I mean all of them:
  • Managed switches - yup even switches sometimes need updating
  • Routers
  • Firewalls
  • Load balancers
  • Operating systems
  • Web servers
  • Languages and their libraries

Firewalls and security appliances


Ensure that you have a good firewall or security appliance set up and regularly reviewed by a qualified security expert. Strong rules on the firewall are a great defence against many simple attacks.

Many firewalls can throttle bandwidth to your web site by various parameters - this might help under some circumstances.

Network monitoring and intrusion detection


Have good network monitoring tools in place - this can help you understand:
  • That you're under attack rather than simply being under heavy load
  • Where the attack is coming from (which may include countries you don't normally do business with) and
  • What the attack actually is (ports, services, protocols, IPs, URIs and packet contents)
Ensure that your monitoring tools are specifically monitoring the web pages and services your are offering. If the number of page requests rockets up, or response time falls below your SLA, your network monitor should trigger an alarm.

Make sure that you have alarms set up that escalate warnings to your IT, 24x7. Run regular DDoS attack fire-drills to test your network monitor. Make sure that the warnings are actually received by IT.

Operating system and web server


Your operating system and web server have many points to configure for performance and security. In general, remove every feature of the software stack that you don't absolutely need and switch off every network service except those you strictly need.

To access your servers, I like SSH - the go-to secure terminal. There are many goodies that come with it including secure file copying (scp and sftp), securely tunnelling ports and even the ability to mount the server's file system remotely using sshfs in FUSE or FISH on KDE

There are also many open source and paid-for security modules and packages that you can install and configure, including:

System logging


A great way of monitoring what is happening on your server are log files, so ensure that your OS and major services (web server, application server and DB) are writing logs, and that they are writing the right amount - not horrendously verbose debug logs, but also not too little.

Make sure log files are written to a different file system volume from the main operating system or the application's data directories (otherwise a DDoS would work by completely filling your disk and crashing the application or OS).  

Apache by default already has great logs that record, among other things:
  • IP address of requestor (don't reverse DNS this - ironically the cost of doing this makes a DDoS easier for the attackers)
  • Timestamp
  • HTTP method and URI
  • HTTP response code (i.e. logs failures of the application tier)

Ensure the OS has logging on, especially for security alerts. It can be useful if, on every computer and VM in your system, you can log the following at regular intervals (say 1 minute):
  • Load average
  • CPU utilization
  • Disk I/O, 
  • Network I/O
If you are properly securing an OS, there are a lot more services and logs to consider, but these are strictly out of the scope for this blog.

Application level logging


A recent trend in DDoS attacks has been to simply overload the application server with valid (and sometimes invalid) requests. In a DDoS, these will come from thousands to millions of different IP addresses, so it's not necessarily going to be easy to block each malicious IP. The attack may take several forms:

  • The attack fires millions of valid requests which overloads the application server or database by brute force
  • The attack finds a particular service that is particularly computationally expensive (think data mining report) and triggers it as many times, thereby overloading the server
  • The attack may seek to save vast amounts of data in the application, eventually clogging up the file system of the web or database tiers. Once the file system is full, no more data can be saved which might even crash the application
  • The attacker finds a web page or service that crashes when presented with illegal arguments, and keeps repeating it

To diagnose the attack, in addition to web server logs, you need good application level monitoring of the requests and responses from your application - a log file is great for this. Your application should write the following fields into the log file:
  • Logical name for the module or service being invoked
  • User ID, if present
  • All, or at least the important, HTTP arguments (especially important as Apache won't log the PUT and POST arguments by default)
  • Any important intermediate results
  • Summary of the result sent back to the user
  • Debugging information if a service crashes (such as file and line number of crash, exception and other helpful information). You should never send this kind of data back to the HTTP client.
  • Thread identifier: Your application is multi-threaded so you have to use some key to tie all the lines written in each logical request.
Remember to write an audit for all important or security-related actions such as:
  • Create user/account
  • Login
  • Logoff/session timeout
  • Change password
  • Delete account

I regularly review my log files and get positively irate if there is rubbish like the following in my logs:
Got value 12 
!!!!Cannot get here!!!! 
#################### step 12

Why did the developer put a line of #####'s? It's there to fight the ======= from another developer who had successfully made his message stand out from the third deverloper's -----------.  This escalation is log file cacophony which is positively harmful. I review the DVCS 'blame' and educate the respective developers.

I like to think of log files as if they were exported database tables - you should be able to import them as columnar data, 'join' different logs using well-defined keys (time, userid, IP), and merge consecutive rolling log file segments. If your log contains properly structured data, it can easily be loaded into a log tool (or Excel or similar). It will also be easily parseable with command-line tools (grep, sed, awk). Remember that a DDoS will generate millions of lines of log, so you will need these tools. 

Make sure that all this information is easily retrievable from the data center - I like to use SSH and scp which I discussed earlier - and that you can correlate logs from different computers and services (i.e. ensure all computers are time synchronized using ntp).

You will likely need to slice'n'dice your logs (especially with respect to URI, time, IP and user) to work out what is going on, and need to generate reports such as:
  • What URIs are being accessed
  • What URIs are failing at a high rate (a likely indicator of the specific URIs the attackers are attacking)
  • Which users are currently logged onto the service
  • From how many IPs is each user accessing the service (which should be no more than a handful)
  • What URIs are anonymous users accessing
  • What arguments are being used for a given service
  • An audit of a specific users actions
  • Which services have been invoked and what arguments/data are sent
  • Which users are doing the invoking and from which IPs (i.e. logging in your application)
  • What queries and inserts/updates/deletes the DB is performing

Controlling anonymous access


DDoS attacks often attack only those pages and services accessible to anonymous users. To handle attacks against anonymously visible services, you can implement the following:

  • Use a QoS feature in the load balancer to send all anonymous sessions to separate application servers in your cluster, while logged-on users use another set. This prevents an application-level anonymous DDoS taking out valuable customers
  • Require an anonymous HTTP session initiated by a CAPTCHA
  • If an IP fails a large number of CAPTCHA trials or logins, block the IP (preferably at using the hosting center or CDN DDoS services)
  • Be able to switch off anonymous access for the duration of the attack

Controlling sessions and users

  • Implement an application-level rate limit (like Twitter) so that users are limited to, say:
    • 10 requests a second
    • 100 a minute and 
    • 1000 an hour 
    • The actual limits are dependent on your application and the strength of the attack
  • Have a configurable session timeout
  • Ensure that a user has a limit to the number of separate concurrent sessions (to prevent a hacked account logging on a million times and accessing privileged URIs)
  • If possible make these constraints dynamic, or at least configurable. This way, while you are under attack, you can set aggressive temporary limits in place ('throttling' the attack), such as only one session per user, and no anonymous access. This is certainly not great for your customers, but a lot better than having no service at all
  • Have a administration capability to terminate user sessions and disable users.

Note that, if you implement these suggestions, each request will use slightly more CPU and memory resource - in other words these features make a DDoS easier. Therefore, implement using simple, quick heuristics. Don't implement complex pattern detection algorithms in your application - leave this to the firewalls and network monitoring applications. 


Database tier


While you can deploy different instances of your web or application server for anonymous, free and paid users, these instances can all use the same database instance. If, say, the anonymous web site is hit with a DDoS, the load on the database might cause an outage for your paid users. What can you do?

If your database is of the expensive, enterprisy type, it should support resource management. In this case, create different database users for different classes of use, for example:
  • Anonymous, free, paid
  • Transactional use
  • Expensive, slow-running reports

On the database, you can now configure exactly how much CPU and other resources these can access (Oracle, DB2). In another plus, it's a whole lot easier monitoring which class of use is loading the database.

In you application configuration, set up database connection pools as appropriate. An additional point of control is to limit the number of connections for some usage classes.

Security plan


I strongly advise you to write a Security Plan covering (among other things):
  • What is at risk, and the cost to the business
  • How an attack is detected
  • Measures taken to protect the assets
  • PR response
  • The escalation procedure and response to attack
  • Processes to keep the system and this document up-to-date
  • Tests to be executed to test the protective capability and IT processes. 
Get this internally reviewed by all relevant parties:
  • Business, to evaluate the cost-effectiveness and understand the business risk of a DDoS
  • Management, to ensure the plans and processes are implementable and conform to HR and other company policies
  • SW dev team as the developers of the application and system
  • QA team who should plan and run the DDoS tests
  • IT team who will own and operate the system, and be the team detecting and managing a DDoS
  • A security expert.

The process of writing the document will cause you and your team to think through the issues more thoroughly. Having an approved DDoS response plan will help you to be prepared if the worst should happen at 3am on your day of. You won't have to think so much on your feet; you can focus on the actual attack; you will communicate with the rest of the organisation at the right levels to get resources and access to key personel; management will not be blindsided and can follow the scripted plan.

What's the fastest and most common way to stop an attack?


Almost certainly a DDoS is trying to get something from the organisation - money, a change in the company's stance to some political issue or business plan. The fastest way to stop an attack is probably to give in to the blackmail - this might not be desirable or indeed possible.

Otherwise, the first thing you to do is contact your hosting and/or CDN provider and work with them (if they haven't contacted you already asking what the hell is going on...).

Start data-mining your logs. Remember to correlate firewall, web server and application level logs. Try and work out the pattern of the attack - it might be a single URI or a group of them. If there is a pattern you can regex, you can build a firewall rule to block it (preferably on the hosting center's firewall). You can also block it in your web server configuration.

Be prepared to share your logs (any and all information) with the provider; these logs, combined with their network monitors, may together provide enough information to block/mitigate the attack.

You should consider switching off anonymous access and throttling the services under attack (i.e. decrease the application's rate limit for the service).

If lucky and you have a small, fixed customer-base, you might be able to determine your valid customers IP addresses. If this is the case, you might switch to switch the firewall rules to white-list IPs for a short while. Make sure all your customers know this is going on so they can call if they need to access from a new IP :)

Concluding thoughts

This is an essay with a list of ideas - I am sure that it's not comprehensive, and it's certainly not prescriptive. Nevertheless I hope the ideas will help you as you plan and build the right level of security and keep it cost-effective.

If you need to protect a web site or service, you should definitely read other articles and consult with security professionals.

Please comment if you have additional ideas or want to highlight deficiencies in mine.

No comments:

Post a Comment