9.5 Protecting Data on the Web

only for RuBoard - do not distribute or recompile

The Web isn't a secure environment. The open nature of the networking and web protocols TCP, IP, and HTTP has allowed the development of many tools that can listen in on data transmitted between browsers and web servers. It is easy to snoop on passing traffic and read the contents of HTTP requests and responses. With a little extra effort, a hacker can manipulate traffic and even masquerade as another user.

If an application transmits sensitive information over the Web, an encrypted connection should be provided between the browser and the web server. The information that would warrant an encrypted connection includes:

Sensitive information held on the server; e.g., commercial-in-confidence documents and bank account balances
User credentials usernames and passwords used to gain access to sensitive services such as online banking or the administration of the winestore
Personal details collected from the user, such as credit card numbers
Session IDs used by the server to link HTTP requests to session variables

In this section we focus on the common method of encrypting data sent over the Web using the Secure Sockets Layer. We discuss the basic mechanics of SSL in this section, and provide an installation and configuration guide for SSL and Apache as part of Appendix A.

This section isn't designed to cover the enormous topic of encryption. We limit our brief discussion to the features of SSL, and how SSL can protect web traffic. More details about cryptographic systems can be found in the references listed in Appendix E.

9.5.1 The Secure Sockets Layer Protocol

The data that is sent between web servers and browsers can be protected using the encryption services of the Secure Sockets Layer protocol, SSL. The SSL protocol addresses three goals:

Privacy: The content of a message transmitted over the Internet can't be understood by a casual (or determined) observer.
Integrity: The contents of a message received are correct and has not been tampered with.
Authentication: Both the sender and receiver of a message can be sure of each other's identity.

SSL was originally developed by Netscape, and there are two versions: SSL v2.0 and SSL v3.0. We don't detail the differences here, but Version 3.0 supports more security features than 2.0. The SSL protocol isn't a standard as such, and the Transport Layer Security 1.0 (TLS) protocol has been proposed by the Internet Engineering Task Force (IETF) as an SSL v3.0 replacement.

9.5.1.1 SSL architecture

To understand how SSL works, you need to consider how browsers and web servers actually send and receive HTTP messages. Browsers send HTTP requests by calling on the host systems' TCP/IP networking software, the software that does the work of sending and receiving data over the Internet. When a request is to be sent for example when a user clicks on a hypertext link the browser formulates the HTTP request in memory and uses the host's TCP/IP network service to send the request to the server. TCP/IP doesn't care that the message is HTTP; it is only responsible for getting the complete message to the destination. When a web server receives a message, data is read from its host's TCP/IP service and then interpreted as HTTP. We discuss the relationship between HTTP and TCP/IP in more detail in Appendix B.

As shown in Figure 9-4, The SSL protocol operates as a layer between the browser and the TCP/IP services provided by the host. A browser passes the HTTP message to the SSL layer to be encrypted before the message is passed to the host's TCP/IP service. The SSL layer, configured into the web server, decrypts the message from the TCP/IP service and then passes it to the web server. Once SSL is installed and the web server is configured correctly, the HTTP requests and responses are automatically encrypted. There is no scripting required to use the SSL services.

Figure 9-4. HTTP clients and servers, SSL, and the network layer that implements TCP/IP

Because SSL sits between HTTP and TCP/IP, secure web sites technically don't serve HTTP, at least not directly over TCP. URLs that locate resources on a secure server begin with https://, which means HTTP over SSL. The default port for an SSL service is 443, not port 80 as with HTTP; for example, when a browser connects to https://secure.example.com, it makes a TCP/IP connection to port 443 on secure.example.com. Most browsers and web servers can support SSL, but keys and certificates need to be included in the configuration of the server (and possibly the browser, if client certification is required).

9.5.1.2 Cipher suites

To provide a service that addresses the goals of privacy, integrity, and authentication, SSL uses a combination of cryptographic techniques and functions, such as message digests, digital certificates, and, of course, encryption. There are many different standard algorithms that implement these functions, and SSL can use different combinations to meet particular requirements, such as being legal to use in a particular country! When an SSL connection is established, clients and servers negotiate the best combination of techniques based on common capabilities to ensure the highest level of protection. The combinations of techniques that can be negotiated are known as cipher suites.

9.5.1.3 SSL sessions

When a browser connects to a secure site, the SSL protocol performs the following four steps:

A cipher suite is negotiated. The browser and the server identify the major SSL version supported, and then the configured capabilities. The strongest cipher suit that can be supported by both systems is chosen.
A secret key is shared between the server and the browser. Normally the browser generates a secret key that is asymmetrically encrypted using the server's public key. Only the server can learn the secret by decrypting it with the corresponding private key. The shared secret is used as the key to encrypt and decrypt the HTTP messages that are transmitted. This phase is called the key exchange.
The server is authenticated to the browser by examining the server's X.509 digital certificate. Often browsers are preloaded with a list of certificates from Certification Authorities, and authentication of the server is transparent to a user. If the browser doesn't know about the certificate, the user is warned.
The server examines the browser's X.509 certificate to authenticate the client. This step is optional and requires that each client be set up with a signed digital certificate. Apache can be configured to use fields from the browser's X.509 certificate as if they were the username and password encoded into an HTTP Authorization header field. Client certificates aren't commonly used on the Web.

These four steps briefly summarize the network handshaking between the browser and server when SSL is used. Once the browser and server have completed these steps, the HTTP request can be encrypted by SSL and sent to the web server.

The SSL handshaking is slow, and if this was to occur with every HTTP request, the performance of a secure web site would be poor. To improve performance, SSL uses the concept of sessions to allow multiple requests to share the negotiated cipher suite, the shared secret key, and the certificates. An SSL session is managed by the SSL software and isn't the same as a PHP session.

9.5.1.4 Certificates and Certification Authorities

A signed digital certificate encodes information so that the integrity of the information and the signature can be tested. The information contained in a certificate that is used by SSL includes details about the organization and the organization's public key. The public key that is contained in a certificate matches a private key held by the organization that is configured into the organization's web server. The browser uses the public key when an SSL session is established to encrypt a secret. The secret can only be decrypted using the private key configured into the organization's server. Encryption techniques that use a public and private key are known as asymmetric, and SSL uses asymmetric encryption to exchange a secret key. The secret key can then be used to encrypt the messages transmitted over the Internet.

A signed certificate also contains details about the Certification Authority (CA). The CA digitally signs a certificate by adding its own organization details, an encrypted digest of the certificate, and its own public key. With this information encoded, the complete signed certificate can be verified as being correct.

There are dozens, perhaps hundreds, of CAs. A browser or the user confronted by a browser warning can't be expected to recognize the digital signatures from all these authorities. The X.509 certificate standard solves this problem by allowing issuing CAs to have their signatures digitally signed by a more authoritative CA, who can in turn have its signature signed by yet another, more trusted CA. Eventually the chain of signatures ends with that of a root Certification Authority. It is the certificates from the root CAs that are often preinstalled in a browser. Some browsers allow users to add their own trusted certificates.

Self-signed certificates can be created and used to configure a web server with SSL. We show how to create self-signed certificates in Appendix A. But will they be trusted? The answer is probably not for secure applications.

only for RuBoard - do not distribute or recompile