Clients, Servers, and Protocols


 
Network Programming with Perl
By Lincoln  D.  Stein
Slots : 1
Table of Contents
Chapter  3.   Introduction to Berkeley Sockets

    Content

Clients , Servers, and Protocols

Network communication occurs when two programs exchange data across the net. With rare exceptions, the two programs are not equal. One, the client, initiates the connection and is usually, but not always, connected to a single server at a time. The other partner in the connection, the server, is passive, waiting quietly until it is contacted by a client seeking a connection. In contrast to clients, it is common for a server to service incoming connections from multiple clients at the same time.

Although it is often true that the computer ("host") that runs the server is larger and more powerful than the client machine, this is not a rule by any means. In fact, in some popular applications, such as the X Windows System, the situation is reversed . The server is usually a personal computer or other desktop machine, while the client is run on a more powerful "server class" machine.

Most of the network protocols that we are familiar with are client-server applications. This includes the HTTP used on the Web, the SMTP used for Internet e-mail, and all the database access protocols. However, a small but growing class of network applications is peer-to-peer. In peer-to-peer scenarios, the two ends of the connection are equivalent, each making multiple connections to other copies of the same program. The controversial Napster file-sharing protocol is peer-to-peer, as are its spiritual heirs Gnutella and Freenet.

Protocols

We've thrown around the word protocol, but what is it, exactly? A protocol is simply an agreed-upon set of standards whereby two software components interoperate . There are protocols at every level of the networking stack (Figure 3.1).

Figure 3.1. The layers of the TCP/IP stack

graphics/03fig01.gif

At the lowest level is the hardware or datalink layer, where, for example, the drivers built into Ethernet network interface cards have a common understanding of how to interpret the pulses of electric potential on the network wire in terms of Ethernet frames , how to detect when the wire is in use by another card, and how to detect and resolve collisions between two cards trying to transmit at the same time.

One level up is the network layer. At this layer, information is grouped into packets that consist of a header containing the sender and recipient's address, and a payload that consists of the actual data to send. Payloads are typically in the range of 500 bytes to 1500 bytes. Internet routers act at the IP layer by reading packet headers and figuring out how to route them to their destinations. The main protocol at this layer is the Internet Protocol, or IP.

The transport layer is concerned with creating data packets and ensuring the integrity of their contents. The two important protocols at this layer are the Transmission Control Protocol (TCP), which provides reliable connection-oriented communications, and the User Datagram Protocol (UDP), which provides an unreliable message-oriented service. These protocols are responsible for getting the data to its destination. They don't care what is actually inside the data stream.

At the top of the stack is the application layer, where the content of the data stream does matter. There is an abundance of protocols at this level, including such familiar and unfamiliar names as HTTP, FTP, SMTP, POP3, IMAP, SNMP, XDMCP, and NNTP. These protocols specify, sometimes in excruciating detail, how a client should contact a server, what messages are allowed, and what information to exchange with each message.

The combination of the network layer and the transport layer is known as TCP/IP, named after the two major protocols that operate at those layers.

Binary versus Text-Oriented Protocols

Before they can exchange information across the network, hosts have a fundamental choice to make. They can exchange data either in binary form or as human-readable text. The choice has far-reaching ramifications .

To understand this, consider exchanging the number 1984. To exchange it as text, one host sends the other the string 1984 , which, in the common ASCII character set, corresponds to the four hexadecimal bytes 0x31 0x39 0x38 0x34 . These four bytes will be transferred in order across the network, and (provided the other host also speaks ASCII) will appear at the other end as "1984".

However, 1984 can also be treated as a number, in which case it can fit into the two-byte integer represented in hexadecimal as 0x7C0 . If this number is already stored in the local host as a number, it seems sensible to transfer it across the network in its native two-byte form rather than convert it into its four-byte text representation, transfer it, and convert it back into a two-byte number at the other end. Not only does this save some computation, but it uses only half as much network capacity.

Unfortunately, there's a hitch. Different computer architectures have different ways of storing integers and floating point numbers . Some machines use two-byte integers, others four-byte integers, and still others use eight-byte integers. This is called word size . Furthermore, computer architectures have two different conventions for storing integers in memory. In some systems, called big-endian architectures, the most significant part of the integer is stored in the first byte of a two-byte integer. On such systems, reading from low to high, 1984 is represented in memory as the two bytes:

 0x07    0xC0 low  -> high 

On little-endian architectures, this convention is reversed, and 1984 is stored in the opposite orientation:

 0xC0    0x07 low  -> high 

These architectures are a matter of convention, and neither has a significant advantage over the other. The problem comes when transferring such data across the network, because this byte pair has to be transferred serially as two bytes. Data in memory is sent across the network from low to high, so for big-endian machines the number 1984 will be transferred as 0x07 0xC0 , while for little-endian machines the numbers will be sent in the reverse order. As long as the machine at the other end has the same native word size and byte order, these bytes will be correctly interpreted as 1984 when they arrive . However, if the recipient uses a different byte order, then the two bytes will be interpreted in the wrong order, yielding hexadecimal 0xC007 , or decimal 49,159. Even worse , if the recipient interprets these bytes as the top half of a four-byte integer, it will end up as 0xC0070000 , or 3,221,684,224. Someone's anniversary party is going to be very late.

Because of the potential for such binary chaos, text-based protocols are the norm on the Internet. All the common protocols convert numeric information into text prior to transferring them, even though this can result in more data being transferred across the net. Some protocols even convert data that doesn't have a sensible text representation, such as audio files, into a form that uses the ASCII character set, because this is generally easier to work with. By the same token, a great many protocols are line-oriented, meaning that they accept commands and transmit data in the form of discrete lines, each terminated by a commonly agreed-upon newline sequence.

A few protocols, however, are binary. Examples include Sun's Remote Procedure Call (RPC) system, and the Napster peer-to-peer file exchange protocol. Such protocols have to be exceptionally careful to represent binary data in a common format. For integer numbers, there is a commonly recognized network format. In network format, a "short" integer is represented in two big-endian bytes, while a "long" integer is represented with four big-endian bytes. As we will see in Chapter 19, Perl's pack() and unpack () functions provide the ability to convert numbers into network format and back again.

Floating point numbers and more complicated things like data structures have no commonly accepted network representation. When exchanging binary data, each protocol has to work out its own way of representing such data in a platform-neutral fashion.

We will stick to text-based protocols for much of this book. However, to give you a taste for what it's like to use a binary protocol, the UDP-based real-time chat system in Chapters 19 and 20 exchanges platform-neutral binary messages.


   
Top


Network Programming with Perl
Network Programming with Perl
ISBN: 0201615711
EAN: 2147483647
Year: 2000
Pages: 173

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net