Sockets API Discussion | Network Programming for Microsoft Windows , Second Edition (Microsoft Programming Series)

< Day Day Up >

Let's now walk through the Python socket API and illustrate how the class and instance methods are used.

Creating and Destroying Sockets

We'll begin by investigating how Python creates and destroys local sockets. Python provides a very similar API to the standard BSD4.4 API, as is illustrated in Figures 11.1 and 11.2.

All of the methods discussed in this section require that the application developer make the socket class visible. At the beginning of the Python networking application, a line specifying 'import socket' must be present. The import statement dynamically loads the module so that the contents of the module are visible. The developer still needs to specify the class and method, such as socket.bind.

The socket class provides the primitive standard socket method that is most familiar to C language programmers. This class method can be used to create stream (TCP), datagram (UDP), and even raw sockets.

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM )

Note the similarities of this function to the C language primitive. The primary differences are the Python naming conventions ('socket.' to bring the symbol in scope). In this example, we've created a stream socket. To create a datagram socket, we could do the following:

sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM, 0 )

Finally, a raw socket could be created with:

sock = socket.socket( socket.AF_INET, socket.SOCK_RAW, 0 )

Recall that the socket.AF_INET symbol defines that we're creating an Internet protocol socket. The second parameter (type) defines the semantics of the communication (socket.SOCK_STREAM for TCP, socket.SOCK_DGRAM for UDP, and socket.SOCK_RAW for raw IP). The third parameter defines the particular protocol to use, which is useful here only in the socket.SOCK_RAW case. This can be of type IPPROTO_ICMP, IPPROTO_IP, IPPROTO_RAW, IPPROTO_TCP, or IPPROTO_UDP.

When we're finished with a socket, we must close it. To close our previously created socket, sock, we use the close method.

sock.close()

Once the close method is called, no further communication is possible with this socket.

Any data queued for transmission would be given some amount of time to be sent before the connection physically closes.

It appears that in version 2.2.2 of Python, the close method aborts the socket connection and, with it, any pending data that has yet to be transmitted to the peer. In cases in which larger amounts of data are being transferred, the shutdown method is a better option. The shutdown method allows the caller to shut down the receive path, the transmit path, or both. For example:

sock.shutdown( 0 )      # Shut down reads sock.shutdown( 1 )      # Shut down writes sock.shutdown( 2 )      # Shut down both reads and writes

Socket Addresses

Python has no sockaddr_in structure, but instead a tuple of the form ‘(hostname port)'. Python simplifies the construction of addresses (as compared to other languages, such as Ruby). For example, if we were creating an address representing a host to which we were going to connect, we could do this as:

( 'www.mtjones.com', 5930 )

if our plan was to connect our TCP client socket to the host identified as www.mtjones.com and the port 5930. Consider the example of binding our server to the loopback address, at the same port number. We'd use an address constructed as:

( 'localhost', 5930 )

Most times, we'll bind ourselves (from the C language) to the INADDR_ANY wildcard address. This permits a server to accept connections from any interface on the host (assuming a multi-homed host). In this case, we'd provide an empty string for host, as:

( '', 5930 )

Another example represents the INADDR_BROADCAST host name. In this case, our address is represented as:

( '<broadcast>', 5930 )

In the examples shown, we've used host names to represents the hosts, but IP addresses can also be used (shown in the following example for localhost):

( '127.0.0.1', 5930 )

Therefore, as we see from the prior address examples, dealing with addresses in Python is simple and flexible.

Socket Primitives

In this section, we look at a number of other important socket control methods.

Bind

The bind method provides a local naming capability to a socket. This can be used to name either client or server sockets, but is used most often in the server case. The bind method for the socket class is an instance method and is provided by the following prototype:

sock.bind( addr )

where addr is a socket address tuple. Sample usage of this form of bind is:

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) sock.bind( ('192.168.1.2', 13) )

In this example, we're creating a stream server on port 13 that will accept client connections only on the interface defined as 192.168.1.2.

Finally, let's look at a UDP example. The use of bind with a UDP socket uses the bind method provided by the socket class, the same as provided for TCP sockets. The following example creates a UDP socket and binds it with host INADDR_ANY (accept datagrams from any interface) with port 13:

sock = socket.socket( socket.AF_INET ) sock.bind( ('', 13) )

Note again the simplicity of this pattern, as provided by Python's address tuples.

Listen

Before a server socket can accept incoming client connections, it must call the listen method to declare this willingness. The listen method is provided by the following prototype:

sock.listen( backlog )

The sock argument represents an instance of the server socket and the backlog argument represents the number of outstanding client connections that may be queued. Here's a complete example from socket creation to the listen method.

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) sock.bind( ('192.168.1.2', 13) ) sock.listen( 5 )

The listen method is always paired ultimately with an accept method, because accept is the method used to accept new client connections enabled by the listen method.

Accept

The accept method is the final call made by servers to accept incoming client connections. Before accept can be invoked, the server socket must be created, a name must be bound to it, and the listen method must be invoked. The accept method accepts a new client connection and returns a tuple in the format (conn, address). The conn element is a new socket object that may be used for client communication and the address is the standard address tuple defined previously. Let's look at a complete simple server example:

    serv = socket.socket( socket.AF_INET, socket.SOCK_STREAM )     serv.bind( ('192.168.1.2', 13) )     serv.listen( 1 )     cliconn, (remotehost, remoteport) = serv.accept()     cliconn.send("Hello!")     cliconn.close()     serv.close()

As illustrated in this example, the accept method returns two values. The first, cliconn, is the client socket that can now be used to communicate with the peer (the client that initiated the connection). The second value is the address tuple that identifies the host and port number for the peer socket of the connection. In this example, we accept an incoming client connection, send the string 'Hello!' to the peer, and then close both the client and server sockets.

Connect

The connect method is used by client Sockets applications to connect to a server. Clients must have created a socket and then defined an address structure containing the host and port number to which they want to connect. This is illustrated by the following code segment (which can be used to connect to the previously illustrated accept method example):

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) sa = ( '192.168.1.2', 13 ) sock.connect( sa ) print sock.recv( 100 ) sock.close

We create our address structure (as with previous examples) and then pass this to our connect method using our client socket instance. This short script can be further simplified as follows, by passing the anonymous address tuple to the connect method:

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) sock.connect( ('192.168.1.2', 13) ) print sock.recv( 100 ) sock.close

The connect method (for TCP) blocks until either an error occurs, or the three-way handshake with the server completes. For datagram sockets (UDP), the method binds the peer locally so that the peer address isn't required to be specified for every send method. In the following example, we use the connect method to logically connect our client to an external socket:

sock = socket.socket( socket.AF_INET, socket.SOCK_DGRAM ) sock.connect( ('192.168.1.2', 13) ) sock.send( "Hi!" ) sock.close

In this example, the sendto method is not required because we predefined our destination using the connect method. We investigate this concept more in the Sockets I/O section.

Sockets I/O

A variety of API methods exists to read data from a socket or write data to a socket. These methods are socket class specific. First, we look at reading and writing from connected sockets and then investigate unconnected (datagram-specific) socket communication.

A number of methods are provided to communicate through a socket. Some of the functions include send, recv, sendto, recvfrom, and sendall. We look at examples of each of these functions in the following sections.

Stream Sockets

Stream sockets utilize the send, recv, and sendall socket methods. Let's now look at some sample code that illustrates the stream-specific methods. The first example illustrates a simple echo server built using stream sockets (stream.py):

import socket serv = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) serv.bind( ('', 45000) ) serv.listen( 1 ) while 1:    cli, (remhost, remport) = serv.accept()    cli.send( cli.recv( 100 ) )    cli.close()

In this example, we open a server socket and then await a client connection, storing the newly created client socket in cli. The remote address tuple is ignored in this example. We then simply send what we receive through the client socket back to the source and close the socket. The process then repeats, awaiting a new client connection.

Now, let's look at the TCP client socket that will connect to the previously defined server (client.py):

    import socket     sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM)     sock.connect( ('localhost', 45000) )     sock.send( "Hello\n", 0 )     mystring = sock.recv( 100 )     print mystring     sock.close()

After we create our TCP client socket using the socket method, we connect to the server using the connect method. For connect, we specify the host name and port to which we want to connect. The echo portion of the client comes next, in which we send our message to the server using the send method and then await the echo response with the recv method. We emit this using the print statement and finally close the socket using the close method.

Datagram Sockets

The sendto and recvfrom methods are used exclusively by datagram sockets. What differentiates these calls from our previously discussed stream calls (send/recv) is that these calls include addressing information. Because datagrams are not connected, we must define the destination address explicitly. Conversely, when receiving a datagram, the address information is also provided so that the source of the datagram can be identified.

Let's look at a datagram server and client that provide the echo functionality illustrated previously by our stream socket examples. Our datagram server takes on a slightly different form (serverd.py):

import socket serv = socket.socket( socket.AF_INET, socket.SOCK_DGRAM ) serv.bind( ('', 45000) ) while 1:     reply, (remhost, remport) = serv.recvfrom( 100 )     serv.sendto( reply, (remhost, remport) ) serv.close()

After creating our datagram socket using the socket method with socket.SOCK_DGRAM, we bind the instance of this socket using the bind method. We provide the interface from which we want to accept client connections (all interfaces, or INADDR_ANY) and the port (45000) to bind. Note that we do not use the listen method, as this method is used only for stream sockets.

In our loop, we then receive datagrams using the recvfrom method. Note the semantics of this call-the method returns not one parameter, but two. The two parameters returned represent the response data (reply) and the source address tuple of the datagram ((remhost, remport)). We use the same source address when returning the datagram back to the source using the sendto method. Returning to the recvfrom call, we must also specify the maximum length of data that we want to receive. The script then continues, awaiting another datagram from a client.

The datagram client utilizes the datagram sendto method with the standard recv method. We can use the standard recv here because we're not interested in the source of the datagram. If we needed to know the source of the datagram, then the recvfrom method would need to be used (clientd.py).

        import socket         cli = socket.socket( socket.AF_INET, socket.SOCK_DGRAM )         cli.sendto( "Hello\n", ('localhost', 45000) )         print cli.recv( 100 )         cli.close()

In the datagram client, after we've created our datagram socket, we send our string to the echo server using the sendto method. The echo server is defined in the sendto method as an address tuple (‘localhost', 45000). We then await the response echo datagram using the recv method, and emit it to the terminal using the print statement. Finally, the close method is used to destroy the client socket.

Socket Options

Socket options permit an application to change some of the modifiable behaviors of a socket and change the behavior of the methods that manipulate them. For example, an application can modify the sizes of the send or receive socket buffers or the size of the maximum segment used by the TCP layer for a given socket.

Socket options can be slightly more complicated than dealing with options in the C environment. This is because we're operating in a scripting environment that must interface with the host environment and its corresponding structures. When we're dealing with scalar types (such as the following example), manipulating socket options can actually be easier in Python.

Let's look at a simple scalar example first. Let's say that we want to identify the size of the receive buffer for a given socket. This can be done with the following code segment (opt.py):

    import socket     sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM )     size = sock.getsockopt( socket.SOL_SOCKET, socket.SO_RCVBUF )     print size     sock.close()

First, we create a new stream socket using the socket method. To get the value of a given socket option, we use the getsockopt method. We specify the socket.SOL_SOCKET argument (called level) because we're interested in a Sockets layer option (as compared to a TCP or IP layer option). For the receiver buffer size, we specify the option name socket.SO_RCVBUF. In this simple case, a scalar value is returned from the getsockopt method. This value is stored into our local variable called size. We print this value using the print statement and then close the socket.

Now let's look at how a more complex option is set for a given socket, specifically, the linger option (opt2.py). This option is more complex because it entails the creation of a structure that maps to an operating system type.

Socket linger allows us to change the behavior of a stream socket when the socket is closed and data is remaining to be sent. After close is called, any data remaining will attempt to be sent for some amount of time. If after some duration, the data cannot be sent, then the data to be sent is abandoned. The time after the close to when the data is removed from the send queue is defined as the linger time. In Python, we must construct the structure that is expected by the host environment.

import socket import struct sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) ling = struct.pack('ii', 1, 10) sock.setsockopt( socket.SOL_SOCKET, socket.SO_LINGER, ling ) sock.close()

The linger structure (shown here as ling) contains first an enable (1 for enable, 0 for disable) and then the linger time in seconds. In our example, we're enabling linger and setting the linger value to 10 seconds (packing the structure into two 32-bit words, the structure that is expected by the host environment). We could read back what was configured by:

newlinger = sock.getsockopt( socket.SOL_SOCKET,                                   socket.SO_LINGER, 8 ) (linger_onoff, linger_sec) = struct.unpack( 'ii', newlinger ) print "linger on/off is ", linger_onoff print "linger sec is ", linger_sec

Upon reading the linger array using the getsockopt method, we unpack the array using the 'ii' template and the unpack method of the struct class. This specifies that the two 32-bit integers are packed into the array. Note that the return of unpack is a tuple that contains the enable and the linger time. We emit the tuple elements using the print statement.

Other socket options are shown in Figure 11.3 (for Sockets layer options) and Figure 11.4 (for IP layer options). Note that the option names are identical to their C language counterparts.

Option Name	Level	Value	get/set	Description
SO_KEEPALIVE	SOL_SOCKET	[0,1]	g/s	Tries to keep the socket alive using keepalives.
SO_RCVBUF / SO_SNDBUF	SOL_SOCKET	int	g/s	Size of the socket-layer receive or send buffers.
SO_RCVKIWAT / SO_SNDLOWAT	SOL_SOCKET	int	g/s	Number of bytes requested before select notification.
SO_RCVTIMEO / SO_SNDTIME0	SOL_SOCKET	timeval	g/s	Timeout on recv/send in seconds.
SO_REUSEADDR	SOL_SOCKET	[0,1]	g/s	Permits the local address to be reused.
SO_LINGER	SOL_SOCKET	linger	g/s	Linger on close if data to be transmitted to peer.
SO_OOBINLINE	SOL_SOCKET	[0,1]	g/s	Inlines the out-of-band data.
SO_DONTROUTE	SOL_SOCKET	[0,1]	g/s	Bypass the routing tables.
SO_USELOOPBACK	SOL_SOCKET	[0,1]	g/s	Loop back data sent to the receive path for a routing socket.
SO_BROADCAST	SOL_SOCKET	[0,1]	g/s	Permits sending of braodcast datagrams.
SO_TYPE	SOL_SOCKET	int	g	Return the socket type.
SO_ERROR	SOL_SOCKET	int	g	Return error status.

Figure 11.3: SOL_SOCKET socket options provided in Python.

Option Name	Level	Value	get/set	Description
IP_ADD_MEMBERSHIP	IPPROTO_IP	mreg	s	Join a multicast group.
IP_DROP_MEMBERSHIP	IPPROTO_IP	mreg	s	Remove from a multicast group.
IP_MULTICAST_IF	IPPROTO_IP	inaddr	g/s	Specify the outgoing interface for this multicast socket.
IP_MULTICAST_LOOP	IPPROTO_IP	char	g/s	Specify loopback for this multicast socket.
IP_MULTICAST_TTL	IPPROTO_IP	[1..255]	g/s	Configure the IP TTL field for this multicast socket.
IP_HDRINCL	IPPROTO_IP	int	g/s	Notifies that an IP header is part of the outgoing data.
IP_OPTIONS	IPPROTO_IP	String	g/s	Configure the IP options for this socket (a String of up to 44 octets).
IP_TOS	IPPROTO_IP	[0..255]	g/s	Configure the IP TOS field for this socket.
IP_TTL	IPPROTO_IP	[1..255]	g/s	Configure the IP TTL field for this socket.

Figure 11.4: IPPROTO_IP socket options provided in Python.

Other Miscellaneous Functions

Let's now look at a few miscellaneous functions from the Sockets API and the capabilities they provide. The first method that we discuss provides information about the current host. Method gethostname (of the socket class) returns the string name of the host:

str = socket.gethostname() print str

The DNS resolver permits us to resolve a host name to an IP address, or vice versa. Method gethostbyname provides IP address resolution given an FQDN, for example:

str = socket.gethostbyname("www.microsoft.com") print str

where the return string represents the IP address of the FQDN. What if we wanted to go in the opposite direction, providing an IP address and receiving an FQDN? To achieve this, we use the gethostbyaddr:

str = socket.gethostbyaddr( "207.46.134.155" ) puts str[0]

The IP address string is contained within the first element of the first array (in String format).

Now, let's consider the problem of identifying the port number for a given service. To specifically retrieve the port number associated with a service, we use the getservbyname method. This method takes two arguments, the first is the string name of the service that is desired (such as 'http') and the second is the particular protocol over which the service is run (such as 'tcp'). To identify the port number for the SMTP procotol, we could do the following:

portnum = socket.getservbyname( "smtp", "tcp" )

which would return, in this case, 25.

Notification

Let's now look at the concept of event notification for sockets. This capability is commonly provided by the select primitive. In Python, the select method is provided by the ‘select' module.

As we saw with C, the descriptors representing our sockets are provided to the select method for which we desire notification when an event occurs. We can configure the select method to tell us when one or more channels are readable, writable, or if an error occurs on them. Further, we can also tell the select method to return after a configurable number of seconds if no event occurs. Consider the following Python TCP server in Listing 11.1.

Listing 11.1 Python TCP server illustrating the select method.

import socket import select serv = socket.socket( socket.AF_INET, socket.SOCK_STREAM ) serv.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 ) sa = ("localhost", 45000) serv.bind( sa ) serv.listen( 1 ) # Server loop while 1:   cli, (remhost, remport) = serv.accept()   # Client loop   while 1:     socki, socko, socke = select.select( [cli], [], [], 5 )     if [socki, socko, socke] == [ [], [], [] ]:       cli.send("Timeout!\n")     else:       for sock in socki:         str = sock.recv( 100 )         sock.send( str )

This very simple server awaits a client connection and then upon receiving one will echo whatever it receives from the subsequently created client socket. To determine when the client has sent something to be echoed, we use the select method. The select method works with descriptors, which can be either file or socket descriptors. In Windows, the select method works only with socket descriptors, but on Unix and on Macintosh, descriptors can be socket or file descriptors.

In the select method, we specify three lists representing our request for event notification. The first is a list of descriptors for which we want to be notified if a read event is generated (data is available on the descriptor for read). The second is a list of descriptors for write events, and the third is for error events (or exceptions). The final integer argument represents how many seconds to await an event before timing out. Note that we construct the lists as arguments to select. These lists could also be constructed externally using the list module methods.

The return of the select method are three lists that represent the descriptors for which events were generated. The three lists are in the same order as were provided to the select method (read, write, and exception). Our first check is to see whether a timeout occurred. This can be determined by checking the three lists to see if each is empty. We perform this check first, and if the timeout occurred, we emit a timeout message to the client through the client socket.

If the three lists are not empty, then based upon our configuration, we can assume that something changed in the read list (because this is the only list in which we identified a descriptor to the select method). We use an iterator here to step through the returned list, though this is overkill because we know that if a read event was generated, it was based upon our only defined descriptor (cli).

With a read event known, we use the recv method to read the available data from the client socket and then write it back out using the send method (the echo). We then loop back to the while loop awaiting a new read event for our client socket.

< Day Day Up >