About TCPIP Network Protocols | The Assembly Programming Master Book

About TCP/IP Network Protocols

The material of this section looks somewhat strange in this book. However, it will be followed by material related to sockets programming. Therefore, I must provide an elementary introduction related to the topics that you'll certainly encounter if you are going to program sockets.

About the Open Systems Interconnection Model

In the open systems interconnection (OSI) model, networking tools are divided into seven layers. OSI was developed by the International Standard Organization. The OSI model layers are outlined in Table 17.1.

Table 17.1: Open systems interconnection model layers
Layer	Description
Physical	Transmits bits through physical links, such as coaxial cable, twisted pair, or fiber- optic cable. At this layer, physical media characteristics and signal parameters are defined. Sometimes, this layer is called the hardware layer.
Data-link	Ensures data frame transmission between any two nodes in networks with typical topology or between two neighboring nodes in networks with arbitrary topology. Data-link layer protocols implement a certain structure of links between computers and the methods of their addressing. Addresses used in the data-link layer protocols in LANs are often called MAC addresses. The data-link layer is divided into two sublayers : Medium access control (MAC) Logical link control (LLC)
Network	Ensures data delivery between any two nodes in networks with arbitrary topology. The network layer doesn't ensure reliable data transmission.
Transport	Ensures data transmission between any two network nodes with the required reliability level. To achieve this, the transport layer provides functional capabilities for establishing a connection, packet numbering, buffering and ordering, and taking into account duplicated packets.
Session	Provides the means of controlling communications between interacting parties, determining which of them is active, and synchronizing within the framework of the message exchanging procedure.
Presentation	Deals with the external representation of the data. Various types of data conversion can take place at this layer, including data compression and decompression , encryption, and decryption.
Application	The set of various network services provided to users and applications. Examples of such services are e-mail, terminal services, and file transmission.

More details about the OSI model can be found in an old but comprehensive book by Barry Nance [18], which haven't become obsolete even now.

About the TCP/IP Family

Protocols of the TCP/IP family (TCP/IP stands for Transmission Control Protocol/Internet Protocol) form a four-layer structure. A schematic representation of this structure is provided in Fig. 17.2.

Application	HTTP, FTP, SMTP, SNMP, telnet
Transport	TCP, UDP
Network	IP, RIP, ARP, ICMP, OSPF
Link	Device drivers

Figure 17.2: The TCP/IP family

The TCP/IP family has a long history; nevertheless, these protocols are the most widely used. Among contemporary operating systems, it is impossible to find one that wouldn't support these protocols. Partially, this is because the Internet is built on the basis of TCP/IP.

The lowest layer in the TCP/IP stack is not regulated , but it supports all known protocols of the physical and data-link layers of the OSI model.

The next , or third, layer is the internetworking layer. It transmits packets using the transport technology of LANs, transmission networks, communications links, etc. The main protocol of the network layer is IP. IP is the datagram protocol (see the note that follows ), which doesn't guarantee the delivery of the packet to the destination node. This layer includes all protocols related to creating and modifying routing tables. These protocols include the Routing Internet Protocol (RIP) and the open shortest path first protocol for collecting the routing information and the Internet control message protocol (ICMP) for internetwork control messages.

Note	A datagram is a network packet transmitted independently of other packets without establishing a logical connection.

The second layer is considered the main layer of the TCP/IP stack. Protocols such as TCP and the User Datagram Protocol (UDP) ^[i] operate at this layer. TCP ensure the transmission of messages between distributed processes, forming virtual connections. UDP ensures the transmission of application packets using the datagram method (similar to IP) and carries out the functions of a link between IP and application processes.

The first layer is the application layer. It includes higher-layer protocols. For example, hypertext transfer protocol allows data transmission in the form of Web pages and the exchange of files between the nodes of a Wide Area Network (WAN).

About Internet Protocol Addressing

The possibility of quickly finding the required destination node is the main feature of computer networks. In IP networks, there are three layers of addressing:

In a LAN, physical or local addressing is based on the network adapter number. These unique 6-byte addresses are assigned by equipment manufacturers. In WANs, local addresses are assigned by the network administrator.
An IP address comprises 4 bytes. Traditionally, IP addresses are written in a decimal notation (i.e., separating bytes with dots), for example: 137.50.50.83. IP addresses can be assigned manually by a network administrator or automatically by the system. An address consists of two parts the network address and the host address (more details will be provided later in this chapter). A host may be the member of several networks; therefore, it can have several IP addresses. An IP address is independent of the physical address; therefore, it is not a physical characteristic of a computer. Quite contrary, it is a logical characteristic of a specific host.
The symbolic address or identifier can consist of several parts separated by dots. Symbolic addresses are assigned by the network administrator. For example, a name such as SERV1.BANK.COM consists of three parts: the domain name COM, the organization name BANK, and the computer name SERV1.

Consider IP addresses in more detail. The address provided earlier as an example (137.50.50.83) can be written in a binary notation:

 10001001 00110010 00110010 01010011

Fig. 17.3 illustrates the existing file classes of IP addresses. As you can see, only the first three classesA, B, and Care used to address computers ( hosts ). The choice of the address class depends on the scale of the network (large, medium, or small).

Class A
	Network number					Host number (3 bytes)
Class B
1		Network number				Host number (3 bytes)
Class C
1	1		Network number			Host number (3 bytes)
Class D
1	1	1		Multicast address
Class E
1	1	1	1		Reserved

Figure 17.3: Classes of IP addresses

Class A networks have addresses ranging from 1 to 126. Zero is not used, and 127 is reserved. Numerous hosts can exist in such networks.
Class B networks are of a medium size . In such networks, 2 bytes are allocated for the host address. Consequently, an address such as 137.50.50.83 specifies that the computer, to which it is assigned, belongs to a class B network.
Networks of class C are small networks.
Class D addresses are special addresses, called multicast addresses. If the packet being transmitted has such an address as its destination, this packet will be delivered to all computers of the destination network. Packets with such an address are called multicast packets.
Class E is the reserved group of addresses.

In addition, there are several special IP addresses:

An address composed of zeroes is called an undefined address. Such an address designates the address of the host that has generated the packet.
An address may contain all zeroes except for the host address. By default, it is assumed that the address belongs to the same network as the sender.
The 255.255.255.255 address is the so-called limited broadcast address. A packet with such an address specified as its destination will be delivered to all hosts of the current network. This broadcast is limited because the packet will never leave the current network.
If all positions corresponding to the host number of the destination host are filled with ones, the packet with such an address is sent to all hosts of the network, the number of which is specified in the destination address.
The address 127.0.0.1 is the so-called loopback address, which has special meaning. This address is an internal address of the computer or router protocol stack. It is used for testing programs and for organizing the operation of client and server components of the same application installed on the same computer.

Address Masks

Even if you do not know the working principles of TCP/IP, you have probably noticed that when specifying an IP address, the system also prompts you to specify a mask. The mask is the number of the same length as IP address, in which the bits set to one specify that the corresponding bits of the address must be interpreted as a network address. Masking, or applying the mask to an address, allows you to split a network into several subnets. This approach is useful if within a network there are computers that rarely communicate with each other.

Physical Addresses and Internet Protocol Addresses

In contrast to such a network protocol as internetwork packet exchange, IP addresses are not bound to computers. IP addresses are used for transmitting information between networks. Within the limits of a single LAN, packets are transmitted by the local address. Consequently, there must be some mechanism for translating IP addresses to local addresses and for inverse operation. The Address Resolution Protocol (ARP) is used for determining the local address using the IP address. ARP can operate differently depending on the type of local addressing adopted in a specific network. An inverse task is solved using another protocolReverse ARP (RARP).

The host that needs to convert an IP address to a local address formulates an ARP request. Note that for different networks the structure of this request might be different. This request is broadcasted within the limits of the current network. All hosts receive this request and compare their IP addresses with the one specified in the request. The host with the matching IP address formulates a reply, in which it specifies its local address and IP address.

In a LAN, local addresses are determined automatically. In WANs, special forwarding tables are used. These tables can be stored on special routers so that requests are sent to the necessary router.

About the Domain Name System Service

The Domain Name System (DNS) ensures automatic mapping of IP addresses to symbolic addresses. DNS is a distributed database. The DNS protocol is an application-layer protocol. It operates with DNS clients and DNS servers. All DNS servers form a logical hierarchical structure. The client requests these servers until the required information (a match) if found.

On the Internet, top-level domains correspond to countries or are assigned by the Internet authorities (e.g., the domain name COM means that the server is owned by a company).

Automatically Assigning Internet Protocol Addresses

Manually assigning IP addresses to network host is a tedious and labor- intensive job. For networks with 50 computers of more, it is recommended that you abandon this approach. To automate the process of assigning IP addresses, a special protocol was developedthe Dynamic Host Configuration Protocol (DHCP). This protocol enables the administrator not only to fully automate the process of address assignment but also to interfere with this process. It is necessary to distinguish between automatic and dynamic address assignment. When automatic address assignment is chosen , a new IP address will be automatically assigned to a computer any time it logs on to the network. When the dynamic assignment mode is used, an IP address may be leased to the computer for a certain time. When dynamic address assignment is used, the number of used addresses can be considerably lower than the number of computers in the network.

Routing

The process of routing is the process of transmitting a packet from the source node to the destination network node. This process involves both routers and individual hosts. Not only routers but also network hosts can have routing tables.

Any record of a routing table must contain at least four fields: the destination network address, the next router address, the output port number, and the distance to the destination network. The latter value can be interpreted in different ways. For example, it might be a time characteristic or the number of nodes, through which the packet must pass. If the routing table contains several records with the same destination network address, then, as a rule, the record with the smallest value of the distance to the destination network is chosen.

The use of such tables assumes the use of so-called single-step routing. Multiple-step routing, when the packet being sent already has information about all routers that it has to pass, is also possible. Such a method is used mainly in debugging situations.

When sending the packet to the next router, ARP is used first because the routing table doesn't contain a local address.

If a host or router detect that this address belongs to the local network, it decides to pass the packet to a specific host, using ARP to determine local address. A routing table also contains records that specify addresses of the network directly connected to this router. Such records contain zeroes in the field specifying the distance to the destination network.

As a rule, a routing table has the default record, which contains the address of the default router. If the record with the required address is not found in the routing table, then the packet will be sent to the default router. It is assumed that proceeding this way, the packet will reach the so-called backbone routers that contain all-embracing routing tables.

Three types of routing algorithms are used when composing routing tables:

Fixed routingThis algorithm is based on manually creating routing tables.
Simple routingThere are three types of simple routing: flood routing, when the packets are sent in all directions except for the one, from which the packet was received; event-dependent routing, when the packets is sent to a specific destination network along the route that previously resulted in a successful delivery; and source routing, when the sender places into the packet information about transit routers that must participate in packet delivery to the destination network.
Adaptive routingThis is the most frequently used type of routing. It is based on routers periodical exchange of information about the network topology, which, by the way, is ever-changing. In this algorithm, not only the network topology but also the bandwidth of specific network sections is taken into account.

Sockets Management

Standard Windows Sockets specification defines the interface in a TCP/IP network, which allows intercommunications between applications. In the simplest interpretation, it is possible to say that two applications in a network interoperate through a socket, to which they are connected. By its properties, a socket is similar to a file descriptor; however, it has specific management and control functions. These functions are stored in a separate DLL. To use these functions in your programs, it is necessary to include the WS2_32.LIB library. When describing sockets, all structures will be described as in the WINDOWS.INC file supplied as part of the MASM32 product.

Now, after considering all essential theoretical aspects, it is time to describe the functions that control sockets, or, to be more precise, the functions that control intercommunications between applications using sockets.

Before using the sockets library, it is necessary to initialize it. The WSAStartup function is intended for this purpose. In the case of an error, this function returns a nonzero code. Consider the parameters of this function:

First parameterThis is a double word, in which the most significant word is not used. The high-order byte of the least significant word specifies the minor version (revision) number, and the most significant byte contains the major part of the library version.
Second parameterThis parameter is the address of a special structure that gets information on sockets support. Because the contents of this structure ( WSADATA ) are of no particular importance, it is enough to reserve the required number of bytes for it. The structure itself is, briefly , as follows:
```
 WSADATA STRUCT   wVersion         WORD ?   wHighVersion     WORD ?   szDescription    BYTE 257 dup (?)   szSystemStatus   BYTE 129 dup (?)   iMaxSockets      WORD ?   iMaxUdpDg        WORD ?   lpVendorInfo     DWORD ? WSADATA ENDS 
```

The next function, socket , creates a socket. If this function completes successfully, it returns the socket descriptor. In the case of an error, the function returns ˆ 1. This function has three parameters:

First parameterSpecifies the set of protocols. For TCP/IP, the AF_INET = 2 constant is used.
Second parameterDefines the mode of interaction. Usually, two constants are used: SOCK_STREAM = 1 for connection-oriented communications and SOCK_DGRAM = 2 for connectionless communications.
Third parameterSpecifies the transport-layer protocol.

To request the server, the connect function is used. If this function completes successfully, it returns zero. The function has three parameters:

First parameterMust contain a previously created socket.
Second parameterMust specify the address of the socaddr_in structure containing the address of the server program.
Third parameterGives the structure length.

Consider the previously mentioned sockaddr_in structure:

 sockaddr_in STRUCT       sin_family   WORD ?       sin_port     WORD ?       sin_addr     in_addr <>       sin_zero     BYTE 8 dup (?)     sockaddr_in ENDS

As you can see, this structure contains another structure inside of it:

 in_addr STRUCT   S_un ADDRESS_UNION <> in_addr ENDS

This structure represents a union:

 ADDRESS_UNION UNION   S_un__b S_UN_B <>   S_un_w S_UN_W <>   S_addr DWORD ? ADDRESS_UNION ENDS

And finally, this union includes two more structures:

 S_UN_B STRUCT   S_b1 BYTE ?   s_b2 BYTE ?   s_b3 BYTE ?   s_b4 BYTE ? S_UN_B ENDS S_UN_W STRUCT   S_w1 WORD ?   s_w2 WORD ? S_UN_W ENDS

There is nothing difficult here, although these definitions seem too long. However, this seeming complexity only reflects that the sin_addr address can be set using three methods. The examples provided later in this chapter will demonstrate the use of these structures.

The listen function switches the socket into the listening state, in which it listens for external calls. In the case of success, the function returns zero. Parameters of this function are as follows:

First parameterGives the socket descriptor.
Second parameterDefines the maximum length of the incoming calls queue. The standard value is 5.

The accept function is used for accepting client requests to establish a connection. Clients send their requests using the connect function. The accept function must precede the listen function, which organizes the queue. This function retrieves the first connection request and returns the descriptor of the socket that will be used for data exchange with the client that requests a connection. If the queue of connection requests is empty, the function switches to the waiting state. Consider the function parameters:

First parameterThe descriptor of the socket, through which the program receives the request.
Second parameterThe address of the sockaddr_in structure that will receive information about the connection.
Third parameterThe size of the structure defined by the second parameter.

The bind function connects the socket to a communications medium. If this function completes successfully, it returns zero; otherwise , it returns ˆ 1. Parameters of this function are as follows:

First parameterThe descriptor of the socket being bound.
Second parameterThe pointer to the sockaddr_in structure. This structure must be filled beforehand. The sin_family field must be equal to AF_INET = 2 . The sin_addr.s_addr field must be set to INADDR_ANY = 0 . The port field must contain the port number, for example, 2000.
Third parameterThe length of the structure pointed to by the second parameter.

For receiving the data, the recv function is used. If data are received successfully, the function returns the number of received bytes. Parameters of this function are as follows:

First parameterThe socket descriptor.
Second parameterThe address of the buffer that receives the data.
Third parameterThe length of the buffer that receives the data.
Fourth parameterThe reception flag. Most frequently, this parameter is set to zero.

For sending the data, the send API function is used. When the function completes successfully, it returns the number of transmitted bytes. The function has the following parameters:

First parameterDescriptor of the used socket
Second parameterAddress of the buffer that receives the data being transmitted
Third parameterBuffer length
Fourth parameterFlag, which is usually set to zero

The closesocket function is used for closing the existing socket. The only parameter of this function is the socket descriptor.

The shutdown function is used for urgently closing the socket. The first parameter of this function is the descriptor of the socket that needs to be closed. The second parameter can take the following values: 0 to reset and stop receiving data for reading from this socket, 1 to reset and stop sending the data for transmission, and 2 to reset all.

In addition to the previously listed functions, the functions that follow will be exceptionally useful when working with sockets.

The gethostname function is used for receiving the name of the local computer. The first parameter of this function is the buffer, into which this name will be loaded. The second parameter is the buffer length.

The gethostbyname function is used to get information about the remote computer. The only parameter of this function is the pointer to its network name. The function itself returns the pointer to the hostent structure in the case of success and zero in the case of error. Consider the structure returned by this function:

 hostent STRUCT     h_name  DWORD ?     h_alias DWORD ?     h_addr  WORD ?     h_len   WORD ?     h_list  DWORD ?   hostent ENDS

The fields of this structure are as follows:

h_name The address, By which the official name of the host will be placed.
h_alias The pointer to array of additional (alias) names. The names are separated by zero, and the array is terminated by two trailing zeroes.
h_addr The type of address, equal to 2 ( AF_INET ).
h_len The length of the host address.
h_list The points to the array that contains IP addresses of the host separated by a zero code. The array is terminated by two trailing zeroes. IP address is represented as a sequence of 4 bytes directly following each other.

^[i] Thus, TCP/IP includes the entire protocol suite, also called protocol stack.