TCPIP Suite


TCP/IP Suite

Many years ago, corporate LANs were characterized by multiple protocols. Each network operating system (NOS) implemented its own protocol at OSI Layer 3. That made life very difficult for systems administrators, network administrators, and users alike. Over time, each NOS vendor began to support multiple protocols. That enabled system and network administrators to converge on one protocol. For better or worse, IP eventually emerged as the predominant OSI Layer 3 protocol for LANs, MANs, and WANs. For many companies, the choice was not based on technical superiority, but rather on the need for Internet connectivity. If any other protocol ever had a chance against IP, the Internet boom sealed its fate. Today, IP is by far the most ubiquitous OSI Layer 3 protocol on earth.

Value of Ubiquitous Connectivity

The primary goal of the ARPANET creators was to transform computer science (and ultimately human society) through communication technology. That said, the creators of the ARPANET probably did not envision the proliferation of IP on today's scale. If they had, they surely would have done some things differently. All things considered, they did far better than could be reasonably expected given the challenges they faced. In the days of the ARPANET (launched in 1969), computer science was characterized primarily by batch processing on isolated machines. The creators of the ARPANET believed that the true power of computers was not in their ability to compute, but in their ability to create communities by providing a new means of communication. So, their challenge was to create the world's first computer network.

The underlying physical infrastructure was to be created by a separate team from the protocol designers. It would be called the interface message processor (IMP). Not knowing the details of the IMP made it very difficult to design the communication protocols. Moreover, each host operating system was unique in those days, so creating communication protocols that could be implemented by a broad range of hosts was very challenging. In the end, it was determined that multiple levels of abstraction would facilitate development of a suitable protocol suite. Thus, the layered network model was born.

The meetings of the ARPANET protocol designers were open forum-style discussions that enabled new, creative ideas to be posited and discussed. The designers were mostly college students who did not have great confidence in their authority to make decisions about the protocols under development. No meeting minutes were taken for the first few meetings. Eventually, notes from each meeting were taken down as request for comments (RFCs). The notes were labeled as such because the meeting attendees thought another team of "real" protocol designers would take over eventually, and they did not want to offend the official designers. That seems a bit comical now, but it had a huge effect on the fate of the ARPANET protocols and, eventually, the protocols of the modern Internet. By adopting a system in which uncommitted ideas are documented and then openly reviewed by a broad audience before being formalized into standards, the creators of the ARPANET virtually guaranteed the long-term success of the Internet Protocol suite. The RFC process lends itself to global participation and enables the development of living protocols (protocols that can be adapted as requirements change).

The emphasis on human community, abstraction through layered protocols, and open dialogue about future developments were critical to ensuring the future of TCP/IP. The critical mass of TCP/IP deployments was achieved many years ago. That mass ensures ongoing support by all major application, NOS, and network vendors. Today, development of new protocols and protocol enhancements is undertaken on a voluntary basis by interested parties comprising vendors and users alike. Maintenance of the global address pool, root domain list, protocol documentation, port registry, and other operational constructs is overseen by nonprofit organizations. While no protocol suite is perfect, TCP/IP seems to get closer every day as the result of unprecedented development efforts aimed at meeting the ever-increasing range of demands placed upon TCP/IP. TCP/IP's application support is unrivaled by any other protocol suite, and virtually every operating system supports TCP/IP. Additionally, IP supports virtually every OSI Layer 2 protocol. That powerful combination of attributes forms the basis of TCP/IP's truly ubiquitous connectivity. Ubiquitous connectivity is hugely advantageous for the users of TCP/IP. Ubiquitous connectivity fosters new relationships among computing resources and human communities. In an interesting quote about the thought process that led to the creation of the ARPANET protocols and eventually the modern TCP/IP suite, Stephen Crocker states, "We looked for existing abstractions to use. It would have been convenient if we could have made the network simply look like a tape drive to each host, but we knew that wouldn't do." The recent ratification of the iSCSI protocol, which adapts tape drives and other storage devices to TCP/IP, is testimony of how far TCP/IP has come.

TCP/IP Throughput

IP runs on lower-layer protocols; therefore the achievable throughput depends on the underlying technology. At the low end, IP can run on analog phone lines via PPP and a modem at speeds from 300 bps to 56 Kbps. At the high end, IP can run on SONET OC-192c or 10GE at approximately 10 Gbps. Many factors can affect throughput in IP networks. The maximum transmission unit (MTU) of the underlying technology determines the TCP maximum segment size (MSS) for the source node. The larger the MSS, the more efficiently TCP can use the underlying technology.

Fragmentation of IP packets at intermediate routers also can affect throughput. Fragmentation occurs when a router forwards an IP packet onto an interface with a smaller MTU than the source interface. Once a packet is fragmented, it must remain fragmented across the remainder of the path to the destination node. That decreases the utilization efficiency of the path from the point of fragmentation to the destination node. IP fragmentation also increases the processing burden on the intermediate router that performs the fragmentation, and on the destination node that reassembles the fragments. This can lead to degradation of performance by increasing CPU and memory consumption. Path MTU (PMTU) discovery offers a means of avoiding IP fragmentation by using ICMP immediately before TCP session establishment. (See Chapter 6, "OSI Network Layer," for more information about ICMP and PMTU discovery in IP networks.)

TCP optimization is another critical factor. There are many parameters to consider when optimizing TCP performance. RFC 1323 describes many such parameters. Windowing is central to all TCP operations. The TCP slow start algorithm describes a process by which end nodes begin communicating slowly in an effort to avoid overwhelming intermediate network links. This is necessary because end nodes generally do not know how much bandwidth is available in the network or how many other end nodes are vying for that bandwidth. Thus, a conservative approach to data transmission is warranted. As communication continues, the rate of transmission is slowly increased until a packet drop occurs. The dropped packet is assumed to be the result of an overwhelmed intermediate network link. The slow start algorithm reduces the total number of dropped packets that must be retransmitted, which increases the efficiency of the network as a whole. However, the price paid is that the throughput of every new TCP session is lower than it could be while the end nodes are in the slow start phase. TCP is discussed in more detail in Chapter 7, "OSI Transport Layer," and Chapter 9, "Flow Control and Quality of Service."

As mentioned previously, TCP/IP and IPS protocol overhead reduces the throughput rate available to SCSI. Consider iSCSI; assuming no encryption, three headers must be added before Ethernet encapsulation. Assuming that no optional IP header fields are present, the standard IP header adds 20 bytes. Assuming that no optional TCP header fields are present, the standard TCP header adds another 20 bytes. Assuming that no optional iSCSI header fields are present, the iSCSI basic header segment (BHS) adds another 48 bytes. Table 3-3 summarizes the throughput rates available to SCSI via iSCSI based on the Ethernet ULP throughput rates given in Table 3-2.

Table 3-3. SCSI Throughput via iSCSI on Ethernet

Ethernet Variant

Ethernet ULP Throughput

iSCSI ULP Throughput

GE Fiber Optic

975.293 Mbps

918.075 Mbps

GE Copper

975.293 Mbps

918.075 Mbps

10GBASE-X

9.75293 Gbps

9.18075 Gbps

10GBASE-R

9.75293 Gbps

9.18075 Gbps

10GBASE-W

9.06456 Gbps

8.53277 Gbps


The ULP throughput calculation for iSCSI is different than FCIP and iFCP. Both FCIP and iFCP use the common FC frame encapsulation (FC-FE) format defined in RFC 3643. The FC-FE header consists of 7 words, which equates to 28 bytes. This header is encapsulated in the TCP header. Using the same assumptions as in Table 3-3, the ULP throughput can be calculated by adding 20 bytes for the IP header and 20 bytes for the TCP header and 28 bytes for the FC-FE header. Table 3-4 summarizes the throughput rates available to FC via FCIP and iFCP on an Ethernet network.

Table 3-4. FC Throughput via FCIP and iFCP on Ethernet

Ethernet Variant

Ethernet ULP Throughput

FCIP/iFCP ULP Throughput

GE Fiber Optic

975.293 Mbps

931.079 Mbps

GE Copper

975.293 Mbps

931.079 Mbps

10GBASE-X

9.75293 Gbps

9.31079 Gbps

10GBASE-R

9.75293 Gbps

9.31079 Gbps

10GBASE-W

9.06456 Gbps

8.65364 Gbps


Note that the ULP in Table 3-4 is FC, not SCSI. To determine the throughput available to SCSI, the FC framing overhead must also be included. If the IP Security (IPsec) protocol suite is used with any of the IPS protocols, an additional header must be added. This further decreases the throughput available to the ULP. Figure 3-9 illustrates the protocol stacks for iSCSI, FCIP and iFCP. Note that neither FCIP nor iFCP adds additional header bits. Certain bits in the FC-FE header are available for ULP-specific usage.

Figure 3-9. IPS Protocol Stacks


TCP/IP Topologies

IP supports all physical topologies but is subject to the limitations of the underlying protocol. OSI Layer 2 technologies often reduce complex physical topologies into simpler logical topologies. IP sees only the logical topology created by the underlying technology. For example, the logical tree topology created by Ethernet's STP appears to be the physical topology as seen by IP. IP routing protocols can organize the OSI Layer 2 end-to-end logical topology (that is, the concatenation of all interconnected OSI Layer 2 logical topologies) into a wide variety of OSI Layer 3 logical topologies.

Large-scale IP networks invariably incorporate multiple OSI Layer 2 technologies, each with its own limitations. The resulting logical topology at OSI Layer 3 is usually a partial mesh or a hybrid of simpler topologies. Figure 3-10 illustrates a hybrid topology in which multiple sites, each containing a tree topology, are connected via a ring topology.

Figure 3-10. Large-Scale Hybrid Topology


IP routing is a very complex topic that exceeds the scope of this book. Chapter 10, "Routing and Switching Protocols," introduces some basic routing concepts, but does not cover the topic in depth. One point is worth mentioning: Some IP routing protocols divide a large topology into multiple smaller topologies called areas or autonomous regions. The boundary between the logical topologies is usually called a border. The logical topology on each side of a border is derived independently by the instance of the routing protocol running within each area or autonomous region.

TCP/IP Service and Device Discovery

Service and device discovery in TCP/IP environments is accomplished in different ways depending on the context.

Discovery Contexts

In the context of humans, a user often learns the location of a desired service via non-computerized means. For example, a user who needs access to a corporate e-mail server is told the name of the e-mail server by the e-mail administrator. The World Wide Web (more commonly referred to as the Web) provides another example. Users often learn the name of websites via word of mouth, e-mail, or TV advertisements. For example, a TV commercial that advertises for Cisco Systems would supply the company's URL, http://www.cisco.com/. When the user decides to visit the URL, the service (HTTP) and the host providing the service (www.cisco.com) are already known to the user. So, service and device discovery mechanisms are not required. Name and address resolution are the only required mechanisms. The user simply opens the appropriate application and supplies the name of the destination host. The application transparently resolves the host name to an IP address via the Domain Name System (DNS).

Another broadly deployed IP-based name resolution mechanism is the NetBIOS Name Service (NBNS). NBNS enables Microsoft Windows clients to resolve NetBIOS names to IP addresses. The Windows Internet Name Service (WINS) is Microsoft's implementation of NBNS. DNS and NBNS are both standards specified by IETF RFCs, whereas WINS is proprietary to Microsoft. Once the host name has been resolved to an IP address, the TCP/IP stack may invoke ARP to resolve the Ethernet address associated with the destination IP address. An attempt to resolve the destination host's Ethernet address occurs only if the IP address of the destination host is within the same IP subnet as the source host. Otherwise, the Ethernet address of the default gateway is resolved. Sometimes a user does not know of instances of the required service and needs assistance locating such instances. The Service Location Protocol (SLP) can be used in those scenarios. SLP is discussed in the context of storage in following paragraphs.

In the context of storage, service and device discovery mechanisms are required. Depending on the mechanisms used, the approach may be service-oriented or device-oriented. This section uses iSCSI as a representative IPS protocol. An iSCSI target node represents a service (SCSI target). An iSCSI target node is, among other things, a process that acts upon SCSI commands and returns data and status to iSCSI initiators. There are three ways to inform an iSCSI initiator of iSCSI target nodes:

  • Manual configuration (no discovery)

  • Semi-manual configuration (partial discovery)

  • Automated configuration (full discovery)

Manual Configuration

Manual configuration works well in small-scale environments where the incremental cost and complexity of dynamic discovery is difficult to justify. Manual configuration can also be used in medium-scale environments that are mostly static, but this is not recommended because the initial configuration can be onerous. The administrator must supply each initiator with a list containing the IP address, port number, and iSCSI target name associated with each iSCSI target node that the initiator will access. TCP port number 3260 is registered for use by iSCSI target devices, but iSCSI target devices may listen on other port numbers. The target name can be specified in extended unique identifier (EUI) format, iSCSI qualified name (IQN) format or network address authority (NAA) format (see Chapter 8, "OSI Session, Presentation, and Application Layers"). After an initiator establishes an iSCSI session with a target node, it can issue a SCSI REPORT LUNS command to discover the LUNs defined on that target node.

Semi-Manual Configuration

Semi-manual configuration works well for small- to medium-scale environments. It involves the use of the iSCSI SendTargets command. The SendTargets command employs a device-oriented approach. To understand the operation of the SendTargets command, some background information is needed. There are two types of iSCSI session: discovery and normal. All iSCSI sessions proceed in two phases: login and full feature. The login phase always occurs first. For discovery sessions, the purpose of login is to identify the initiator node to the target entity, so that security filters can be applied to responses. Thus, the initiator node name must be included in the login request. Because target node names are not yet known to the initiator, the initiator is not required to specify a target node name in the login request. So, the initiator does not log in to any particular target node. Instead, the initiator logs into the unidentified iSCSI entity listening at the specified IP address and TCP port. This special login procedure is unique to discovery sessions. Normal iSCSI sessions require the initiator to specify the target node name in the login request. Upon completion of login, a discovery session changes to the full-feature phase. iSCSI commands can be issued only during the full-feature phase. During a discovery session, only the SendTargets command may be issued; no other operations are supported. The sole purpose of a discovery session is to discover the names of and paths to target nodes. Upon receiving a SendTargets command, the target entity issues a SendTargets response containing the iSCSI node names of targets accessible via the IP address and TCP port at which the SendTargets command was received. The response may also contain additional IP addresses and TCP ports at which the specified target nodes can be reached. After discovery of target nodes, a SCSI REPORT Luns command must be issued to each target node to discover LUNs (via a normal iSCSI session).

To establish a discovery session, the initiator must have some knowledge of the target entities. Thus, the initiators must be manually configured with each target entity's IP address and TCP port number. The SendTargets command also may be used during normal iSCSI sessions for additional path discovery to known target nodes.

The SendTargets command contains a parameter that must be set to one of three possible values: ALL, the name of an iSCSI target node, or null. The parameter value ALL can be used only during an iSCSI discovery session. The administrator must configure each initiator with at least one IP address and port number for each target device. Upon boot or reset, the initiator establishes a TCP session and an iSCSI discovery session to each configured target device. The initiator then issues a SendTargets command with a value of ALL to each target device. Each target device returns a list containing all iSCSI target names (representing iSCSI target nodes) to which the initiator has been granted access. The IP address(es), port number(s) and target portal group tag(s) (TPGT) at which each target node can be reached are also returned. (The TPGT is discussed in Chapter 8, "OSI Session, Presentation, and Application Layers.")

After initial discovery of target nodes, normal iSCSI sessions can be established. The discovery session may be maintained or closed. Subsequent discover sessions may be established. If an initiator issues the SendTargets command during a normal iSCSI session, it must specify the name of a target node or use a parameter value of null. When the parameter value is set to the name of an iSCSI target node, the target device returns the IP address(es), port number(s), and TPGT(s) at which the specified target node can be reached. This is useful for discovering new paths to the specified target node or rediscovering paths after an unexpected session disconnect. This parameter value is allowed during discovery and normal iSCSI sessions. The third parameter value of null is similar to the previous example, except that it can be used only during normal iSCSI sessions. The target device returns a list containing all IP address(es), port number(s), and TPGT(s) at which the target node of the current session can be reached. This is useful for discovering path changes during a normal iSCSI session.

The iSCSI RFCs specify no method for automating the discovery of target devices. However, it is technically possible for initiators to probe with echo-request ICMP packets to discover the existence of other IP devices. Given the range of possible IP addresses, it is not practical to probe every IP address. So, initiators would need to limit the scope of their ICMP probes (perhaps to their local IP subnet). Initiators could then attempt to establish a TCP session to port 3260 at each IP address that replied to the echo-request probe. Upon connection establishment, an iSCSI discovery session could be established followed by an iSCSI SendTargets command. Target devices that are not listening on the reserved port number would not be discovered by this method. Likewise, target devices on unprobed IP subnets would not be discovered. This probe method is not recommended because it is not defined in any iSCSI-related RFC, because it has the potential to generate considerable overhead traffic, and because it suffers from functional limitations.

Automated Configuration

Automated configuration is possible via SLP, which is defined in RFC 2165 and updated in RFC 2608. SLP employs a service-oriented approach that works well for medium- to large-scale environments. SLP defines three entities known as the user agent (UA), service agent (SA), and directory agent (DA). The UA is a process that runs on a client device. It issues service-request messages via multicast or broadcast on behalf of client applications. (Other SLP message types are defined but are not discussed herein.) The SA is a process that runs on a server device and replies to service-request messages via unicast if the server device is running a service that matches the request. The SLP service type templates that describe iSCSI services are defined in RFC 4018. UAs may also send service request messages via unicast if the server location (name or address) is known. An SA must reply to all unicast service requests, even if the requested service is not supported. A UA may include its iSCSI initiator name in the service request message. This allows SAs to filter requests and reply only to authorized initiators. Such a filter is generically called an access control list (ACL).

For scalability, the use of one or more SLP DAs can be enlisted. The DA is a process that runs on a server device and provides a central registration facility for SAs. If a DA is present, each SA registers its services with the DA, and the DA replies to UAs on behalf of SAs. SA service information is cached in the DA store.

There are four ways that SAs and UAs can discover DAs: multicast/broadcast request, multicast/broadcast advertisement, manual configuration, and Dynamic Host Configuration Protocol (DHCP).

The first way involves issuing a service-request message via multicast or broadcast seeking the DA service. The DA replies via unicast. The reply consists of a DAAdvert message containing the name or address of the DA and the port number of the DA service if a non-standard port is in use.

The second way involves listening for unsolicited DAAdvert messages, which are transmitted periodically via multicast or broadcast by each DA.

The third way is to manually configure the addresses of DAs on each device containing a UA or SA. This approach defeats the spirit of SLP and is not recommended. The fourth way is to use DHCP to advertise the addresses of DAs. DHCP code 78 is defined as the SLP Directory Agent option.

When a DA responds to a UA service request message seeking services other than the DA service, the reply contains the location (host name/IP address and TCP port number if a non-standard port is in use) of all hosts that have registered the requested service. In the case of iSCSI, the reply also contains the iSCSI node name of the target(s) accessible at each IP address. Normal name and address resolution then occurs as needed.

SLP supports a scope feature that increases scalability. SLP scopes enable efficient use of DAs in multi-DA environments by confining UA discovery within administratively defined boundaries. The SLP scope feature offers some security benefits, but it is considered primarily a provisioning tool. Every SA and DA must belong to one or more scopes, but scope membership is optional for UAs. A UA that belongs to a scope can discover services only within that scope. UAs can belong to more than one scope at a time, and scope membership is additive. UAs that do not belong to a scope can discover services in all scopes. Scope membership can be manually configured on each UA and SA, or DHCP can be used. DHCP code 79 is defined as the SLP Service Scope option.

Automated configuration is also possible via the Internet Storage Name Service (iSNS), which is defined in RFC 4171. iSNS employs a service-oriented approach that is well suited to large-scale environments. Like SLP, iSNS provides registration and discovery services. Unlike SLP, these services are provided via a name server modeled from the Fibre Channel Name Server (FCNS). Multiple name servers may be present, but only one may be active. The others act as backup name servers in case the primary name server fails. All discovery requests are processed by the primary iSNS name server; target devices do not receive or reply to discovery requests. This contrasts with the SLP model, in which direct communication between initiators and target devices can occur during discovery. Because of this, the iSNS client is equivalent to both the SLP SA and UA. The iSNS server is equivalent to the SLP DA, and the iSNS database is equivalent to the SLP DA store. Also like SLP, iSNS provides login and discovery control. The iSNS Login Control feature is equivalent to the initiator name filter implemented by SLP targets, but iSNS Login Control is more robust. The iSNS dscovery domain (DD) is equivalent to the SLP scope, but iSNS DDs are more robust. Unlike SLP, iSNS supports centralized configuration, state change notification (SCN), and device mapping.

iSNS clients locate iSNS servers via the same four methods that SLP UAs and SAs use to locate SLP DAs. The first method is a client-initiated multicast/broadcast request. Rather than define a new procedure for this, iSNS clients use the SLP multicast/broadcast procedure. This method requires each iSNS client to implement an SLP UA and each iSNS server to implement an SLP SA. If an SLP DA is present, the iSNS server's SA registers with the DA, and the DA responds to iSNS clients' UA requests. Otherwise, the iSNS server's SA responds directly to iSNS clients' UA requests. The service request message contains a request for the iSNS service.

The second method is a server initiated multicast/broadcast advertisement. The iSNS server advertisement is called the Name Service Heartbeat. In addition to client discovery, the heartbeat facilitates iSNS primary server health monitoring by iSNS backup servers.

The third method is manual configuration. Though this method is not explicitly permitted in the iSNS RFC, support for manual configuration is common in vendor implementations of practically every protocol. As with SLP, this approach is not recommended because it undermines the spirit of iSNS. The fourth method is DHCP. DHCP code 83 is defined as the iSNS option.

All clients (initiators and targets) can register their name, addresses, and services with the name server on the iSNS server upon boot or reset, but registration is not required. Any registered client (including target nodes) can query the name server to discover other registered clients. When a registered iSCSI initiator queries the iSNS, the reply contains the IP address(es), TCP port(s), and iSCSI node name of each target node accessible by the initiator. Unregistered clients are denied access to the name server. In addition, clients can optionally register for state change notification. An SCN message updates registered clients whenever a change occurs in the iSNS database (such as a new client registration). SCN messages are limited by DD membership, so messages are sent only to the affected clients. This is known as regular SCN registration. Management stations can also register for SCN. Management registration allows all SCN messages to be sent to the management node regardless of DD boundaries. Target devices may also register for entity status inquiry (ESI) messages. ESI messages allow the iSNS server to monitor the reachability of target devices. An SCN message is generated when a target device is determined to be unreachable.

DD membership works much like SLP scope membership. Clients can belong to one or more DDs simultaneously, and DD membership is additive. A default DD may be defined into which all clients not explicitly assigned to at least one named DD are placed. Clients in the default DD may be permitted access to all clients in all DDs or may be denied access to all DDs other than the default DD. The choice is implementation specific. Clients belonging to one or more named DDs are allowed to discover only those clients who are in at least one common DD. This limits the probe activity that typically follows target node discovery. As mentioned previously, a SCSI REPORT LUNs command must be issued to each target node to discover LUNs (via a normal iSCSI session). After discovery of target nodes, LUN discovery is usually initiated to every discovered target node. By limiting discovery to only those target nodes that the initiator will use, unnecessary probe activity is curtailed. Management nodes are allowed to query the entire iSNS database without consideration for DD membership. Management nodes also can update the iSNS database with DD and Login Control configuration information that is downloadable by clients, thus centralizing configuration management. The notion of a DD set (DDS) is supported by iSNS to improve manageability. Many DDs can be defined, but only those DDs that belong to the currently active DDS are considered active.

Microsoft has endorsed iSNS as its preferred iSCSI service location mechanism. Additionally, iSNS is required for iFCP operation. However, SLP was recently augmented by the IETF to better accommodate the requirements of FCIP and iSCSI. Thus, it is reasonable to expect that both iSNS and SLP will proliferate in IPS environments.

An iSNS database can store information about iSCSI and Fibre Channel devices. This enables mapping of iSCSI devices to Fibre Channel devices, and vice versa. The common iSNS database also facilitates transparent management across both environments (assuming that the management application supports this). Note that there are other ways to accomplish this. For example, the Cisco MDS9000 cross-registers iSNS devices in the FCNS and vice versa. This enables the Cisco Fabric Manager to manage iSCSI devices via the FCNS.




Storage Networking Protocol Fundamentals
Storage Networking Protocol Fundamentals (Vol 2)
ISBN: 1587051605
EAN: 2147483647
Year: 2007
Pages: 196
Authors: James Long

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net