Tag Routing with TOS and fwmark

Of course using internal services and routing them differentially is great when you have access to a Policy Routing “capable system. But most of the server systems running over IPv4 today do not implement much of the basic IPv4 suite, let alone the advanced networking portions. There are several facilities available to deal with these types of systems.

The first facility that comes to mind is the QoS (Quality of Service) umbrella of protocols. Many of the items within this scope were originally intended to provide very specific types of routing and queuing services. But what is more interesting, and relevant to this discussion, is the design as a whole.

When you consider most of the various items commonly lumped under the QoS umbrella, such as DiffServ, IntServ, or RSVP, you see that they were designed to prioritize packet traffic flows. A packet is classified and then queued and routed based on that classification. The important part to note is that the packet itself, in part or in whole, is used to make a classification decision about the packet ”not unlike the decision made to route a packet based on source address.

This general view is true of all facets of Policy Routing. After all, Policy Routing is routing based on the entire packet itself. And, when you start to consider the actual realities of implementing a routing interface, you quickly realize that queuing is an integral part of the actual act of interfacing to the network ”thus the statement that QoS is an integral part of the scope of Policy Routing.

As with any large and complicated system, the various parts of Policy Routing as a whole have unique and specific roles that do not seem to be a part of the intent of the general system. Those roles of the QoS spectrum include traffic flow service levels and the various mechanisms for implementing the queuing structures, among others. That entire scope of usage would require another book and will not be discussed here.

The interesting part of the QoS family, in reference strictly to routing the packet, lies in the mechanisms for classification. As you learned in Chapter 3, "Linux Policy Routing Structures," one of the mechanisms for specifying a route within the RPDB is to use the TOS (Type of Service) tag within the packet header to select a route or drive a rule. Since almost all QoS classification mechanisms are designed to use this field, either in the original format or in various other methods (for example, DiffServ architecture), these classification mechanisms can be used to select packets using very specific parameters.

The specification of the TOS field for use in Policy Routing is best made with a broadly scoped and yet very precise mechanism. Within the Linux implementation this description fits the classifier known as u32. The u32 classifier is a binary-based selection mechanism. It essentially uses two parameters to operate upon a packet. The first parameter is the binary offset into the packet, and the second parameter is the binary match. Because the offset is specified as a binary location, you can look at any given part of the packet. The binary match is specified as a pattern and a mask so that you can look for specific signature patterns or even very specific bits. Thus you have a comprehensive packet selection mechanism over the entire packet.

Packet selection mechanisms bring up the other facility of mention: packet filtering. Packet filtering mechanisms are usually considered a function of network security and control mechanisms. As with the QoS family, the essential nature of packet filtering is the important concept.

Packet filtering relies on the ability to select packets for perusal. Most of the packet filtering schemes use an internal representation of the selection mechanism to differentiate the packets. This selection mechanism representation usually takes the form of a tag field added to the packet during the period that the packet traverses the filtering device. Using the native tag field as a selector for routing provides the link to Policy Routing.

Within the Linux kernel, the packet filtering mechanisms ensconced during the 2.1 kernel development provide a mechanism for exposing this tag to the general networking structure. This is the fwmark, called nfmark in the NetFilter architecture. This mark is a specifically provided mapping from the internal tagging mechanism to the general network structures. The mark is administratively assigned as needed by a specific packet filter selection rule. This mark was in all of the 2.2 series kernels and was recently added into the new 2.3/2.4 series kernels .

Either of these two mechanisms, QoS classification or packet filter mark, allow you to specify a tag that decides the routing. These mechanisms can coexist within a single system and can even coexist with their original functionality. You get the best of both worlds .

Example 6.5 ”Mark My Route

The first of these two facilities you decide to examine is the firewall mark, fwmark. This facility exists in different but related implementations depending on which kernel you use. For the 2.1/2.2 series of kernels you would use the ipchains utility to fwmark the packet. For the 2.3/2.4 series you would use the iptables utility of NetFilter to provide the fwmark. You decide to check out both facilities because some of your older machines are running 2.2 kernels, while many of your newer test machines run the 2.4 series kernels.

Returning to your testing network setup you decide to install a 2.2.12 series kernel on router1 along with the ipchains utility. Then you set up a Web server on host2 along with three different addresses. You will use the fwmark facility of ipchains to tag packets entering router1 from net1. You will then use these tags to selectively allow access to specific addresses of host2.

The addresses assigned on host2 along with the Web aliases are as follows :

 host2   10.1.1.3/24         web1    10.1.1.5/32         web2    10.1.1.6/32         web3    10.1.1.7/32

Now on the eth0 interface of router1 you will place your fwmark rules. Recall from Chapter 3 that the INPUT chain is where you would put your tagging rules. The FORWARD chain is after the RPDB along with the OUTPUT chain.

You decide for clarity that you will tag the inbound packets using a fwmark that is the same as the final octet of the destination address. So you implement the following set of chain rules on router1:

  ipchains -A input -p tcp -s 0/0 -d 10.1.1.5 80 -m 5   ipchains -A input -p tcp -s 0/0 -d 10.1.1.6 80 -m 6   ipchains -A input -p tcp -s 0/0 -d 10.1.1.7 80 -m 7

This will tag any packets entering router1 from any interface that is destined for the host2 addresses. There are some additional specifications you can add to the ipchains command to further specify the interface and even the source. If you are interested in those features you know you can look them up in the man pages, but for now you only want to see how the fwmark tag works.

Now you set about using some rules to select routing tables for these fwmarked packets. You note that in the extended listing of the fwmark from ipchains using ipchains -L -n -v that the fwmark is coded as a hex value. Thus, you see that if you had used a fwmark of 10, the corresponding actual tag would be 0xa . With this in mind you set up the rules noting that the ip utility uses hex only in referring to the fwmark. You end up with the following set of rules:

  ip rule add fwmark 5 table 5 prio 15000   ip rule add fwmark 6 table 6 prio 16000   ip rule add fwmark 7 table 7 prio 17000

Of course, you need to populate the tables with the appropriate routes. One of the features of this style of selection is that you can tag different types of packets with the same fwmark. So, for example, when implementing the chain rules on router1 you could have marked both 10.1.1.5 and 10.1.1.6 with the same fwmark. Then the rules would select tables based on this mark. Thus you can tie together disparate packet types into the same routing structure.

Now that you have tried out the fwmark facility in kernel 2.2.12, you decide to try kernel 2.4.0 on router1 and implement the same fwmark setup. Since you already know how the rules will look, you only need to figure out how to use the iptables utility under NetFilter. You come up with the following set of iptables commands that operate as the ipchains rules you set up on router1 operated. Note that you have to specify these rules as operating on the mangle table because you are actually modifying the packet.

  iptables -t mangle -A PREROUTING -p tcp -s 0/0 -d 10.1.1.5/32 --dport 80 \   -j MARK --set-mark 5   iptables -t mangle -A PREROUTING -p tcp -s 0/0 -d 10.1.1.6/32 --dport 80 \   -j MARK --set-mark 6   iptables -t mangle -A PREROUTING -p tcp -s 0/0 -d 10.1.1.7/32 --dport 80 \   -j MARK --set-mark 7

In Chapter 3 you learned that the NetFilter architecture allows you to specify two different locations for packet mangling operations. Since you want to see packets entering router1 from the network you choose the PREROUTING hook. The rules that act on this setup are the same as before.

Now both the ipchains and the iptables commands can be used to set marks within the OUTPUT hook location. This location sets the mark for packets that are exiting from the localhost or loopback interface. Thus you can use all of the dev lo rules you saw in Example 6.2 to route the marked packets.

Linux DiffServ Architecture

Now that you have tried out the packet filtering techniques for marking the packets, you decide to turn your attention to the QoS classification routines. These routines are designed to tag packets for use with queuing structures. These tags are often in the form of actual changes to the TOS field within the packet header.

To date, most implementations of QoS tend to implement classification and flow control on the output, called the egress, interface. This is purely due to the general viewpoint from the development time in the early 1990s that you were only performing traditional routing. Since in traditional routing a decision about the packet destination is not performed until just before the packet leaves the system, the general consensus was that any queuing must take place after the routing decision was made. The arrival of Policy Routing has revealed that this idea, as with the traditional routing structure, is limited.

Fortunately, the Linux DiffServ architecture provides an ingress (input) queuing discipline that can meet your needs. This ingress queuing discipline (qdisc) is currently only capable of tagging and policing packets on the ingress. But the plans and future hopes are that it will grow to become a regular full-function qdisc. Additionally, there is an idea floating around to associate the entire DiffServ architecture on Linux with the services rather than the physical interface, similar in thought to the way an address within Policy Routing belongs to a service and not a physical device. Since the entire structure of QoS, including DiffServ, is considered a part of the full Policy Routing structure, this move would align all network mechanisms in the same generalized structure. And that would be best all the way around.

To use the ingress qdisc you need to understand a little of the DiffServ architecture with respect to the various terms and mechanics. In a nutshell , the qdisc is the core function that provides a method for queuing the packets. The class is the group into which the packet is placed and by which the qdisc is selective of packets. The filter is attached to the qdisc and is the selector of the packet. Basically you enable a qdisc, attach filters to the qdisc, and provide classes within the qdisc. For your purposes the actual classification will be done by the filter because the filter is the tagging mechanism.

Qdisc

There is a difference between a queue and a queuing discipline. Each particular network device has a queue that feeds packets to it. Within that device queue you may have several queuing disciplines at work. Think of a store where there is only one register at which you actually purchase your item and leave the store. That register is the device queue. From that register there are several lines that start within the store at a single point, and branch out into several lines that then converge again on the single register. Think of the entire system, beginning with the single entry point into the lines and ending at the single register, as the device queue itself. Then the various lines represent various possible queuing disciplines.

For the ingress qdisc you need only consider that there is only one possible line. Hopefully when the newer generalized structures are implemented, perhaps in 2.5 series Linux development kernels, there will be more possible lines to choose from.

Class

Queuing disciplines and classes are fundamentally intertwined. A queuing discipline may contain several classes of service. These classes and their semantics are fundamental properties of that queuing discipline. Thinking again of the store register lines, each originating line within the whole queue can be a class of the queuing discipline. Each class can contain other queuing disciplines within it, which then can contain classes, and so on and so on. In the end all of the machinations serve merely to differentiate the service received by the various packets.

A queuing discipline does not necessarily have classes. For example, the TBF queuing discipline does not allow classes. If you use TBF you essentially have a single overall class for the entire queuing discipline. In the ingress qdisc there is no real need for classes because the current function is only to provide a mechanism for tagging a packet on reception .

Filter

Filters provide the method for checking and tagging packets. These tags can then be used by the classes to determine the membership in the class. Filters may be combined arbitrarily with queuing disciplines. Thinking again about the store analogy, the point where a single line splits into several parallel lines indicates the location of a filter application. The actual split mechanism could be a class decision based on an earlier filter tag. Consider the case where everybody who has less than five items and wants to pay cash will be put into the "less than five items & pays cash" line. There is a filter entity that checks each person and if they are a "less than 5 & cash," they are given a tag. Either then or later on another entity, think of class or RPDB, moves the person to another line based on the tag.

Now that you have an idea of how the basic set up works within the Linux DiffServ mechanism, you decide to play with using the ingress qdisc for routing tags.

Example 6.6 ”Class Wars

In order to test this model you decide to use the 2.4 kernel with the classid-to-mark DiffServ extension. This extension will be part of the regular DiffServ code within the 2.4 series kernels. It provides an internal conversion map between a classid tag from a filter to an fwmark. In this way you can use the ultimate packet tagging power tool, the u32 classifier.

You go to router1 and make sure it is running a 2.4 kernel without any of the NetFilter architecture turned on. Then you set up the ingress qdisc on the Network B interface, eth1:

  tc qdisc add dev eth1 handle ffff: ingress

Now you need to consider how the u32 filter works.

u32 Filter

The most powerful filter available in Linux is the u32 filter. This filter allows you to actually make a choice based on any data within the packet itself. As with all of the DiffServ implementations for Linux, you will use the tc utility from IPROUTE2. (The complete syntax and use of the tc utility will not be covered in this book. Please refer to the IPROUTE2 documentation for details.) Looking at the tc utility help for this filter gives a faint glimpse of this power:

 root@router1#  tc filter add u32 help  Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]                [ police POLICE_SPEC ] [ offset OFFSET_SPEC ]                [ ht HTID ] [ hashkey HASHKEY_SPEC ]                [ sample SAMPLE ] or         u32 divisor DIVISOR Where: SELECTOR := SAMPLE SAMPLE ...        SAMPLE := {  ip  ip6  udp  tcp  icmp  u{ 32168}  }  SAMPLE_ARGS        FILTERID := X:Y:Z

The actual heavy-duty selection mechanisms are in the SELECTOR . But all you are told is that the SELECTOR is a series of SAMPLE sections. And nowhere are you told what the SAMPLE_ARGS would have to be. But by reading through the source and looking around the Internet you amass some of the needed information for using u32 in the context of this book.

The u32 selectors are simply binary patterns with binary masks that are used to match any set of data within the packet. The most common usage is to perform matches within the packet header. There are two main types of selectors that are deeply interrelated. The human interface selectors are those that are specified using linguistic aliases for the actions specifying specific protocol and field matches, such as the IP destination address or protocol 4. Then there are the bitwizard selectors that are specified in terms of the bit pattern length. These selectors are the u32, u16, and u8 selectors themselves . Within the tc utility all of the human selectors are translated into bitwizard selectors.

For example, if you specify matching the human selector match ip tos 0x10 0xff , the tc utility actually matches against the packet as match u8 0x10 0xff at 1 . Note from the human specification that you are trying to match TOS 10h. Now the TOS field within an IP header is one byte long, which is 8 bits and thus a u8 general length selector, and located at a one byte offset into the IP packet. Thus you can specify matching a one byte set of bits with a full mask located at one byte into the packet header, which is match u8 0x10 0xff at 1 . Or you can say match ip tos 0x10 0xff .

Now you can see why this is bitwizard work: There is no man page or other help, you have to know your packet binary structure and hexadecimal conversions, and even the human interface is somewhat cryptic. But the power available using this filter is incredible. The ability to specify any binary data pattern means that you can pick out individual data streams for routing. Suppose that you are browsing several different Web sites from your machine. To the router, all the data streams look as though they originate from the same address to the same protocol. By using u32, you could look for data patterns that indicate SSL encryption on either the sent or returned packets and route them through a secure link. And that is without looking at the header at all.

Additionally, you can look for certain types of patterns by using the mask portion of the specification. In the TOS example from the previous paragraph you were looking for exactly TOS 16 decimal, which is 0x10 hex. But what if you wanted to consider all TOS decimal levels from 16 through 19 inclusive? You would just change the mask portion of the specification and would then have a command like match u8 0x10 0xf3 at 1 . Thus between the specification of the length of pattern, the pattern itself, and the offset into the packet you can isolate any unique portion of the packet. You can also stack several selectors together to obtain any combination of selections you require.

As you work with the u32 filter you note some tricky behaviors on the part of the selectors. You consider the filter snippet ” match tcp src 0x1234 . This human filter is coded by tc as match u16 0x1234 0xffff nexthdr+0 , which means to match a 16- bit 0x1234 pattern within the internal protocol header at offset 0. But what is contained within the offset 0 of the internal protocol header is simply the IPv4 source port for the packet.

Thus, if you were expecting the match tcp part to only match TCP packets, you would be surprised. The filter snippet will actually match UDP packets as well because they also have the source port contained at offset 0 within the internal protocol header. If you want to specify only matching TCP packets with source port 0x1234, then you have to stack up selectors. You would then use match tcp src 0x1234 match ip protocol 0x6 . The additional selector match ip protocol 0x6 states to also only look at packets of protocol 6 hex, which is TCP. Table 6.1 is the full table of selectors as known at this time.

Table 6.1. u32 Selectors

Command	Option	Data
`match <ip,tcp,udp>`	`src`	`<ip source address/mask` `CIDR>`
`match <ip.tcp,udp>`	`dst`	`<ip dest address/mask` `CIDR>`
`match ip`	`tos`	`<original IPv4 TOS field` `in hex> <hex mask>`
`match ip`	`dsfield`	`<8bit entire TOS field> <hex mask>`
`match ip`	`precedence`	`<precedence part of TOS field> <hex mask>`
`match ip`	`ihl`	`<8 bit ip header length in` `hex>`
`match ip`	`protocol`	`<hex protocol number> <hex mask>`
`match ip`	`nofrag`	`only match non fragmented ip packets`
`match ip`	`firstfrag`	`only match first ip` `fragment`
`match ip`	`df`	`possibly Data Fragments (no documentation exists)`
`match ip`	`mf`	`possibly Matching Fragments (no documentation exists)`
`match <ip,tcp,udp>`	`sport`	`<source port in hex> <hex` `mask>`
`match <ip,tcp,udp>`	`dport`	`<dest port in hex> <hex mask>`
`match ip`	`icmp_type`	`<icmp type in hex> <hex mask>`
`match ip`	`icmp_code`	`<icmp code in hex> <hex mask>`
`match icmp`	`type`	`<icmp type in hex> <hex` `mask>`
`match icmp`	`code`	`<icmp code in hex> <hex` `mask>`

As you can see, there are many ways in which you can look into the packet headers and determine your selection. When you combine these facilities with the ability to also specify any exact bit pattern at any offset into the packet that you want, you can see the power of the Linux DiffServ architecture.

Within the u32 filter there is another kind of selector available ”the sample command. The sample command takes the same kinds of arguments as match . However, the sample command normally takes only a single argument for type. So where you would use match ip protocol 0x6 0xff , you can use sample tcp instead.

With your newly acquired knowledge of the u32 filter usage you first decide to try a simple test of the ingress filter. You know you have the ingress qdisc set up on router1 on the Network B interface. You decide to try tagging all incoming packets from 10.1.1.0/24 with classid 1. Then you will use a rule that sends those packets into table 1 and assign them additionally to realms 3/4 for tracking. You end up with the following sequence of commands:

  tc filter add dev eth1 parent ffff: protocol ip prio 1 u32 \   match ip src 10.1.1.0/24 classid :1   ip rule add fwmark 1 table 1 prio 15000 realms 3/4   ip route add default via 192.168.1.1 table 1 src 192.168.1.254   ip route flush cache

Then you run a ping from net1 to host1 and look at the output of the qdisc statistics and the realms:

 [root@router1 root]#  tc -s qdisc ls dev eth1  qdisc ingress ffff: ----------------  Sent 0 bytes 0 pkts (dropped 0, overlimits 0)  [root@router1 root]#  rtacct 3  Realm      BytesTo    PktsTo     BytesFrom  PktsFrom   3          0          0          504        6          [root@router1 root]#  rtacct 4  Realm      BytesTo    PktsTo     BytesFrom  PktsFrom   4          504        6          0          0

You note that the qdisc statistics do not show any traffic. That is expected because you are not using the classid anywhere on egress for DiffServ. You are only using the ingress qdisc to be able to tag packets with the u32 filter. You know that the filter is working because you have your ping packets showing up balanced on the realms. The only way the realms would list the packets is if they were acted upon by that rule. So your quick test was successful. You decide to create a listing for this just because it would be coded as follows:

 You are playing with the  MARK  of the  BEST  ... ;-}

Top