18.6 Implementing a Queuing Discipline | Linux Network Architecture

This section describes how we can implement a queuing discipline. We will use the token-bucket filter as an example, because it represents a fundamental element of many traffic-shaping approaches.

18.6.1 The Token-Bucket Filter

A token-bucket filter is used to control and limit the rate and burst (when a specified data rate is briefly exceeded) of data streams. Figure 18-4 illustrates the basic idea of a token bucket. In this model, the determining parameters are the rate, R, at which a token bucket is filled with tokens, and the maximum number of tokens, B, this token bucket can hold. Each token represents a byte that may be sent. Subsequently, the token bucket declares a packet to comply with the rate and burst parameters, if the number of tokens in the token bucket corresponds at least to the length of the packet in bytes.

Figure 18-4. Model of a token bucket.

graphics/18fig04.gif

If a packet is compliant, it may be sent. Subsequently, the number of tokens in the token bucket is reduced by a number corresponding to the packet length. If a noncompliant packet is deleted immediately, then the token bucket runs a traffic-policing process. In contrast, if the packet is held back until sufficient tokens have accumulated in the token bucket, we talk of traffic shaping.

A real-world implementation will realize this model differently, so that the computing cost is less, though the result is the same. It would not make sense to increment a counter representing the number of tokens several times per second, even when there is no packet to send. Instead, computations are made only provided that a packet is ready to be sent and waiting at the input of the token bucket. In this case, we can compute how many tokens have to be present in the token bucket at that point in time. To do this computation, we need to know when the last packet was sent and what the filling level of the token bucket was after that. The current number of available tokens is calculated from the sum of tokens available after the last transmission, plus the tokens arrived in the meantime (i.e., plus the interval, multiplied by the rate, R). Notice that the number of available tokens can never be larger than B. If the number of tokens computed in this way corresponds to at least the length of the waiting packet, then this packet may be sent. Otherwise, instead of sending the packet, a timer is started. This timer expires when more packets can be sent as a sufficient number of tokens has arrived. The timer has to be initialized to an appropriate interval, which can be easily calculated from the number of tokens still missing and the rate, R, at which the bucket is filled with more tokens.

Such a token-bucket filter is implemented within the traffic-control framework in the file net/sched/sch_tbf.c. However, this is an extension (i.e., a dual token bucket). More specifically, two token buckets are arranged back to back, in a frequently used arrangement, to guarantee a mean rate and limit bursts. The first token bucket is set to a rate, R, corresponding to the desired mean data rate, and the second token bucket is set to the peak rate and a significantly smaller number of tokens, B. However, B corresponds at least to the maximum size of one maximum transmission unit (MTU).

To be able to store states between single transmit processes, the token-bucket implementation uses the tbf_sched_data structure:

 struct tbf_sched_data{ /*Parameters*/ u32          limit;          /*Maximal length of backlog: bytes*/ u32          buffer;         /*Token Bucket depth/rate: MUST Be >= MTU/B */ u32          mtu; u32          max_size; struct qdisc_rate_table *R_tab; struct qdisc_rate_table *P_tab; /* Variables */ long          tokens;        /* Current number of B tokens */ long          ptokens;       /* Current number of P tokens */ psched_time_t t_c;           /* Time check-point */ struct timer_list wd_timer;  /*Watchdog timer */ };

The limit field specifies the number of bytes in the queue used to buffer packets that cannot be sent immediately. The buffer field shows that the byte-to-rate ratio (i.e., times) is used rather than bytes and rates for computations in most places within the implementation. This means that a packet with a specific length takes some transmit time from the token bucket, the available transmit time of which is calculated from the current number of tokens to the rate R ratio. The variables tokens and ptokens store the number of tokens in the respective token bucket. The time at which the last packet was transmitted is written to the t_c entry, and the wd_timer field is needed when a timer has to be started for a packet's delayed transmission. Two pointers, R_tab and P_tab, point to the structure qdisc_rate_table, which stores the allocations of packet lengths to transmit times to avoid divisions during each transmission process. This table is created by the function qdisc_get_rtab() when tbf_change() (net/sched/sch_tbf.c) initializes the token-bucket filter. However, the actual computation is done in the user space by use of the tc_calc_rtable() function in iproute2/tc/tc_core.c.

We will now introduce and explain additional functions of the token-bucket filter.

`tbf_init()`	net/sched/sch_tbf.c

Initialization of the token bucket means merely that the start time has to be defined to initialize the timer. The macro PSCHED_GET_TIME (include/net/pkt_sched.h) is used to establish the start time. This macro accesses the TSC register described in Chapter 2, if it is present. A structure containing a pointer to private data and a pointer to the tbf_watchdog() function are passed to the timer. The tbf_watchdog() function has to be invoked when the timer expires.

`tbf_enqueue()`	netsched/sch_tbf.c

This function serves to accept packets and to append them to the end of the queue. Also, a number of statistics are updated, and error cases are handled.

`tbf_dequeue()`	netsched/sch_tbf.c

This function handles the actual traffic-shaping work. First, PSCHED_GET_TIME(now) is used to learn the current time. Subsequently, the number of available tokens is computed from the old value (q->ptokens or q->tokens) and the time elapsed. Next, qdisc_rate_table is used to compute the number of tokens required by the packet, and the difference between existing and required tokens is calculated. However, notice that a token stands for an interval rather than for a byte. If the number of tokens in both token buckets is sufficient, then tbf_dequeue() returns the top skb from the queue; otherwise, a timer is started. The inaccuracies of the standard Linux timers, described in Chapter 2, can cause a value of null to result from the conversion of the interval into jiffies. In this case, a minimum delay of one jiffie is selected, to ensure that a packet is never sent too early.

`tbf_watchdog()`	netsched/sch_tbf.c

The tbf_watchdog function is invoked when the timer for a packet expires. Within this function, only the netif_schedule() function is invoked, which eventually invokes the function dev->dqisc->dequeue() once computing time has been allocated, as shown in Figure 18-3. In the simplest case, this happens without multiple queues in the traffic-control tree tbf_dequeue().