5.1 The net_device Interface

   


In addition to character and block devices, network devices represent the third category of adapters in the Linux kernel [RuCo01]. This section describes the concept of network devices from the perspective of higher-layer protocols and their data structures and management.

Network adapters differ significantly from the character and block devices introduced in Section 2.5. One of their main characteristics is that they have no representation in the device file system /dev/, which means that they cannot be addressed by simple read-write operations. In addition, this is not possible because network devices work on a packet basis; a behavior comparable to character-oriented devices can be achieved only by use of complex protocols (e.g., TCP). For example, there are no such network devices as /dev/eth0 or /dev/atm1. Network devices are configured separately by the ifconfig tool on the application level. More recently, another tool available is ip, which can be used for extensive configuration of most network functions.

One of the reasons why network devices are so special is that the actions of a network adapter cannot be bound to a unique process; instead, they run in the kernel and independently of user processes [RuCo01]. For example, a hard disk is requested to pass a block to the kernel: The action is triggered by the adapter (in the case of network adapters), and the adapter has to explicitly request the kernel to pass the packet.

5.1.1 The net_device Structure

struct net_device

include/linux/netdevice.h


 struct net_device {    char                name[IFNAMSIZ];    unsigned long       rmem_end, rmem_start, mem_end, mem_start, base_addr;    unsigned int        irq;    unsigned char       if_port, dma;    unsigned long       state;    struct net_device   *next, *next_sched;    int                 ifindex, iflink;    unsigned long       trans_start, last_rx;    unsigned short      flags, gflags, mtu, type, hard_header_len;    void                *priv;    struct net_device   *master;    unsigned char       broadcast[MAX_ADDR_LEN], pad;    unsigned char       dev_addr[MAX_ADDR_LEN], addr_len;    struct dev_mc_list  *mc_list;    int                 mc_count, promiscuity, allmulti;    int                 watchdog_timeo;    struct timer_list   watchdog_timer;    void                *atalk_ptr, *ip_ptr, *dn_ptr, *ip6_ptr, *ec_ptr;    struct Qdisc        *qdisc, *qdisc_sleeping, *qdisc_list, *qdisc_ingress;    unsigned long       tx_queue_len;    spinlock_t          xmit_lock;    int                 xmit_lock_owner;    spinlock_t          queue_lock;    atomic_t            refcnt;    int                 features;    int                 (*init)(struct net_device *dev);    void                (*uninit)(struct net_device *dev);    void                (*destructor)(struct net_device *dev);    int                 (*open)(struct net_device *dev);    int                 (*stop)(struct net_device *dev);    int                 (*hard_start_xmit) (struct sk_buff *skb,               \                          struct net_device *dev);    int                 (*hard_header) (struct sk_buff *skb,struct net_device  \                          *dev,unsigned short type,void *daddr,void *saddr,    \                          unsigned len);    int                 (*rebuild_header)(struct sk_buff *skb);    void                (*set_multicast_list) (struct net_device *dev);    int                 (*set_mac_address) (struct net_device *dev, void *addr);    int                 (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, \                          int cmd);    int                 (*set_config)(struct net_device *dev, struct ifmap     \                          *map);    int                 (*hard_header_cache) (struct neighbour *neigh, struct  \                          hh_cache *hh);    void                (*header_cache_update) (struct hh_cache *hh, struct    \                          net_device *dev, unsigned char *haddr);    int                 (*change_mtu)(struct net_device *dev, int new_mtu);    void                (*tx_timeout) (struct net_device *dev);    int                 (*hard_header_parse) (struct sk_buff *skb, unsigned    \                          char *haddr);    int                 (*neigh_setup) (struct net_device *dev, struct         \                          neigh_parms *);    struct net_device_stats* (*get_stats) (struct net_device *dev);    struct iw_statistics* (*get_wireless_stats) (struct net_device *dev);    struct module       *owner;    struct net_bridge_port *br_port; }; 

interface:The net_device structure forms the basis of each network device in the Linux kernel. It contains not only information about the network adapter hardware (interrupt, ports, driver functions, etc.), but also the configuration data of the network device with regard to the higher network protocols (IP address, subnet mask, etc.).

As was mentioned at the beginning of this chapter, the net_device structure represents a general interface between higher protocol instances and the hardware used. It allows you to abstract from the network components used. For an efficient implementation of this abstraction, we once again use the concept of function pointers. For this reason, the net_device structure contains a number of function pointers, which are called by higher protocols by using their global names, and then the hardware-specific methods of the driver are called from each network device.

For example, e13_start_xmit() is used to actually call the function hard_start_xmit() for a network adapter of type 3Com/3c509.

In general, the parameters of the net_device structure can be divided into different areas, as described below.

General Fields of a Network Device

The following parameters of the net_device structure (see previous subsection) are used to manage network devices. They have no significance with regard to special layers or protocol instances.

  • name is the name of the network device. In general, device types are numbered from 0 to n (e.g., eth0 eth4). Some network devices, such as the loopback device (lo), occur only once, which means that they have fixed names.

    When registering a network device, you can suggest a name, which should be unique. However, you can also let the system assign the ethn name automatically. (See init_etherdev.) The naming convention for network devices will be described in detail in Section 5.2.3.

  • next is used to concatenate several net_device structures. We will see in Section 5.2 that all network devices are managed in a singly linked linear list that starts with the pointer dev_base.

  • owner is a pointer to the module structure of the module created by the net_device structure of this network device.

  • ifindex is a second identifier for a network device, in addition to the name. When a new network device is created, dev_get_index() assigns a new unused index to this device. This index allows you to quickly find a network device from the list of all devices, which is much faster, compared to search by name.

  • iflink specifies the index of the network device used to send a packet. This is normally the index ifindex, but, for tunneling network devices, such as ipip, iflink includes the index of the network device that is eventually used to send the enveloped packet.

  • state: The field dev?gt;state contains status information about the network device and the network adapter. It was added to the kernel for the first time in version 2.3.43 and replaces the previous fields start (network adapter is open), interrupt (driver handles an adapter interrupt), and tbusy (all packet buffers are busy). These functions are now replaced by the following flags in the field state:

    • LINK_STATE_START shows whether the network adapter was opened with dev?gt;open() (i.e., whether it was activated and can be used). However, a LINK_STATE_START state set does not automatically mean that packets can be sent. In fact, all buffers on the adapter could be busy. (See next flag.) The flag LINK_STATE_START should have read access only, because it should be modified only by the methods used to manage network devices. The method netif_running(dev) is available to test this flag.

    • LINK_STATE_XOFF shows whether the network adapter can accept socket buffers for transmission or its transmit buffers (which are normally organized as ring buffers) are already busy. The method netif_queue_stopped(dev) can be used to test for this state. Again, only read access to this flag should be allowed.

      LINK_STATE_XOFF replaces the previous field dev?gt;tbusy. Older drivers could take either of three different situations, which accessed the tbusy flag. The latter was replaced by the following functions, which make the programming style much easier to read:

    • Stopping a transmission: When the packet buffers of a network adapter are busy, dev?gt;tbusy = 1 was previously used to stop sending packets to the adapter. Now, there is the (inline) function netif_stop_queue(dev), which sets the LINK_STATE_XOFF flag in dev?gt;state. This means that no packets are removed from the queue and passed to this adapter. Normally, netif_stop_queue() is called by the driver of an adapter, and then the driver is responsible for restarting the transmission. (See Section 5.3.)

    • Resuming a transmission: Once a network adapter has sent a packet from the (ring) packet buffer, it can resume accepting packets from the kernel. The method netif_start_queue(dev), which deletes the LINK_STATE_XOFF flag, is used for this purpose. In general, netif_start_queue(dev) is used by the driver methods. (See Section 5.3.) This corresponds to dev?gt;tbusy = 0 in older kernel versions.

    • Starting a transmission: The method netif_start_queue(dev) is used to resume passing socket buffers to the network adapter.

    • In addition, the method netif_wake_queue(dev) is used to resume passing packets and, at the same time, to trigger the NET_TX software interrupt, which handles the passing of packets to the network adapter.

    • The field interrupt has no counterpart in the new kernel versions. It was previously used to prevent concurrent handling of interrupt methods. The new and SMP-improved kernels have special methods to control parallel processes. (See Section 2.3.) A driver should use these methods and manage their lock variables in its private data structures as needed.

  • trans_start stores the time (in jiffies) when the transmission of a packet started. If, after some time, the driver still hasn't received an acknowledgment to send the packet (ack interrupt), then it can introduce appropriate actions. For these purposes, kernel versions 2.4 and higher use a timer called watchdog_timer.

  • last_rx should include the time (in jiffies) when the last packet arrived.

  • priv is a pointer to the private data of a network device or to the private data of its driver. Private data contains those variables and structures that are required to manage a network adapter. They are not stored in the net_device structure, but they are normally specific to an adapter.

  • qdisc refers to a structure of the type Qdisc, which mirrors the serving strategy of the current network device. Chapter 18 will discuss this issue in detail.

  • refcnt stores the number of references to this network device.

  • xmit_lock, xmit_lock_owner, and queue_lock are used to protect against parallel handling of a transmit process or parallel access to the transmit queue. For example, xmit_lock_owner includes the number of the processor, which is currently in the transmit function hard_start_xmit(). When no processor is currently transmitting, then xmit_lock_owner takes the value ?.

Hardware-Specific Fields
  • rmem_end, rmem_start, mem_end, mem_start: These fields specify the beginning and end of the common memory space that the network adapter and the kernel share. The location (mem_start ?mem_end) designates the buffers for packets to be sent, and (rmem_start ?rmem_end) designates the location for received packets. The size of the buffers indicates the amount of storage available on the card. When using ifconfig to initialize a network adapter, you can specify the addresses of memory locations.

  • base_addr: The I/O basic address is also set in the driver's probing routine during a search for a device. ifconfig can be used to display and set the value. In addition, the I/O basic address can be specified when loading most of the modules and as a kernel boot parameter.

  • irq: The number of the interrupt of a network adapter is also set during the so-called probing phase of the driver or by explicitly specifying it when loading the module or starting the kernel. In addition, ifconfig can be used to modify the interrupt number during operation.

  • dma contains the number of the DMA (Direct Memory Access) channel, if the device supports the DMA transfer mode.

  • if_port stores the media type of the network adapter currently used. For Ethernet, we distinguish between BNC, Twisted Pair (TP), and AUI. There are no unique constants; instead, each driver can use its own values.

Data on the Physical Layer

The values of the following fields are set by the ethersetup() function for Ethernet cards. They are generally identical for all Ethernet-based cards, except for the flag field, which has to be set to match the card's capability.

There are similar functions to set standard values for token-ring and FDDI adapters (fddi_setup(), tr_setup()). These fields have to be set manually for other network types.

  • hard_header_length specifies the length of the layer-2 packet header. This value is 14 for Ethernet adapters. This does not correspond to the length of the actual packet header on the physical medium, but only to the part passed to the network adapter. In general, the network adapter adds additional fields (e.g., the preamble and checksum for Ethernet).

  • mtu is the maximum transfer unit, which specifies the maximum length of the payload of a layer-2 frame. Layer-3 protocols have to consider this value; they must not pass more octets to the network device. Ethernet has an MTU of 1500 bytes.

  • tx_queue_len specifies the maximum length of the output queue of the network device. ether_setup() sets this value to 100. tx_queue_len should not be confused with the buffers of the network adapter. A network adapter normally has an additional ring buffer for 16 or 32 packets.

  • type specifies the hardware type of the network adapter. The values are specified in RFC 1700 for the ARP protocol, which has to state the hardware type for address-resolution purposes. Linux defines additional constants not defined in FRC 1700. (See Figure 5-2.)

    Figure 5-2. Hardware types defined in RFC 1700 and Linux-specific constants.
     ARPHRD_NETROM    0   /* NET/ROM pseudo            */ ARPHRD_ETHER     1   /* Ethernet 10Mbps           */ ARPHRD_EETHER    2   /* Experimental Ethernet     */ ARPHRD_AX25      3   /* AX.25 Level 2             */ ARPHRD_PRONET    4   /* PROnet token ring         */ ARPHRD_CHAOS     5   /* Chaosnet                  */ ARPHRD_IEEE802   6   /* IEEE 802.2 Ethernet/TR/TB */ ARPHRD_ARCNET    7   /* ARCnet                    */ ARPHRD_APPLETLK  8   /* APPLEtalk                 */ ARPHRD_DLCI     15   /* Frame Relay DLCI          */ ARPHRD_ATM      19   /* ATM                       */  /* Dummy types for non-ARP hardware */ ARPHRD_SLIP    256 ARPHRD_CSLIP6  259 ARPHRD_PPP     512 ARPHRD_LOOPBACK 772  /* Loopback device           */ ARPHRD_IRDA   783    /* Linux-IrDA                */ 

  • addr_len, dev_addr[MAX_ADDR_LEN], broadcast[MAX_ADDR_LEN]: These fields contain the data of the layer-2 address. addr_len specifies the length of the layer-2 address, which is stored in the dev_addr field. The third field contains the broadcast address, which can be used to reach all computers in the local network.

  • dev_mc_list points to a linear list with multicast layer-2 addresses. When the network adapter receives a packet with a destination address included in dev_mc_list, then the network adapter has to pass this packet to the upper layers. The driver method set_multicast_list is used to pass the addresses of this list to the network adapter. The hardware filter of this network adapter (if present) is responsible for passing to the kernel only those packets of interest to this computer.

  • mc_count contains the number of addresses in dev_mc_list.

  • watchdog_timeo and watchdog_timer are used to detect problems an adapter may incur when sending packets. For this reason, the watchdog_timer is initialized when a network device starts and always called after watchdog_timeo time units (jiffies). The handling routine dev_watchdog() checks whether or not watchdog_timeo time units have passed since the last transmission of a packet (stored in trans_start). If this is the case, then there were problems in the transmission of the last packet, and the network adapter has to be checked. To check the network adapter, the driver function tx_timeout() is called. If not much time has passed since the last start of a transmission, then nothing is done, except the watchdog timer is started.

Data on the Network Layer
  • ip_ptr, ip6_ptr, atalk_ptr, dn_ptr, and ec_ptr point to information of layer-3 protocols that use this network device. If the network device was configured for the Internet protocol, among others, then ip_ptr points to a structure of the type in_device, which manages information and configuration parameters of the relevant IP instance. For example, the in_device structure manages a list with IP addresses of the network device, a list with active IP multicast groups, and the parameters for the ARP protocol.

  • family designates the address family of the network device. In the case of the Internet protocol (IP), this field takes the constant AF_INET.

  • pa_alen specifies the length of the addresses of the protocol used. IP addresses of the class AF_INET have the length four bytes.

  • pa_addr, pa_braddr, and pa_mask describe the addressing of a network device on the network layer.pa_addr contains the address of the computer or network device. pa_baddr specifies the broadcast address, and pa_mask includes the network mask. All three values are set by ifconfig when a network device is activated.

  • pa_dstaddr specifies the address of the other partner in a point-to-point connection (e.g., PPP or SLIP).

  • flags includes different switches. Some of them describe properties of the network device (IFF_ARP, IFF_MULTICAST,...); others output the current state (IFF_UP). Table 5-1 lists the meaning of these switches, which can be set by use of the ifconfig command.

    Table 5-1. IFF flags of a network device.

    Flag

    Meaning

    IFF_UP

    The network device is activated and can send and receive packets.

    IFF_BROADCAST

    The device is broadcast-enabled, and the broadcast address pa_braddr is valid.

    IFF_DEBUG

    This flag switches the debug mode on (currently not used by any driver).

    IFF_LOOPBACK

    This flag shows that this is a loopback network device.

    IFF_POINTOPOINT

    This is a point-to-point connection. If this switch is set, then pa_dstaddr should contain the partner's address.

    IFF_NOARP

    This device does not support the Address Resolution Protocol (ARP) (e.g., in point-to-point connections).

    IFF_PROMISC

    This flag switches the promiscuous mode on. This means that all packets currently received in the network adapter are forwarded to the upper layers, including those not intended for this computer. This mode is of interest for tcpdump only.

    IFF_MULTICAST

    This flag activates the receipt of multicast packets. ether_setup() activates this switch. A card that does not support multicast should delete this flag.

    IFF_ALLMULTI

    All multicast packets should be received. This is required when the computer is to work as multicast router. IFF_MULTICAST has to be set in addition.

    IFF_PORTSEL

    Setting of the output port is supported by the hardware.

    IFF_AUTOMEDIA

    Automatic selection of the output medium (autosensing) is enabled.

    IFF_DYNAMIC

    Dynamic change of the network device's address is enabled (e.g., for dialup connections).


Device-Driver Methods

As mentioned earlier, one of the tasks of the network device interface is to abstract a network device from the underlying hardware. The set of methods available for network driver functions have to be mapped to a uniform interface so that higher protocols can be accessed. This functionality is implemented exactly by the function pointers of the net_device structure (see above) described in this section. These pointers let you use individual functions for different instances of the net_device structure, which are eventually addressed over a common name.

Some of these functions depend on the hardware of the network adapter and have to be set in the initialization function of the network driver. The other functions are specific to the MAC protocol used by the network adapter and can be initialized by special methods (e.g., eth_setup()). A function pointer not required can be initialized to NULL.

We will next discuss the tasks of the methods of a network device. More specifically, we will describe their basic tasks from the view of the higher protocols. These methods are implemented by the network driver used. The exact implementation in general will be discussed in Section 5.3, using the skeleton network driver as an example.

  • init() is used to search and initialize network devices. This method is responsible for finding and initializing a network adapter of the present type. Primarily, a net_device structure has to be created and filled with the driver-specific data of the network device or network driver. Subsequently, the network device is registered by register_netdevice(). (See Section 5.3.1.)

  • uninit() is called when a network device is unregistered (unregister_netdevice()). This method can be used to execute driver-specific functions, which may be necessary when a network device is removed. The uninit() has been introduced to the net_device structure since version 2.4 and is currently not used by any driver.

  • destructor() is also new in the net_device structure. This function is called when the last reference to a network device was removed (dev->refcnt) (i.e., when no protocol instances or other components in the Linux kernel point to the net_device structure). This means that you can use the destructor() function to do cleanup work (e.g., free memory or similar things). The destructor() function is currently not used by any driver.

  • open() opens (activates) a named network device. During the activation, the required system resources are requested and assigned. Note that this method can open only network devices that were previously registered. Normally, dev->open() is used in the dev_open() method which, in turn, is called by the ifconfig command. Upon successful execution of open(), the network device can be used.

  • stop() terminates the activity of a network adapter and frees the system resources it has used. The network device is then no longer active, but it remains in the list of registered network devices (net_devs).

  • hard_start_xmit() uses a packet (in the form of a socket buffer) over the network device. If successful (i.e., the packet was delivered to the adapter), then hard_start_xmit() returns with the return value 0; otherwise, 1.

  • get_stats() gets statistics and information about the network device and its activities. This information is returned in the form of a net_device_stats structure. The elements of this structure will be introduced in the course of this chapter.

  • get_wireless_stats() returns additional information for wireless network adapters. This information is forwarded in a structure of the type iw_statistics. The tool iwconfig can be used to display this specific information.

  • set_multicast_list() passed the list with multicast MAC addresses to the network adapter, so that the adapter can receive packets with these addresses. This list is called either when the multicast receipt for the network device is activated (IFF_MULTICAST flag) or when the list of group MAC addresses to be received has changed. (See also Section 17.4.1.)

  • watchdog_timeo() deals with problems during the transmission of a packet across the network adapter (not when the socket buffer is passed to the network adapter). If no acknowledgment for the packet is received after dev->tx_timeout, then the kernel calls the method watchdog_timeo() to solve the problem.

  • do_ioctl(): This method is generally not used by higher protocols, because they have no generic functions. It is normally used to pass adapter-specific ioctl() commands to the network driver.

  • set_config() is used to change the configuration of a network adapter at runtime. The method lets you change system parameters, such as the interrupt or the memory location of the network adapter.

The methods for a network device described above depend on the network adapter used, which means that they have to be provided by the driver, if their functionality is required. The methods described below depend less on the hardware of a network adapter, but rather on the layer-2 protocol used. For this reason, they don't necessarily have to be implemented by driver-specific methods, but can run on top of existing methods (e.g., those for Ethernet and FDDI).

  • hard_header() creates a layer-2 packet header from layer-2 addresses for source and destination.

  • rebuild_header() is responsible for rebuilding the layer-2 packet header before a packet is transmitted. This function was the entry point to the ARP protocol in earlier versions of the Linux kernel. The conversion to the neighbour cache (see Section 15.3.1) should create a stored layer-2 packet header, so that rebuild_header() is called only when the hard header cache contains wrong information.

  • hard_header_cache() fills a layer-2 packet header in the hard header cache with passed data. This means that subsequent transmission processes can access a prepared layer-2 packet header.

  • header_cache_update() changes the layer-2 destination address in a stored layer-2 packet header in the hard header cache.

  • hard_header_parse() reads the layer-2 sender address from the layer-2 packet header in the packet data space of a socket buffer and copies it to the passed address, haddr.

  • set_mac_address() can be used to change the layer-2 address of a network adapter, if it supports alternative MAC addresses.

  • change_mtu() changes the MTU (Maximum Transfer Unit) of a network device and implements all necessary changes.


       


    Linux Network Architecture
    Linux Network Architecture
    ISBN: 131777203
    EAN: N/A
    Year: 2004
    Pages: 187

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net