5.2. DevicesTwo kinds of device files exist: block device files and character device files. Block devices transfer data in chunks, and character devices (as the name implies) transfer data one character at a time. A third device type, the network device, is a special case that exhibits attributes of both block and character devices. However, network devices are not represented by files. The old method of assigned numbers for devices where the major number usually referred to a device driver or controller, and the minor number was a particular device within that controller, is giving way to a new dynamic method called devfs. The history behind this change is that the major and minor numbers are both 8-bit values; this allows for little more than 200 statically allocated major devices for the entire planate. (Block and character devices each have their own list of 256 entries.) You can find the official listing of the allocated major and minor device numbers in /Documentation/devices.txt. The Linux Device Filesystem (devfs) has been in the kernel since version 2.3.46. devfs is not included by default in the 2.6.7 kernel build, but it can be enabled by setting CONFIG_DEVFS_FS=Y in the configuration file. With devfs, a module can register a device by name rather than a major/minor number pair. For compatibility, devfs allows the use of old major/minor numbers or generates a unique 16-bit device number on any given system. 5.2.1. Block Device OverviewAs previously mentioned, the Linux operating system sees all devices as files. Any given element in a block device can be randomly referenced. A good example of a block device is the disk drive. The filesystem name for the first IDE disk is /dev/hda. The associated major number of /dev/hda is 3, and the minor number is 0. The disk drive itself usually has a controller and is electro-mechanical by nature (that is, it has moving parts). The "General System File" section in Chapter 6, "Filesystems," discusses the basic construction of a hard disk. 5.2.1.1. Generic Block Device LayerThe device driver registers itself at driver initialization time. This adds the driver to the kernel's driver table, mapping the device number to the block_device_operations structure. The block_device_operations structure contains the functions for starting and stopping a given block device in the system: ------------------------------------------------------------------------- include/linux/fs.h 760 struct block_device_operations { 761 int (*open) (struct inode *, struct file *); 762 int (*release) (struct inode *, struct file *); 763 int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); 764 int (*media_changed) (struct gendisk *); 765 int (*revalidate_disk) (struct gendisk *); 766 struct module *owner; 767 }; ------------------------------------------------------------------------- The interfaces to the block device are similar to other devices. The functions open() (on line 761) and release() (on line 762) are synchronous (that is, they run to completion when called). The most important functions, read() and write(), are implemented differently with block devices because of their mechanical nature. Consider accessing a block of data from a disk drive. The amount of time it takes to position the head on the proper track and for the disk to rotate to the desired block can take a long time, from the processor's point of view. This latency is the driving force for the implementation of the system request queue. When the filesystem requests a block (or more) of data, and it is not in the local page cache, it places the request on a request queue and passes this queue on to the generic block device layer. The generic block device layer then determines the most efficient way to mechanically retrieve (or store) the information, and passes this on to the hard disk driver. Most importantly, at initialization time, the block device driver registers a request queue handler with the kernel (specifically with the block device manager) to facilitate the read/write operations for the block device. The generic block device layer acts as an interface between the filesystem and the register level interface of the device and allows for per-queue tuning of the read and write queues to make better use of the new and smarter devices available. This is accomplished through the tagged command queuing helper utilities. For example, if a device on a given queue supports command queuing, read and write operations can be optimized to exploit the underlying hardware by reordering requests. An example of per-queue tuning in this case would be the ability to set how many requests are allowed to be pending. See Figure 5.4 for an illustration of how the application layer, the filesystem layer, the generic block device layer, and the device driver interrelate. The file biodoc.txt under /Documentation/block> has more helpful information on this layer and information regarding changes from earlier kernels. Figure 5.4. Block Read/Write5.2.2. Request Queues and Scheduling I/OWhen a read or write request traverses the layers from VFS, through the filesystem drivers and page cache,[6] it eventually ends up entering the block device driver to perform the actual I/O on the device that holds the data requested.
As previously mentioned, the block device driver creates and initializes a request queue upon initialization. This initialization also determines the I/O scheduling algorithm to use when a read or write is attempted on the block device. The I/O scheduling algorithm is also known as the elevator algorithm. The default I/O scheduling algorithm is determined by the kernel at boot time with the default being the anticipatory I/O scheduler.[7] By setting the kernel parameter elevator to the following values, you can change the type of I/O scheduler:
As of this writing, a patch exists that makes the I/O schedulers fully modular. Using modprobe, the user can load the modules and switch between them on the fly.[8] With this patch, at least one scheduler must be compiled into the kernel to begin with.
Before we can describe how these I/O schedulers work, we need to touch on the basics of request queues. Block devices use request queues to order the many block I/O requests the devices are given. Certain block devices, such as a RAM disk, might have little need for ordering requests because the I/O requests to the device have little overhead. Other block devices, like hard drives, need to order requests because there is a great overhead in reading and writing. As previously mentioned, the head of the hard drive has to move from track to track, and each movement is agonizingly slow from the CPU's perspective. Request queues solve this problem by attempting to order block I/O read and write requests in a manner that optimizes throughput but does not indefinitely postpone requests. A common and useful analogy of I/O scheduling is to look at how elevators work.[9] If you were to order the stops an elevator took by the order of the requests, you would have the elevator moving inefficiently from floor to floor; it could go from the penthouse to the ground floor without ever stopping for anyone in between. By responding to requests that occur while the elevator travels in the same direction, it increases the elevator's efficiency and the riders' happiness. Similarly, I/O requests to a hard disk should be grouped together to avoid the high overhead of repeatedly moving the disk head back and forth. All the I/O schedulers mentioned (no-op, deadline, and anticipatory) implement this basic elevator functionality. The following sections look at these elevators in more detail.
5.2.2.1. No-Op I/O SchedulerThe no-op I/O scheduler[10] takes a request and scans through its queue to determine if it can be merged with an existing request. This occurs if the new request is close to an existing request. If the new request is for I/O blocks before an existing request, it is merged on the front of the existing request. If the new request is for I/O blocks after an existing request, it is merged on the back of the existing request. In normal I/O, we read the beginning of a file before the end, and thus, most requests are merged onto the back of existing requests.
If the no-op I/O scheduler finds that the new request cannot be merged into the existing request because it is not near enough, the scheduler looks for a place within the queue between existing requests. If the new request calls for I/O to sectors between existing requests it is inserted into the queue at the determined position. If there are no places the request can be inserted, it is placed on the tail of the request queue. 5.2.2.2. Deadline I/O SchedulerThe no-op I/O scheduler[11] suffers from a major problem; with enough close requests, new requests are never handled. Many new requests that are close to existing ones would be either merged or inserted between existing elements, and new requests would pile up at the tail of the request queue. The deadline scheduler attempts to solve this problem by assigning each request an expiration time and uses two additional queues to manage time efficiency as well as a queue similar to the no-op algorithm to model disk efficiency.
When an application makes a read request, it typically waits until that request is fulfilled before continuing. Write requests, on the other hand, will not normally cause an application to wait; the write can execute in the background while the application continues on to other tasks. The deadline I/O scheduler uses this information to favor read requests over write requests. A read queue and write queue are kept in addition to the queue sorted by a request's sector proximity. In the read and write queue, requests are ordered by time (FIFO). When a new request comes in, it is placed on the sorted queue as in the no-op scheduler. The request is also placed on either the read queue or write queue depending on its I/O request. When the deadline I/O scheduler handles a request, it first checks the head of the read queue to see if that request has expired. If that requests expiration time has been reached, it is immediately handled. Similarly, if no read request has expired, the scheduler checks the write queue to see if the request at its head has expired; if so, it is immediately handled. The standard queue is checked only when no reads or writes have expired and requests are handled in nearly the same way as the no-op algorithm. Read requests also expire faster than write requests: a second versus 5 seconds in the default case. This expiration difference and the preference of handling read requests over write requests can lead to write requests being starved by numerous read requests. As such, a parameter tells the deadline I/O scheduler the maximum number of times reads can starve a write; the default is 2, but because sequential requests can be treated as a single request, 32 sequential read requests could pass before a write request is considered starved.[12]
5.2.2.3. Anticipatory I/O SchedulingOne of the problems with the deadline I/O scheduling algorithm occurs during intensive write operations. Because of the emphasis on maximizing read efficiency, a write request can be preempted by a read, have the disk head seek to new location, and then return to the write request and have the disk head seek back to its original location. Anticipatory I/O scheduling[13] attempts to anticipate what the next operation is and aims to improve I/O throughput in doing so.
Structurally, the anticipatory I/O scheduler is similar to the deadline I/O scheduler. There exist a read and write queue each ordered by time (FIFO) and a default queue that is ordered by sector proximity. The main difference is that after a read request, the scheduler does not immediately proceed to handling other requests. It does nothing for 6 milliseconds in anticipation of an additional read. If another read request does occur to an adjacent area, it is immediately handled. After the anticipation period, the scheduler returns to its normal operation as described under the deadline I/O scheduler. This anticipation period helps minimize the I/O delay associated with moving the disk head from sector to sector across the block device. Like the deadline I/O scheduler, a number of parameters control the anticipatory I/O scheduling algorithm. The default time for reads to expire is second and the default time for writes to expire is second. Two parameters control when to check to switch between streams of reads and writes.[14] A stream of reads checks for expired writes after second and a stream of writes checks for expired reads after second.
The default I/O scheduler is the anticipatory I/O scheduler because it optimizes throughput for most applications and block devices. The deadline I/O scheduler is sometimes better for database applications or those that require high disk performance requirements. The no-op I/O scheduler is usually used in systems where I/O seek time is near negligible, such as embedded systems running from RAM. We now turn our attention from the various I/O schedulers in the Linux kernel to the request queue itself and the manner in which block devices initialize request queues. 5.2.2.4. Request QueueIn Linux 2.6, each block device has its own request queue that manages I/O requests to that device. A process can only update a device's request queue if it has obtained the lock of the request queue. Let's examine the request_queue structure: ------------------------------------------------------------------------- include/linux/blkdev.h 270 struct request_queue 271 { 272 /* 273 * Together with queue_head for cacheline sharing 274 */ 275 struct list_head queue_head; 276 struct request *last_merge; 277 elevator_t elevator; 278 279 /* 280 * the queue request freelist, one for reads and one for writes 281 */ 282 struct request_list rq; ------------------------------------------------------------------------ Line 275This line is a pointer to the head of the request queue. Line 276This is the last request placed into the request queue. Line 277The scheduling function (elevator) used to manage the request queue. This can be one of the standard I/O schedulers (noop, deadline, or anticipatory) or a new type of scheduler specifically designed for the block device. Line 282The request_list is a structure composed of two wait_queues: one for queuing reads to the block device and one for queuing writes. ------------------------------------------------------------------------- include/linux/blkdev.h 283 284 request_fn_proc *request_fn; 285 merge_request_fn *back_merge_fn; 286 merge_request_fn *front_merge_fn; 287 merge_requests_fn *merge_requests_fn; 288 make_request_fn *make_request_fn; 289 prep_rq_fn *prep_rq_fn; 290 unplug_fn *unplug_fn; 291 merge_bvec_fn *merge_bvec_fn; 292 activity_fn *activity_fn; 293 ------------------------------------------------------------------------- Lines 283293These scheduler- (or elevator-) specific functions can be defined to control how requests are managed for the block device. ------------------------------------------------------------------------- include/linux/blkdev.h 294 /* 295 * Auto-unplugging state 296 */ 297 struct timer_list unplug_timer; 298 int unplug_thresh; /* After this many requests */ 299 unsigned long unplug_delay; /* After this many jiffies*/ 300 struct work_struct unplug_work; 301 302 struct backing_dev_info backing_dev_info; 303 ------------------------------------------------------------------------- Lines 294303These functions are used to unplug the I/O scheduling function used on the block device. Plugging refers to the practice of waiting for more requests to fill the request queue with the expectation that more requests allow the scheduling algorithm to order and sort I/O requests that enhance the time it takes to perform the I/O requests. For example, a hard drive "plugs" a certain number of read requests with the expectation that it moves the disk head less when more reads exist. It's more likely that the reads can be arranged sequentially or even clustered together into a single large read. Unplugging refers to the method in which a device decides that it can wait no longer and must service the requests it has, regardless of possible future optimizations. See documentation/block/biodoc.txt for more information. ------------------------------------------------------------------------- include/linux/blkdev.h 304 /* 305 * The queue owner gets to use this for whatever they like. 306 * ll_rw_blk doesn't touch it. 307 */ 308 void *queuedata; 309 310 void *activity_data; 311 ------------------------------------------------------------------------- Lines 304311As the inline comments suggest, these lines request queue management that is specific to the device and/or device driver: ------------------------------------------------------------------------- include/linux/blkdev.h 312 /* 313 * queue needs bounce pages for pages above this limit 314 */ 315 unsigned long bounce_pfn; 316 int bounce_gfp; 317 ------------------------------------------------------------------------- Lines 312317Bouncing refers to the practice of the kernel copying high-memory buffer I/O requests to low-memory buffers. In Linux 2.6, the kernel allows the device itself to manage high-memory buffers if it wants. Bouncing now typically occurs only if the device cannot handle high-memory buffers. ------------------------------------------------------------------------- include/linux/blkdev.h 318 /* 319 * various queue flags, see QUEUE_* below 320 */ 321 unsigned long queue_flags; 322 ------------------------------------------------------------------------- Lines 318321The queue_flags variable stores one or more of the queue flags shown in Table 5.1 (see include/linux/blkdev.h, lines 368375).
------------------------------------------------------------------------- include/linux/blkdev.h 323 /* 324 * protects queue structures from reentrancy 325 */ 326 spinlock_t *queue_lock; 327 328 /* 329 * queue kobject 330 */ 331 struct kobject kobj; 332 333 /* 334 * queue settings 335 */ 336 unsigned long nr_requests; /* Max # of requests */ 337 unsigned int nr_congestion_on; 338 unsigned int nr_congestion_off; 339 340 unsigned short max_sectors; 341 unsigned short max_phys_segments; 342 unsigned short max_hw_segments; 343 unsigned short hardsect_size; 344 unsigned int max_segment_size; 345 346 unsigned long seg_boundary_mask; 347 unsigned int dma_alignment; 348 349 struct blk_queue_tag *queue_tags; 350 351 atomic_t refcnt; 352 353 unsigned int in_flight; 354 355 /* 356 * sg stuff 357 */ 358 unsigned int sg_timeout; 359 unsigned int sg_reserved_size; 360 }; ------------------------------------------------------------------------- Lines 323360These variables define manageable resources of the request queue, such as locks (line 326) and kernel objects (line 331). Specific request queue settings, such as the maximum number of requests (line 336) and the physical constraints of the block device (lines 340347) are also provided. SCSI attributes (lines 355359) can also be defined, if they're applicable to the block device. If you want to use tagged command queuing use the queue_tags structure (on line 349). The refcnt and in_flight fields (on lines 351 and 353) count the number of references to the queue (commonly used in locking) and the number of requests that are in process ("in flight"). Request queues used by block devices are initialized simply in the 2.6 Linux kernel by calling the following function in the devices' __init function. Within this function, we can see the anatomy of a request queue and its associated helper routines. In the 2.6 Linux kernel, each block device controls its own locking, which is contrary to some earlier versions of Linux, and passes a spinlock as the second argument. The first argument is a request function that the block device driver provides. ------------------------------------------------------------------------- drivers/block/ll_rw_blk.c 1397 request_queue_t *blk_init_queue(request_fn_proc *rfn, spinlock_t *lock) 1398 { 1399 request_queue_t *q; 1400 static int printed; 1401 1402 q = blk_alloc_queue(GFP_KERNEL); 1403 if (!q) 1404 return NULL; 1405 1406 if (blk_init_free_list(q)) 1407 goto out_init; 1408 1409 if (!printed) { 1410 printed = 1; 1411 printk("Using %s io scheduler\n", chosen_elevator->elevator_name); 1412 } 1413 1414 if (elevator_init(q, chosen_elevator)) 1415 goto out_elv; 1416 1417 q->request_fn = rfn; 1418 q->back_merge_fn = ll_back_merge_fn; 1419 q->front_merge_fn = ll_front_merge_fn; 1420 q->merge_requests_fn = ll_merge_requests_fn; 1421 q->prep_rq_fn = NULL; 1422 q->unplug_fn = generic_unplug_device; 1423 q->queue_flags = (1 << QUEUE_FLAG_CLUSTER); 1424 q->queue_lock = lock; 1425 1426 blk_queue_segment_boundary(q, 0xffffffff); 1427 1428 blk_queue_make_request(q, __make_request); 1429 blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE); 1430 1431 blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS); 1432 blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS); 1433 1434 return q; 1435 out_elv: 1436 blk_cleanup_queue(q); 1437 out_init: 1438 kmem_cache_free(requestq_cachep, q); 1439 return NULL; 1440 } ------------------------------------------------------------------------- Line 1402Allocate the queue from kernel memory and zero its contents. Line 1406Initialize the request list that contains a read queue and a write queue. Line 1414Associate the chosen elevator with this queue and initialize. Lines 14171424Associate the elevator-specific functions with this queue. Line 1426This function sets the boundary for segment merging and checks that it is at least a minimum size. Line 1428This function sets the function used to get requests off the queue by the driver. It allows an alternate function to be used to bypass the queue. Line 1429Initialize the upper-size limit on a combined segment. Line 1431Initialize the maximum segments the physical device can handle. Line 1432Initialize the maximum number of physical segments per request. The values for lines 14291432 are set in include/linux/blkdev.h. Line 1434Return the initialized queue. Lines 14351439Routines to clean up memory in the event of an error. We now have the request queue in place and initialized. Before we explore the generic device layer and the generic block driver, let's quickly trace the layers of software it takes to get to the manipulation of IO in the block device. (Refer to Figure 5.4.) At the application level, an application has initiated a file operation such as fread(). This request is taken by the virtual filesystem (VFS) layer (covered in Chapter 4), where the file's dentry structure is found, and through the inode structure, where the file's read() function is called. The VFS layer tries to find the requested page in its buffer cache, but if it is a miss, the filesystem handler is called to acquire the appropriate physical blocks. The inode is linked to the filesystem handler, which is associated with the correct filesystem. The filesystem handler calls on the request queue utilities, which are part of the generic block device layer to create a request with the correct physical blocks and device. The request is put on the request queue, which is maintained by the generic block device layer. 5.2.3. Example: "Generic" Block DriverWe now look at the generic block device layer. Referring to Figure 5.4, it resides above the physical device layer and just below the filesystem layer. The most important job of the generic block layer is to maintain request queues and their related helper routines. We first register our device with register_blkdev(major, dev_name, fops). This function takes in the requested major number, the name of this block device (this appears in the /dev directory), and a pointer to the file operations structure. If successful, it returns the desired major number. Next, we create the gendisk structure. The function alloc_disk(int minors) in include/linux/genhd.h takes in the number of partitions and returns a pointer to the gendisk structure. We now look at the gendisk structure: ------------------------------------------------------------------------- include/linux/genhd.h 081 struct gendisk { 082 int major; /* major number of driver */ 083 int first_minor; 084 int minors; 085 char disk_name[16]; /* name of major driver */ 086 struct hd_struct **part; /* [indexed by minor] */ 087 struct block_device_operations *fops; 088 struct request_queue *queue; 089 void *private_data; 090 sector_t capacity; 091 092 int flags; 093 char devfs_name[64]; /* devfs crap */ 094 int number; /* more of the same */ 095 struct device *driverfs_dev; 096 struct kobject kobj; 097 098 struct timer_rand_state *random; 099 int policy; 100 101 unsigned sync_io; /* RAID */ 102 unsigned long stamp, stamp_idle; 103 int in_flight; 104 #ifdef CONFIG_SMP 105 struct disk_stats *dkstats; 106 #else 107 struct disk_stats dkstats; 108 #endif 109 }; ------------------------------------------------------------------------- Line 82The major_num field is filled in from the result of register_blkdev(). Line 83A block device for a hard drive could handle several physical drives. Although it is driver dependent, the minor number usually labels each physical drive. The first_minor field is the first of the physical drives. Line 85The disk_name, such as hda or sdb, is the text name for an entire disk. (Partitions within a disk are named hda1, hda2, and so on.) These are logical disks within a physical disk device. Line 87The fops field is the block_device_operations initialized to the file operations structure. The file operations structure contains pointers to the helper functions in the low-level device driver. These functions are driver dependent in that they are not all implemented in every driver. Commonly implemented file operations are open, close, read, and write. Chapter 4, "Memory Management," discusses the file operations structure. Line 88The queue field points to the list of requested operations that the driver must perform. Initialization of the request queue is discussed shortly. Line 89The private_data field is for driver-dependent data. Line 90The capacity field is to be set with the drive size (in 512KB sectors). A call to set_capacity() should furnish this value. Line 92The flags field indicates device attributes. In case of a disk drive, it is the type of media, such as CD, removable, and so on. Now, we look at what is involved with initializing the request queue. With the queue already declared, we call blk_init_queue(request_fn_proc, spinlock_t). This function takes, as its first parameter, the transfer function to be called on behalf of the filesystem. The function blk_init_queue() allocates the queue with blk_alloc_queue() and then initializes the queue structure. The second parameter to blk_init_queue() is a lock to be associated with the queue for all operations. Finally, to make this block device visible to the kernel, the driver must call add_disk(): ------------------------------------------------------------------------- Drivers/block/genhd.c 193 void add_disk(struct gendisk *disk) 194 { 195 disk->flags |= GENHD_FL_UP; 196 blk_register_region(MKDEV(disk->major, disk->first_minor), 197 disk->minors, NULL, exact_match, exact_lock, disk); 198 register_disk(disk); 199 blk_register_queue(disk); 200 } ------------------------------------------------------------------------- Line 196This device is mapped into the kernel based on size and number of partitions. The call to blk_register_region() has the following six parameters:
Line 198register_disk checks for partitions and adds them to the filesystem. Line 199Register the request queue for this particular region. 5.2.4. Device OperationsThe basic generic block device has open, close (release), ioctl, and most important, the request function. At the least, the open and close functions could be simple usage counters. The ioctl() interface can be used for debug and performance measurements by bypassing the various software layers. The request function, which is called when a request is put on the queue by the filesystem, extracts the request structure and acts upon its contents. Depending on whether the request is a read or write, the device takes the appropriate action. The request queue is not accessed directly, but by a set of helper routines. (These can be found in drivers/block/elevator.c and include/linux/blkdev.h.) In keeping with our basic device model, we want to include the ability to act on the next request in our request function: ------------------------------------------------------------------------- drivers/block/elevator.c 186 struct request *elv_next_request(request_queue_t *q) ------------------------------------------------------------------------- This helper function returns a pointer to the next request structure. By examining the elements, the driver can glean all the information needed to determine the size, direction, and any other custom operations associated with this request. When the driver finishes this request, it indicates this to the kernel by using the end_request() helper function: ------------------------------------------------------------------------- drivers/block/ll_rw_blk.c 2599 void end_request(struct request *req, int uptodate) 2600 { 2601 if (!end_that_request_first(req, uptodate, req->hard_cur_sectors)) { 2602 add_disk_randomness(req->rq_disk); 2603 blkdev_dequeue_request(req); 2604 end_that_request_last(req); 2605 } 2606 } ------------------------------------------------------------------------- Line 2599Pass in the request queue acquired from elev_next_request(), Line 2601end_that_request_first() TRansfers the proper number of sectors. (If sectors are pending, end_request() simply returns.) Line 2602Add to the system entropy pool. The entropy pool is the system method for generating random numbers from a function fast enough to be called at interrupt time. The basic idea is to collect bytes of data from various drivers in the system and generate a random number from them. Chapter 10, "Adding Your Code to the Kernel," discusses this. Another explanation is at the head of the file /drivers/char/random.c. Line 2603Remove request structure from the queue. Line 2604Collect statistics and make the structure available to be free. From this point on, the generic driver services requests until it is released. Referring to Figure 5.4, we now have the generic block device layer constructing and maintaining the request queue. The final layer in the block I/O system is the hardware (or specific) device driver. The hardware device driver uses the request queue helper routines from the generic layer to service requests from its registered request queue and send notifications when the request is complete. The hardware device driver has intimate knowledge of the underlying hardware with regard to register locations, I/O, timing, interrupts, and DMA (discussed in the "Direct Memory Access [DMA]" section of this chapter). The complexities of a complete driver for IDE or SCSI are beyond the scope of this chapter. We offer more on hardware device drivers in Chapter 10 and a series of projects to help you produce a skeleton driver to build on. 5.2.5. Character Device OverviewUnlike the block device, the character device sends a stream of data. All serial devices are character devices. When we use the classic examples of a keyboard controller or a serial terminal as a character stream device, it is intuitively clear we cannot (nor would we want to) access the data from these devices out of order. This introduces the gray area for packetized data transmission. The Ethernet medium at the physical transmission layer is a serial device, but at the bus level, it uses DMA to transfer large chunks of data to and from memory. As device driver writers, we can make anything happen in the hardware, but real-time practicality is the governing force keeping us from randomly accessing an audio stream or streaming data to our IDE drive. Although both sound like attractive challenges, we still have two simple rules we must follow:
The parallel port driver at the end of this chapter is a character device driver. Similarities between character and block drivers is the file I/O-based interface. Externally, both types use file operations such as open, close, read, and write. Internally, the most obvious difference between a character device driver and a block device driver is that the character device does not have the block device system of request queues for read and write operations (as previously discussed). It is often the case that for a non-buffered character device, an interrupt is asserted for each element (character) received. To contrast this to a block device, a chunk(s) of data is retrieved and an interrupt is then asserted. 5.2.6. A Note on Network DevicesNetwork devices have attributes of both block and character devices and are often thought of as a special set of devices. Like a character device, at the physical level, data is transmitted serially. Like a block device, data is packetized and moved to and from the network controller via direct memory access (discussed in the "Direct Memory Access [DMA]" section). Network devices need to be mentioned as I/O in this chapter, but because of their complexity, they are beyond the scope of this book. 5.2.7. Clock DevicesClocks are I/O devices that count the hardware heartbeat of the system. Without the concept of elapsed time, Linux would cease to function. Chapter 7, "Scheduling and Kernel Synchronization," covers the system and real-time clocks. 5.2.8. Terminal DevicesThe earliest terminals were teletype machines (hence the name tty for the serial port driver). The teletypewriter had been in development since the turn of the century with the desire to send and read real text over telegraph lines. By the early 1960s, the teletype had matured with the early RS-232 standard, and it seemed to be a match for the growing number of the day's minicomputers. For communicating with computers, the teletype gave way to the terminal of the 1970s. True terminals are becoming a rare breed. Popular with mainframe and minicomputers in the 1970s to the mid 1980s, they have given way to PCs running terminal-emulator software packages. The terminal itself (often called a "dumb" terminal) was simply a monitor and keyboard device that communicated with a mainframe by using serial communications. Unlike the PC, it had only enough "smarts" to send and receive text data. The main console (configurable at boot time) is the first terminal to come up on a Linux system. Often, a graphical interface is launched, and terminal emulator windows are used thereafter. 5.2.9. Direct Memory Access (DMA)The DMA controller is a hardware device that is situated between an I/O device and (usually) the high-performance bus in the system. The purpose of the DMA controller is to move large amounts of data without processor intervention. The DMA controller can be thought of as a dedicated processor programmed to move blocks of data to and from main memory. At the register level, the DMA controller takes a source and destination address and length to complete its task. Then, while the main processor is idle, it can send a burst of data from a device to memory, or from memory to memory or from memory to a device. Many controllers (disk, network, and graphics) have a DMA engine built-in and can therefore transfer large amounts of data without using precious processor cycles. |