Hard Drive Basics | Apple Pro Training Series. Optimizing Your Final Cut Pro System. A Technical Guide to Real-World Post-Production

While you can just buy hard disks and plug them in, you'll have better luck with them if you understand how they work and how they're connected. The spec sheets don't tell the whole story; hard drives vary in performance depending on how full they are, and the ways in which they're connected and formatted affect how well they can supply a steady stream of video data. Fortunately, the details aren't that difficult to grasp, and once you know the basics you'll be able to read between the lines of the manufacturer's data sheets and pick the best drives for your needs.

Hard drives are fairly simple electromechanical devices. One or more platters, metal or glass disks coated with magnetic material, are mounted on a hub and spun at about 100 revolutions per second. Magnetic read/write heads scan across the surface of the platters to read existing data or record it anew.

Each platter may be coated on one or both surfaces (top and bottom alike). Each coated surface has its own head, and the heads are attached to arms, suspending them just above the magnetic layer. The arms in turn are connected to a common mechanism that either rotates or slides back and forth, to move the heads between the outermost parts of the platters and the innermost sections closest to the hub.

Each surface is divided into a bunch of concentric circles, or tracks, within which data may be recorded. All the tracks addressable with the arm assembly in one place form a cylinder.

Within tracks, data are stored in sectors, the smallest addressable chunk of storage from the disk drive's standpoint. Each sector may hold something like 512 or 1024 bytes. Each track contains sector markers so the drive can find the sector it needs. (The drive actually reads and writes an entire sector at a time; it has no way to individually address "bit 3 of byte 12 of sector 14," for example. That's a complication handled by the drive's controller.)

The smallest chunk of data the Mac OS sees is the block, a collection of one or more sectors. (In the PC world, a block is usually called a cluster.) Lumping several sectors together into blocks makes for more efficient disk access. If your sectors are 512 bytes, and your blocks contain 8 sectors or 4 kilobytes, you can read a 1 MB file by asking the disk for a block 256 times. Asking the disk for a sector would require 2,048 separate calls; the Mac would spend all its time asking for data instead of digesting it. It's more efficient to make a few large requests instead of many smaller ones.

There's a trade-off in block sizing: make the blocks too small, and your I/O subsystem spends all its time handling piffling little requests; make them too big and you waste space on the disk. The smallest amount you can allocate for a file is one entire block, and the file's space grows in block-long increments. Small, slow disks, like floppies (remember them?) usually have block sizes of a single sector, whereas huge, fast arrays like Xserve RAIDs may use blocks of 4 MB for video I/O. Desktop drives are somewhere in-between, of course; 4 KB blocks are typical.

Note

For single disks you initialize with OS X's Disk Utility, you don't need to worry about block sizes: Disk Utility picks sensible sizes for you. If you set up RAIDs or SANs, the RAID or SAN administration software may very well ask you for a block size, and that's why we're wasting your time with this stuff in the first place.

Starting with an empty disk, the heads write data into each track within a cylinder in turn until the entire cylinder is full. Then the arms step to the next cylinder, and the heads write new data into it, and the cycle repeats, cylinder by cylinder, until the disk is full. Disks normally write from the outermost cylinders inwards, the same way a phonograph record starts playing from the outside inwards. Of course, once you've written a bunch of files and then deleted some, the progress is a bit less predictable; we'll cover that in a bit.

Raw Performance Factors

Drives are rated by their rotational speed. Slow laptop drives spin at 4200 RPM (revolutions per minute); faster laptop drives and some desktop drives run at 5400 RPM. Most desktop drives these days turn at 7200 RPM; expensive, high-performance drives may run at 10,000, 12,000, or even 15,000 RPM.

All else being equal, data transfer speeds scale with rotational speeds, but you pay the price in dollars, power consumption, heat generation, and reduced longevity. Faster drives, in addition to sucking more power and generating more heat, tend to run noisier, too, an important consideration if you're filling up a quiet editing suite with the things.

The outermost cylinders of current drives have roughly twice the diameter of the innermost cylinders, thus they have twice the linear storage space (the circumference being 2pr, of course). (See? High-school geometry was good for something.) Because the platters rotate at a fixed speed, the outermost cylinder passes beneath the heads twice as quickly as the innermost cylinder. Bits are recorded on the platters at a fixed density (number of bits per inch of travel, if you will), so bits on the outermost cylinders get read and written about twice as fast as on the innermost cylinders. As you fill a drive up with data, the read/write speed decreases from its maximum when the drive is empty to only half that when the drive is full. The general rule of thumb is that you'll have 80 percent of the drive's performance when it's two-thirds full, but that performance drops off rapidly thereafter.

When a cylinder has been read or written to, the arm assembly moves the heads to the next cylinder. This process takes a long time by disk drive standards: the assembly has to move, to lock onto an embedded servo track (the "lane markings" that divide one track from its neighbors), and then wait for the desired data sector to come flying past the heads. Thus, whenever the heads seek to a new cylinder, there's an interruption in the data flow.

All else being equal, a drive with a shorter seek time will provide smoother data flow with fewer interruptions than comparable drives with longer seek times. It may seem brief enough to you and me, but to an impatient computer, hungry for data, the delay could mean the difference between smooth playback and a dropped frame.

It's not so bad when a file is recorded sequentially from cylinder to cylinder and all the data are contiguous. But as you use a drive, writing and then deleting files at random, the free space on the drive starts looking like Swiss cheese: little holes here and there. This fragmentation means that files start getting spread out in discontiguous chunks, and the drive spends more time seeking and less time reading and writing.

All else being equal, a drive with low fragmentation will record and play back data faster and more smoothly than a heavily fragmented drive. (Note that OS X performs defragmentation automatically as you read and write to a drive. Also note that if you erase a media drive between projects, you automatically defragment it. Running separate defragmentation utilities under OS X is usually not useful because of these factors.)

To help smooth and speed the flow of precious data, modern drives use memory buffers, serving the same purpose as cache memory on processors. While writing, data can flow into the cache even as the heads are busily seeking to new cylinders; on playback, clever algorithms can fill the buffer with data not yet requested that might be needed, minimizing the time the computer spends waiting on balky mechanical systems.

All else being equal, larger buffers result in smoother performance with fewer delays and higher sustained data transfer rates. 2 MB buffers are common on commodity drives at this time, and higher-performance drives typically carry 8 or 16 MB buffersbut these figures are constantly changing, and a year from now they may be entirely different.

You'll notice I keep saying, "all else being equal." The sad truth is, unless you control for all other variables, these performance factors give only a vague indication of actual performance. Although faster is generally better, less fragmented is better, and larger buffers are better, the actual performance of a drive is conditioned by all of these factors in combination. But you can't predict performance based on these raw numbers alone.

For example, this year's 4200 RPM drives are faster than the 5400 RPM drives from a couple of years ago: bit density has increased, and the data-buffering algorithms have improved, so data flows to and from the disk much faster.

Two seemingly identical drives from different manufacturersboth current-model, 7200 RPM, 250 GB drives with 8 MB buffersmay show radically different performance in real-world tests, because their designers took different approaches to buffering and caching the data.

What really counts are a drive's sustained read and sustained write figures. These are measurements of the actual data throughput over a reasonably long period of time. And at that, consider that sustained rates are usually measured on outer cylinders of a reasonably unfragmented drive. Performance on the inner cylinders of a highly-fragmented disk is likely to be a disappointing fraction of the quoted specification.

Note

Burst or peak throughput figures are sometimes provided in a drive's specifications; these measure how quickly the drive and its electronics can supply a chunk of data in best-case conditions, not cases where heads need to seek across cylinders, buffers need filling, and the like. Video and audio capture and playback rely on sustained data rates, not peak rates.

Ready to despair? Don't worry, there are several things you can do, both before and after getting your drives, to predict and verify performance:

Read the drive's specifications for the rated sustained read and write speeds. Remember, these numbers are best-case sustained numbers; derate them to 80 percent if you expect to fill the drives to the two-thirds point, or 50 percent if you plan to fill them with captured video (unless the specs say the numbers were obtained with two-thirds full or completely full drives). Also compare seek times, buffer sizes, and rotation rates to get an overall feel for how one drive compares to another.
Several Web sites measure real-world performance, so you don't have to extrapolate manufacturers' numbers to actual results. Robert Morgan's "Bare Feats" (www.barefeats.com) is probably the best. Robert tests drives, drive arrays, graphics cards, FireWire ports, and the like, and is even so kind as to run FCP rendering tests on different Macs every so often.
Various "tweaker" Web sites, like ExtremeTech.com and TomsHardware.com, are great places to find drive performance tests as well as all manner of inside-the-box technologiesthough these sites tend to be rather geeky and somewhat more PC-oriented than is entirely necessary.
Can't find real-world information on drives you already have? You can run your own tests using the SpeedTools Utilities from Intech Software (www.speedtools.com) or the command-line tool DiskTester from Lloyd Chambers ($20 shareware, via email at disktester@llc4.com. Put "DiskTester" in the subject line of your email).
Run tests with FCP itself: try capturing to and playing back from the drives in question, and see how well they work. If your capture card supports multiple data rates, you can ratchet the rates up and down to find the drives' breaking points. For playback, try setting up multiple picture-in-picture effects to multiply the number of streams being played in real time. Don't forget to test with Timelines containing fast cuts and multiple clips; such Timelines have the same stress-inducing effect as fragmented clips do. Playing or scrubbing a Timeline backwards is an especially good test, since the drives have to supply all the frames out of sequence.

Tip

If you find Bare Feats useful, help Rob out with a PayPal or snail-mail donation, using the link provided for that purpose. Rob provides the Mac community with an invaluable service, and he should be encouraged.

Oh, one more thing: The way a drive is interfaced to your Mac is another important factor. Both the native interface on the drive itself (that is, ATA or SCSI), as well as how it's connected to your FCP system (directly, in a FireWire case, across a network, or through a Fibre Channel switch) affect the actual performance you'll see. We'll cover interfaces and connections a bit later.

Disks, Partitions, and Volumes

The disk as we've described it is of no use to your Mac until one or more volumes have been created on it. A disk is a physical device, that is, something you can drop on your foot. A volume is a logical device, one visible to your computer as a usable place to store data, but it may not have a direct correspondence to a single physical device.

You create volumes when you initialize or format a disk (in Disk Utility, this happens when you erase a disk). OS X creates the necessary data structures, such as the catalog listing all the files and folders on the volume, and the bitmap, which tells the Mac which blocks are used by existing files and which are free to use for new ones.

Disks can be partitioned into two or more volumes. Each volume is a separate logical entity; each mounts on the desktop as a separate drive. There are nearly as many arguments about partitioning as there are Mac users; I'll just mention a couple of pro-partitioning points you may want to consider.

Partitioning for Data Segregation

Although you usually want to keep media on dedicated drives, there are perfectly legitimate reasons to partition disks into separate "system" and "media" drives.

If you're doing field production with PowerBooks, for example, you have only one internal drive; it's fast enough for DV capture and playback, and a lot more convenient than external FireWire drives. Partitioning the internal disk lets you store project media on its own drive, so when you're finished with it, you can simply erase it instead of having to ferret out captured clips, render files, and the like from your Final Cut Pro Documents folder on your system drive.

Erasing the media partition eliminates any fragmentation that might have occurred during the project, too. Deleting project folders from a system disk does nothing to clean up fragmentation issues.

Note however that the separate partition is a logical drive, not a physical driveit still shares its mechanism with any other volumes on the same disk. If your Mac needs to fetch a file from the system volume in the middle of a capture, the same heads writing the captured data have to run off to the system partition in search of the required file, possibly causing dropped frames. There are still good reasons to store your media files on a physically separate disk, even when it seems inconvenient!

Partitioning for Performance

You may have disks that perform adequately on the outer cylinders for the data rates you need to support, but aren't quite good enough on the inner cylinders for video purposes. You can partition the disks into media volumes on the higher-performance outer part of the disk, and either store less critical files on the inner part or simply leave it unused. With careful configuring, you can sometimes wind up with more storage at a lower cost than if you bought drives sufficient to sustain capture all the way to the inner tracks.

More Info

Bare Feats describes one such case, using cheap, large SATA drives in place of faster, smaller, more expensive SCSI drives: www.barefeats.com/hard35.html.

Drive Formats

When you initialize a volume, Disk Utility asks you for a volume format. You normally have at least two choices, including "Mac OS Extended" and "Mac OS Extended (Journaled)." These are the correct choices for best performance. (Mac OS Extended is also known as HFS+.)

The two formats are identical except for journaling. Journaling is a high-reliability scheme in which Mac OS keeps a separate journal of all changes made to the file system as they're happening, even before the volume's catalog and bitmap get updated. The journal lets the drive survive power outages and system crashes with a better chance of avoiding data corruption.

If you're on the edge of acceptable performance, you may get better results with the non-journaled format, but the differences are small: Journaling slows things down by only a few percent, not usually enough to notice amongst all the other factors affecting data transfer. Nonetheless, some capture device vendors suggest turning journaling off.

Other options you'll see will be MS-DOS File System and Unix File System. The MS-DOS option is useful for FireWire and other removable drives to be shared with PCs, but it's not efficient enough for media capture and playback. (Although you may be able to capture or play back low-bit rate formats like DV.) Likewise, the Unix choice isn't suitable for video capture.

Tip

If you're configuring a RAID, SAN, or certain third-party external drives, you may be given other options. Check the product's documentation for the proper format to use. Xsan, for example, needs to be set up as ACFS, not Mac OS Extended!

There's another gotcha to watch out for: FCP sets its default scratch disk to ~/Documents/Final Cut Pro Documents, where ~ is a user's home directory. Normally, that works fine, at least for DV capture. However, if a hapless user is so bold as to turn on File Vault, things won't work so well. File Vault converts the user's home directory and everything it contains into an encrypted disk image: every byte of data read or written therein has to go through encryption and decryption.

There are two simple fixes to this problem: turn off File Vault, or move FCP's scratch disk outside the user's home directory