Section 7.8. Hierarchical Storage Management

   

7.8 Hierarchical Storage Management

Hierarchical Storage Management (HSM) is sometimes confused with backup products and backup auxiliary products that help keep track of a file's archival position on a tape, or even its archival condition. Although both HSM and backup do deal with moving files and data between a primary medium (typically disk) and a secondary medium (typically tape or optical media), there is a big difference. The difference is the degree of transparency that backup and HSM provide. With HSM, the file appears to be present still. In some cases the file may have special attributes to indicate that the access time may be higher. With backup products, the application does not see a previously backed -up and deleted file.

Another way to look at HSM is to compare it to virtual memory. Virtual memory is an imaginary area of memory ”that is, memory that a computer system believes it has but is not physically present in the machine. Operating systems create virtual memory by swapping out regions of memory (precious and fast online resources) that are not actively in use to hard disk (relatively inexpensive and slower media), and then allocating the freed memory to another process. When a different process tries to access the memory swapped out to disk, that process halts temporarily while the required memory contents are restored from disk, allowing the process to continue running. HSM does the same thing, except that the precious resource with HSM is online disk storage, and the relatively inexpensive resources are tape or other media, such as optical media.

Some HSM products integrate backup with the HSM products. To strike a balance between HSM functionality and to avoid disruption of regular data center activities, HSM products typically establish some benchmarks. When the free disk space reaches a particular benchmark, the HSM product kicks in, even during regular hours, to migrate some files to offline media and free up disk space.

7.8.1 Remote Storage Services

Windows 2000 Server includes a Remote Storage Services (RSS) module that provides Hierarchical Storage Management functionality. RSS keeps track of free space and files accessed on volumes , called managed volumes . A storage administrator establishes criteria about the minimum free disk space required, how long a file needs to be unused before it is a candidate for migration to offline storage, and so on.

RSS has an engine that periodically checks the amount of free disk space, scans files for premigration, and when the disk space falls below a minimum, truncates the premigrated files. (A complete description of premigration is included later in this section.) The unused files are migrated to an offline medium ”for example, tape ”depending on various criteria, explained in detail later in this section. The reparse point data is set for this file, including information necessary for migration of the file.

File migration selects the appropriate files on the basis of policies set by the administrator. These policies deal with

  • The amount of free disk space desired to be maintained

  • The last date and time the file was accessed

  • The size of the file

  • Some administrator-defined inclusion or exclusion rules as to what files or directories may be migrated

RSS integrates into the rest of the Windows 2000 platform as follows :

  • The Explorer GUI shows a special icon for files that have been migrated to removable media.

  • The command-line window, when listing files that have been migrated to removable media, shows the size of the file within parentheses.

  • The Windows NT backup application coordinates with RSS; Windows NT backup opens the file using option FILE_OPEN_NO_RECALL in the CreateFile API. This ensures that the backup application can read the file data to back up the file, yet not cause the file to be migrated back to disk because of the file's backup status.

  • RSS jobs are submitted via the Windows NT job scheduler, and they can be administered like any other ordinary jobs.

  • The Windows 2000 Indexing Service can recognize the fact that a file has not changed, other than the file being migrated between disk storage and remote storage.

  • Network timeouts are automatically extended when a file on remote storage is accessed, allowing a chance for the file to be migrated.

RSS deals with files that can be in one of three states:

  1. A normal file , wherein the file is resident on disk.

  2. A premigrated file , wherein the file's unnamed data stream is copied to offline media, yet it is also left untouched on the disk. A premigrated file also has a reparse point on the file.

  3. A migrated file , wherein the file's unnamed data stream is copied to offline media and deleted from the disk; the file is marked with the sparse attribute, and the FILE_ATTRIBUTE_OFFLINE attribute is set. The reparse point is set on the file with a special tag, IO_REPARSE_TAG_HSM.

Figure 7.6 shows the RSS architecture. Remote Storage Services consists of multiple user mode and kernel mode components , described in the following paragraphs.

Figure 7.6. RSS Architecture

graphics/07fig06.gif

When accessing a migrated file, the RSS filter driver catches the IRP and queues it up. The RSS filter driver communicates with the RSS file system agent, requesting that the file data be restored from the removable media. The RSS file system agent, in turn , invokes the services of the RSS engine. The RSS engine retrieves the data and streams it to the RSS filter driver, which restores the file data piece by piece as it receives the pieces of data. If any I/O is pending for the data thus retrieved, that I/O is completed. The data retrieved so far is used to satisfy any I/O that may be pending, waiting for the appropriate data. The file status (including reparse point data) is updated to move the file back into a premigrated stage from its current condition of being marked as migrated.

When a migrated file is accessed via the FILE_OPEN_NO_RECALL option in the CreateFile API, the IRPs that are read are satisfied in a similar manner, with the difference being that the file is not migrated from tape to disk. The data is restored off the tape and is fed to the application without being written to the disk. Note that Windows 2000 can handle only one file recall (migrating the file from remote storage back to tape) at a time. [1] This is true even when multiple Remote Storage drives are available and the files accessed are on two separate paths on two separate media in two separate drives .

[1] Windows Server 2003 is expected to have the same limitation as well. Given that the code implementing Hierarchical Storage Management has been licensed by Microsoft from a company that is now part of Sun Microsystems, one can expect some interesting developments in the successor to the Windows Server 2003 product.

The RSS engine acts as a Removable Storage Management (RSM) client, using RSM to manage the media. The RSS engine uses a database to store details about the media it uses.

The RSS file system agent is responsible for periodically scanning managed NTFS volumes and preparing a list of files that need to be migrated. This list is prepared on the basis of the criteria decided by the administrator. The file system agent communicates the list of files to be migrated to the RSS engine. Once the files have been premigrated by the RSS engine, the RSS file system agent adds the file to its database of files that are in the premigrated state.

When the free space falls below a benchmark set by the administrator, an automatic task initiated by the RSS file system agent runs. The file system agent deletes the data stream of files that have been premigrated (moves the file from premigrated state to fully migrated state) after verifying that the files have not changed since they were premigrated. This verification is accomplished via the USN journal mechanism described in Chapter 6.

Remote Storage Services do not install by default. RSS comes with a GUI management tool that allows the management GUI to run remotely on a Windows machine. The GUI has several components:

  • A Windows Explorer component that allows users to view information such as date and timestamp of when the file was premigrated or file data location on remote storage; users can force premigration of a user-selected file, provided that they have proper access to that file.

  • A disk management component that shows volume information such as free or busy space, amount of storage used by premigrated files, amount of storage used by file placeholders, and so on.

  • A user interface that allows an administrator to cancel a file recall (migration from remote storage to disk) operation.

  • A management interface to establish the high and low benchmarks, free disk space, criteria that establish selection of files for migration, and so on.

RSS has several limitations, including the following:

  • RSS shipping with Windows 2000 does not have clustering support. One can speculate that a future version of Windows NT may include RSS clustering support.

  • The Windows 2000 product supports 4-millimeter tape, 8-millimeter tape, and digital linear tape (DLT) as secondary media. Future versions of Windows NT may include support for other media types, such as optical media.

  • RSS will provide migration only for the unnamed data stream, and it does not handle named data streams at all.

  • RSS uses reparse points that are new to the NTFS version that ships with Windows 2000. Hence, RSS cannot support older versions of NTFS volumes.

  • RSS can manage only fixed volumes. It cannot work with volumes on removable media ”for example, DVD or a Jazz drive.

  • RSS should be installed only after the managed volume has been compressed (if that is desired).

  • RSS should be installed after the Indexing Service is installed, if the Indexing Service is desired.

  • RSS currently maintains a database on the system volume. This means that a volume managed by RSS is not completely self-referencing, and thus the volume cannot easily be moved from one server to another

  • RSS cannot migrate hidden, system, encrypted, or sparse files, or any files with extended attributes, to or from remote storage.

7.8.2 Windows 2000 Removable Storage Management

Windows 2000 Removable Storage Management (RSM) is a subsystem that provides some important functionalities, including

  • Support for tape devices, tape robots, and jukeboxes

  • Management of removable media such as tapes and CDs

  • Provisions for sharing of the tape devices, robots, and jukeboxes between different applications, such as backup and Hierarchical Storage Management

Windows 2000 provides a set of components that jointly facilitate storage management and development of applications dealing with removable storage media. These components consist of

  • Removable storage administrative module

  • Removable Storage Manager (API)

  • Removable storage database

Removable Storage Services is a tool used to accomplish a variety of tasks , including backup. Thus it is not a replacement for backup, but a tool used to accomplish and manage backup and restore operations.

To understand these components, it is best to understand the overall architecture of the Removable Storage Management, as described in the next section.

7.8.2.1 Windows 2000 RSM Architecture

Figure 7.7 shows the overall architecture of the Windows 2000 RSM subsystem. The RSM Service plays a significant role in the RSM subsystem. The RSM Service acts as the repository of the RSM API implementation. It receives requests from applications and places them in a queue, handling the requests as the desired resources become available. When the service starts, it also performs some inventory and initialization of the various libraries, identifying stand-alone drives and associating drives with changers.

Figure 7.7. RSM Architecture

graphics/07fig07.gif

Vendors developing changer and other RSS-type hardware should write a minichanger driver. The changer class driver implements a lot of the functionality common across devices and takes care of creating device objects to represent the device. However, the minichanger driver does have to deal with some knowledge of Windows NT drivers, including IRPs. Nevertheless, the range of functionality that the minichanger driver must supply is limited, compared to regular Windows NT drivers.

Note that RSM is involved in managing and setting up a device that houses removable media, as well as the media itself. After establishing device ownership, and mounting and positioning the appropriate media, RSM is no longer in the data path , meaning that it adds no I/O overhead at all.

7.8.2.2 Windows 2000 RSM APIs

The Windows 2000 Platform SDK describes how to build an application using the RSM APIs and provides details about them. The biggest contribution of these APIs is the simplicity they bring to building storage management applications.

Figure 7.8 shows the situation before the RSM API existed. Each application was complex because it had to deal with individual devices and device types. Further, every time a new device or device type was introduced, every application would require modification.

Figure 7.8. Application Development before RSM APIs

graphics/07fig08.gif

Figure 7.9 shows the simplification provided by RSM APIs in developing the application. Obviously RSM provides extensibility in addition to simplification. When RSM adds support for a new type of device, the application can now deal with a new device with little or no modification at all. The list of devices supported by RSM is continually changing; the latest list is available at the Microsoft hardware compatibility list Web site (http://www.microsoft.com/hwdq/hcl).

Figure 7.9. Application Development Simplified by RSM APIs

graphics/07fig09.gif

RSM APIs can be divided into several categories, on the basis of the functionality they provide. This functionality includes the following:

  • Drive cleaning “ related APIs such as reserve cleaner, inject cleaner, eject cleaner, and (run) cleaner

  • Device state change detection APIs on stand-alone drives

  • Database-related APIs for backing up and restoring the RSM database, as well as registering and deregistering for database notifications

  • Library control functions for injecting, moving, and ejecting media (within a library), as well as enabling and disabling drive and changer resources

  • Mounting, dismounting , and managing media pools

  • Other APIs necessary to build a robust storage management application, such as APIs to query status, cancel outstanding operations, or deal with RSM objects

7.8.2.3 RSM Database

The RSM database stores information vital to the RSM subsystem. Examples of the kind of information stored here include

  • Media inventory

  • Media pool details, including pool configurations and contents of pools

  • Library configurations

The database does not contain catalog information of which files are on which media. That type of functionality is up to the storage application.

Users can back up the database by manually copying the files. These files are normally stored under the %SystemRoot%\System32\ntmsdata folder. Obviously the RSM Service must stop before the files will allow copying.

The database also allows backup via an RSM API. In this case the service must still run, but it is advisable that the service not be actively in use by other programs.

7.8.2.4 Media Pools

Media pools provide for ease of use, including administrative use. A media pool is simply a collection of media such as tapes or CDs that share a common property. There are two kinds of media pools: application pools and system pools.

System pools are used by the operating system for its purposes. There are three kinds of system media pools:

  1. A free pool contains media that any application can claim for its use and return to the free pool when it is finished with them.

  2. An import pool contains media that are recognizable but never before encountered . A good example might be a medium that was written to use a known backup software program on a different computer system and has been mounted for the first time on this computer system.

  3. An unrecognized pool contains media recognized to be nonblank, but Removable Storage Services is unable to determine the nature of the data on the media. An example may be media created on a different operating system or a backup software signature that is unknown. The intent is to allow the administrator to recognize the situation and save the media from being overwritten.

Application pools are created by storage management applications via the RSM APIs. Examples of applications that do this include Windows backup and Windows Remote Storage (HSM).

7.8.2.5 RSM Administration User Interface

The Removable Storage Management administration user interface first shipped with Windows 2000. It is simply a Microsoft Management Console (MMC) snap-in, which allows the storage or system administrator to view and configure objects. Various objects ”such as media pools, work queues, and physical properties of storage devices ”represent these object types.


   
Top


Inside Windows Storage
Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond
ISBN: 032112698X
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Dilip C. Naik

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net