Section 10.2. Windows 2000 | Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond

10.2 Windows 2000

The most remarkable advance of Windows 2000 was to make the Windows NT platform highly conducive to being used in a SAN or NAS environment. This release also established Windows as a player in the NAS environment. Sections 10.2.1 through 10.2.13 review the storage- related advances made in Windows 2000.

10.2.1 Improved Storage Unit Accessibility

Windows 2000 dramatically enhances storage unit accessibility by

Vastly reducing the scenarios in which a system reboot is required to change storage unit configurations ”for example, add, remove, grow, or shrink volumes
Vastly increasing the number of storage units that the operating system can handle

Windows NT 4.0 requires that a system be rebooted before it can access a new physical storage device (called a target in storage terminology). Windows 2000 PnP improvements support the dynamic addition and removal of storage targets. The bus driver reports an event called BusChangeDetected, and the operating system responds with action that may involve rescanning for all devices attached to the bus. The improvement is that the operating system and associated drivers support access to new devices without requiring a reboot, and the operating system also does not require the storage bus (e.g., SCSI, Fibre Channel) to be reset ”an operation that consumes a fair amount of time. A new target may be added or a logical unit number (LUN) on an existing target may be dynamically made visible, and neither operation requires a bus reset.

Note that Windows NT (including Windows 2000) still will claim ownership of all LUNs that it sees. Microsoft has indicated that it is working on a feature that will allow an administrator to specify which LUNs should be accessed by Windows NT (and which ones should not). This feature would allow for LUNs to be visible to Windows NT (the operating system, and not Windows NT applications), yet Windows NT would not use them and in the process possibly destroy existing data on those LUNs.

Windows 2000 supports vastly improved storage connectivity by being able to address up to 128 targets (SCSI identifiers) and up to 255 LUNs per target. By comparison Windows NT 4.0 supported only 8 targets and 8 LUNs per target, and Windows NT 4.0 Service Pack 5 supported 255 targets and 8 LUNs per target.

10.2.2 New Volume and Disk Management

Windows 2000 introduced a new concept called dynamic disks . The term dynamic disk is a logical concept that is applied to the physical disk. Physical disks remain basic disks with a partition table, just as they used to be in Windows NT 4.0, until they are explicitly converted to dynamic disks. To put it differently, dynamic disks are physical disks with a different partition table format that allows the partitions on the disks to shrink and grow dynamically. This new partition table is stored in a redundant fashion, to ensure that it is always available, even when some disk clusters are corrupted. Complete details are available in Chapter 6.

Dynamic disks can be recognized only by Windows 2000 and higher versions of Windows NT. Dynamic disks still retain the old-style partition table as well, to ensure that data does not become corrupted when the disk is accidentally referenced either by a Windows NT 4.0 system or in a heterogeneous operating environment in which the other operating system understands only the old-style partition table. However, the partition table simply contains a single entry that shows the whole disk as being in use. This old-style partition table does not reflect the true logical organization of the disk.

Two new kernel mode components have been added. The Mount Manager handles mount operations for basic and dynamic disks, including assigning drive letters . The Partition Manager handles PnP operations, as well as power management and WMI operations on disk partitions. The two combine to provide functionality that allows two volume managers to coexist in the system. One is the old FtDisk driver, which in some senses is indeed a volume manager. The other one is either the default Logical Disk Manager (LDM) shipped with Windows 2000 or the Logical Volume Manager (LVM) upgrade product sold by VERITAS.

Together with improvements in PnP, the new disk management features ensure that

Volumes may be dynamically shrunk and grown without the operating system having to be shut down and the system rebooted. While a volume is being grown, an application can continue to access and use it.
Volumes may be mounted and unmounted on the fly. Of course, an unmounted volume is no longer available for use, but the system does not need to be rebooted after a volume has been gracefully removed or a new volume has been added.
Volumes may be converted from using one RAID class to using another RAID class without a system reboot. RAID 0, RAID 1, and RAID 5 volumes can be configured, broken, and rebuilt on the fly, all without requiring a reboot.
Legacy volumes, including volumes that use a form of RAID supported by the legacy FtDisk volume manager, are still supported.
Dynamic disks are self-describing ; that is, all metadata needed to figure out the layout of a disk is stored on the disk itself. Windows NT 4.0 stored part of this metadata in the registry instead. This means that dynamic disks may be freely transported between Windows 2000 systems and be readily available for use without any further reconfiguration needed.
The new disk management features make a GUI and command-line tool available for disk management.
New APIs have been added to enable volume management applications. Examples include APIs to enumerate all volumes (FindFirstVolume to enumerate the first volume, FindNextVolume to enumerate the next , and FindVolumeClose to end the search and free up resources associated with the search). GetVolumeInformation returns information about the volume, including whether the volume supports reparse points (described in Section 10.2.10).

10.2.3 Dfs Improvements

Dfs was introduced in Windows NT 4.0. Dfs allows an administrator to manage shares efficiently on a network, and it allows users to work with share names or directories that reflect their work environment. Users do not need to correlate the physical network and server share environments with their work environment. For example, users in a company could access \\MyCompany\Accounts for all files and directories related to the Accounting department and not have to worry about the real physical server name or its shares.

Dfs has been substantially improved in Windows 2000. The improvements include the following:

Better availability and fault tolerance . Windows 2000 introduces Dfs support for clustering and also removes the single point of failure in terms of the single Dfs root server by allowing for multiple servers to host the Dfs root directory. This is done by providing Dfs and Active Directory integration in the form of storing the Dfs topology data in Active Directory. Furthermore, changes to the Dfs topology can be made and are immediately effective without the Dfs service having to be stopped and restarted. Dfs can now also work with the Windows 2000 File Replication Service to replicate changes in the Dfs replicas.
Better administration . Dfs is now administered via a graphical Microsoft Management Console (MMC) tool. Dfs also supports more configurable parameters ”for example, the amount of time a Dfs entry can be cached. Dfs in Windows 2000 is integrated with Active Directory, so the administrator has one less domain to worry about.
Performance optimizations . Windows 2000 clients select a Dfs replica that is on the same site rather than selecting a replica randomly that may be on a remote site. When multiple replicas are on a local site, the Windows 2000 client selects a replica randomly . Windows NT 4.0 clients do a lot more processing to use Dfs than Windows 2000 clients do. The older Dfs clients are really Dfs unaware and simply try to resolve a path , assuming that the path contains a valid server name. Only after the attempt to connect to a nonexistent server fails does the client use the services of the dfs.sys driver to resolve the path. Windows 2000 clients instead assume a Dfs-aware path and first use the dfs.sys driver to attempt to resolve the pathname.
Load balancing . Because multiple servers may store the same set of files, a client can access any of them, and different clients may access the same file on different servers.

10.2.4 Offline Folders and Client-Side Caching

Windows 2000 allows an administrator to mark shares on a server as containing cachable files. The files within that share (e.g., data and program executable files or dynamic link libraries) are then downloaded to the client and stored locally. This download happens when the user first opens the file. Files that are not opened are not downloaded. The files are synchronized in the background or when the user logs on or off. When synchronization results in a conflict (because the locally made changes now conflict with changes that were meanwhile made on the server folder), the conflict is resolved with the assistance of the user . The main advantages are these:

Performance gains because reading a local disk is much faster than reading a file over the network
Improved availability because the user can keep working on the client even when the server is offline
Improved availability when the user is undocked from the network and traveling

Offline folders can be administratively set to have the following options:

Manual caching for documents, where an administrator explicitly marks the document files that users may cache
Automatic caching for documents, where all document files opened by a user are automatically cached
Automatic caching for program files, where all program files (e.g., .exe and .dll files) are automatically cached

10.2.5 File Replication Service

Windows 2000 introduces a full two-way File Replication Service (FRS) that uses the NTFS change log journal to detect changed files and directories. The File Replication Service is used extensively by Active Directory for its internal needs. It is also used extensively in conjunction with Dfs to make multiple copies of files and the Dfs metadata available on multiple servers, thus allowing for load balancing and fault-tolerant features.

10.2.6 File System Content Indexing

Windows 2000 server ships with the Indexing Service fully integrated with the operating system and tools. For example,

Indexing Service functionality can be accessed simply via the Find files or folders dialog in Explorer.
The Indexing Service can index file contents and file attributes, including user defined attributes, as well.
The Indexing Service can also index offline content managed by Remote Storage Services.
On NTFS volumes, the Indexing Service uses the change log journal to determine which files have changed since the last index run.
The Indexing Service uses the bulk ACL-checking feature and will return files only in response to a user search to which the user has permissions. Files that a user is not allowed to access are eliminated from the search results.
The Indexing Service can also work on FAT volumes, but it will work less efficiently on FAT volumes than it does on NTFS volumes because it cannot use NTFS-specific features such as the change log journal.

10.2.7 Setup Improvements

Windows 2000 setup can create a native NTFS volume. Earlier, the process involved setting up a FAT partition and then converting the FAT partition to an NTFS volume. Note the almost interchangeable use of the terms partition and volume . As explained in Chapter 6, partitions and volumes refer to the same concept but have slightly different behavior. For one, whereas a volume can be dynamically resized, a partition cannot.

10.2.8 File System Improvements

Windows 2000 makes some improvements in the area of file systems. Although the majority of the enhancements are in NTFS, Windows 2000 also includes major improvements in optical media file systems, as well as the venerable FAT file system. However, the expectation is that the FAT file system code is in maintenance mode at Microsoft, and no significant enhancements in the FAT file system should be expected. NTFS improvements are so significant that they rate their own individual section (10.2.9).

10.2.8.1 FAT File System Improvements

Windows 2000 supports FAT32 ”the first time that FAT32 has been supported on the Windows NT platform. FAT32 was of course supported on the Windows 9X platform. FAT32 can support volumes up to 2 terabytes (TB) in size and a maximum file size of 4GB. For performance reasons, the utilities to create a FAT volume limit a FAT volume size to 32GB, but this limitation is a function of the utility and not of the underlying file system itself. FAT32 uses a smaller disk cluster and hence is more efficient in using disk space.

10.2.8.2 Optical Media File Systems

Windows NT 4.0 supported CDFS (CD-ROM File System), and this support is continued in Windows 2000. Windows 2000 adds the Universal Disk Format (UDF) file system defined by the Optical Storage Technology Association (OSTA) to the list of file systems supported on the Windows NT platform.

10.2.9 NTFS Improvements

NTFS in Windows 2000 has been enhanced in numerous ways. The enhancements require a change in the NTFS-on-disk structure. When a Windows 2000 system mounts a legacy NTFS partition for the first time ( legacy as in "one created with a prior version of Windows NT"), the partition is silently converted. The conversion is fairly efficient, and the time taken for conversion is independent of the underlying partition size. Windows NT 4.0 SP4 and higher has been provided with an upgraded version of NTFS that can recognize the new NTFS partition.

Implementing these features required a change in the on-disk structure for NTFS. For existing volumes, the on-disk structure is changed when the volume is first mounted on a Windows 2000 system. For the system volume, this is done when the system is upgraded from Windows NT 4.0 to Windows 2000.

Microsoft has also invested in considerable enterprise-level testing of Windows 2000 in general and NTFS in particular. Some examples include the following:

The maximum tested file size was 2 ⁴⁴ “ 64K, which is approximately 16TB. By contrast, the theoretical maximum file size is 2 ⁶⁴ “ 1 bytes, which is approximately 17 exabytes (EB).
The maximum number of files tested per volume was approximately 34 million. By contrast, the theoretical maximum limit is 2 ³² “ 1, which is approximately 4 billion.

10.2.9.1 Native Property Sets

Windows 2000 introduces support for native property sets that is simply user-defined metadata that can be associated with a file. This metadata can also be indexed with the Index Server that now ships with Windows 2000 Server. An example is defining a document author or intended document audience. Users can then search for a document by these user-defined tags or metadata. NTFS treats files as a collection of attribute/value pairs. The user-defined properties are simply stored as additional optional attributes on a file.

10.2.9.2 File Scan by Owner

Windows NT tracks user accounts by a unique security identifier (SID). NTFS in Windows 2000 can scan a volume and identify all files owned by a particular SID. This process facilitates administration; for example, when an intern leaves , the files owned by the intern can be quickly identified and cleaned up.

10.2.9.3 Improved Access Control List Checking

NTFS in Windows NT 4.0 kept access control lists (ACLs) on a per-file and per-directory basis. If a user had 50 files, the probably identical ACLs would be stored 50 times, once in each file. NTFS in Windows 2000 stores ACLs in a directory and indexes them as well. So for the scenario just described, the ACL would be stored just once, and each of the 50 files would have a "pointer" that would help identify the ACL.

This change facilitates bulk ACL checking, which is used by the Indexing Service. When a user performs a search, the Indexing Service prepares a list of files, and before returning the list to the user, it performs ACL checking and eliminates all files to which the user does not have access. Thus a user will see only the files that it can access.

Bulk ACL checking can be used in other scenarios as well. For example, it can be used to determine what a given user is allowed to do with a given set of files.

10.2.9.4 Journal Log File

NTFS tracks changes to the file system for two purposes: (1) to be able to recover in case of a disaster by first logging undo and redo information and (2) to offer application developers access to information that indicates which files or directories were changed and the nature of the change. Only the changes are tracked here, and the information tracked in the journal log file is insufficient to implement an undo operation. The journal file survives across reboot operations. If the file becomes full or is deleted, applications can detect the fact that some change information has been lost and behave accordingly ”for example, rescan all the directories and files of interest. This facility is extremely useful to a broad range of applications, such as replication and backup.

10.2.9.5 NTFS Stream Renaming

The Windows NT NTFS has always shipped with support for multiple data streams per file. One example of an application that uses multiple data streams is the Windows NT Macintosh server. Until Windows 2000, there was no way to rename a data stream, once it was created. One could create a new file with a new named data stream and then copy the contents of the old file to the new one, but this approach is rather inefficient. The version of NTFS that ships with Windows 2000 now provides an API to allow an application to rename an existing named data stream.

10.2.9.6 Object IDs and Link Tracking

Windows 2000 implements link tracking. Links can be shortcuts for files or OLE objects such as an Excel or PowerPoint document embedded within a file such as a Word file. An application can track a link even when the source object behind the link moves in various ways, including the following:

A document representing the link source is moved within the same volume on a Windows NT server.
A document representing the link source is moved between volumes on the same Windows NT server.
A document representing the link source is moved from one Windows NT server to another Windows NT server within the same domain.
A complete volume containing the link source is moved from one Windows NT server to another Windows NT server within the same domain.
A Windows NT server with a mounted volume that contains the link source is renamed .
A network share on a Windows NT server that contains the source of the link is renamed.
The document representing the link source is renamed.
Any combination of the above.

Basically each file in Windows 2000 (and higher Windows NT versions) can have an optional unique object identifier. To track a file, an application refers to the file by its unique object identifier. When the file reference fails, a user mode link-tracking service is called for assistance (by Windows NT). The user mode service attempts to locate the file, using its object identifier in a trial-and-error fashion for all the scenarios just described.

10.2.9.7 NTFS Sparse Files

Windows NT 3.51 implemented file system compression. The implication was that a file with a lot of zeros as data would be compressed. However, compression and decompression does take time, and the compressed data still occupies some clusters on the disk. Consider a file that is logically 1GB long but contains only 4K of data at the very beginning and 4K of data at the very end. With Windows 2000, a file such as is, if marked as a sparse file, will occupy just 8K on the disk (assuming an appropriate disk cluster size).

NTFS organizes its internal data structures to recognize the fact that there are "missing data blocks," and when an application issues a read request for data within this region, NTFS zero-fills the application buffer. The application never realizes that there is anything special about the file when it does read or write operations on the file. Sparse files have a user-controlled attribute that can be set to indicate that the file is sparse. Applications that are not aware that the file is sparse will function normally, but the system will run a little less efficiently because some time will be spent returning zeros for data that is in unallocated blocks of the file. Applications that are aware of the sparse-file attribute can bypass this behavior.

File and directory enumeration APIs (e.g., FindFirst, FindNext) return the flag FILE_ATTRIBUTE_SPARSE_FILE. The Win32 APIs BackupRead, BackupWrite, CopyFile, and MoveFile have been updated to be sparse file aware (for example, CopyFile will preserve a file as a sparse file and does not do read or write operations that would turn the file into a nonsparse file).

10.2.9.8 Disk Quotas

Disk quota tracking and management features are implemented in NTFS that ships with Windows 2000. Disk quotas are tracked on a per-user and per-volume basis. An administrator can

Turn the feature off
Use the feature simply to track usage
Use the feature to track usage and also enforce policies that limit the amount of disk storage a given user can consume

Disk quotas are tracked by the user security identifier (SID), a unique identifier assigned to every user that logs on to a Windows NT machine (the user can be another computer or a system account that is also just a user with a different set of privileges than a human user has), and each file or directory contains the SID of the owner. Disk quotas are set on a per-volume basis and the administrative tools allow the policy to be replicated to other volumes.

Recall that Windows 2000 can also return a list of files on a per-user basis. This functionality is simply a subset of quota tracking, because quota tracking computes the total size of the storage used by the list of files returned by a per-user search.

10.2.9.9 CHKDSK Improvements

Windows 2000 NTFS reduces the number of situations in which CHKDSK needs to run, and it significantly reduces the amount of time taken to run CHKDSK. The phrase "your mileage may vary" comes to mind in view of the fact that the exact amount of improvement depends on the size of the volume and the nature of the corruption, if any. For volumes with millions of files, however, an improvement that reduces the amount of time needed to run CHKDSK by a factor of 10 is quite possible.

10.2.9.10 Defragmentation

Windows 2000 ships with a defragmentation applet that is really a light version of DiskKeeper from Executive Software, an ISV. Like Windows NT 4.0, Windows 2000 supports defragmentation APIs. The built-in defragmentation applet has some limitations that full-fledged defragmentation applications typically do not have. For example,

It cannot defragment the NTFS MFT (master file table) or the system paging file.
It cannot defragment directories.
It does not have Microsoft Cluster Server support.

10.2.10 Reparse Points

Reparse points represent a significant new architectural feature in NTFS and the Windows NT I/O subsystem. Note that reparse point implementation requires changes in the I/O subsystem and also the NTFS file system. It is conceivable that somebody can implement reparse points in a file system other than NTFS. In addition, reparse points provide the basis of some very important functionality. Thus, reparse points rate their own subsection, rather than being buried as an NTFS feature improvement.

Reparse points provide the foundation for implementing the following features:

Symbolic links
Directory junction points
Volume mount points
Single-instance storage
Remote storage (Hierarchical Storage Management)

We will look at these features in more detail in Sections 10.2.10.1 through 10.2.10.4.

A reparse point is an object on an NTFS directory or file. A reparse point can be created, manipulated, and deleted by an application using the Win32 API set in general, and CreateFile, ReadFile, and WriteFile in particular. Recall that the Win32 API set allows an application to create and manipulate user-defined attributes on a file or directory. Think of reparse points as simply user-defined attributes that are handled in a special manner. This includes ensuring uniqueness about some portions of the attribute object and handling in the I/O subsystem. An ISV would typically write the following:

Some user mode utilities to create, manage, and destroy reparse points
A file system filter driver that implements the reparse point “related functionality

Each reparse point consists of a tag and a data blob. The tag is a unique 32-bit tag that is assigned by Microsoft. ISVs can request that such a unique tag be assigned to them. Figure 10.1 shows the structure of the reparse tag, including the following elements:

A bit (M) indicating whether a tag is for a Microsoft device driver or not.
A bit (L) indicating whether the driver will incur a high latency to retrieve the first data byte. An example here is the HSM solution, in which retrieving data from offline media will incur a high latency.
A bit (N) indicating whether the file or directory is an alias or redirection for another file or directory.
Some reserved bits.
The actual 16-bit tag value.

Figure 10.1. Reparse Point Tag Structure

graphics/10fig01.gif

The data blob portion of the reparse point is up to 16K in size. NTFS will make this data blob available to the vendor-written device driver as part of the I/O subsystem operation handling reparse points.

To understand the sequence of operations and how reparse points are implemented, consider Figure 10.2. For simplicity, this discussion assumes that the user has the required privileges for the requested operation. In addition, Figure 10.2 shows only one file system filter driver, in the interest of keeping things simple and relevant.

Figure 10.2. Reparse Point Operation

graphics/10fig02.gif

The sequence of steps in creating reparse point functionality includes the following, as illustrated in Figure 10.2:

Step 1. Using the Win32 subsystem, an application makes a file open request.

Step 2. After some verification, the Win32 subsystem directs the request to the NT Executive subsystem.

Step 3. The Windows NT I/O Manager builds an IRP (IRP_MJ_OPEN) and sends it to the NTFS file system. The IRP is intercepted by the reparse point file system driver.

Step 4. The filter driver intercepts the IRP, specifies a completion routine that should be called when the IRP completes, and using the services of the I/O Manager, sends the IRP to the NTFS file system driver.

Step 5. The IRP reaches the file system. The file system looks at the IRP_MJ_OPEN request packet, locates the file or directory of interest, and notes the reparse point tag associated with it. NTFS puts the reparse point tag and data into the IRP and then fails the IRP with a special error code.

Step 6. The I/O subsystem now calls each filter driver (one at a time) that has registered a completion routine for the IRP. Each driver completion routine looks at the error and the reparse point tag in the IRP. If the driver does not recognize the tag as its own, it invokes the I/O Manager to call the next driver's I/O completion routine. Assume that one of the drivers recognizes the reparse point tag as its own. The driver can then use the data within the reparse point to resubmit the IRP with some changes based on the data in the reparse point; for example, the pathname may be changed before the IRP is resubmitted.

Step 7. NTFS completes the resubmitted IRP operation. A typical example might be that the pathname was changed and the open request succeeds. The I/O Manager completes the open request; each file system filter driver may then be invoked at its completion routine again. The driver notices that the open request succeeded and takes appropriate action. Finally, the IRP is completed, and the application gets back a handle to the file.

If no filter driver recognizes the reparse point tag, the file or directory open request fails.

Whereas some applications may need to be aware of reparse point functionality, other applications may not care and never even realize that a reparse point exists at all. For example, a Microsoft Office application simply opening a Word, PowerPoint, or Excel document may not care at all about reparse point functionality that redirects the open request to a different volume. However, some applications that walk a tree recursively may need to be aware of the possibility of having paths that create a loop. Applications can suppress the reparse point functionality by appropriate options (FILE_OPEN_REPARSE_POINT) in the CreateFile, DeleteFile, and RemoveDirAPI requests . This is how the reparse point data can be created, modified, or deleted. The GetVolumeInformation API returns the flag FILE_SUPPORTS_REPARSE_POINTS. FindFirstFile and FindNextFile APIs return the flag FILE_ATTRIBUTE_REPARSE_POINT to indicate the presence of a reparse point.

All reparse points on an NTFS volume are indexed within a file called $Index that lives in the \$Extend directory. An application can thus quickly enumerate all reparse points that exist on a volume.

Note that this section describes reparse points as being integral to NTFS. Although it is true that the FAT file system does not support reparse points, an ISV or Microsoft could conceivably write another file system, different from NTFS, that also supported reparse points. Such a task would not be trivial, but note that reparse points need to be implemented in three areas:

The file system ”for example, NTFS
The I/O subsystem and the Win32 API set
The tools and utilities

Microsoft has obviously done the necessary work in all three areas, and hence it is conceivable for a new file system to support reparse points as well.

Sections 10.2.10.1 through 10.2.10.4 describe applications of the reparse point mechanism.

10.2.10.1 Volume Mount Points

Windows NT 4.0 required that a drive letter be used to mount volumes or partitions. This constraint limited a system to having 26 volumes or partitions at the most. Windows 2000 allows mounting a volume without using a drive letter. The only limitations are as follows :

A volume may be mounted only on a local directory; that is, a volume cannot be mounted on a network share.
A volume may be mounted only on an empty directory.
This empty directory must be on an NTFS volume (only NTFS supports reparse points).

Applications accessing the directory that hosts the mount point do not notice anything special about the directory unless the application explicitly requests such information.

APIs have been added and modified to provide application support for volume mount points. GetVolumeInformation indicates via a flag whether the volume supports mount points. FindFirstVolumeMountPoint and FindNextVolumeMountPoint are used to find the volume mount points. FindVolumeMountPointClose is used to free up resources consumed by FindFirstVolumeMountPoint and FindNext Volume Mount Point. GetVolumeNameForMountPoint returns the volume name to which a volume mount point resolves.

10.2.10.2 Directory Junction Points

Directory junction points are closely related to volume mount points. The difference is that whereas volume mount points resolve a directory to a new volume, directory junction points resolve a directory to a new directory that exists on the same local volume where the directory junction point itself resides. Directory junction points may be created by use of the linkd.exe tool that ships with the Windows 2000 Resource Kit.

10.2.10.3 Single Instance Storage

Single Instance Storage is a feature that can significantly reduce storage space requirements by detecting duplicate files and replacing the duplicate instances with a special link. Single Instance Storage is used with the Remote Installation Service (RIS), which allows Windows NT systems to boot from a network share.

With RIS, administrators typically create multiple images ”for example, an image for the Accounting department, an image for the Engineering department, and so on. In each image, a lot of the same files are duplicated again and again. However, a simple link file will not serve the purpose here. If the administrator later decides that one image should be slightly different and changes a particular file, all other images with links that point to that file will also be affected. Enter SIS.

SIS replaces duplicate files via a sparse file with no allocated region and a reparse point on this sparse file. The reparse point has enough information to locate the backup file, which is the one file left in its original state. (If there are four files that are duplicates, one file is left as the backup file, and the other three are turned into sparse files with reparse points.) The size of the sparse file will still show the original size of the file. When an application opens a SIS sparse file, the filter driver intercepts the file open IRP request, opens the file backing the SIS file, and passes a handle to this other file back to the calling application. Thereafter, the application does its I/O on the backup file.

10.2.10.4 Remote Storage (Hierarchical Storage Management)

Remote Storage is a storage management service that provides functionality to move files transparently between online media (e.g., disk) and offline media (e.g., tape). When a file is moved to offline media, a stub file is left behind on the disk. The stub file uses NTFS reparse points to mark the file as special in some way and to indicate where exactly on offline storage the file data can be located. Remote Storage is also often referred to as Hierarchical Storage Management (HSM) because the service deals with two distinct tiers of storage.

The administrator can set some policies, such as the amount of disk space that should be kept free and the types of files that should be dealt with by Remote Storage Services. RSS periodically runs and checks if the disk space use is in compliance with the policies set by the administrator. RSS periodically checks the last-accessed timestamp on files and locates files that have not been accessed for a while (typically 30 days). These files are then periodically copied to offline storage, but the file is still left on disk and marked as premigrated by an NTFS reparse point being set on the file. When the disk space falls below a set value, RSS deletes the file data from the disk and modifies the reparse point data to indicate that the file has now been moved from a premigrated state to a fully migrated state.

Remote Storage Services cannot deal with hidden, system, encrypted, or sparse files, or with files that have extended attributes. Remote Storage is well integrated with the rest of the operating system components. For example,

Remote Storage Services interacts with the Indexing Service.
Remote Storage Services uses the Windows 2000 job scheduler to schedule its backup jobs.
Windows backup can recognize files that have been migrated to offline media and refrain from restoring those files simply for the purpose of backing them up.
Remote Storage Services is also integrated with NTFS security; for example, RSS will recall a file from offline media only if a user is permitted access to the file.
The Explorer GUI is also Remote Storage aware. Files that have been migrated from disk to offline media are shown with a special icon.
The NFTS change log journal is also Remote Storage aware, and a flag indicates when a file moves between online and offline media. An application can use this flag to determine that the underlying file has not changed in any manner.

Figure 10.3 shows the Remote Storage (HSM) and Removable Storage Management (RSM) architecture. The HSM application uses the RSM API (described in Section 10.2.11). RSM itself is a user mode subsystem that has its own private database to keep track of the media and devices that it is managing.

Figure 10.3. HSM and RSM Architecture

graphics/10fig03.gif

10.2.11 Removable Storage Management

Removable Storage allows an application to manage all enterprise removable media, irrespective of whether the media is online or offline. Whereas the online media could be in a variety of devices, including changer, tape jukebox, or robotic library, the offline media are typically sitting on a shelf.

Figure 10.4 illustrates the architecture and advantage of using RSM. The application deals with a single interface provided by RSM. RSM itself deals with the intricacies of new devices and devices that come and go.

Figure 10.4. Removable Storage Management

graphics/10fig04.gif

RSM uses a concept of media pools to organize and manage media. A media pool represents a collection of media. Media pools are broadly categorized into two types: systems and applications. The system pool holds media for system-related purposes, as well as unrecognized and unassigned media. The system pool is further categorized as follows:

The free pool holds media that is currently not in use and is freely available for assignment.
The import pool holds newly added media that are recognized. Media in the import pool are later manipulated by applications and moved into the application pool (if the application deems the media to be of use) or to the free pool if the application determines that the media is no longer of any interest.
The unrecognized pool holds newly added media that has not been recognized. As with the import pool, an application may later move media into the application pool or the free pool.

Application pools hold media created by applications such as backup or Remote Storage.

10.2.12 Encrypting File System

Windows 2000 introduces the encrypting file system (EFS), which is really a service that is layered on top of NTFS. The service provides a mechanism to encrypt data before it is written to the disk by NTFS and decrypt data after it is read by NTFS but before it is passed back to the application that requested the data. EFS also provides a mechanism to manage the keys needed for the encryption, as well as recovery keys to allow an authorized entity other than the file owner to retrieve the data.

The data is encrypted by a symmetric cipher (in particular, a variation of DES), and the randomly generated key for this symmetric encryption is written to the file after it has been encrypted by an asymmetric cipher and the user's public key. To ensure that the file can be retrieved (e.g., when a disgruntled employee is fired and refuses to cooperate), the randomly generated symmetric cipher key is also encrypted with an authorized entity's public key and written to the file.

Encrypted files cannot be compressed; that is, either a file can be compressed or it can be encrypted, but not both. EFS ensures that the data remains encrypted even when somebody manages to access the disk directly without going through NTFS and its security mechanism that would disallow the access. EFS can be enabled on a per-file or per-directory basis.

More details about EFS are available in Chapter 6.

10.2.13 System File Protection

One of the problems plaguing Windows has been the fact that operating system files get corrupted or replaced . Application vendors often replace system files with either their own files or newer versions. This practice can lead to system management problems at best, and reliability problems at worst. Starting with Windows 2000, Microsoft introduced a feature called System File Protection.

This feature protects certain system files using a background mechanism. A list of monitored system files is stored in a cache folder. A copy of the relevant files is also cached in the folder. If the file is not available, it is loaded from the media (Windows CD). The background process computes a file signature for the actual files and compares them to the signature within the cache. If the signatures do not match, the file is silently updated from the cached copy. The service pack and hot-fix installation utilities have the necessary code to update both the cache and the noncache copy of the file.

Top