The New Technology File System | The Assembly Programming Master Book

NTFS is an advanced and sophisticated file system developed independently of FAT. In my opinion, this is one of the most advanced and perfect file systems. Because of this, I'll cover its structure in detail.

Similar to FAT, NTFS also stores files as chains of clusters. Clusters in NTFS can range in size from 512 bytes to 64 KB. The standard (default) cluster size is 4 KB. The main structure in NTFS is the Master File Table (MFT) file. It is necessary to point out that according to the general NTFS concept, NTFS doesn't contain anything but files. The length of filename is limited to 255 characters, and the maximum pathname length cannot exceed 32,767 characters .

Figure 11.1: General structure of an NTFS volume

For MFT, the main NTFS file, an area in the beginning of the volume is allocated (the boot sector contains the number of the first MFT cluster). However, because MFT is also a file, this file can be located anywhere . To avoid fragmenting this file (although it can be fragmented ), a disk space that is 12% of the entire volume space is allocated for it beforehand. If necessary, the operating system can increase or decrease this area and then return it to its initial state. By its meaning, this is not a system area and therefore is included in the total value of the available disk space.

The MFT file contains records about each file in the system. The MFT record size is 1 KB. If one record is not sufficient for describing a file, other records are added. The first 16 files, whose records are located in the beginning of MFT, are system files. The names of these files start with the $ character. Table 11.3 provides information about these files. Note that the first record (numbering starts from 0) of the MFT file contains information about the MFT file itself. ^[i]

Table 11.3: NTFS metafiles
Record No. in MFT	Name of system file	Comment
	`$MFT`	Master file table.
1	`$MFTMIRR`	The copy of the first 16 records of the MFT. In a normal situation, this file resides directly in the middle of the volume.
2	`$LOGFILE`	Log file for system recovery. According to this file, the operating system can recover the file system of a damaged volume with a high level of probability.
3	`$VOLUME`	The volume file (volume label, file system version, size, etc.).
4	`$ATTRDEF`	This file contains the list of standard volume attributes.
5	`$`	Root directory. As any file, it can be increased or decreased in size. Note that all system files are located in this directory.
6	`$BITMAP`	This file contains the bitmap for finding available space in the volume.
7	`$BOOT`	The bootstrap file.
8	`$BADCLUS`	The file that lists bad clusters.
9	`$SECURE`	Security information.
10	`$UPCASE`	Uppercase and lowercase mapping for the volume.
11	`$QUOTA`	The directory that contains files used for disk quotes.
1215		Reserved records.

Now, consider the MFT file record. Every record consists of a header followed by a header attribute and its value. Each header contains the checksum; the file ordinal number, which is increased when the record is used for another file; the file access counter; the number of bytes used in the record; and other fields. The record header is followed by the header of the first attribute and the value of this attribute. Then comes the header of the second attribute, and so on. If the attribute is large enough, it is stored in a separate file (nonresident attribute). The important point is that if the volume of data stored in the file is small, they are stored in the MFT record.

Table 11.4. lists the attributes.

Table 11.4: Attributes of the MFT records
Attribute	Description
Standard information (information attribute)	Information about the owner, security data, hard links counter, and bit attributes (read only, archive, etc.).
Filename	Filename in the Unicode format.
Security descriptor	This attribute has become obsolete; the `$EXTENDEDSECURE` attribute has replaced it.
List of attributes	Location of additional MFT records, used when the attributes do not fit within a single record.
Object identifier	64-bit file identifier unique for this volume.
Reparse point	For creating hierarchical storage; this attribute instructs the procedure that processes the filename to carry out additional operations.
Volume name	Used in `$VOLUME.`
Information about the volume	Volume version (used in `$VOLUME` ).
Root index	For directories.
Index layout	For large directories implemented as B-trees instead of normal lists.
Bit array	For large directories.
Data flow of the registration utility	Controls data registration in the `$LOGFILE` file.
Data	Flow of the file data; the header of this attribute is followed by the list of clusters, where the data reside, or the data themselves if their volume doesn't exceed hundreds of bytes.

Thus, in NTFS, files are nothing but sets of attributes. Attributes are represented in the form of the byte flow. As you can see, one of the attributes is the data stored in the file, or the data flow. The file system allows you to add new attributes to the file, which can contain additional information.

NTFS implements many interesting technical solutions. One of these advanced innovations was already mentioned: a small file can entirely reside in a single MFT record. Another approach assumes that the operating system, when writing a file, always tries to carry out this operation to make as many cluster chains (sections, in which disk clusters follow each other directly) as possible. Groups of file clusters are described by special structuresrecords placed into MFT records. ^[i] For example, a file made up of a single chain of clusters is described by one such record. The same can be said about a file that consists of a small number of cluster chains. Cluster chains within a record are described by the following pair of values: cluster offset from the starting position and the number of clusters. The record header specifies the offset of the first cluster from the starting point of the file and the offset of the first cluster that goes beyond the limits of the current record.

Fig. 11.2 shows a schematic illustration of an MFT record for a file that consists of nine clusters. The record header specifies the offset of the first cluster of the file and the first cluster that doesn't fit within this record. This header is followed by two pairs of numbers that specify the continuous sequences of clusters. The first element of the pair is the offset of the cluster from the starting point of the disk space and the second element is the number of clusters in a chain. Such pairs are also called a series. As you can see, in this case, the file consists of two continuous chains and is specified by two series. Note that the numeric values defining the number of clusters and the cluster offset are 64-bit values in NTFS.

Figure 11.2: Example of an information record about the location of a file that consists of nine clusters

What will happen if the file is fragmented so that all its chains cannot be described by a single MFT record? In this case, several MFT records are used. At the same time, they may not necessarily have numbers differing from each other by 1. To interrelate these records, a base record is used. In the first MFT record that starts this file, the base record precedes the record containing the descriptor of cluster chains. It also has the header followed by a list of the numbers of MFT records, which contain information about location of the file data on the disk. All other MFT records have the same structure as shown in Fig. 11.2. A question might arise: What if the base record cannot fit within a single MFT record? In this case, it is placed into the separate file. According to the NTFS terminology, this makes the record nonresident.

Now, consider some other specific features of NTFS.

Directories in NTFS

A directory in NTFS is a specific file storing references to other files and directories, thus creating the hierarchical structure of the disk file system. As in a normal file, the directory can fit within an MFT record, provided that it isn't too large. Fig. 11.3 shows a schematic picture of an MFT record containing a small directory. Note that the information attribute contains information about the root directory. Directory records themselves contain the filename length and some other parameters; the main information is contained in the MFT index for this file. For large directories, a different data storage format is used. Such directories are built in the form of binary trees, which ensures fast alphabetic searching and allows the fast addition of new files.

Figure 11.3: Small directory entirely fits within an MFT record

NTFS File Compression

NTFS allows files to be stored in a compressed form. The compression mechanism is interesting and deserves a special mention. Compression is carried out in 16-cluster blocks. When writing data, the operating system tries to compress the first 16 clusters of a record, then the next 16-cluster bunch, etc. If the system fails to compress the block, it is written "as is." Suppose that you are compressing a 16-cluster block. Assume that the offset of the first cluster is 50 and that you succeeded in achieving a 25% level of compression. This means that instead of 16 clusters you now have only 12. For simplicity, assume that the clusters are sequential, thus forming a chain. A compressed cluster chain in the MFT record will be represented by two pairs of numbers, (50, 12) and (0, 4), instead of one pair(50, 16). The second pair is needed for the operating system to recognize, which of the chains was compressed. As you can see, file compression mechanism is easy and built into the file system.

The standard method of determining the file system applied in the current partition is to use the GetVolumeInformation function. I won't describe this function in detail but will mention that its seventh parameter (out of eight) is the buffer, into which the file system type will be loaded after this function is called.

Reparse Points

A reparse point allows you to extend NTFS functionality. Reparse points were introduced into the NTFS version intended for use with Windows 2000. The developers made provision for several types of reparse points, including volume mount points, NTFS directory attachment points, and HSM reparse points.

Volume mount points allow a volume to be bound to the directory without assigning a drive letter to that volume. Thus, several volumes can be joined and assigned a single drive letter. For example, assume that the mount point is C:\TEMP, and the D: volume is mounted to it. After that, all directories of that volume will be available through the mount point: C:\TEMP\ARH, C:\TEMP\PROGRAM, etc. When the user attempts to access, say, the C:\TEMP\PROGRAM\FC.EXE file, the system detects the mount point for the TEMP directory connected to the D: volume and then accesses the PROGRAM directory on that volume.
Directory connection points are similar to mount points. However, this mechanism is used for mounting directories instead of volumes. For instance, if you return to the previous example, you can connect the PROGRAM directory to the TEMP directory and access the FC.EXE file in a similar way: C:\TEMP\PROGRAM\FC.EXE.
In HSM, reparse points move rarely used files to the backup storage medium. When doing so, the file contents are deleted and replaced with reparse points. The reparse point data contain all information needed for the HSM system to find the file on the backup media. When the user accesses the file, the system processes its reparse point and determines that the file has been moved to the backup medium (and determines where it resides). After that, the system starts the mechanism for retrieving the file from the backup storage medium. After the file has been moved successfully, the reparse point is deleted automatically, and the file access procedure is repeated.

If the reparse point is related to a file or directory, NTFS creates a $Reparse attribute for it. This attribute stores the code and the reparse point data. Because of this mechanism, NTFS easily detects all reparse points on the volume.

File Searching

For searching files, Windows provides two functions: FindFirstFile and FindNextFile . These resemble similar functions of the MS-DOS. Like in MS-DOS, in Windows these functions operate in coordination. When the search is successful, the first function returns some number or identifier then used by the second function to continue the search.

The first parameter of the FindFirstFile function is the pointer for the search string. The second parameter is the pointer to the structure that receives information about the files found. The FindNextFile received the identifier obtained by the first function as its first parameter, and its second parameter is the pointer to the structure. The example illustrating the use of this structure is presented in Listing 11.1.

Similar functions existed in MS-DOS. The difference between Windows file-searching functions and the respective MS-DOS functions lies in that only the search mask (*.*, *.exe, etc.) is specified in the input. If the file was found, then by the return structure containing all information about that file, it is possible to decide whether the file found is the one you required. In MS-DOS, for file search, it was necessary to additionally specify the file attribute.

The program in Listing 11.1 searches for files in the specified directory. The program can accept one or two parameters or can have no parameters. If two parameters are used, then the first parameter is interpreted as the search directory. The program accounts for the presence of a trailing backslash (allowed options are C:, C:\, C:\WINDOWS\, C:\WINDOWS\SYSTEM, etc.). The second parameter (in the program, it has the number 3 because the first parameter is the command line), if present, specifies the search mask. If this parameter is missing, *.* will be the default mask. Finally, if no parameters are specified, then the program searches the current directory using the " *.* " mask. Note that you can easily extend this program and make a useful utility out of it. I hope that you will do this on your own. Comments about this program are provided after the listing.

Listing 11.1: A simple program that searches for files and displays the list of found files

 ; The FILES.ASM file .586P ; Flat memory model .MODEL FLAT, stdcall ; Constants STD_OUTPUT_HANDLE equ -11 STD_INPUT_HANDLE equ -10 ; Prototypes of external procedures EXTERN wsprintfA:NEAR EXTERN CharToOemA@8:NEAR EXTERN GetStdHandle@4:NEAR EXTERN WriteConsoleA@20:NEAR EXTERN ReadConsoleA@20:NEAR EXTERN ExitProcess@4:NEAR EXTERN GetCommandLineA@0:NEAR EXTERN lstrcatA@8:NEAR EXTERN FindFirstFileA@8:NEAR EXTERN FindNextFileA@8:NEAR EXTERN FindClose@4:NEAR ;----------------------------------------------- ; The structure for file searching ; using the FindFirstFile and FindNextFile functions _FIND STRUC ; File attribute    ATR        DWORD ? ; File creation time    CRTIME     DWORD ?    DWORD ? ; File access time    ACTIME     DWORD ?    DWORD ? ; File modification time    WRTIME     DWORD ? DWORD ? ; File size    SIZEH      DWORD ? ; Most significant part    SIZEL      DWORD ? ; Least significant part ; Reserved    DWORD ?    DWORD ? ; Long filename    NAM  DB 260 DUP(0) ; Short filename    ANAM  DB 14 DUP(0) _FIND ENDS ;---------------------------------------------- ; INCLUDELIB directives for linking libraries includelib c:\masm32\lib\user32.lib includelib c:\masm32\lib\kernel32.lib ;---------------------------------------------- ; Data segment _DATA SEGMENT    BUF      DB 0    DB      100 dup(0)    LENS    DWORD ? ; Number of displayed characters    HANDL   DWORD ?    HANDL1  DWORD ?    MASKA   DB "*.*", 0    AP      DB "\", 0    FIN     _FIND <0>    TEXT    DB "Press <ENTER> to continue", 13, 10, 0    BUFIN   DB 10 DUP(0)    FINDH DWORD ?    NUM     DB 0    NUMF    DWORD 0 ; Files counter    NUMD    DWORD 0 ; Directories counter    FORM    DB "Files found: %lu", 0    FORM1   DB "Directories found: %lu", 0    BUFER   DB 100 DUP(?)    DIR     DB " <DIR>", 0    PAR     DB 0 ; Number of parameters _DATA ENDS ; Code segment _TEXT   SEGMENT         : START: ; Get the output handle    PUSH  STD_OUTPUT_HANDLE    CALL  Get;StdHandle@4    MOV   HANDL, EAX ; Get the input handle HANDL1    PUSH  STD_INPUT_HANDLE    CALL  GetStdHandle@4    MOV   HANDEL1, EAX ; Convert strings for output    PUSH  OFFSET TEXT    PUSH  OFFSET TEXT    CALL  CharToOemA@8    PUSH  OFFSET FORM    PUSH  OFFSET FORM    CALL  CharToOemA@8    PUSH  OFFSET FORM1    PUSH  OFFSET FORM1    CALL  CharToOemA@8 ; Get the number of parameters    CALL  NUMPAR    MOV   PAR, AL ; Search the current directory if only one parameter is passed    CMP   EAX, 1    JE    NO_PAR ;-------------------- ; Get the parameter with EDI number    MOV   EDI, 2    LEA   EBX, BUF    CALL  GETPAR    PUSH  OFFSET BUF    CALL  LENSTR ; Add the trailing backslash if it is missing    CMP   BYTE PTR [BUF+EBX-1], "\"    JE    NO_PAR    PUSH  OFFSET AP    PUSH  OFFSET BUF    CALL  lstrcatA@8 ; Is there a parameter specifying the search mask?    CMP   PAR, 3    JB    NO_PAR ; Get the search mask    MOV   EDI, 3    LEA   EBX, MASKA    CALL  GETPAR NO_PAR: ;--------------------    CALL  FIND ; Display the number of files    PUSH  NUMF    PUSH  OFFSET FORM    PUSH  OFFSET BUFER    CALL  wsprintfA    LEA   EAX, BUFER    MOV   EDI, 1    CALL  WRITE ; Display the number of directories    PUSH  NUMD    PUSH  OFFSET FORM1    PUSH  OFFSET BUFER    CALL  wsprintfA    LEA   EAX, BUFER    MOV   EDI, 1    CALL  WRITE _END:    PUSH  0    CALL  ExitProcess@4 ;*********************** ; Procedures ;*********************** ; Display a string (terminated by a line feed character) ; EAX --- To the string beginning ; EDX --- With or without a line feed WRITE PROC ; Get the parameter length    PUSH  EAX    CALL  LENSTR    MOV   ESI, EAX    CMP   EDI, 1    JNE   NO_ENT ;Terminated by the line feed    MOV   BYTE PTR [EBX+ESI], 13    MOV   BYTE PTR [EBX+ESI+1], 10    MOV   BYTE PTR [EBX+ESI+2], 0    ADD   EBX, 2 NO_ENT: ; String output    PUSH  0    PUSH  OFFSET LENS    PUSH  EBX    PUSH  EAX    PUSH  HANDL    CALL  WriteConsoleA@20    RET WRITE    ENDP ; Procedure for determining the string length ; String in [EBP+08H] ; Length in EBX LENSTR   PROC    PUSH  EBP    MOV   EBP, ESP    PUSH  EAX    PUSH  EDI ;---------------------    CLD    MOV   EDI, DWORD PTR [EBP+08H]    MOV   EBX, EDI    MOV   ECX, 100 ; Limit the string length    XOR   AL, AL    REPNE SCASB ; Find the 0 character    SUB   EDI, EBX ; String length including the 0 character    MOV   EBX, EDI    DEC   EBX ;---------------------    POP   EDI    POP   EAX    POP   EBP RET 4 LENSTR ENDP ; Procedure for determining the number of parameters ; in the string ; Determine the number of parameters (->EAX) NUMPAR PROC    CALL  GetCommandLineASO    MOV   ESI, EAX ; Pointer to the string    XOR   ECX, ECX ; Counter    MOV   EDX, 1 ; Indication L1:    CMP   BYTE PTR [ESI], 0    JE    L4    CMP   BYTE PTR [ESI], 32    JE    L3    ADD   ECX, EDX ; Parameter number    MOV   EDX, 0    JMP   L2 L3:    OR    EDX, 1 L2:    INC   ESI    JMP   LI L4:    MOV   EAX, ECX    RET NUMPAR ENDP ; Get the parameter from the command line ; EBX --- Points to the buffer where the parameter ; will be loaded ; Zero-terminated string is loaded into the buffer ; EDI --- Number of the parameter GETPAR PROC    CALL  GetCommandLineA@0    MOV   ESI, EAX ; Pointer to the string    XOR   ECX, ECX ; Counter    MOV   EDX, 1 ; Indicator L1:    CMP   BYTE PTR [ESI], 0    JE    L4    CMP   BYTE PTR [ESI], 32    JE    L3    ADD   ECX, EDX ; Number of the parameter    MOV   EDX, 0    JMP   L2 L3:    OR    EDX, 1 L2:    CMP   ECX, EDI    JNE   L5    MOV   AL, BYTE PTR [ESI]    MOV   BYTE PTR [EBX], AL    INC   EBX L5:    INC   ESI    JMP   L1 L4:    MOV   BYTE PTR [EBX], 0    RET GETPAR   ENDP ; Searching for files in a directory and their output ; Directory name in BUF    FIND  PROC ; Path with a mask    PUSH  OFFSET MASKA    PUSH  OFFSET BUF    CALL  lstrcatA@8 ; The search starts here    PUSH  OFFSET FIN    PUSH  OFFSET BUF    CALL  FindFirstFileA@8    CMP   EAX, -1    JE    _ERR ; Save the search descriptor    MOV   FINDH, EAX LF: ; Exclude "." and ".." "files"    CMP   BYTE PTR FIN.NAM, "."    JE    _N2O ; Is this a directory?    TEST  BYTE PTR FIN.ATR, 10H    JE    NO_DIR    PUSH  OFFSET DIR    PUSH  OFFSET FIN.NAM    CALL  lstrcatA@8    INC   NUMD    DEC   NUMF NO_DIR: ; Convert the string    PUSH  OFFSET FIN.NAM    PUSH  OFFSET FIN.NAM    CALL  CharToOemA@8 ; Results output    LEA   EAX, FIN.NAM    MOV   EDI, 1    CALL  WRITE ; Increase the counters    INC   NUMF    INC   NUM ; Page end?    CMP   NUM, 22    JNE   _NO    MOV   NUM, 0 ; Wait for the input string    MOV   EDI, 0    LEA   EAX, TEXT    CALL  WRITE    PUSH  0    PUSH  OFFSET LENS    PUSH  10    PUSH  OFFSET BUFIN    PUSH  HANDL1    CALL  ReadConsoleA@20 _NO: ; Continue the search    PUSH  OFFSET FIN    PUSH  FINDH    CALL  FindNextFileA@8    CMP   EAX, 0    JNE   LF ; Terminate the search    PUSH  FINDH    CALL  FindClose@4 _ERR:    RET    FIND  ENDP _TEXT   ENDS END START

The program in Listing 11.1 is simple. The only new feature that you'd find here is working with the FindFirstFile and FindNextFile functions. Procedures used for working with command-line parameters were encountered before. Information is output into the current console; this feature, too, has been encountered before. To get the console descriptor, the GetstdHandle function is used. The WRITE procedure simplifies the code sections responsible for screen output. Earlier, I promised that string API functions will be the paid attention they deserve. In this program, I have kept my word. Along with custom string procedures, this program makes use of the lstrcat string function, which concatenates strings. As relates to the command-line parameter, note that if the directory name contains blanks, the filename must be specified in abbreviated form, for example, C:\PROGRA~1 instead of C:\PROGRAM FILES. The reason is straightforward, because blanks serve as parameter delimiters. To solve the problem correctly, it is necessary to introduce a special delimiter for parameters, for example, ˆ or /.

The program in Listing 11.1 searches either the current directory or the specified directory. If this program was written in a high-level programming language, such as C, it could be easily modified so that it would search the directory tree. Only the find procedure, which must be called recursively, would require a minor modification. As can be seen, this ease is because of the presence of local variables in high-level languages. Well, try to implement the same thing basing on materials presented in Chapter 2. Is it possible to achieve the same goal without using local variables ?

Note

The length of the first parameter of the FindFirstFile API function cannot exceed the value of the MAX_PATH constant, which is equal to 260. If you need to use longer strings, it is necessary to use the Unicode version of this function (which has the w suffix). In this case, the string length can reach 32,000 characters. However, do not forget to convert the string into Unicode format and precede it with the \ \? \ prefix.

The program presented in Listing 11.2 is similar to the previous program. However, it searches the entire directory tree, starting from the specified directory. This is one of the most complicated programs presented in this book. Therefore, I strongly recommend that you study it carefully . I hope that you'll be able to improve it. I'd like to give you some directions, with which you can work. The second command-line parameter allows you to specify the search mask. For example, if you specify the *. EXE option, not only files but also directories will be searched by this mask. This is an obvious drawback that is the first candidate for elimination .

The directory tree can be easily searched recursively; however, for this purpose, you need local variables. ^[i] The meaning of using a local variable in a recursive algorithm is that part of data must be preserved when returning from the procedure.

In the program under consideration, for simplicity, I abandoned the LENSTR procedure and decided to use the lstrlen API function instead. In addition, I improved the output to display the fully qualified filename on the screen.

Listing 11.2: Example program that recursively searches the directory tree

 ;  The FILES.ASM file .586P ; Flat memory model .MODEL FLAT, stdcall ; Constants STD_OUTPUT_HANDLE equ -11 STD_INPUT_HANDLE equ -10 ; Prototypes of external procedures EXTERN  wsprintfA:NEAR EXTERN  CharToOemA@8:NEAR EXTERN  GetStdHandle@4:NEAR EXTERN  WriteConsoleA@ 20: NEAR EXTERN  ReadConsoleA@20: NEAR EXTERN  ExitProcess@4:NEAR EXTERN  GetCommandLineA@0:NEAR EXTERN  lstrcatA@8:NEAR EXTERN  lstrcpyA@8:NEAR EXTERN  lstrlenA@4:NEAR EXTERN  FindFirstFileA@8:NEAR EXTERN  FindNextFileA@8:NEAR EXTERN  FindClose@4:NEAR ;------------------------------------ ; The structure for file search ; using the FindFirstFile and FindNextFile functions _FIND STRUC ; File attribute    ATR    DWORD ? ; File creation time    CRTIME DWORD ?           DWORD ? ; File access time    ACTIME DWORD ?           DWORD ? ; File modification time    WRTIME DWORD ?           DWORD ? ; File size    SIZEH  DWORD ?    SIZEL  DWORD ? ; Reserved           DWORD ?           DWORD ? ; Long filename    NAM    DB 260 DUP(0) ; Short filename    ANAM   DB 14 DUP(0) _FIND ENDS ;----------------------------------------- ; INCLUDELIB directives for the linker includelib c:\masm32\lib\user32.lib includelib c:\masm32\lib\kernel32.lib ;----------------------------------------- ; Data segment _DATA SEGMENT         BUF    DB   0                DB   100 dup(0)         LENS   DWORD ? ; Number of output characters         HANDL  DWORD ?         HANDL1 DWORD ?         MASKA  DB "*.*"                DB 50 DUP(0)         AP     DB "\", 0         FIN    _FIND <0>         TEXT   DB "Press <Enter> to continue", 13, 10, 0         BUFIN  DB 10 DUP(0) ; Output buffer         NUM    DB 0         NUMF   DWORD 0 ; Files counter         NUMD   DWORD 0 ; Directories counter         FORM   DB "Number of files found: %lu", 0         FORM1  DB "Number of directories found: %lu", 0         DIRN   DB " <DIR>", 0         PAR    DWORD 0         PRIZN  DB 0 _DATA ENDS ; Code segment _TEXT SEGMENT START: ; Get the output handle         PUSH   STD_OUTPUT_HANDLE         CALL   GetStdHandle@4         MOV    HANDL, EAX ; Get the input handle HANDL1         PUSH   STD_INPUT_HANDLE         CALL   GetStdHandle@4         MOV    HANDL1, EAX ; Convert strings for output         PUSH   OFFSET TEXT         PUSH   OFFSET TEXT         CALL   CharToOemA@8         PUSH   OFFSET FORM         PUSH   OFFSET FORM         CALL   CharToOemA@8         PUSH   OFFSET FORM1         PUSH   OFFSET FORM1         CALL   CharToOemA@8 ; Get the number of parameters         CALL   NUMPAR         MOV    PAR, EAX ; If there is only one parameter, search the current directory         CMP    EAX, 1         JE     NO_PAR ;-------------------------------------- ; Get the parameter with the EDI number         MOV    EDI, 2         LEA    EBX, BUF         CALL   GETPAR         CMP    PAR, 3         JB     NO_PAR ; Get the search mask         MOV    EDI, 3         LEA    EBX, MASKA         CALL   GETPAR NO_PAR: ;-------------------------         PUSH   OFFSET BUF         CALL   FIND ; Output the number of files         PUSH   NUMF         PUSH   OFFSET FORM         PUSH   OFFSET BUF         CALL   wsprintfA         LEA    EAX, BUF         MOV    EDI, 1         CALL   WRITE ;++++++++++++++++ ; Output the number of directories         PUSH   NUMD         PUSH   OFFSET FORM1         PUSH   OFFSET BUF         CALL   wsprintfA         LEA    EAX, BUF         MOV    EDI, 1         CALL   WRITE _END:         PUSH   0         CALL   ExitProcess@4 ; Procedures ;**************************************** ; Output the string (terminated with line feed) ; EAX --- To the beginning of the string ; EDX - With or without line feed   WRITE PROC ; Get the parameter length         PUSH EAX         PUSH EAX         CALL lstrlenA@4         MOV  ESI, EAX         POP  EBX         CMP  EDI, 1         JNE  NO_ENT ; Line feed in the end         MOV  BYTE PTR [EBX+ESI], 13         MOV  BYTE PTR [EBX+ESI+1], 10         MOV  BYTE PTR [EBX+ESI+2], 0         ADD  EAX, 2 NO_ENT: ; String output         PUSH 0         PUSH OFFSET LENS         PUSH EAX         PUSH EBX         PUSH HANDL         CALL WriteConsoleA@20         RET WRITE  ENDP ; Procedure for determining the number of parameters ; Determine the number of parameters (->EAX) NUMPAR PROC        CALL  GetCommandLineA@0        MOV   ESI, EAX ; Pointer to the string        XOR   ECX, ECX ; Counter        MOV   EDX, 1  ; Indicator L1:        CMP   BYTE PTR [ESI], 0        JE    L4        CMP   BYTE PTR [ESI], 32        JE    L3        ADD   ECX, EDX ; Parameter number        MOV   EDX, 0        JMP   L2 L3:        OR    EDX, 1 L2:        INC   ESI        JMP   L1 L4:        MOV   EAX, ECX        RET NUMPAR ENDP ; Get the parameter from the command line ; EBX --- Points to the buffer, in which to load the parameter ; Zero-terminated string is loaded into the buffer ; EDI --- Parameter number GETPAR PROC         CALL GetCommandLineA@0         MOV  ESI, EAX ; Pointer to the string         XOR  ECX, ECX ; Counter         MOV  EDX, 1  ; Indicator L1:         CMP  BYTE PTR [ESI], 0         JE   L4         CMP  BYTE PTR [ESI], 32         JE   L3         ADD  ECX, EDX  ; Parameter number         MOV  EDX, 0         JMP  L2 L3:         OR   EDX, 1 L2:         CMP  ECX, EDI         JNE  L5         MOV  AL, BYTE PTR [ESI]         MOV  BYTE PTR [EBX], AL         INC  EBX L5:         INC  ESI         JMP  L1 L4:         MOV  BYTE PTR [EBX], 0         RET GETPAR ENDP ;------------------------------- ;-Searching files in the directory and sending them for output FINDH   EQU  [EBP-4]   ; Search descriptor DIRS    EQU  [EBP-304] ; Fully qualified filename DIRSS   EQU  [EBP-604] ; For storing the directory DIRV    EQU  [EBP-904] ; For temporary storage DIR     EQU  [EBP+8]   ; Parameter - Directory name FIND    PROC         PUSH    EBP         MOV     EBP, ESP         SUB     ESP, 904 ; Initializing local variables         MOV  ECX, 300         MOV  AL, 0         MOV  EDI, 0 CLR:         MOV  BYTE PTR DIRS+[EDI], AL         MOV  BYTE PTR DIRSS+[EDI], AL         MOV  BYTE PTR DIRV+[EDI], AL         INC  EDI         LOOP CLR ; Determine the path length         PUSH DIR         CALL lstrlenA@4         MOV  EBX, EAX         MOV  EDI, DIR         CMP  BYTE PTR [EDI], 0         JE   _OK ; Add the trailing backslash if it is missing         CMP  BYTE PTR [EDI+EBX-1], "\"         JE   _OK         PUSH OFFSET AP         PUSH DIR         CALL lstrcatA@8 _OK: ; Store the directory         PUSH DIR         LEA  EAX, DIRSS         PUSH EAX         CALL lstrcpyA@8 ; Path with the mask         PUSH OFFSET MASKA         PUSH DIR         CALL lstrcatA@8 ; Search starts here         PUSH OFFSET FIN         PUSH DWORD PTR DIR         CALL FindFirstFileA@8         CMP  EAX, -1         JE   _ERR ; Save the search descriptor         MOV  FINDH, EAX LF: ; Exclude the "files" "." and ".."         CMP  BYTE PTR FIN.NAM, "."         JE   _FF ;-------------------------         LEA  EAX, DIRSS         PUSH EAX         LEA  EAX, DIRS         PUSH EAX         CALL lstrcpyA@8 ;-------------------------         PUSH OFFSET FIN.NAM         LEA  EAX, DIRS         PUSH EAX         CALL lstrcatA@8 ; Is this a directory?         TEST BYTE PTR FIN.ATR, 10H         JE   NO_DIR ; Add to the <DIR> string         PUSH OFFSET DIRN         LEA  EAX, DIRS         PUSH EAX         CALL lstrcatA@8 ; Increase the counters         INC  NUMD         DEC  NUMF ; Set the directory attribute         MOV  PRIZN, 1 ; Display the directory name         LEA  EAX, DIRS         PUSH EAX         CALL OUTF         JMP  _NO NO_DIR: ;  Display the filename         LEA  EAX, DIRS         PUSH EAX         CALL OUTF ; Indicator of the file (not a directory)         MOV  PRIZN, 0 _NO:         CMP  PRIZN, 0         JZ   _F ; Directory, preparing a recursive call         LEA  EAX, DIRSS         PUSH EAX         LEA  EAX, DIRV         PUSH EAX         CALL lstrcpyA@8         PUSH OFFSET FIN.NAM         LEA  EAX, DIRV         PUSH EAX         CALL lstrcatA@8 ; Calling         LEA  EAX, DIRV         PUSH EAX         CALL FIND ; Continue the search _F:         INC  NUMF _FF:         PUSH OFFSET FIN         PUSH FINDH         CALL FindNextFileA@8         CMP  EAX, 0         JNE  LF ; Close the search descriptor         PUSH FINDH         CALL FindClose@4 _ERR:         MOV  ESP, EBP         POP  EBP         RET  4 FIND    ENDP ;---------------------------- ; Page output of the names of found files STRN    EQU  [EBP+8] OUTF    PROC         PUSH EBP         MOV  EBP, ESP ; Convert the string         PUSH STRN         PUSH STRN         CALL CharToOemA@8 ; Output of the results         MOV  EAX, STRN         MOV  EDI, 1         CALL WRITE         INC  NUM ; End of page?         CMP  NUM, 22         JNE  NO         MOV  NUM, 0 ; Wait for string input         MOV  EDI, 0         LEA  EAX, TEXT         CALL WRITE         PUSH 0         PUSH OFFSET LENS         PUSH 10         PUSH OFFSET BUFIN         PUSH HANDL1         CALL ReadConsoleA@20 NO:         POP  EBP         RET  4 OUTF    ENDP _TEXT ENDS END START

Now, it is necessary to explain the role of local variables in the FIND procedure. The FINDH variable stores the search descriptor for the current directory. The FIND procedure can be called recursively even if the search in the current directory hasn't been accomplished yet. Consequently, after returning from a recursive call, the search must continue. This can be ensured only with the old value of the descriptor. A local variable ensures this possibility, because it is destroyed only when returning to the lower level (to the parent directory).

The DIRSS variable plays a similar role. It stores the current directory. This is important because the fully qualified filename is formed using this variable.

The DIRS and DIRV variables play an auxiliary role. Principally, they could be replaced by global variables. In this respect, bear in mind that, although global variables are undesirable in recursive algorithms from the efficiency point of view, the smaller is the space required for local variables, the better.

Here is another aspect that I want to explain: For passing the directory name when calling the procedure, the DIRV variable is used. Why can't you use the DIRSS variable for the same purpose? The point is that instead of the referenced value, the pointer is passed to the procedure. Consequently, any modifications to the DIR parameter will result in similar changes to the DIRSS variable at the lower level of recursion. Naturally, this isn't the goal.

The Program Translation Using TASM

The main problem with translating the programs presented in Listings 11.1 and 11.2 relates to local labels. A local label is a label that works within the limits of a certain block of the program. In this case, such a block is the procedure. MASM automatically distinguishes labels located within the limit of a specific procedure and interprets them as local labels. Therefore, no problems arise when labels with the same names are encountered within different procedures. TASM uses a slightly different approach: By default, labels are considered global. Thus, local labels must be preceded by the @@ prefix. Furthermore, it is necessary to insert the LOCALS directive in the starting point of the procedure. Having inserted the LOCALS directive and marked the required labels as local, you'll easily convert the program to the format acceptable by TASM. Also, do not forget about converting wsprintfa to _wsprintfA .

^[i] Certainly, this is an elegant solution that allows a certain information redundancy. Under some conditions, this allows you to recover the damaged file system.

^[i] Do not confuse MFT file records and file records describing positions of the file clusters on the disk.

^[i] Naturally, it is possible to do without local variables. For example, it is possible to store the data in a global array and access the required elements of this array depending on the recursion level.