| ||
NTFS is an advanced and sophisticated file system developed independently of FAT. In my opinion, this is one of the most advanced and perfect file systems. Because of this, I'll cover its structure in detail.
Similar to FAT, NTFS also stores files as chains of clusters. Clusters in NTFS can range in size from 512 bytes to 64 KB. The standard (default) cluster size is 4 KB. The main structure in NTFS is the Master File Table (MFT) file. It is necessary to point out that according to the general NTFS concept, NTFS doesn't contain anything but files. The length of filename is limited to 255 characters, and the maximum pathname length cannot exceed 32,767 characters .
For MFT, the main NTFS file, an area in the beginning of the volume is allocated (the boot sector contains the number of the first MFT cluster). However, because MFT is also a file, this file can be located anywhere . To avoid fragmenting this file (although it can be fragmented ), a disk space that is 12% of the entire volume space is allocated for it beforehand. If necessary, the operating system can increase or decrease this area and then return it to its initial state. By its meaning, this is not a system area and therefore is included in the total value of the available disk space.
The MFT file contains records about each file in the system. The MFT record size is 1 KB. If one record is not sufficient for describing a file, other records are added. The first 16 files, whose records are located in the beginning of MFT, are system files. The names of these files start with the $ character. Table 11.3 provides information about these files. Note that the first record (numbering starts from 0) of the MFT file contains information about the MFT file itself. [i]
Record No. in MFT | Name of system file | Comment |
---|---|---|
| $MFT | Master file table. |
1 | $MFTMIRR | The copy of the first 16 records of the MFT. In a normal situation, this file resides directly in the middle of the volume. |
2 | $LOGFILE | Log file for system recovery. According to this file, the operating system can recover the file system of a damaged volume with a high level of probability. |
3 | $VOLUME | The volume file (volume label, file system version, size, etc.). |
4 | $ATTRDEF | This file contains the list of standard volume attributes. |
5 | $ | Root directory. As any file, it can be increased or decreased in size. Note that all system files are located in this directory. |
6 | $BITMAP | This file contains the bitmap for finding available space in the volume. |
7 | $BOOT | The bootstrap file. |
8 | $BADCLUS | The file that lists bad clusters. |
9 | $SECURE | Security information. |
10 | $UPCASE | Uppercase and lowercase mapping for the volume. |
11 | $QUOTA | The directory that contains files used for disk quotes. |
1215 | Reserved records. |
Now, consider the MFT file record. Every record consists of a header followed by a header attribute and its value. Each header contains the checksum; the file ordinal number, which is increased when the record is used for another file; the file access counter; the number of bytes used in the record; and other fields. The record header is followed by the header of the first attribute and the value of this attribute. Then comes the header of the second attribute, and so on. If the attribute is large enough, it is stored in a separate file (nonresident attribute). The important point is that if the volume of data stored in the file is small, they are stored in the MFT record.
Table 11.4. lists the attributes.
Attribute | Description |
---|---|
Standard information (information attribute) | Information about the owner, security data, hard links counter, and bit attributes (read only, archive, etc.). |
Filename | Filename in the Unicode format. |
Security descriptor | This attribute has become obsolete; the $EXTENDEDSECURE attribute has replaced it. |
List of attributes | Location of additional MFT records, used when the attributes do not fit within a single record. |
Object identifier | 64-bit file identifier unique for this volume. |
Reparse point | For creating hierarchical storage; this attribute instructs the procedure that processes the filename to carry out additional operations. |
Volume name | Used in $VOLUME. |
Information about the volume | Volume version (used in $VOLUME ). |
Root index | For directories. |
Index layout | For large directories implemented as B-trees instead of normal lists. |
Bit array | For large directories. |
Data flow of the registration utility | Controls data registration in the $LOGFILE file. |
Data | Flow of the file data; the header of this attribute is followed by the list of clusters, where the data reside, or the data themselves if their volume doesn't exceed hundreds of bytes. |
Thus, in NTFS, files are nothing but sets of attributes. Attributes are represented in the form of the byte flow. As you can see, one of the attributes is the data stored in the file, or the data flow. The file system allows you to add new attributes to the file, which can contain additional information.
NTFS implements many interesting technical solutions. One of these advanced innovations was already mentioned: a small file can entirely reside in a single MFT record. Another approach assumes that the operating system, when writing a file, always tries to carry out this operation to make as many cluster chains (sections, in which disk clusters follow each other directly) as possible. Groups of file clusters are described by special structuresrecords placed into MFT records. [i] For example, a file made up of a single chain of clusters is described by one such record. The same can be said about a file that consists of a small number of cluster chains. Cluster chains within a record are described by the following pair of values: cluster offset from the starting position and the number of clusters. The record header specifies the offset of the first cluster from the starting point of the file and the offset of the first cluster that goes beyond the limits of the current record.
Fig. 11.2 shows a schematic illustration of an MFT record for a file that consists of nine clusters. The record header specifies the offset of the first cluster of the file and the first cluster that doesn't fit within this record. This header is followed by two pairs of numbers that specify the continuous sequences of clusters. The first element of the pair is the offset of the cluster from the starting point of the disk space and the second element is the number of clusters in a chain. Such pairs are also called a series. As you can see, in this case, the file consists of two continuous chains and is specified by two series. Note that the numeric values defining the number of clusters and the cluster offset are 64-bit values in NTFS.
What will happen if the file is fragmented so that all its chains cannot be described by a single MFT record? In this case, several MFT records are used. At the same time, they may not necessarily have numbers differing from each other by 1. To interrelate these records, a base record is used. In the first MFT record that starts this file, the base record precedes the record containing the descriptor of cluster chains. It also has the header followed by a list of the numbers of MFT records, which contain information about location of the file data on the disk. All other MFT records have the same structure as shown in Fig. 11.2. A question might arise: What if the base record cannot fit within a single MFT record? In this case, it is placed into the separate file. According to the NTFS terminology, this makes the record nonresident.
Now, consider some other specific features of NTFS.
A directory in NTFS is a specific file storing references to other files and directories, thus creating the hierarchical structure of the disk file system. As in a normal file, the directory can fit within an MFT record, provided that it isn't too large. Fig. 11.3 shows a schematic picture of an MFT record containing a small directory. Note that the information attribute contains information about the root directory. Directory records themselves contain the filename length and some other parameters; the main information is contained in the MFT index for this file. For large directories, a different data storage format is used. Such directories are built in the form of binary trees, which ensures fast alphabetic searching and allows the fast addition of new files.
NTFS allows files to be stored in a compressed form. The compression mechanism is interesting and deserves a special mention. Compression is carried out in 16-cluster blocks. When writing data, the operating system tries to compress the first 16 clusters of a record, then the next 16-cluster bunch, etc. If the system fails to compress the block, it is written "as is." Suppose that you are compressing a 16-cluster block. Assume that the offset of the first cluster is 50 and that you succeeded in achieving a 25% level of compression. This means that instead of 16 clusters you now have only 12. For simplicity, assume that the clusters are sequential, thus forming a chain. A compressed cluster chain in the MFT record will be represented by two pairs of numbers, (50, 12) and (0, 4), instead of one pair(50, 16). The second pair is needed for the operating system to recognize, which of the chains was compressed. As you can see, file compression mechanism is easy and built into the file system.
The standard method of determining the file system applied in the current partition is to use the GetVolumeInformation function. I won't describe this function in detail but will mention that its seventh parameter (out of eight) is the buffer, into which the file system type will be loaded after this function is called.
A reparse point allows you to extend NTFS functionality. Reparse points were introduced into the NTFS version intended for use with Windows 2000. The developers made provision for several types of reparse points, including volume mount points, NTFS directory attachment points, and HSM reparse points.
Volume mount points allow a volume to be bound to the directory without assigning a drive letter to that volume. Thus, several volumes can be joined and assigned a single drive letter. For example, assume that the mount point is C:\TEMP, and the D: volume is mounted to it. After that, all directories of that volume will be available through the mount point: C:\TEMP\ARH, C:\TEMP\PROGRAM, etc. When the user attempts to access, say, the C:\TEMP\PROGRAM\FC.EXE file, the system detects the mount point for the TEMP directory connected to the D: volume and then accesses the PROGRAM directory on that volume.
Directory connection points are similar to mount points. However, this mechanism is used for mounting directories instead of volumes. For instance, if you return to the previous example, you can connect the PROGRAM directory to the TEMP directory and access the FC.EXE file in a similar way: C:\TEMP\PROGRAM\FC.EXE.
In HSM, reparse points move rarely used files to the backup storage medium. When doing so, the file contents are deleted and replaced with reparse points. The reparse point data contain all information needed for the HSM system to find the file on the backup media. When the user accesses the file, the system processes its reparse point and determines that the file has been moved to the backup medium (and determines where it resides). After that, the system starts the mechanism for retrieving the file from the backup storage medium. After the file has been moved successfully, the reparse point is deleted automatically, and the file access procedure is repeated.
If the reparse point is related to a file or directory, NTFS creates a $Reparse attribute for it. This attribute stores the code and the reparse point data. Because of this mechanism, NTFS easily detects all reparse points on the volume.
For searching files, Windows provides two functions: FindFirstFile and FindNextFile . These resemble similar functions of the MS-DOS. Like in MS-DOS, in Windows these functions operate in coordination. When the search is successful, the first function returns some number or identifier then used by the second function to continue the search.
The first parameter of the FindFirstFile function is the pointer for the search string. The second parameter is the pointer to the structure that receives information about the files found. The FindNextFile received the identifier obtained by the first function as its first parameter, and its second parameter is the pointer to the structure. The example illustrating the use of this structure is presented in Listing 11.1.
Similar functions existed in MS-DOS. The difference between Windows file-searching functions and the respective MS-DOS functions lies in that only the search mask (*.*, *.exe, etc.) is specified in the input. If the file was found, then by the return structure containing all information about that file, it is possible to decide whether the file found is the one you required. In MS-DOS, for file search, it was necessary to additionally specify the file attribute.
The program in Listing 11.1 searches for files in the specified directory. The program can accept one or two parameters or can have no parameters. If two parameters are used, then the first parameter is interpreted as the search directory. The program accounts for the presence of a trailing backslash (allowed options are C:, C:\, C:\WINDOWS\, C:\WINDOWS\SYSTEM, etc.). The second parameter (in the program, it has the number 3 because the first parameter is the command line), if present, specifies the search mask. If this parameter is missing, *.* will be the default mask. Finally, if no parameters are specified, then the program searches the current directory using the " *.* " mask. Note that you can easily extend this program and make a useful utility out of it. I hope that you will do this on your own. Comments about this program are provided after the listing.
; The FILES.ASM file .586P ; Flat memory model .MODEL FLAT, stdcall ; Constants STD_OUTPUT_HANDLE equ -11 STD_INPUT_HANDLE equ -10 ; Prototypes of external procedures EXTERN wsprintfA:NEAR EXTERN CharToOemA@8:NEAR EXTERN GetStdHandle@4:NEAR EXTERN WriteConsoleA@20:NEAR EXTERN ReadConsoleA@20:NEAR EXTERN ExitProcess@4:NEAR EXTERN GetCommandLineA@0:NEAR EXTERN lstrcatA@8:NEAR EXTERN FindFirstFileA@8:NEAR EXTERN FindNextFileA@8:NEAR EXTERN FindClose@4:NEAR ;----------------------------------------------- ; The structure for file searching ; using the FindFirstFile and FindNextFile functions _FIND STRUC ; File attribute ATR DWORD ? ; File creation time CRTIME DWORD ? DWORD ? ; File access time ACTIME DWORD ? DWORD ? ; File modification time WRTIME DWORD ? DWORD ? ; File size SIZEH DWORD ? ; Most significant part SIZEL DWORD ? ; Least significant part ; Reserved DWORD ? DWORD ? ; Long filename NAM DB 260 DUP(0) ; Short filename ANAM DB 14 DUP(0) _FIND ENDS ;---------------------------------------------- ; INCLUDELIB directives for linking libraries includelib c:\masm32\lib\user32.lib includelib c:\masm32\lib\kernel32.lib ;---------------------------------------------- ; Data segment _DATA SEGMENT BUF DB 0 DB 100 dup(0) LENS DWORD ? ; Number of displayed characters HANDL DWORD ? HANDL1 DWORD ? MASKA DB "*.*", 0 AP DB "\", 0 FIN _FIND <0> TEXT DB "Press <ENTER> to continue", 13, 10, 0 BUFIN DB 10 DUP(0) FINDH DWORD ? NUM DB 0 NUMF DWORD 0 ; Files counter NUMD DWORD 0 ; Directories counter FORM DB "Files found: %lu", 0 FORM1 DB "Directories found: %lu", 0 BUFER DB 100 DUP(?) DIR DB " <DIR>", 0 PAR DB 0 ; Number of parameters _DATA ENDS ; Code segment _TEXT SEGMENT : START: ; Get the output handle PUSH STD_OUTPUT_HANDLE CALL Get;StdHandle@4 MOV HANDL, EAX ; Get the input handle HANDL1 PUSH STD_INPUT_HANDLE CALL GetStdHandle@4 MOV HANDEL1, EAX ; Convert strings for output PUSH OFFSET TEXT PUSH OFFSET TEXT CALL CharToOemA@8 PUSH OFFSET FORM PUSH OFFSET FORM CALL CharToOemA@8 PUSH OFFSET FORM1 PUSH OFFSET FORM1 CALL CharToOemA@8 ; Get the number of parameters CALL NUMPAR MOV PAR, AL ; Search the current directory if only one parameter is passed CMP EAX, 1 JE NO_PAR ;-------------------- ; Get the parameter with EDI number MOV EDI, 2 LEA EBX, BUF CALL GETPAR PUSH OFFSET BUF CALL LENSTR ; Add the trailing backslash if it is missing CMP BYTE PTR [BUF+EBX-1], "\" JE NO_PAR PUSH OFFSET AP PUSH OFFSET BUF CALL lstrcatA@8 ; Is there a parameter specifying the search mask? CMP PAR, 3 JB NO_PAR ; Get the search mask MOV EDI, 3 LEA EBX, MASKA CALL GETPAR NO_PAR: ;-------------------- CALL FIND ; Display the number of files PUSH NUMF PUSH OFFSET FORM PUSH OFFSET BUFER CALL wsprintfA LEA EAX, BUFER MOV EDI, 1 CALL WRITE ; Display the number of directories PUSH NUMD PUSH OFFSET FORM1 PUSH OFFSET BUFER CALL wsprintfA LEA EAX, BUFER MOV EDI, 1 CALL WRITE _END: PUSH 0 CALL ExitProcess@4 ;*********************** ; Procedures ;*********************** ; Display a string (terminated by a line feed character) ; EAX --- To the string beginning ; EDX --- With or without a line feed WRITE PROC ; Get the parameter length PUSH EAX CALL LENSTR MOV ESI, EAX CMP EDI, 1 JNE NO_ENT ;Terminated by the line feed MOV BYTE PTR [EBX+ESI], 13 MOV BYTE PTR [EBX+ESI+1], 10 MOV BYTE PTR [EBX+ESI+2], 0 ADD EBX, 2 NO_ENT: ; String output PUSH 0 PUSH OFFSET LENS PUSH EBX PUSH EAX PUSH HANDL CALL WriteConsoleA@20 RET WRITE ENDP ; Procedure for determining the string length ; String in [EBP+08H] ; Length in EBX LENSTR PROC PUSH EBP MOV EBP, ESP PUSH EAX PUSH EDI ;--------------------- CLD MOV EDI, DWORD PTR [EBP+08H] MOV EBX, EDI MOV ECX, 100 ; Limit the string length XOR AL, AL REPNE SCASB ; Find the 0 character SUB EDI, EBX ; String length including the 0 character MOV EBX, EDI DEC EBX ;--------------------- POP EDI POP EAX POP EBP RET 4 LENSTR ENDP ; Procedure for determining the number of parameters ; in the string ; Determine the number of parameters (->EAX) NUMPAR PROC CALL GetCommandLineASO MOV ESI, EAX ; Pointer to the string XOR ECX, ECX ; Counter MOV EDX, 1 ; Indication L1: CMP BYTE PTR [ESI], 0 JE L4 CMP BYTE PTR [ESI], 32 JE L3 ADD ECX, EDX ; Parameter number MOV EDX, 0 JMP L2 L3: OR EDX, 1 L2: INC ESI JMP LI L4: MOV EAX, ECX RET NUMPAR ENDP ; Get the parameter from the command line ; EBX --- Points to the buffer where the parameter ; will be loaded ; Zero-terminated string is loaded into the buffer ; EDI --- Number of the parameter GETPAR PROC CALL GetCommandLineA@0 MOV ESI, EAX ; Pointer to the string XOR ECX, ECX ; Counter MOV EDX, 1 ; Indicator L1: CMP BYTE PTR [ESI], 0 JE L4 CMP BYTE PTR [ESI], 32 JE L3 ADD ECX, EDX ; Number of the parameter MOV EDX, 0 JMP L2 L3: OR EDX, 1 L2: CMP ECX, EDI JNE L5 MOV AL, BYTE PTR [ESI] MOV BYTE PTR [EBX], AL INC EBX L5: INC ESI JMP L1 L4: MOV BYTE PTR [EBX], 0 RET GETPAR ENDP ; Searching for files in a directory and their output ; Directory name in BUF FIND PROC ; Path with a mask PUSH OFFSET MASKA PUSH OFFSET BUF CALL lstrcatA@8 ; The search starts here PUSH OFFSET FIN PUSH OFFSET BUF CALL FindFirstFileA@8 CMP EAX, -1 JE _ERR ; Save the search descriptor MOV FINDH, EAX LF: ; Exclude "." and ".." "files" CMP BYTE PTR FIN.NAM, "." JE _N2O ; Is this a directory? TEST BYTE PTR FIN.ATR, 10H JE NO_DIR PUSH OFFSET DIR PUSH OFFSET FIN.NAM CALL lstrcatA@8 INC NUMD DEC NUMF NO_DIR: ; Convert the string PUSH OFFSET FIN.NAM PUSH OFFSET FIN.NAM CALL CharToOemA@8 ; Results output LEA EAX, FIN.NAM MOV EDI, 1 CALL WRITE ; Increase the counters INC NUMF INC NUM ; Page end? CMP NUM, 22 JNE _NO MOV NUM, 0 ; Wait for the input string MOV EDI, 0 LEA EAX, TEXT CALL WRITE PUSH 0 PUSH OFFSET LENS PUSH 10 PUSH OFFSET BUFIN PUSH HANDL1 CALL ReadConsoleA@20 _NO: ; Continue the search PUSH OFFSET FIN PUSH FINDH CALL FindNextFileA@8 CMP EAX, 0 JNE LF ; Terminate the search PUSH FINDH CALL FindClose@4 _ERR: RET FIND ENDP _TEXT ENDS END START
The program in Listing 11.1 is simple. The only new feature that you'd find here is working with the FindFirstFile and FindNextFile functions. Procedures used for working with command-line parameters were encountered before. Information is output into the current console; this feature, too, has been encountered before. To get the console descriptor, the GetstdHandle function is used. The WRITE procedure simplifies the code sections responsible for screen output. Earlier, I promised that string API functions will be the paid attention they deserve. In this program, I have kept my word. Along with custom string procedures, this program makes use of the lstrcat string function, which concatenates strings. As relates to the command-line parameter, note that if the directory name contains blanks, the filename must be specified in abbreviated form, for example, C:\PROGRA~1 instead of C:\PROGRAM FILES. The reason is straightforward, because blanks serve as parameter delimiters. To solve the problem correctly, it is necessary to introduce a special delimiter for parameters, for example, ˆ or /.
The program in Listing 11.1 searches either the current directory or the specified directory. If this program was written in a high-level programming language, such as C, it could be easily modified so that it would search the directory tree. Only the find procedure, which must be called recursively, would require a minor modification. As can be seen, this ease is because of the presence of local variables in high-level languages. Well, try to implement the same thing basing on materials presented in Chapter 2. Is it possible to achieve the same goal without using local variables ?
Note | The length of the first parameter of the FindFirstFile API function cannot exceed the value of the MAX_PATH constant, which is equal to 260. If you need to use longer strings, it is necessary to use the Unicode version of this function (which has the w suffix). In this case, the string length can reach 32,000 characters. However, do not forget to convert the string into Unicode format and precede it with the \ \? \ prefix. |
The program presented in Listing 11.2 is similar to the previous program. However, it searches the entire directory tree, starting from the specified directory. This is one of the most complicated programs presented in this book. Therefore, I strongly recommend that you study it carefully . I hope that you'll be able to improve it. I'd like to give you some directions, with which you can work. The second command-line parameter allows you to specify the search mask. For example, if you specify the *. EXE option, not only files but also directories will be searched by this mask. This is an obvious drawback that is the first candidate for elimination .
The directory tree can be easily searched recursively; however, for this purpose, you need local variables. [i] The meaning of using a local variable in a recursive algorithm is that part of data must be preserved when returning from the procedure.
In the program under consideration, for simplicity, I abandoned the LENSTR procedure and decided to use the lstrlen API function instead. In addition, I improved the output to display the fully qualified filename on the screen.
; The FILES.ASM file .586P ; Flat memory model .MODEL FLAT, stdcall ; Constants STD_OUTPUT_HANDLE equ -11 STD_INPUT_HANDLE equ -10 ; Prototypes of external procedures EXTERN wsprintfA:NEAR EXTERN CharToOemA@8:NEAR EXTERN GetStdHandle@4:NEAR EXTERN WriteConsoleA@ 20: NEAR EXTERN ReadConsoleA@20: NEAR EXTERN ExitProcess@4:NEAR EXTERN GetCommandLineA@0:NEAR EXTERN lstrcatA@8:NEAR EXTERN lstrcpyA@8:NEAR EXTERN lstrlenA@4:NEAR EXTERN FindFirstFileA@8:NEAR EXTERN FindNextFileA@8:NEAR EXTERN FindClose@4:NEAR ;------------------------------------ ; The structure for file search ; using the FindFirstFile and FindNextFile functions _FIND STRUC ; File attribute ATR DWORD ? ; File creation time CRTIME DWORD ? DWORD ? ; File access time ACTIME DWORD ? DWORD ? ; File modification time WRTIME DWORD ? DWORD ? ; File size SIZEH DWORD ? SIZEL DWORD ? ; Reserved DWORD ? DWORD ? ; Long filename NAM DB 260 DUP(0) ; Short filename ANAM DB 14 DUP(0) _FIND ENDS ;----------------------------------------- ; INCLUDELIB directives for the linker includelib c:\masm32\lib\user32.lib includelib c:\masm32\lib\kernel32.lib ;----------------------------------------- ; Data segment _DATA SEGMENT BUF DB 0 DB 100 dup(0) LENS DWORD ? ; Number of output characters HANDL DWORD ? HANDL1 DWORD ? MASKA DB "*.*" DB 50 DUP(0) AP DB "\", 0 FIN _FIND <0> TEXT DB "Press <Enter> to continue", 13, 10, 0 BUFIN DB 10 DUP(0) ; Output buffer NUM DB 0 NUMF DWORD 0 ; Files counter NUMD DWORD 0 ; Directories counter FORM DB "Number of files found: %lu", 0 FORM1 DB "Number of directories found: %lu", 0 DIRN DB " <DIR>", 0 PAR DWORD 0 PRIZN DB 0 _DATA ENDS ; Code segment _TEXT SEGMENT START: ; Get the output handle PUSH STD_OUTPUT_HANDLE CALL GetStdHandle@4 MOV HANDL, EAX ; Get the input handle HANDL1 PUSH STD_INPUT_HANDLE CALL GetStdHandle@4 MOV HANDL1, EAX ; Convert strings for output PUSH OFFSET TEXT PUSH OFFSET TEXT CALL CharToOemA@8 PUSH OFFSET FORM PUSH OFFSET FORM CALL CharToOemA@8 PUSH OFFSET FORM1 PUSH OFFSET FORM1 CALL CharToOemA@8 ; Get the number of parameters CALL NUMPAR MOV PAR, EAX ; If there is only one parameter, search the current directory CMP EAX, 1 JE NO_PAR ;-------------------------------------- ; Get the parameter with the EDI number MOV EDI, 2 LEA EBX, BUF CALL GETPAR CMP PAR, 3 JB NO_PAR ; Get the search mask MOV EDI, 3 LEA EBX, MASKA CALL GETPAR NO_PAR: ;------------------------- PUSH OFFSET BUF CALL FIND ; Output the number of files PUSH NUMF PUSH OFFSET FORM PUSH OFFSET BUF CALL wsprintfA LEA EAX, BUF MOV EDI, 1 CALL WRITE ;++++++++++++++++ ; Output the number of directories PUSH NUMD PUSH OFFSET FORM1 PUSH OFFSET BUF CALL wsprintfA LEA EAX, BUF MOV EDI, 1 CALL WRITE _END: PUSH 0 CALL ExitProcess@4 ; Procedures ;**************************************** ; Output the string (terminated with line feed) ; EAX --- To the beginning of the string ; EDX - With or without line feed WRITE PROC ; Get the parameter length PUSH EAX PUSH EAX CALL lstrlenA@4 MOV ESI, EAX POP EBX CMP EDI, 1 JNE NO_ENT ; Line feed in the end MOV BYTE PTR [EBX+ESI], 13 MOV BYTE PTR [EBX+ESI+1], 10 MOV BYTE PTR [EBX+ESI+2], 0 ADD EAX, 2 NO_ENT: ; String output PUSH 0 PUSH OFFSET LENS PUSH EAX PUSH EBX PUSH HANDL CALL WriteConsoleA@20 RET WRITE ENDP ; Procedure for determining the number of parameters ; Determine the number of parameters (->EAX) NUMPAR PROC CALL GetCommandLineA@0 MOV ESI, EAX ; Pointer to the string XOR ECX, ECX ; Counter MOV EDX, 1 ; Indicator L1: CMP BYTE PTR [ESI], 0 JE L4 CMP BYTE PTR [ESI], 32 JE L3 ADD ECX, EDX ; Parameter number MOV EDX, 0 JMP L2 L3: OR EDX, 1 L2: INC ESI JMP L1 L4: MOV EAX, ECX RET NUMPAR ENDP ; Get the parameter from the command line ; EBX --- Points to the buffer, in which to load the parameter ; Zero-terminated string is loaded into the buffer ; EDI --- Parameter number GETPAR PROC CALL GetCommandLineA@0 MOV ESI, EAX ; Pointer to the string XOR ECX, ECX ; Counter MOV EDX, 1 ; Indicator L1: CMP BYTE PTR [ESI], 0 JE L4 CMP BYTE PTR [ESI], 32 JE L3 ADD ECX, EDX ; Parameter number MOV EDX, 0 JMP L2 L3: OR EDX, 1 L2: CMP ECX, EDI JNE L5 MOV AL, BYTE PTR [ESI] MOV BYTE PTR [EBX], AL INC EBX L5: INC ESI JMP L1 L4: MOV BYTE PTR [EBX], 0 RET GETPAR ENDP ;------------------------------- ;-Searching files in the directory and sending them for output FINDH EQU [EBP-4] ; Search descriptor DIRS EQU [EBP-304] ; Fully qualified filename DIRSS EQU [EBP-604] ; For storing the directory DIRV EQU [EBP-904] ; For temporary storage DIR EQU [EBP+8] ; Parameter - Directory name FIND PROC PUSH EBP MOV EBP, ESP SUB ESP, 904 ; Initializing local variables MOV ECX, 300 MOV AL, 0 MOV EDI, 0 CLR: MOV BYTE PTR DIRS+[EDI], AL MOV BYTE PTR DIRSS+[EDI], AL MOV BYTE PTR DIRV+[EDI], AL INC EDI LOOP CLR ; Determine the path length PUSH DIR CALL lstrlenA@4 MOV EBX, EAX MOV EDI, DIR CMP BYTE PTR [EDI], 0 JE _OK ; Add the trailing backslash if it is missing CMP BYTE PTR [EDI+EBX-1], "\" JE _OK PUSH OFFSET AP PUSH DIR CALL lstrcatA@8 _OK: ; Store the directory PUSH DIR LEA EAX, DIRSS PUSH EAX CALL lstrcpyA@8 ; Path with the mask PUSH OFFSET MASKA PUSH DIR CALL lstrcatA@8 ; Search starts here PUSH OFFSET FIN PUSH DWORD PTR DIR CALL FindFirstFileA@8 CMP EAX, -1 JE _ERR ; Save the search descriptor MOV FINDH, EAX LF: ; Exclude the "files" "." and ".." CMP BYTE PTR FIN.NAM, "." JE _FF ;------------------------- LEA EAX, DIRSS PUSH EAX LEA EAX, DIRS PUSH EAX CALL lstrcpyA@8 ;------------------------- PUSH OFFSET FIN.NAM LEA EAX, DIRS PUSH EAX CALL lstrcatA@8 ; Is this a directory? TEST BYTE PTR FIN.ATR, 10H JE NO_DIR ; Add to the <DIR> string PUSH OFFSET DIRN LEA EAX, DIRS PUSH EAX CALL lstrcatA@8 ; Increase the counters INC NUMD DEC NUMF ; Set the directory attribute MOV PRIZN, 1 ; Display the directory name LEA EAX, DIRS PUSH EAX CALL OUTF JMP _NO NO_DIR: ; Display the filename LEA EAX, DIRS PUSH EAX CALL OUTF ; Indicator of the file (not a directory) MOV PRIZN, 0 _NO: CMP PRIZN, 0 JZ _F ; Directory, preparing a recursive call LEA EAX, DIRSS PUSH EAX LEA EAX, DIRV PUSH EAX CALL lstrcpyA@8 PUSH OFFSET FIN.NAM LEA EAX, DIRV PUSH EAX CALL lstrcatA@8 ; Calling LEA EAX, DIRV PUSH EAX CALL FIND ; Continue the search _F: INC NUMF _FF: PUSH OFFSET FIN PUSH FINDH CALL FindNextFileA@8 CMP EAX, 0 JNE LF ; Close the search descriptor PUSH FINDH CALL FindClose@4 _ERR: MOV ESP, EBP POP EBP RET 4 FIND ENDP ;---------------------------- ; Page output of the names of found files STRN EQU [EBP+8] OUTF PROC PUSH EBP MOV EBP, ESP ; Convert the string PUSH STRN PUSH STRN CALL CharToOemA@8 ; Output of the results MOV EAX, STRN MOV EDI, 1 CALL WRITE INC NUM ; End of page? CMP NUM, 22 JNE NO MOV NUM, 0 ; Wait for string input MOV EDI, 0 LEA EAX, TEXT CALL WRITE PUSH 0 PUSH OFFSET LENS PUSH 10 PUSH OFFSET BUFIN PUSH HANDL1 CALL ReadConsoleA@20 NO: POP EBP RET 4 OUTF ENDP _TEXT ENDS END START
Now, it is necessary to explain the role of local variables in the FIND procedure. The FINDH variable stores the search descriptor for the current directory. The FIND procedure can be called recursively even if the search in the current directory hasn't been accomplished yet. Consequently, after returning from a recursive call, the search must continue. This can be ensured only with the old value of the descriptor. A local variable ensures this possibility, because it is destroyed only when returning to the lower level (to the parent directory).
The DIRSS variable plays a similar role. It stores the current directory. This is important because the fully qualified filename is formed using this variable.
The DIRS and DIRV variables play an auxiliary role. Principally, they could be replaced by global variables. In this respect, bear in mind that, although global variables are undesirable in recursive algorithms from the efficiency point of view, the smaller is the space required for local variables, the better.
Here is another aspect that I want to explain: For passing the directory name when calling the procedure, the DIRV variable is used. Why can't you use the DIRSS variable for the same purpose? The point is that instead of the referenced value, the pointer is passed to the procedure. Consequently, any modifications to the DIR parameter will result in similar changes to the DIRSS variable at the lower level of recursion. Naturally, this isn't the goal.
The main problem with translating the programs presented in Listings 11.1 and 11.2 relates to local labels. A local label is a label that works within the limits of a certain block of the program. In this case, such a block is the procedure. MASM automatically distinguishes labels located within the limit of a specific procedure and interprets them as local labels. Therefore, no problems arise when labels with the same names are encountered within different procedures. TASM uses a slightly different approach: By default, labels are considered global. Thus, local labels must be preceded by the @@ prefix. Furthermore, it is necessary to insert the LOCALS directive in the starting point of the procedure. Having inserted the LOCALS directive and marked the required labels as local, you'll easily convert the program to the format acceptable by TASM. Also, do not forget about converting wsprintfa to _wsprintfA .
[i] Certainly, this is an elegant solution that allows a certain information redundancy. Under some conditions, this allows you to recover the damaged file system.
[i] Do not confuse MFT file records and file records describing positions of the file clusters on the disk.
[i] Naturally, it is possible to do without local variables. For example, it is possible to store the data in a global array and access the required elements of this array depending on the recursion level.
| ||