In this, mainly theoretical chapter of the book, the reader will become acquainted with CD organization and the principles of optical recording. Without this knowledge, any study of CD-protection principles is simply impossible .
Physically, a CD is a thin plate made of polycarbonate plastic with a thin reflective aluminum (or, in some cases, golden) layer. The reflective layer, in turn , is covered by another, special protective layer (Fig. 1.1). The reflective layer is imprinted with a chain of microscopic pits and lands, arranged in a form of a long, continuous spiral track. This track is similar to those that you would see on an old vinyl phonograph record (Fig. 1.2). Unlike the vinyl record, however, the information on a CD winds from the disc s center to its outer edge. CDs, therefore, are similar to sequential access devices with accelerated rewinding.
Contrary to common belief, pits and lands do not directly correspond to the ones and zeroes of binary code. Information is stored on CDs according to much more sophisticated and advanced principles. The one of binary code is represented by a change from a pit to a land or from a land to a pit, while the zero of binary code is represented by the lack of a change for the current interval (Fig. 1.3). At the same time, between two ones there must be no less than two and no more than ten zeroes. The lower limit is a result of technological problems involved in the manufacture of the physical discs manufacturing, while the upper limit is due to the instability of disc rotation speed. For example, if the stability of rotation speed is 3 percent, then, when reading a sequence of ten zeroes, the error will be 1/3 bit, which doesn t pose any problems. However, when reading a sequence of fifteen zeroes, this error grows to 1/2 bit, and the drive will have problems with rounding this error.
Fourteen bits make up one EFM-word, which is encoded into a normal 8-bit byte according to special table. In fact, the abbreviation EFM stands for Eight to Fourteen Modulation. Every two EFM words are separated by three merging bits, which are intended, first, for resolving situations of encoding conflict (for instance, where one EFM word terminated by the binary value of one is followed by another EFM word that starts with the same binary value), and, second, to prevent the occurrence of erroneous sync groups, which will be covered in detail a little bit later.
A group of 36 bytes makes up an F1 frame, which comprises the sync group preceding it, a subcode byte, and two 12-byte data groups supplied with 4-byte checksum (or CRC) fields.
Frames are joined to form sectors, also called blocks. Each sector contains 98 chaotically mixed frames (mixing allows for the reduction of the influence of medium defects, since useful information is spread over the track). At the same time, the first 16 bytes of each sector are occupied by the header, which contains a 12-byte Sync field, a 3-byte Address field, and a 1-byte Mode field (Fig. 1.5).
The sector is significant because it is the smallest unit of data that a CD drive can read in raw mode. Note that there are no drives that would allow for the retrieval of the frame contents as is. On the contrary, all drives forcibly recover frame contents at the hardware level, using CRC fields for this purpose. Details related to errorrecovery technique will be provided later. For the moment, it is enough to note that the lack of access to actual raw bytes rules out obtaining a bit-by-bit disc copy. This means that the protection mechanism has the principal capability of distinguishing a copy from the original.
Along with various auxiliary information, the F1 frame contains one subchannel byte, also called the subchannel or control byte (see Fig. 1.7). Subchannel data are entirely isolated from sector contents and, in some ways, behave exactly like multiple data flows in the NTFS file system (see Inside Windows NT File System by Helen Custer). Fig. 1.6 illustrates this.
Every one of the eight bits that make up the subcode byte is designated by an upper-case Latin character: P, Q, R, S, T, U, V, and W, respectively. Similarly named bits of subchannel bytes in all frames are joined into so-called subcode channels. Channels consist of sections, each of which is created by means of joining subchannel data from 98 frames, which equals one sector (see Fig. 1.7). However, section and sector boundaries may not coincide. Consequently, to guarantee the retrieval of one section from a disc, we must read the first two sectors. The first two bytes of the section are used for synchronization, while 96 are dedicated for the storage of actual data. By means of simple calculation, we can discover that for every channel there are exactly 16 bytes of raw, unprocessed data.
Data of P and Q subchannels are supplied in the form of data that are ready for use. The first 12 bytes are the most important, while the others are used for alignment. Data of R-W channels must be specially prepared before used (this process is known as cooking). The 96 6-bit symbols that make up these channels are divided into 4 groups comprising 24 words. Every group of this type is called a pack, and includes 16 symbols of user data and two EDC/ECC fields, one of which contains 2 symbols of correcting codes, and the second of which contains 4 symbols of correcting codes.
What information is stored in subcode channels? According to the ECMA-130 standard, normal CDs use only two subchannels: P and Q.
The P subchannel contains the termination marker of the current track and a pointer to the next track, while the Q subchannel is used for the storage of service information, which determines the current position of this block on the disc. Of all of the channels, this is the most important.
In its structure, the Q subchannel comprises the following four parts : four control bits corresponding to the Control field, four address bits corresponding to the q-Mode (ADR) field; 7 bits of Q-data, corresponding to the q-Data field, and 16 bits of the checksum, corresponding to the CRC field (Fig. 1.8).
Byte | Description | |
---|---|---|
| Control/ADR | |
1 | TNO (Track Number) | |
2 | Index | |
3 | PMin | Head position in relation to the track start |
4 | PSec | |
5 | PFrame | |
6 | Zero | |
7 | Amin | Head position in relation to the start of the disc |
8 | Asec | |
9 | Aframe | |
10 | CRC | |
11 | ||
12 | Reserved for alignment | |
13 | ||
14 | ||
15 |
The Control field defines the track contents (audio or data), the number of audio channels (stereo or quadro), and specifies whether the copying of data is permitted. In particular, the sequence 0110 specifies that the user-data part of the sector stores digital data that may be copied . Conversely, the sequence 0100 prohibits data copying from the disc. In other words, if the first bit from the right (numbering starts from 0) is set, then copying is permitted. Curiously, most drives always reset this bit to zero, even if files created by the user are written to the disc. Nevertheless, copiers ignore these absurd prohibitions altogether. Therefore, the end user may be unaware of the problems that could arise!
The q-Mode field defines the format of data representation in the q-Data field. For most CDs, this is equal to 1.
The q-Data field in the q-Mode == Mode 1 mode comprises nine single-byte fields containing information about the sector (other modes are not considered because of their exotic nature):
TNO (Track Number) ”contains the number of the current track, receiving values ranging from 01 to 99; the magic value of 0xAA points to the Lead-out track.
INDEX ”contains the index of the current section within the current track: 00 ”specifies pause, while values from 01 to 99 identify sections with useful data. Currently, however, this possibility is not utilized and the section index is either always equal to zero (audio-pause), or to one (actual data); the Lead-out index of the track must be equal to zero.
MIN, SEC, FRAC ”the time or sector playback, from the starting point, of the current track (minutes: seconds: frames, respectively). Also called the relative playback time.
ZERO ”this field must always be equal to zero.
A-MIN , A-SEC, A-FRAC ”the playback time of the disc from the starting point of the data area ( minutes: seconds: frames , respectively). Also called the absolute replay time.
The CRC field contains the checksum of the contents of the Q subcode channel. It is calculated according to the following polynomial: G(x)=x 16 + x 12 + x 5 +1 .
The addressing of sectors originated with audio discs. It is written in the following format: Time - mm:ss:ff ( minutes: seconds: fractions fractions of a second range from 0 to 74). Counting starts from the beginning of the program area, i.e., the addresses of the Lead-in area are negative.
To convert MSF address to the LBA format, the following formula can be used: Logical Sector Address = (((Minute*60) +Seconds)*75) - 150
IEC 908 is the standard for audio CDs. Because it was published in 1982 in the form of a book with a red cover, it is officially called the Red Book. This standard describes the sector as a logical block 2,352 bytes long that has no additional fields and represents a continuous audio flow of digitized music. Logical enough, the sectors of all other storage media, including diskettes and hard disks, are organized in a similar way at the logical level. They only differ with regard to the sector length (for instance, the length of a sector for diskettes and hard disks equals to 512 bytes).
Unfortunately , any attempt to use an audio CD for storing data invariably failed. The prohibitively high storage density, along with the technical imperfection of the read mechanism, resulted in persistent errors while attempting to read the disc. The number of these errors for a 10-second interval might reach 200! For audio, this seemed relatively normal, since erroneous bits can easily be corrected by interpolation. Although the reliability of streaming audio is not a guarantee in this case, even the well-trained ear of a musician is unable to notice the difference. Therefore, the increase in storage density for the sake of increasing disc capacity seems justified.
Naturally, for the correction of errors that occur in the course of reading data files, the interpolation approach is not suitable. Consequently, the file being read will be irrecoverably corrupted. To solve this problem, it was necessary to increase the redundancy of the information being written to the CD and to introduce additional errorcorrection codes. For the purpose of maintaining compatibility with existing equipment (including the existing production capacities ), the existing information-storage format was preserved, but with another level of abstraction added.
The Yellow Book standard, published in 1983, describes the sector as a complex structure comprising a 12-byte sync sequence, a 4-byte header, a 2,048-byte data area, a 4-byte Error Correction Code (ECC) field, an 8-byte Auxiliary field, and a 276-byte Error Detection Code (EDC) field. This structure is shown in Fig. 1.9.
Naturally, CD-ROM drive hardware hides all of these details and displays the contents of the auxiliary fields only when it receives a special command (which, by the way, is not possible on all models). From the programmer s point of view, 2,048 bytes of user data area is the minimum set of information, with which the standard drive must work. Therefore, it is fine that the length of a logical sector became a multiple of the sector length of all of the other devices as a result of the contraction of the actual sector! So, what is the problem? Why should we dig so deep? The answer is obvious: By manipulating with auxiliary fields, you can create discs that cannot be copied with standard tools. Besides, you ll be able to crack protection mechanisms that prevent users from unauthorized copying of protected discs.
Thus, if you are not too bored with the pure theory, let s continue with our efforts. In order to, hopefully, bolster your inspiration, I ll just point out that these theoretical considerations will soon come to an end, and we will embark on the captivating process of investigating the disc under the microscope.
The synchronization field contains the following sequence: 00h FFh FFh FFh FFh FFh FFh FFh FFh FFh FFh 00h , and serves as an indicator of the sector s starting point.
The Header field contains four bytes, the first three of which are occupied by the physical address of the current physical sector address, specified in the following format: minutes:seconds:frames . The final, fourth, byte determines the format of the remaining part of the sector (mode). If mode == 0 , then the remainder of the sector if mode == 0 this shall mean that all bytes in positions 16 to 2,351 of the Sector are set to 00h and read without any additional processing. If mode == 1 , then the remainder of the sector appears as shown in Fig. 1.10 (which we are discussing presently). Naturally, there are also other modes. However, because they are not very common, we won t engage in a discussion of this topic. Interested readers can find all of the required information in the appropriate specifications.
Fig. 1.10: Sector format for the MODE1 mode
2,048 bytes of user data area, as its name suggests, represent the useful information.
The four-byte EDC field contains the sector checksum, calculated according to the following polynomial: P(x) =( x 16 + x 15 + x 2 +1) — ( x 16 + x 2 +1). At the same time, the least significant parity bit (x ) is written into the most significant bit of the checksum, and computation of the checksum starts from the least significant data bit.
The auxiliary Intermediate field stores eight bytes filled with zeroes. To be honest, I don t quite understand how it is used (for all appearances , it is not used at all, or, at least, it is clearly not used by protection mechanisms).
P-parity and Q-parity fields, whose lengths are 172 and 104 bytes, respectively, contain so-called Reed-Solomon error-correction codes. The mathematical principles of their operation are described in detail in ECMA-130. We won t concentrate on these codes here, especially because for the vast majority of problems, there is no need to compute ECC codes. Most frequently, crackers simply fill these fields with random, senseless garbage, thus imitating an irrecoverable sector read error (or, simply speaking, emulating the physical defects of the disc surface by means of creating sectors that cannot be read logically). This approach is the most appropriate for cracking protection mechanisms that rely on physical defects of the discs surface.
Merging bits solve three important problems, without which it would be impossible to read information from a CD.
First, merging bits prevent conflict situations from arising. These conflict situations can occur at the junction point of two EFM words, the first of which terminates with a one, and the second of which also starts with a one (Fig. 1.11). Because two binary ones (each of which corresponds to the transition from pit to land or from land to pit) must be separated by at least two zeroes, this combination is considered a violation. The drive simply won t notice that something is present here (the length of one pit/land is considerably smaller than the diameter of a focused laser trace). Therefore, to detect this reliably, its length must be increased to at least 3T (see Fig. 1.12 for more details). On the other hand, if the tail of one EFM word consists of eight successive binary zeroes, and the following EFM word starts with the same sequence of 8 zeroes, we will have the chain comprising 16 zeroes. When reading such a chain, an error will occur, since according to the standard, there must be no more than eleven zeroes between two binary ones. Otherwise , the length-detection error will become too extreme. If you haven t quite grasped the idea, just take a map and try to measure the distance between two cities using an ordinary ruler. This makes the nature of the problem a bit clearer. Briefly speaking, merging bits are chosen in such a way as to ensure that there are no less than three and no more than eleven zeroes between the two closest neighboring binary ones.
Second, merging bits prevent the occurrence of erroneous sync groups. The sequence of bits that make up a sync group (for reference, this is 100000000001000000000010) can only occur in the frame header, and, therefore, serves as a kind of specific indicator of its starting point. When the read head moves across the spiral track to find the specified sector, it has to use some method to detect its current location (i.e., the starting point of a frame, the middle of a frame, or even the middle of the EFM word). The reading device passes the stream of digital data through itself until it encounters the next sync group. When a sync group is encountered , the intellectual circuitry of the CD-ROM drive draws the conclusion that this point is actually nothing other than the starting point of a new frame! Just imagine the confusion if false parasitic sync groups were allowed to appear in the middle of a frame! In fact, without merging bits, this would occur on a regular basis! For example, consider the following EFM words: 10000000000100 and 0000000010 0100 . If these words are glued together, a false parasitic sync group appears (the digits forming it are in bold). Any attempt at reading such a frame would cause a crash. Merging bits that connect such EFM words allow us to avoid these situations.
Third, look at the illustration shown in Fig. 1.12 ”the CD has no other markings except for the spiral track comprising the sequence of pits and lands. Fast alternation of pits and lands generates a HF (High Frequency) signal, shown in graph (b) . This signal is important for holding the read head on the spiral track because there is no other method for distinguishing the track between inter-track intervals. Another complication relates to the lack of a reference signal (or decision signal, as it is called in the standard), without which the read head cannot reliably distinguish dark surface areas from light areas. A precise quotation from the standard is provided below: The information contained in the HF signal is extracted in the form of the positions of the crossings of the HF signal with a decision level I D . This decision level I D is the level in the middle of the extreme values of I 3 (I 3 is the signal level corresponding to the maximum frequency of pit and land alternation).
Now, imagine what will happen if in some section of a spiral track there is a considerable excess of pits in relation to the number of lands, or vice versa. Instead of an alternating high-frequency signal, the drive will produce a direct current of a high or low level. Under conditions where there is no decision signal, the drive will experience considerable difficulties in detecting which is which! In other words, within one frame the number of pits and lands must be approximately equal.
How can this balance be achieved? After all, we cannot write strictly ordered sequences to the disc. Furthermore, even a brief glance at the EFM code table is sufficient to show that the zeroes predominate over the ones. Whatever EFM sequences we write, it is simply impossible to achieve the necessary quorum of ones! But there is no direct correspondence between bits and pits (lands). This means that the binary zero can be encoded both by a pit and a land! Assume that we are writing the following EMF sequence to the disc: "10000000100000" . Despite the evident excess of binary zeroes, this EMF code contains an approximately equal number of pits and lands (Fig. 1.13, a).
To compute this ratio more precisely, a special value was introduced, known as the DSV (Digital Sum Value). This value is computed in the following way: initially, the DSV is equal to zero, but each next pit increases it by one, while each land decreases it by one. In particular, for an EFM sequence such as "10000000100000" , the DSV value is equal to two (Fig. 1.13, a ). Note that this is 2 rather than ˆ’ 2 , since we are interested only in the absolute value of the number, while its being positive or negative is of no importance (in fact, if this sequence started from a land rather than a pit, we would get the inverse result ”eight + and six ˆ’ ).
According to the standard, the DSV computed for the entire frame must fall within the interval from 2 to 10, otherwise, there will be problems in reading the sector (if the read operation were even possible). Note that not all EFM codes are characterized by a low DSV value. For instance, consider the following ” "00010010000000" , for which the DSV is 8. In fact, although this value formally satisfies the standard requirements, if there are at least ten such sequences on a disc, the DSV value will grow catastrophically to a value of 80 !!!
Decreasing the DSV level to an acceptable minimum is the third task that is carried out by merging bits. How do they do this? Look at the illustration shown in Fig. 1.13, b . Here, one EFM word having a large positive DSV is followed by another EFM word with a high DSV. As mentioned above, the DSV is an unsigned value, or, to be more precise, the actual DSV sign (negative or positive) of an EFM word doesn t depend on the word itself, but rather it depends on the word s context! Let us consider the example of the following degenerate sequence: "...00000000..." . Since binary zero corresponds to the lack of a change in this location on the disc surface, this sequence can be encoded both by eight lands or by eight pits. Now, let us assume that we are writing two EFM words, both of which have a significant excess of lands. Is it possible to turn the lands of the second word into pits? This is, in fact, possible, provided that at least one of the three merging bits is equal to one. As we know, binary one corresponds to the transition from land to pit (or from pit to land). Therefore, the second EFM word will start from a pit, and its DSV will become strongly negative (in other words, there will be an excess of pits over lands). As a result, the EFM word with a strongly positive DSV will be, to some extent, compensated for by an EFM word with a strongly negative DSV. Thus, the total DSV will be close to zero (Fig. 1.13).
Do application programmers need to know all of these details of physical encoding, you might ask? The answer is straightforward: The requirements for merging bits are mutually exclusive and can produce very unfavorable EFM sequences with unavoidably high DSVs. Such a sequence is shown in Fig. 1.14. The first EFM word is terminated by 8 zeroes. Since sequences comprising ten or more consecutive zeroes are disallowed , either the first or the second of the merging bits must necessarily be set to 1. However, in this case, the next EFM word will take strongly negative DSV value, for which we cannot compensate. This is because of the fact that between the second and the third EFM words only a single combination of merging bits is possible ” "000" , while any other combination will violate the rule that no less than two zeroes must be present between two neighboring binary ones. As a result, the third EMF word also has a strongly negative DSV value. If we fill the entire sector with this sequence, its total DSV will be catastrophically negative!
To prevent the occurrence of such sequences, all data being written to the disc are previously scrambled, i.e., transformed into a pseudo-random sequence that resembles white noise in its characteristics. Accordingly, when performing data reading, an inverse operation is carried out. However, if desired, it is possible to bypass the scrambler! Some protection mechanisms do exactly this (see Protection Based on Weak Sectors ).
Before writing the data to the disc, sector contents undergo a scrambling operation. Scrambling means that the data is transformed into a pseudo-random sequence that resembles white noise in its characteristics. This eliminates the unpremeditated generation of regular sequences with high DSV values, because such sequences are considered unfavorable from the point of view of the read device. As a result, the reading of such sequences is extremely unstable (for more details see Sync Groups, Merging Bits, and DSV ).
Scrambling is applied to all fields of a sector, except for the 12-byte sync group at its start. In fact, if we also scramble the sync group, how would we find it later in the data stream? In total, this operation will produce 2,340 bytes of data (Fig. 1.15).
Scrambling is carried out by the CD-R/CD-RW drive at the hardware level, and is absolutely transparent for the programmer. Accordingly, when reading a sector, an inverse procedure is carried out, i.e., descrambling. As a result of this operation, the white noise is removed from the sector, which is then converted back to its initial form.
The transparency of the scrambling mechanism creates a false impression , meaning that its algorithm seems absolutely useless for the programmer and only of any interest to hardware designers. In reality, however, this is not the case. Since scrambler was designed specifically for the elimination of unpremeditated occurrences of unfavorable sequences, the ability to create these sequences on purpose will allow us to create discs that are unreadable at the hardware level. What next? ”you might ask. Why choose such a complicated way of creating an unreadable disc? Why not just take a disc, scratch its reflecting layer with a sharp needle, so that it will also be unreadable? After all, it is also possible to smash it with a paperweight or a sledge hammer . All joking aside, there is a point to this. The point is that, as a result of the presence of correcting codes, it is possible to create an unfavorable sequence, compute the correcting codes corresponding to it, and write this sequence to the disc in a slightly modified form in such a way as to ensure that, on one hand, it changes from an unfavorable to a favorable one, and, on the other hand, that after passing through ReedSolomon decoder, it is restored into its initial (unfavorable) form. Any attempt to copy such a disc using a standard CD-copying program will fail, because the program will write an unfavorable sequence as is. When an attempt is made to read this sequence, an error will occur! Promising, isn t it? More details on this technique will be provided in Chapter 6 , which is dedicated to protection mechanisms based on weak sectors. For the moment, let s concentrate on the scrambler.
According to the ECMA-130 standard, the scrambling algorithm appears as follows : Each bit in the input stream of the scrambler is added modulo 2 to the least significant bit of a maximum-length register. The least significant bit of each byte comes first in the input stream. The 15-bit register is of the parallel block synchronized type, and is fed back according to polynomial x 15 +x+1. After the Sync of the Sector, the register is pre-set to the value 0000 0000 0000 0001, where the ONE is the least significant bit (Fig. 1.16).
UpdateShiftRegister () { int i; for (i = 0; i < 8; i ++) { int hibit = ((ShiftRegister & 1) ^ ((Shif tRegister & 2)>>1)) << 15; ShiftRegister = (hibit ShiftRegister) >> 1; } } void Scramble() { int i; for (i=12;i<2352;i++) { Scrambled[i] = Scrambled[i] ^ (ShiftRegister&0xFF); UpdateShiftRegister(); } }
Does this make sense to you? It didn t to me either . at least until I used disassembler and carried out reverse engineering of the Clone CD program. As a matter of fact, Clone CD is an excellent tool for bypassing protection mechanisms based on weak sectors. Because of this, it must contain its built-in scrambler.
Among the functions exported by the ElbyECC.dll (supplied as part of Clone CD), there is one very interesting function: RawSrcambleSector , the disassembled listing of which appears as shown in Listing 1.2.
.text:100020E0 RawScrambleSector proc near .text:100020E0 .text:100020E0 arg 0 = dword ptr 4 .text:100020E0 .text:100020E0 mov eax, [esp+arg 0] ; Loading the passed argument into EAX .text:100020E4 mov ecx, offset ScrmblrTbl ; ECX contains pointer to ScrmblrTbl. .text:100020E9 add eax, 0Ch ; Skipping 12 bytes of sync sequence .text:100020EC push esi ; Saving ESI .text:100020ED push edi ; Saving EDI .text:100020EE sub ecx, eax ; Computing delta .text:100020F0 mov edx, 249h ; 2340 / 4 bytes for scrambling .text:100020F5 .text:100020F5 loc_100020F5 ; CODE XREF: RawScrambleSector+22 j .text:100020F5 mov esi, [ecx+eax] ; Take next DWORD from the table .text:100020F8 mov edi, [eax] ; Next DWORD for scrambling .text:100020FA xor edi, esi ; XOR'ing .text:100020FC mov [eax], ed ; Saving the result .text:100020FE add eax, 4 ; Next DWORD .text:10002101 dec edx ; De-incrementing the counter .text:10002102 jnz short loc_100020F5 ; Looping .text:10002104 pop edi ; Restoring EDI .text:10002105 pop esi ; Restoring EDI .text:10002106 retn ; Returning from the function .text:10002106 RawScrambleSector endp
Analysis of the disassembled listing shows that the developers of Clone CD preferred a fast table algorithm to the tedious and resource-consuming fuss of working with a polynomial. This algorithm is reduced to superimposing a pseudo-random sequence over the sector being scrambled, which is carried out by means of the XOR operation. The actual result is nothing other than a disposable Vernam cipher, which has a length of the private key equal to the length of the sector part being scrambled (2,340 bytes). Implementation of this algorithm in a high-level language will appear approximately as follows:
RawScrambleSector (char *raw_sector) { int a; DWORD *p; DWORD *MyScramblerTable=(DWORD *) ScramblerTable; p = (DWORD*)(raw_sector + 12); for (a = 0; a < 2340 / 4; a++) { p[a] ^= MyScramblerTable[a]; } }
Now all that remains is to look into the pseudo-random sequence itself. The first eight members of this sequence (which were retrieved by the disassembler from Clone CD) appear as follows:
dd 060008001h dd 01E002800h dd 006600880h dd 081FE02A8h dd 028606080h dd 0889E1E28h dd 0AAAE6668h dd 0E0017FFCh
The full table is too large to be provided in its entirety here. Even printed in the smallest font, it would consume the entire page ”more than 4,000 characters ). Therefore, it is more interesting to discover the pattern, according to which the members of this sequence are related to one another and recreate the algorithm that computes all of the members of the sequence if we know the first. This small programming puzzle is not as difficult as it might seem at first glance. Sure, a brief glance at the first 8 members of the pseudo-random sequence won t provide any clues to its nature. In fact, the numbers changed chaotically and seem to bear a close resemblance to the dancing men mystery solved by Sherlock Holmes. In this case, however, frequency analysis is useless, and this problem cannot be solved by brute force. The good news, however, is that we are not trying to solve this problem from scratch! First, we know for sure that scrambling is carried out by 16-bit words (the width of the scrambling register is exactly 16 bits). Because of this, we must analyze words, and not double words. The fact that XORing is carried out by 32-bit blocks doesn t change anything because XOR is a bitwise operation. Therefore, the bit width of the operands has no influence on the final result! Second, the most convenient way for analyzing patterns is to do so at the bit level, because this level is precisely the one, at which this pseudorandom sequence is generated.
The script shown in Listing 1.5 automatically converts all table elements into 16-bit words displayed in binary format. Start IDA, press <F2>, and load the file containing this script. Then, move the cursor to the first element of the table, press <Shift>+<F2>, and enter the following command: x2bin(ScreenEA() , ScreenEA()+2340 , 2) . Pressing <Ctrl>+<Enter> (or simply <Enter>, if you have an earlier IDA version) will start the script for execution.
// x_start - the starting address for conversion // x_len - the number of bytes for conversion // x_pow - the number of bytes in a single element static x2bin (x_start, x_end, x_pow) { auto a,p; for (p=x_start;;) { // Converting into the element of the required width if (x_pow == 1) Make3yte(p); else if (x_pow == 2) MakeWord(p); else if (x_pow == 4) MakeDword(p); else return 0; // Converting into binary code OpBinary(p, 0); // Next element p = p + x_pow; // Exit, if everything is done, if (p>x_end) break; } }
The updated pseudo-random sequence of the scrambler will appear as follows (Listing 1.6 provides its first 16 elements).
dw 1 00000000000000 1 b dw 0 11 0000000000000b dw 00 1 1 00000000000b dw 000 1111 000000000b dw 0000 1 000 1 0000000b dw 00000 11 00 11 00000b dw 000000 1 1 1 01000b dw 1 000000 11111111 0b dw 0 11 00000 1 0000000b dw 00 1 1 0000 11 00000b dw 000 1111 000 1 1 000b dw 1 000 1 000100 1111 0b dw 0 11 00 11 00 11 1 000b dw 1 1 1 1 1 1 111 0b dw 0 1111111111111 00b dw 111 000000000000 1 b
Now, the pattern can be detected easily (this shows how important it is to format the listing correctly). The bits of each of the next elements are moved one position to the right, nestling up to the logical East and making a kind of bit stream , which increases its size linearly in its diagonal flow (each next element adds one bit to it). However, at a certain stage, it is suddenly broken into a set of smaller streamlets, interlacing and forming a indecipherable mess. Nevertheless, the physical principles forming the foundation of this pattern are still hidden in the darkness and the mist. So, there is nothing left for us to do other than to resort to the blind trial method, hoping for a little intuition and luck.
Now what do we know? Unfortunately, there is not a glut of information... The scrambler XORs the contents of its internal register with the flow of data being scrambled. After each scrambled 16-bit word, it modifies the value of this register... but how? Let s take two adjacent elements of our pseudo-random sequence and try to guess the sequence of operations that generates the next element. There are only a few possible answers: shift, XOR , AND , or OR . It is unlikely that the creators of the CD-ROM would have used anything else in the scrambler.
Let s, therefore, take the starting element (i.e., the number 11 0000000000000b ) that the scrambler has created from 00 1 1 00000000000b in some yet unknown way. Clearly, there has been a shift. To compensate for this, let s shift the next element to the right by one and write the new value under the original (as if were carrying out modulo-2 addition):
dw 0 11 000000000000b dw ???????????????b XOR ------------------- dw 0 1 1 00000000000b
All but the most lazy readers will be able to determine the source item here: 11 000000000000b XOR 0 1 1 00000000000b gives us... 00 11 00000000000b . But wait! This is our unknown item shifted one position to the right! Let s consider how the next pair of numbers would behave:
dw 0 1 1 00000000000b dw ???????????????b XOR ------------------- dw 000 1111 000000000b
Well, 1 1 0000000000b XOR 000 1111 000000000b gives us a value of 00 1 1 0000000000b . Consequently, we can see that we have chosen the right method! After quickly writing the simplest script computing the next members on the basis of those preceding them, we can determine the correct results for all of the members of the sequence, running from the second member to the seventh, inclusively. After that, however, our theory will cease to work. In fact, the theory and practice will go different ways, like a married couple after a divorce. Quite unexpectedly, the binary one will appear in the most significant bit. In the next iteration, it will generate a parasitic streamlet. Is it possible that the bit shift in the word takes place cyclically, i.e., that the least significant bit is periodically carried upwards? An attempt to compute the next member refutes this theory.
Having spent a couple of hours trying to find information on polynomials and the specific features of their implementation in the literature available to me, I couldn t find anything out. Having finally decided that, so to speak, hasty climbers have sudden falls , I simply produced a printout of the first hundred members of our pseudorandom sequence and manually computed the next element for each on the basis of the one that preceded it. Having completed this operation, I simply marked all of the exceptions that I detected. It turned out that the 14th and 15th bits (starting from zero) are spontaneously inverted from time to time. All of the other bits behaved in complete accordance with the theory.
Now, all that remained to do was to detect, under which conditions these bit mutations take place. I discovered pretty quickly that if the first bit (counting from zero) is set to 1, then 15th bit of the same member is inverted. This was fairly obvious, especially on the printout. Discovering the second exception from the rule was a little more difficult: if the bit 0 of the computed member is set to 1, then 15th and 14th bits of the same member are inverted. Accordingly, if both the 0th and 1st bits are set to 1, then only 14th bit is inverted, due to the double inversion (Fig. 1.17). That s it! Now we can compute the entire pseudo-random sequence!
The source code of the program generating the scrambling sequence is provided below. Naturally, it isn t a masterpiece of optimization. It is, however, illustrative enough.
/*----------------------------------------------------------------------------- * * GENERATES THE SEQUENCE FOR CD SCRAMBLER * ============================================= * * build 0x001 @ 07.06.2003 ------------------------------------------------------------------------------*/ #include <stdio.h> // Check fragment of the real scrambling sequence for checking the program // ----------------------------------------------------------------------- //0x8001,0x6000,0x2800,0x1e00,0x0880,0x0660,0x02a8,0x81fe,0x6080,0x2860,0x1e28, //0x889e,0x6668,0xaaae,0x7ffc,0xe001,0x4800,0x3600,0x1680,0x0ee0,0x04c8,0x8356, //0xe17e,0x48e0,0x3648,0x96b6,0xeef6,0xccc6,0xd552,0x9ffd,0xa801,0x7e00,0x2080, printf_bin(int a) { int b; for(b = 15; b >= 0; b--) printf ("%x", (a & (1<<b))?1:C);printf("%x\n",a); } main() { int a, tmp; int reg = 0x8001; // The first element of the scrambling sequence for(a = 1; a < 1170/ * The scrambled sector part length in words*/; a++) { // Printing printf_bin(reg); if ((a % 8) == 0) printf(".............%03d................\n",a /8); // Modulo-2 addition with shift tmp = reg >> 1; tmp = reg ^ tmp; reg = tmp >> 1; // processing polynomial x^15+x+1, e.g., 1<<15 + 1<<1 + 1<<0 if (reg & 1<<1) reg = reg ^ (1<<15); if (reg & 1<<0) reg = reg ^ ((1<<15) (1<<14)); } }
The next production cycle starts with chopping the hurriedly scrambled sector into 24-byte slices of data called F1 -frames. A simple computation shows that one sector with a length of 2,352 bytes can be sliced into exactly 98 F1 -frames.
F1 -frame. The structure of F1 -frames is extremely simple: each frame consists of 24 bytes (12 words), numbered from 0 to 23 and sequentially mapped to the appropriate cells of the sector (Fig. 1.18, a ) . At the same time, the boundaries of frames and sectors do not necessarily have to match. Therefore, the sector can start from any of the following positions of the first frame: 0, 4, 8, 12, 16 or 20 (Fig. 1.18, b ) . The starting position of the sector in the F1-frame isn t stored anywhere (after all, there is no place to store it). Instead, the starting position of the sector is recognized by the presence of a sync group, which is hard not to notice!
The standard provides a rather vague description of the process for mapping sectors to frames. However, it does tell us that the starting point of the next sector directly follows the end of the previous sector (Byte 2,351 of a sector is immediately followed by byte 0 of the next sector). Consequently, the change in the starting position of the sector doesn t wrap the sector s tail to the starting point of the first frame. Instead, the sector tail is carried into the next frame. Briefly speaking, if the starting position of the sector is not equal to zero, then each 49th frame simultaneously contains the bits of two sectors. As is easy to see, these will be the first and the last frames of the sector, and, since one sector contains 98 frames, 98/2 == 49 .
The change of the starting position of the first byte of the sector within a frame results in considerable changes in its DSV (see Sync Groups, Merging Bits, and DSV ). As a result, a CD-R recorder is able to normalize sectors that have catastrophically high DSV values. The drive s firmware must choose the starting position with the smallest DSV value or, at least, ensure that the DSV value doesn t exceed the allowed limits. Unfortunately, most low-end CD-R recorders are too simple for coping with this task. They always start the frame from byte 0 of the sector. As a consequence, the disc copy at the frame level can significantly differ significantly from the original. Despite the fact that it is impossible to determine the starting position programmatically (standard CD-ROM drives refuse to disclose this information), nothing is easier for the developers of protection mechanisms than forming a weak sector with a catastrophically high DSV value for the starting position 0, but quite normal for all other starting positions (see Protection Mechanisms Based on Weak Sectors ). After this, it is practically impossible to copy such a sector using standard writing equipment, because few CD-ROM drive models will be able to read the sector with the high DSV. For example, my ASUS-50x seems to be able to do this. However, it doesn t do it reliably, and, of course, it can t do it for every disc. At the same time, none of the recorders, of which I am aware, allows you to choose the starting position manually (this possibility, at least, is not provided for by the standard, and at the same time, CD-RW drives cannot yet operate at a level this low). It is, of course, possible to use a little cunning and to corrupt several bytes of the sector intentionally, without damaging the error-correction codes (even minor changes introduced into the source data will result in monstrous changes to the DSV), so that the drive s firmware will return the corrupted data to its initial state on the fly. However, if the protection mechanism isn t completely stupid, it will easily distinguish this kind of rough imitation from the original. After all, when reading raw sectors, all of the tweaks that have been performed with error-correcting codes will be disclosed immediately!
At the same time, most CD-RW drives (if not all of them) carefully trace the DSV value and correctly choose the starting position. Well, this seems logical enough ” the contrast range of the CD-RW media is too low and, therefore, the requirements for DSV value here are considerably more stringent than for CD-R media. Hence, if a protected disc cannot be copied on CD-R, try to copy it into CD-RW. By this I mean to use the CD-RW drive to copy the protected disc onto a CD-RW disc, because some CD-RW recorders (for example, Plextor, PHILIPS) always start the frame from 0 byte of the sector when writing to CD-R discs, but at the same time, determine the starting position of the sector correctly when writing to CD-RW media! Of course, this is an irritating circumstance, but noting can be done about it.
The order of bytes in the sector is different from the order of bytes in the F1-frame. This means that when mapping the sector contents to a frame, even and odd bytes are swapped (Fig. 1.19). This mixing is intended to reduce the negative influence of disc defects that involve two adjacent bytes simultaneously.
The software method of slicing a sector into frames is provided in the listing below. Here, the sector is the pointer to the initial sector, and F1_frames is the array of 98 frames, each containing 24 bytes:
/* Generate F1 frames */ for (a = 0; a < 98; a ++) { for (b = 0; b < 24; b++) { F1_frame[a][b]=((*sector&0xff00ff00UL)>>8)((*sector&0x00ff00ffUL)<<8); } }
F2-frame. Newly created F1-frames are supplied to the input of a special coder ( Cross-Interleaved Reed-Solomon Coder, also known as CIRC coder), where their 24 bytes are complemented by 8 bytes of the Reed-Solomon checksum. As a result, F2-frames with a length of 32 bytes are produced at the coder output.
The contents of the bytes forming F1-frames remain unchanged at the bit level (The bit pattern of each of the 24 8-bit bytes of an F1-Frame remains unchanged). However, the bytes themselves are redistributed over 106 F2-frames. As a result, F1-frames are spread over the spiral track, which makes them less sensitive to radial scratches on the disc surface and any other local defects.
Mixing is achieved by means of so-called delay lines, and is carried out according to the following scheme (Fig. 1.20). The first delay section swallows the F1-frames supplied to its input. These frames are already split into 12 two-byte words, where the least significant and most significant bytes are designated by the letters A and B, respectively. The words are numbered sequentially from W12n to W12n+11. Thus, the first byte of the frame has the number W12n, A , while the last one is numbered W12n+11,B .
The first delay line splits the frame contents into two groups of words, one of which (W12n+2, W12n+3, W12n+6, W12n+7, W12n+10, W12n+11) is supplied unhindered to the output, while the second (W12n + , W12n + 1 , W12n+4 , W12n + 5 , W12n+8, W12n + 9) is forcibly delayed during the processing of the next two F1-frames. Words starting from the numbers W12n+1, , W12n+10 are carefully mixed according to a strictly defined scheme. In fact, a picture is worth 128 K words, because it is much easier to draw it graphically (Fig. 1.20) than describe it using normal language.
Mixed words are supplied to the input of the C 2 -encoder, where they are complemented by four parity bytes computed according to the Reed-Solomon codes and sequentially numbered from Q12n+0 to Q12n+3.
The words complemented by parity Q-bytes are then supplied to the second delay line, where they are delayed for the length of the time interval required for processing F1-frames and ranging from 1D to 27D (where D is equal to 4).
The words that have finally been freed are sent to the next Reed-Solomon coder, designated as C2 , where they are supplemented with four parity bytes, sequentially numbered from Pn+0 to Pn+3. This kind of two-stage scheme of redundant encoding reduces the probability of irrecoverable errors considerably, because between C 1 and C 2 coders, the data being processed are carefully mixed!
Finally, the third delay line delays all even bytes in the data flow for the time required to process a single F1-frame. That s it! The newly created F2-frame then exits the output of the third delay line. This frame comprises 32 bytes sequentially numbered from 0 to 32:24 bytes of payload, 4 Q-bytes of parity and 4 P-bytes of parity. At the same time, 24 bytes of useful data contained in the F2-frame include the data of a large set of different F1-frames! In other words, you can t consider the F2-frame to be the F1-frame supplied with the checksum.
When the data are read from a CD, an inverse process takes place (Fig. 1.21). First, the bytes being read pass the delay line that grabs even bytes for the interval required to process one frame. They are then supplied to the C 1 decoder, which checks the correctness of the checksum and tries , if necessary, to recover corrupted bytes. There is then another delay section (1D-27D Delay lines) and another decoder (C 2 decoder), which recover whatever couldn t be recovered by their predecessors. Finally, F1-frames that are ready for use leave the output of the last delay line. Further on, they are assembled into sectors. Sectors have already been covered, so we won t repeat ourselves here.
At such a level of redundancy, error-correction codes can recover up to 2 corrupted bytes per each 24/28-byte slice of the source data. If three or more bytes are corrupted, the decoder can only report an error, and is unable to recover the original contents of corrupted bytes. Still, it is possible to determine exactly which bytes were corrupted. Consequently, it might be possible to determine their approximate value by means of interpolation. Naturally, this technique of recovery is not suitable for recovering data CDs. However, it produces satisfactory results for Audio-CDs. Even on high-quality media, the number of irrecoverable errors is actually large enough. Therefore, CD drives must actively carry out their interpolation. Note that the vast majority of music lovers aren t even aware of its existence.
First-generation CD-ROM drives intended for computers formally supported audio discs, and even managed to produce high-quality playback. However, any attempt to grab these discs and make a digital copy produced a result where the speakers issued continuous crackling sounds similar to those produced by old scratched vinyl gramophone records. While nostalgia isn t necessarily a bad thing, the days of vinyl records are long gone. The main reason for this situation is that CDs produce considerably better sound quality, which doesn t degrade with time, while records are inevitably decaying, even if you store them very carefully. Why, then, do our grabbed CDs become as crippled as records, also produce hissing and cracking sounds?.
This occurs because early CD-ROM drives read Audio CDs as is, and didn t attempt to recover corrupted bytes when they occurred. Furthermore, they didn t even report the number of these bytes. As a result, application software didn t have any information and didn t know what to interpolate ! If the corruption involves the least significant bits of a byte, this might remain undetectable to the human ear (even that of a true music lover). However, if corruption involves the most significant bits, this corruption can be heard by the human ear as a sharp click, noticeable even for those who are hard of hearing. It would be possible to try and read the corrupted sector several times (in the hope that one of the reading attempts would prove successful) or to analyze the read data to find and smooth all sharp peaks and pits. However, this is a half-measure, as most of you will understand! High-quality audio grabbing on such drives is virtually impossible. Moreover, some CD manufacturers, keen on protecting their products from unauthorized copying, began to introduce a large number of irrecoverable errors into their products intentionally. As a result, these CDs can be read normally (even on computer CD drives), but any attempt to grab or copy them to CD-R inevitably failed. The sound quality was so horrible that it threatened to make a true music lover sick. The sound was even worse than that produced by an old-fashioned gramophone.
The situation has begun to correct. Some contemporary CD-ROM drives are capable of returning pointers to corrupted bits in the data flow. At the software level, this is achieved by passing the BEh (READ CD) command with a nonzero value for the Error Flags field to the drive. For reference, this field is located in the first two bits of the 9th byte of the ATAPI/SCSI packet. The result of using this command will be illustrated by the /etc/RAW.CD.READ/aspi32.C2.c demo example. Those of you who would like to get more detailed information on this topic are recommended to read the Standard for DVD/CD-ROM drives, which can be found at the following address: http://www. stanford .edu/~csapuntz/ specs /INF-8020.PDF (page 143). Now, let us concentrate not so much on a description of the format of the fields of the READ CD command, but on C2-pointers themselves. Strictly speaking, these are not pointers at all, but, rather, normal bitmaps placed into the tail of the data returned by the drive. Each bit of the source data has its own corresponding bit of the C2-pointers. Analysis of the value of this bit allows us to determine whether or not this bit is corrupted without ambiguity. Taking into account the fact that the sector length is 2,352 bytes, it becomes easy to compute the total size of all C2-pointers bits, which is 2,352/8=294 bytes. The first byte of the sector corresponds to the first bit of the C2-pointers (Fig. 1.22).
To find out if a specific drive offers this possibility, just send the MODE SENSE 10 ( 5Ah ) command to the drive with the Page Code equal to 2Ah (C/DVD Capabilities and Mechanical Status Page Format). Then the fourth bit of 13th byte of the returned data will specify that C2 Pointers option is supported. If this value is equal to 0, then this function is not supported (see the etc/RAW.CD.READ/aspi32.cfg.c demo example). In particular, my PHILIPS CDRW 2400 doesn t provide this possibility.
But enough bad news. Let s return to our C1- and C2-decoders or, to be more precise, to the technique of computing the number of errors. There are at least six types of errors: a ) single-character (recoverable) errors that correspond to the first stage of recovery (i.e., recoverable by the C 1 decoder); b ) two-character (recoverable) errors corresponding to the first stage of recovery, and c ) three-character (unfortunately, irrecoverable) errors that correspond to the same stage. A similar pattern is typical also for the second stage of recovery related to the C 2 decoder. It is a common practice to designate errors by the Latin character E , followed by a two-digit number, the first digit of which specifies the number of errors (1, 2 or 3), while the second specifies the stage of correction (1 or 2). All possible combinations of these digits are outlined in Table 1.2.
E11 | The number of single-character (recoverable) errors at the C1 stage |
E21 | The number of two-character (recoverable) errors at the C1 stage |
E31 | The number of three-character (irrecoverable) error at the C1 stage |
E12 | The number of single-character (recoverable) errors at the C2 stage |
E22 | The number of two-character (recoverable) errors at the C2 stage |
E32 | The number of three-character (irrecoverable) errors at the C2 stage |
Three-character errors that cannot be recovered at the C1 stage (e.g., E31 errors) can be successfully recovered at the next recovery stage in most cases. However, a single E31 error can cause up to 30 E12 errors, because the data of 160 F1-frames are carefully mixed between C 1 - and C 2 decoders!
Three-character errors irrecoverable at the C2 stage (e.g., E32 errors) serve as an evidence of a serious physical defect on the disc surface. Unfortunately, these errors are not as uncommon as they might seem at first glance, even on virgin discs. This is due to imperfect technological processes. Because of this, it is necessary to use redundant error correction codes on data CDs (for Audio CDs, interpolation is used in such cases, but for data CDs, interpolation is senseless). More detail on this topic was provided in Raw and Cooked Sectors .
F3-frame. When an F2-frame produced as the output of the CIRC decoder is complemented by another byte of auxiliary data, called the Control Byte, an F3-frame is formed , which was already considered in Pits, Lands, Frames, and Sectors in Brief . The structure of the F3-frame is perfectly simple: first is the control byte, followed by 32 bytes inherited from the F2-frame. The control byte contains eight subcode bits, which, in turn, form channels that are 98 bytes long, which means that they span the entire sector comprising 98 F3-frames (for more detail, see Subcode Channels ).
The structure formed by the 98 F3-frames is called a section and represents a selfsufficient entity not in any way bound to the sector boundaries. According to the ECMA130 standard, These Sections are asynchronous with the Sectors, i.e., there is no prescribed relation between the number of the F1-Frame, in which the first byte of a Sector is placed, and the number of the F3-Frame, in which the first Control byte of the table is placed. Another section of the same standard reads as: The address of a Sector is recorded in the Sector Header, also as an absolute time. It has no prescribed relation to the addresses of the Sections, because the mapping of a Sector on the Sections during recording is implementation-dependent due to the freedom left in clause 16. Therefore, the address of a Sector is filled in just before the Sector enters the CIRC encoder.
The first bytes of the two F3-frames of each section (e.g., the first two control bytes of the section) are processed in a special manner. In contrast to other 3,232 bytes of the section, which are converted into 14-bit EFM-sequences that are directly written to the disc, these two bytes are replaced by fixed sync groups named SYNC0 ( 00100000000001 ) and SYNC1 ( 00000000010010 ), respectively (Fig. 1.23).