In the previous section of this chapter, we mentioned that error-protected Reed-Solomon codes are based on the following math components : polynomial arithmetic and the arithmetic of Galois fields . We can t proceed any further until we cover all of these aspects in detail, so let s be patient before assaulting the peaks of higher math. After that, we ll be engaged in pure coding without any sophisticated math.
The 6th section of Art of Programming by Donald Knuth is dedicated to the polynomial arithmetic. There, the following definition is given to this area of math: Formally speaking, polynomial over S is the following expression: u(x)=u n x n + + + u 1 x + u , where un, , u 1 , u coefficients are elements of some algebraic syste m S, and variable x can be considered as a formal symbol without a determining value. Let us assume that algebraic system S represents a commutative ring with 1. This means that S allows addition, subtraction, and multiplication operations satisfying normal requirements: addition and subtraction are associative and commutative binary operations defined over S, multiplication is distributive in relation to addition. There are also elementary unit for addition 0 and elementary unit for multiplication 1, so that a + == a and a * 1 == a for every a belonging to S. Subtraction is an inverse operation in relation to addition; however, no assumptions have been made for division as operation inverse to multiplication. The polynomial 0x n+m + +0x n+1 +u n x n + + + u 1 x + u is considered as identical to u n x n + +u 1 x+u , although formally these polynomials are different.
Thus, instead of presenting data word D , codeword C , and the remainder from the division R as integer numbers (as we did earlier), we can relate them to the appropriate coefficients of a binary polynomial and carry out all further mathematical operations according to the rules of polynomial arithmetic. At first glance, the gain from this approach seems doubtful. However, let s not be hasty. Instead, let s present any number (the first one that comes to mind; for instance, let it be 69h ) into a binary polynomial. Start the built-in Windows application named Calculator, or any other suitable application of your choice, and convert this number into binary form (with a bit of skill, it is possible to carry out this operation mentally: 69h ’ 1101001 .
Well, the rightmost coefficient is equal to 1, then there are two zeroes, followed by 1 briefly speaking, we get the following: 1 x 6 + 1 x 5 + x 4 + 1 x 3 + x 2 + x + 1 . In fact, the bit string 1101001 is one of the forms of writing the above-mentioned polynomial. Certainly, this form of data presentation might lack clarity from the beginner s point of view, but it is very convenient for machine processing. Stop. If 69h already represents a polynomial, then what is the difference between the addition of polynomials 69h and 27h and the addition of integer numbers 69h and 27h ?! Undoubtedly, there is a difference. As Nietzsche has shown, there are no facts. All that exists are their interpretations. Interpretations of numbers and polynomials are different, and mathematical operations over them are carried out according to different rules.
Coefficients in polynomial arithmetic are strictly typified and the coefficient of x k is of a different type than the coefficient of x m (provided, of course, that k ‰ m ). Operations over numbers of different types are not allowed! All coefficients are processed independently and carry to the more significant bit or borrowing from the more significant bit are not taken into account. Let us demonstrate this with the example of adding 69h and 27h :
1101001 (69h) 1101001 (69h) +0100111 (27h) +0100111 (27h) ------- ------- 1001110 (4Eh) 10010000 (90h)
Simple calculations show that modulo-2 addition of polynomials gives the same result as their subtraction and magically coincides with the XOR operation. The match with XOR is a purely accidental, though. However, the equivalence of addition and subtraction forces us to review the nature of things, to which most of us are accustomed. For instance, let us recall arithmetic problems such as Mary has one apple. Pete takes it away from her. We gave another apple to Mary. How many apples does Mary have? How many apples would she have had if Pete hadn t robbed her? . From the modulo-2 arithmetic the answer would be: zero and one, respectively. Yes! If Pete didn t take the apple from Mary, 1 + 1 == 0 and the poor girl would not have any apples at all.
However, we are digressing. Let s forget the quarrels between the kids and return to the fictious member of our polynomial and its coefficients. Thanks to their typification and lack of mutual relations, we can process numbers of practically unlimited lengths by simply XOR ing their bits. This is one of the main advantages of polynomial arithmetic that are unnoticeable at first glance, but thanks to which polynomial arithmetic became so widespread.
In our case, however, polynomial arithmetic alone is not sufficient. To implement a Reed-Solomon coder /decoder, we ll also need Galois fields. What are these, you ask?
As long ago as the 1960s, when computers were iron-cast and 20 MB hard disks resembled washing machines, a beautiful legend was born about a green extra-terrestrial creature that came from stars and recorded the entire British Encyclopedia into a thin light-silvery rod, which the creature took with it as it left. These days, when the 100-GB hard disks are no bigger than a pack of cigarettes, this density of information recording is no longer a wonder . However, the point of this legend is that the extra-terrestrial creature has mastered the technology of writing an infinite volume of information on an infinitely small media, and the British Encyclopedia was chosen just as an example. The alien could copy the contents of all Internet servers by placing only a thin trace on its metal rod. You don t believe it? You have objections? Let us convert British Encyclopedia into digital form and obtain a very large number. Then let us place a decimal point before it, thus converting this number into a very long decimal fraction. Now it only remains to find two numbers, A and B , so that the result of dividing A by B is exactly equal to this decimal fraction with precision to the last digit. Writing these numbers to a metal rod can be carried out by placing a mark on the rod dividing the latter into two sections of lengths that are multiples of the values A and B , respectively. To read information from the rod, it will be enough to measure the lengths of the A and B sections and then divide A by B . The first ten digits following the decimal point will be more or less precise, after which The practice here will outdo the perfect theory, burying the latter under a thick cover of information garbage that results from the impossibility of measuring exactly the geometric size of real-world objects.
In a digital world, the situation is even worse . Every programmer knows only too well that the division of integer and real numbers is subject to rather stringent limitations. First and foremost, division is a very resource-consuming operation, with regard to processor resources. If it s not enough that it is insufficient, the operation is mathematically inaccurate! This means that, if c = a * b , this doesn t necessarily mean that a == c/b ! Thus, normal arithmetic is not suitable for practical implementation of Reed-Solomon codes. Therefore, one has to resort to a special branch of mathematics, namely, the mathematics of finite Galois groups.
Here, we interpret the group as a set of integer values, sequentially numbered from to 2 n “1 , for example: {0, 1, 2, 3} or {00h 01h, 02h, 03h, 04h, 05h, 06h, 07h, 08h, 09h, 0Ah, 0Bh, 0Ch, 0Dh, 0Eh, 0Fh} . Groups containing 2 n elements are called Galois Fields and denoted as follows : GF(2 n ) .
Group members without fail are subject to associative, commutative, and distributive laws. However, they are processed in a way that might, at first glance, seem unnatural :
The sum of any two group members is always present in this group.
For any group member, designated as a , there always exists an identity member, which usually is denoted as e , and satisfies the following condition: a + e = e + a = a
For each member a of a group there is an inverse member ˆ’ a , so that a + ( ˆ’ a) = 0
Let us start with the first thesis. Does this seem like gibberish to you? Suppose that we have the following group: {0, 1, 2, 3} . Is it possible to obtain the number less than or equal to 3 when calculating the value 2 + 3 ?! As it turns out, addition in Galois fields is carried out without accounting the carry, and the sum of two members of a Galois group is equal to: c = (a + b) % 2 n , where the % operation stands for the calculation of the remainder from the division. As applied to our case: (2 + 3) % 4 == 1 . In mathematics, this operation is called modulo-4 addition, or congruence addition .
Naturally, the following question will likely interest you: Does modulo addition find a practical application or is it used only in abstract theoretical construction? Good question. You mechanically carry out this operation many times each day without even noticing or thinking that this is exactly that operation ”addition without taking carry into account. For instance, suppose that you woke up at 6 p.m. and then worked on your computer for 9 hours without pauses. Then you accidentally look at your watch. Provided that the watch actually shows exact time, what will the position of the hour hand be? Obviously, the required value represents the modulo-12 sum of 6 and 9, which is equal to: (6 + 9) % 12=3 . Here is an illustrative example of using Galois arithmetic. And now, as an experiment, let us subtract 6 from 3 (if you find this difficult, just look at your watch).
Now, let us proceed with the most important fact: Since the result of dividing one group member by another group member (naturally, a non-zero one), must be present in this group, then, despite the fact that this is an integer division, this operation will be precise. In fact, the result will be precise rather than approximate! Consequently, if c = a * b , then a == c/b . In other words, multiplication and division are unambiguously and consistently defined for all group members, except for the case of division by zero. At the same time, multiplication doesn t result in the increase of the bit width!
Naturally, this is not a standard multiplication (and, of course, not in every Galois field 2*2=4 ). However, Galois arithmetic must not necessarily correspond to common sense. The most important thing about it is that it works, and works rather well. The existence of hard disks, CD-ROM/DVD drives is the best confirmation of this fact, since they all use this arithmetic for specific purposes of one sort or another.
As was already mentioned, modulo-2 Galois fields have become the most wide-spread in computing. This is because these fields are the most natural from machine processing point of view, since it is binary by the nature.
To implement Reed-Solomon coder/decoder, we will need four basic arithmetic operations: addition, subtraction, multiplication, and division . These operations will be covered in detail in the following few sections.
Modulo-2 addition in Galois fields is identical to subtraction, and is implemented by the XOR bitwise operation. This aspect was already discussed when studying polynomial arithmetic. Therefore, we will simply provide an example of software implementation of the addition/subtraction function:
// This function returns the result of modulo-2 addition (subtraction) // of two polynomials a and b int gf_sum(int a, int b) { return a ^ b; }
Having opened an elementary school math textbook , we will read there that multiplication represents the operation of addition repeated a number of times. If we have already learned how to carry out addition in Galois fields, should we feel confident that the multiplication function won t cause us any serious difficulties? Unfortunately, no. I always knew that two multiplied by two equals four. However, I never believed this to be an absolute truth. After encountering Galois fields for the first time, I understood how right I was [i] . As it turns out, there are other kinds of mathematics where two multiplied by two is not equal to four, and the multiplication operation is not defined using addition, but, rather, is based on quite a different concept.
In fact, if try to wrap the gf_sum function into the loop, we will get just the same addition, defined in a different way: a * b will be equal to a , if b is even, otherwise it will be equal to zero. Who needs this type of multiplication? Actually, the function implementing real Galois multiplication is so complicated and resource-consuming that to simplify its implementation, it is necessary to transform polynomials into index form, and then add indexes by modulo of GF , after which it is necessary to carry out an inverse transformation of the sum of indexes into polynomial form.
What is the index ? The index is the exponent of the power of two, which produces the required polynomial. For instance, the index of polynomial 8 is equal to 3 ( 2 3 = 8 ), while index of polynomial 2 is equal to 1 ( 2 1 = 2 ). It is quite easy to show that a * b = 2 i + 2 j =2 (i+j) . In particular, 2 * 8 =2 3 +2 1 = 2 (3+1) =4 4 = 16 . Now let us compose the following table and carry out some experiments with it:
i | alpha_of[i] |
---|---|
001 |
|
002 | 1 |
004 | 2 |
008 | 3 |
016 | 4 |
Up to this moment, we operated with the concepts of the customary arithmetic. Therefore, about two thirds of the table fields remained blank. In fact, equations such as 2 x = 3 have no integer solutions. Therefore, some indexes do not correspond to any polynomials! However, since the number of polynomials of any Galois field is equal to the number of possible indexes, we can map them to one another in specifically predefined way, without paying any attention to the fact that this action makes no sense from the customary mathematical point of view. The specific scheme of mapping can be chosen in any way. The only fact that matters here is that the mapping must be consistent., e.g., that it must satisfy all above-listed requirements of the groups (see Galois Fields ).
Naturally, since the final result depends directly on the chosen mapping scheme, both parties (Reed-Solomon coder and decoder) must observe specific agreements. However, different Reed-Solomon coders/decoders can use different mapping schemes, incompatible to one another.
In particular, the Reed-Solomon decoder built into the CD-ROM drive carries out multiplication according to the table provided in Listing 2.13. Having encountered such a table in the disassembled listing of the program being investigated, you ll be able to quickly and reliably identify all functions that are using it.
i | alpha | index | i | alpha | index | i | alpha | index |
---|---|---|---|---|---|---|---|---|
000 | 001 | -1 | 047 | 035 | 69 | 094 | 113 | 70 |
001 | 002 |
| 048 | 070 | 29 | 095 | 226 | 64 |
002 | 004 | 1 | 049 | 140 | 181 | 096 | 217 | 30 |
003 | 008 | 25 | 050 | 005 | 194 | 097 | 175 | 66 |
004 | 016 |
| 051 | 010 | 125 | 098 | 067 | 182 |
005 | 032 | 50 | 052 | 020 | 106 | 099 | 134 | 163 |
006 | 064 | 26 | 053 | 040 | 39 | 100 | 017 | 195 |
007 | 128 | 198 | 054 | 080 | 249 | 101 | 034 | 72 |
008 | 029 | 3 | 055 | 160 | 185 | 102 | 068 | 126 |
009 | 058 | 223 | 056 | 093 | 201 | 103 | 136 | 110 |
010 | 116 | 51 | 057 | 186 | 154 | 104 | 013 | 107 |
011 | 232 | 238 | 058 | 105 | 9 | 105 | 026 | 58 |
012 | 205 | 27 | 059 | 210 | 120 | 106 | 052 | 40 |
013 | 135 | 104 | 060 | 185 | 77 | 107 | 104 | 84 |
014 | 019 | 199 | 061 | 111 | 228 | 108 | 208 | 250 |
015 | 038 | 75 | 062 | 222 | 114 | 109 | 189 | 133 |
016 | 076 | 4 | 063 | 161 | 166 | 110 | 103 | 186 |
017 | 152 | 100 | 064 | 095 | 6 | 111 | 206 | 61 |
018 | 045 | 224 | 065 | 190 | 191 | 112 | 129 | 202 |
019 | 090 | 14 | 066 | 097 | 139 | 113 | 031 | 94 |
020 | 180 | 52 | 067 | 194 | 98 | 114 | 062 | 155 |
021 | 117 | 141 | 068 | 153 | 102 | 115 | 124 | 159 |
022 | 234 | 239 | 069 | 047 | 221 | 116 | 248 | 10 |
023 | 201 | 129 | 070 | 094 | 48 | 117 | 237 | 21 |
024 | 143 | 28 | 071 | 188 | 253 | 118 | 199 | 121 |
025 | 003 | 193 | 072 | 101 | 226 | 119 | 147 | 43 |
026 | 006 | 105 | 073 | 202 | 152 | 120 | 059 | 78 |
027 | 012 | 248 | 074 | 137 | 37 | 121 | 118 | 212 |
028 | 024 | 200 | 075 | 015 | 179 | 122 | 236 | 229 |
029 | 048 | 8 | 076 | 030 | 16 | 123 | 197 | 172 |
030 | 096 | 76 | 077 | 060 | 145 | 124 | 151 | 115 |
031 | 192 | 113 | 078 | 120 | 34 | 125 | 051 | 243 |
032 | 157 | 5 | 079 | 240 | 136 | 126 | 102 | 167 |
033 | 039 | 138 | 080 | 253 | 54 | 127 | 204 | 87 |
034 | 078 | 101 | 081 | 231 | 208 | 128 | 133 | 7 |
035 | 156 | 47 | 082 | 211 | 148 | 129 | 023 | 112 |
036 | 037 | 225 | 083 | 187 | 206 | 130 | 046 | 192 |
037 | 074 | 36 | 084 | 107 | 143 | 131 | 092 | 247 |
038 | 148 | 15 | 085 | 214 | 150 | 132 | 184 | 140 |
039 | 053 | 33 | 086 | 177 | 219 | 133 | 109 | 128 |
040 | 106 | 53 | 087 | 127 | 189 | 134 | 218 | 99 |
041 | 212 | 147 | 088 | 254 | 241 | 135 | 169 | 13 |
042 | 181 | 142 | 089 | 225 | 210 | 136 | 079 | 103 |
043 | 119 | 218 | 090 | 223 | 19 | 137 | 158 | 74 |
044 | 238 | 240 | 091 | 163 | 92 | 138 | 033 | 00 |
045 | 193 | 18 | 092 | 091 | 131 | 139 | 066 | 237 |
046 | 159 | 130 | 093 | 182 | 56 | 140 | 132 | 49 |
141 | 021 | 197 | 180 | 150 | 20 | 219 | 086 | 177 |
142 | 042 | 254 | 181 | 049 | 42 | 220 | 172 | 187 |
143 | 084 | 24 | 182 | 098 | 93 | 221 | 069 | 204 |
144 | 168 | 227 | 183 | 196 | 158 | 00 | 138 | 62 |
145 | 077 | 165 | 184 | 149 | 132 | 223 | 009 | 90 |
146 | 154 | 153 | 185 | 055 | 60 | 224 | 018 | 203 |
147 | 041 | 119 | 186 | 110 | 57 | 225 | 036 | 89 |
148 | 082 | 38 | 187 | 220 | 83 | 226 | 072 | 95 |
149 | 164 | 184 | 188 | 165 | 71 | 227 | 144 | 176 |
150 | 085 | 180 | 189 | 087 | 109 | 228 | 061 | 156 |
151 | 170 | 124 | 190 | 174 | 65 | 229 | 122 | 169 |
152 | 073 | 17 | 191 | 065 | 162 | 230 | 244 | 160 |
153 | 146 | 68 | 192 | 130 | 31 | 231 | 245 | 81 |
154 | 057 | 146 | 193 | 025 | 45 | 232 | 247 | 11 |
155 | 114 | 217 | 194 | 050 | 67 | 233 | 243 | 245 |
156 | 228 | 35 | 195 | 100 | 216 | 234 | 251 | 22 |
157 | 213 | 32 | 196 | 200 | 183 | 235 | 235 | 235 |
158 | 183 | 137 | 197 | 141 | 123 | 236 | 203 | 122 |
159 | 115 | 46 | 198 | 007 | 164 | 237 | 139 | 117 |
160 | 230 | 55 | 199 | 014 | 118 | 238 | 011 | 44 |
161 | 209 | 63 | 200 | 028 | 196 | 239 | 022 | 215 |
162 | 191 | 209 | 201 | 056 | 23 | 240 | 044 | 79 |
163 | 099 | 91 | 202 | 112 | 73 | 241 | 088 | 174 |
164 | 198 | 149 | 203 | 224 | 236 | 242 | 176 | 213 |
165 | 145 | 188 | 204 | 221 | 127 | 243 | 125 | 233 |
166 | 063 | 207 | 205 | 167 | 12 | 244 | 250 | 230 |
167 | 126 | 205 | 206 | 083 | 111 | 245 | 233 | 231 |
168 | 252 | 144 | 207 | 166 | 246 | 246 | 207 | 173 |
169 | 229 | 135 | 208 | 081 | 108 | 247 | 131 | 232 |
170 | 215 | 151 | 209 | 162 | 161 | 248 | 027 | 116 |
171 | 179 | 178 | 210 | 089 | 59 | 249 | 054 | 214 |
172 | 123 | 220 | 211 | 178 | 82 | 250 | 108 | 244 |
173 | 246 | 252 | 212 | 121 | 41 | 251 | 216 | 234 |
174 | 241 | 190 | 213 | 242 | 157 | 252 | 173 | 168 |
175 | 255 | 97 | 214 | 249 | 85 | 253 | 071 | 80 |
176 | 227 | 242 | 215 | 239 | 170 | 254 | 142 | 88 |
177 | 219 | 86 | 216 | 195 | 251 | 255 | 000 | 175 |
178 | 171 | 211 | 217 | 155 | 96 | |||
179 | 075 | 171 | 218 | 043 | 134 |
Using this table, you can easily carry out the transform from the polynomial form into the index form, and vice versa. How should you use this table? Let us assume that we need to multiply the polynomials 69 and 96 . Let us find the number 69 in column i . The corresponding alpha value is 47 . Let us memorize it (or write it down) and proceed with the number 96 , for which the value of alpha is equal to 217 . Let us carry out a modulo-256 addition of 47 and 217 . As a result, we will get the following value: (217 + 47) % 256 = 8 . Now, let us transform the result from the index form into the polynomial form. To do so, it is necessary to find the number 8 in column i and its corresponding polynomial ”3 ”in column index . (If we carry out an inverse operation, by dividing 3 by 69 , we will get 96 , which proves the consistency of multiplication and division operations, as well as in the entire Galois arithmetic as a whole). This is rather fast, although sometimes it is unclear why the table is composed in this way rather than in some other. The worst thing is that the trustworthiness of the result cannot be felt intuitively, because all of these concepts represent a pure abstraction. This significantly complicates program debugging (it is rather difficult to debug anything where you don t fully understand the operating principle).
Still, it is not necessary to enter manually the multiplication table from the keyboard. It is possible to generate this table automatically, on the fly, at program execution time. A possible implementation of such a generator looks as follows:
#define m 8 // The exponent of the RS polynomial // (equal to 8, according to the ECMA-130 // standard) #define n 255 // n=2**m-1 // (length of the codeword) #define t 1 // number of errors to be corrected #define k 253 // k = n-2*t // (length of the data word) // prime generator polynomial // According to ECMA-130: P (x) = x 8 + x 4 + x 3 + x 2 + 1 int p[m+1]={1, 0, 1, 1, 1, 0, 0, 0, 1}; int alpha_to [n+1]; // table of exponents // of the prime member int index_of [n+1]; // index table // for fast multiplication //-------------------------------------------------------------------- // Generating the look-up table for fast multiplication // for GF(2^m) on the basis of prime generator polynomial // from p[0] to p[m]. // // Look-up table: // index>polynomial from alpha_to [] // contains j=alpha^i, // where alpha is the trivial member, // usually equal to 2 // a ^ - operation of raising to the power // (not XOR!); // // polynomial form > index from // index_of [j=alpha^i] = i; // // Simon Rockliff //------------------------------------------------------------------- generate_gf() { int i, mask; mask=1; alpha_to[m]=0; for (i = 0; i < m; i++) { alpha_to[i]=mask; index_of [alpha_to[i]] = i; if (p[i] != 0) alpha_to[m] ^= mask; mask <<= 1; } index_of [alpha_to [m]] = m; mask >>= 1; for (i = m + 1; i < n; i++) { if (alpha_to[i-1] >= mask) alpha_to[i] = alpha_to [m] ^ ((alpha_to [i-1] ^mask) <<1); else alpha_to[i] = alpha_to [i-1 index_of[alpha_to[i]] = i; } index of [0] = -1; }
The multiplication function itself looks rather trivial. In fact, it fits within five lines of code. In most software implementations of Reed-Solomon coder/decoder that I have seen, the division operation is not even implemented as a separate procedure but, rather, is carried out directly where it is called.
// The function returns the result of multiplication // of two polynomials in Galois fields int gf_mul(int a, int b) { int sum; if (a == 0 b == 0) return 0; // Some optimization // won't hurt. sum=alpha_of[a]+alpha_of[b], // Calculating the sum // of polynomial indexes if (sum >= GF-1) sum -= GF-1; // Bringing the sum to // the GF module return index of [sum]; // Transforming the // result to the // polynomial form // and return the result }
Division in Galois fields is carried out practically the same way as multiplication, with the only exception that the indexes are not added but, rather, subtracted from one another. Actually, a/b == 2 i /2 j == 2 (i “j) . To transform this presentation from polynomial to index form and vice versa, the above-provided look-up table can be used.
Naturally, do not forget about the fact that, however perverse Galois fields might seem, even abstract arithmetic doesn t allow division by zero. Therefore, the division function must be supplied with the appropriate check.
// This function returns the result of division of two // polynomials, a and b, in Galois fields. // In case of attempt of division by zero, the function // returns -1. int gf_div(int a, int b) { int diff; if (a == 0) return 0; // Some optimization // wont hurt. if (b == 0) return -1; // Division by zero // is not allowed! diff = alpha_of[a] - alpha_of[b]; // Calculating the // difference of indexes ff += GF-1; // Bringing the // difference to the // GF module return index_of [diff]; // Transforming the // result to the // polynomial form // and returning // the result }
Ancient-model hard disks developed by IBM represent a good example of the implementation of the Reed-Solomon coder/decoder. The IBM 3370 model had very simple and illustrative Reed-Solomon coder/decoder of the (174, 171) type in the Galois field GF(256) . In other words, this coder/decoder operated with 8-bit cells (2 8 =256) , and there were 3 bytes of checksum field per 171 data bytes. As a result, this produced a codeword having the length of 174 bytes. As we will see later, all three checksum bytes were calculated absolutely independently from one another. Consequently, the Reed-Solomon coder/decoder operated only with one byte, which significantly simplified its architecture.
In contemporary hard disks, the Reed-Solomon coder/decoder has become significantly more complicated, and the number of check bytes has grown immensely. As a result, in implementations of this Reed-Solomon coder/decoder, one has to operate with numbers of unnatural lengths (about 1,408 bits or more). Consequently, the source code bristled with a thick layer of additional checks, loops , and functions that significantly complicate its understanding. Furthermore, most manufacturers of computer hardware have recently migrated to the hardware implementations of Reed-Solomon coders/decoders, which are implemented entirely by a single chip. In other words, despite all of our respect for technological progress, it is much better to study the basic operating principles of the Reed-Solomon coders/decoders on the examples of ancient models.
The listing provided below shows a fragment of the original code of the firmware for the IBM 3370 hard disk:
for (s0 = s1 = sm1 = i = 0; i < BLOCK_SIZE; ++i) { s0 = s0 ^ input[i]; s1 = GF_mult_by_alpha[ s1 ^ input[i]]; sm1 = GF_mult_by_alpha_inverse[sm1 ^ input[i]]; };
err_i = GF_log_base_alpha [GF_divide [s1] [s0]]; // Calculating // the error // syndrome input[err_i] ^= s0; // Correcting the // erroneous byte
Can we understand how it works? With regard to the s0 variable, everything is clear: This variable stores the checksum calculated according to the trivial algorithm. As you probably remember, addition in Galois fields is carried out by logical XOR operation. Therefore, s0+= input[i] .
The goal of the s1 variable is more difficult to clarify. To understand the idea of its transformations, we must know the contents of the GF_mult_by_alpha table. Although, for the sake of brevity, this function is not provided here, it is easy to guess its goal because it has the speaking name : the contents of s1 are added to the next byte of the data flow being controlled, and then multiplied by so-called primitive term , designated as alpha , and equal to 2. In other words, s1 = 2 * (s1+input[i]) .
Now, let us assume that one of the bytes of the data flow becomes inverted (let us designate its position as err_i ). Then the index of the inverted byte can be determined by the trivial division of s1 by s0 . Why is this the case? In fact, s1 = 2 * (s1 + input[i]) is nothing other than the multiplication of the data word by the generated polynomial dynamically generated on the basis of its primitive member alpha . The checksum of the data word stored in the s0 variable actually represents the same data word, simply represented in more compact form. As already mentioned, if the error took place in position x , then the remainder generated by the division of the codeword by the generated polynomial will be equal to k = 2 x . Now, it only remains to calculate x by the given k , which in our case is carried out by means of looking up the GF_log_base_alpha table, which stores the pairs of mappings between k and 2 x . As long as the position of the erroneous byte is detected , it can be corrected by means of XOR ing it with the calculated checksum s0 ( input [err_i] ^=s0 ). Naturally, this is true only for single errors. The above-described algorithm is unable of correcting errors comprising two or more corrupted bytes per data block. In fact, the third byte of the checksum ” sm1 ” is present for exactly this purpose. It protects the decoder from incorrect attempts at error correction, i.e., prevents error correction when the number of errors exceeds 1. If the expression s1/s0 == sm1 * s0 becomes false, then the disk controller can record the fact of multiple errors, but should report that these errors cannot be recovered.
As is well-known, the defects of a magnetic surface tend to group errors rather than cause single errors. To compensate for the weakness of the error-correction algorithm, IBM engineers resorted to byte interleaving . The IBM 3370 hard disk had the interleave of 3:1, which means that the first byte of the first block goes first, followed by the first byte of the second block, then by the first byte of the third block, and only after that ”by the second byte of the first block. This trick increased the corrective capabilities of the hard disk from a single error to the sequence of three corrupted bytes However, if bytes that were not directly adjacent to one another were subject to corruption, then the corrective capability once again dropped down to one corrupted byte per block. However, the probability of such an event was significantly lower.
Naturally, this kind of algorithm can be implemented not only in hard disks. By varying the block size and interleave factor, you ll be able to ensure more or less strong protection having higher or lower information redundancy. In fact, let us suppose that we have N sectors on the disk. Then, having divided them into blocks containing 174 sectors each and having dedicated 3 sectors per block for storing the checksum, we will be able to restore at least N/174 disk sectors. Based on an average disk capacity of 100 GB (which corresponds to 209,715,200 sectors), we will be able to restore up to 1,205,259 sectors ”even in the case of their total physical destruction. At the same time, only 2 percent of disk space will be reserved for storing checksums. Most people would agree that the situations, when the destruction of a hard disk is so fast that correcting capabilities of the Reed-Solomon code prove to be insufficient for restoring information, are quite rare (provided, of course, that incipient disk corruption is noticed in due time, and provided that the interleave factor is chosen correctly ”so that sectors belonging to the same plate of the hard disk were served by different correcting blocks. Otherwise, the corruption of the surface of one disk plate results in a group error that cannot be recovered by this algorithm).
What should we do if the entire hard disk is damaged? In this case, the most reasonable approach is to create an array of several disks storing useful information mixed with correcting codes. The main drawback of such an approach is its inefficiency with arrays comprising a small number of hard disks. The reasonable minimum is as follows: four data disks and one disk for check information. In this case, the loss of any of the data disks will be compensated by the check disk that remained functional. The damaged check disk can easily be replaced by a new one, and all check codes will be recalculated. However, if two disks fail simultaneously , this means a catastrophe. An array containing 15 disks, among which 12 are data disks, and 3 disks are reserved for check codes, provides a significantly higher level of fault tolerance. It tolerates simultaneous failure of any two disks, and under favorable circumstances, even simultaneous failure of three disks can be overcome .
Actually, there is nothing new in all this information. In fact, appropriate RAID controllers are available in practically any computer store. However I can hardly imagine the cost of a RAID controller of level 15, and I am not sure that it will be possible to make it work (based on my own experience, RAID controllers even of basic levels are extremely error-prone , whimsical, and have extremely high requirements both to hardware and to the OS environment). Finally, practically all RAID controllers require either absolutely identical disks or, at least, ones that are very similar in their characteristics and/or interfaces). What if they are not available?
Software implementation of RAID, actively promoted by the author of this book, is free from all of the above-listed drawbacks. You can use disks of different geometry and even of different capacity. At the same time, nothing limits you to concentrating them in a single location. On the contrary, disks can be accessed even via the network. Furthermore, it is not necessary to reserve the entire disk for RAID storage. You can arbitrarily dedicate the desired part of the disk space for this purpose.
How can we use this in practice? The first idea that comes to mind is to use part of the hard disk space for storing redundant information that will allow the restoration of the disk contents in the event of a failure. If several computers are joined to form a network then, under conditions of relatively low overhead, we will be able to restore any of the hard disks in any computer connected to a network, even if its information is totally destroyed . This is thanks only to the redundant information distributed among all of the other computers. It would be hard to conceive of a more reliable system for information storage! This scheme was implemented by the author in the LANs of several enterprises . It has proven its high survivability , flexibility, and a wide range of functional capabilities. The need to constantly carry out backups of the hard disk contents was automatically eliminated. This is more than urgent under conditions of peer-to-peer networks, where there is no dedicated server! On the other hand, such networks exist, and they are not rare. (No, I do not promote the development of this type of network or claim that it is a good solution. I am simply noting the fact that they exist and are not going to be eliminated totally in the nearest future.)
The only drawback of software RAID implementation is its low performance. In particular, by installing software RAID on a server processing thousands of queries per second and modifying a large number of files intensively, you won t get a performance gain, but the concept of performance itself is rather relative. Having a sufficiently fast processor, it is possible to carry out information encoding/decoding on the fly without any losses in throughput! On the other hand, if read operations dominate over write operations, the installation of software RAID is a natural solution, since the integrity control of the information being read is carried out at the hardware level of the drive itself. Consequently, when using systematic encoding (i.e., data words separate from parity bytes), the Reed-Solomon decoder doesn t need to interfere with this process, and its help is needed only when part of the information is badly damaged. Realistically, this doesn t happen often. Thus, it is not worth paying a lot to firms specializing on hardware RAID implementations, especially since they don t provide home users and small businesses the attention that they deserve.
[i] In other words, when I press the button switch, I know that the light will be switched on. However, I am not sure (suppose that the electrician has cut the wires, or the light bulb has fused, etc.) The same thing is typical also for math. That set of trash that most of us are fed up from the secondary school, or even from colleges, is not a math. This is a set of voodoo rites that students must repeat to obtain the result, but which do not allow to grasp the underlying ideas ”zen of math. I don t know whether or not this is for the better, but I feel it my duty to remind you that math taught in many educational institutions has no more relation to the real math than programming has to torturing the mouse when struggling with Microsoft Word or installing Windows.