Many programs spend quite a lot of time on operations that involve copying and moving large amounts of data. For these purposes, the Intel processors have a number of commands that allow you to speed up the operations over multibyte arrays and strings. There are a number of processor instructions, called string commands, which have been developed for these tasks .
The group of string commands includes commands such as movs , lods , cmps , scas . When these commands are used without the repetition prefix ( rep ), they will hinder the application performance. Since these commands are complex, they require a fairly large number of processor cycles, and therefore cannot be optimized by the processor well enough. A good alternative to such instructions is the program code using the ordinary commands.
But if you use the string commands with the repetition prefix ( rep ), you can achieve high performance levels in copying and moving the data. In this case, the most efficient commands are those containing a mnemonic indication of the size of the operands (such as lodsb , lodsw , lodsd , movsb , movsw , etc.). For instance, if you need to copy 100 double words from the src memory buffer to the dst buffer, you can use the following sequence of commands:
. . . mov ECX, 100 lea ESI, src lea EDI, dst cld rep movsd . . .
The following sequence of commands may be a good alternative to the code fragment considered above:
. . . mov ECX, 100 lea ESI, src lea EDI, dst next: mov EAX, [ESI] mov [EDI], EAX add ESI, 4 add EDI, 4 dec ECX jnz next . . .
If you proceed even further, unrolling the loop as to copy two double words simultaneously in one iteration, you will obtain a much more efficient code than the variant using the rep movsd command:
. . . mov ECX, 100 lea ESI, src lea EDI, dst next: mov EAX, [ESI] mov [EDI], EAX mov EAX, [ESI+4] mov [EDI+4], EAX add ESI, 8 add EDI, 8 sub ECX, 2 jnz next . . .