|
|
Vertex shaders version 1.0 and 1.1 can be up to 128 instructions long, while DirectX 9 shaders can be up to 256 instructions long. These instructions are a collection of general instructions, macroinstructions, and version and constant definition instructions. They are instructions that operate on input registers or constants and can selectively choose which of the register's four float elements to use. These elements in turn can be swizzled, negated, or masked.
The philosophy of shaders is to keep them simple and straightforward. Thus you'll see a very limited set of instructions, which, with a bit of cleverness, can actually be expanded to a wider variety of uses, as we'll see later. Generally (with a few notable exceptions), each shader instruction corresponds to one clock cycle of execution time. This means that the longer your shader program, the longer it'll take to execute with almost a direct linear correspondence. Macroinstructions are just that, and they can expand up to 12 general instructions, so remember that when you're calculating the shader size and number of clock cycles. On the other hand, the register masks and swizzles add no execution time and should be used freely.
abs(macro) | vs 2.0 |
This macro computes the absolute value of the input register.
One slot
_______________________________________________________________________________ abs Dest0, Source0 _______________________________________________________________________________
This macro is equivalent to
_______________________________________________________________________________ max Dest0, Source0, -Source0 _______________________________________________________________________________
which you can use if you're using a prevertex shader 2.0 shader. In any case, you'll end up with the absolute value of the Source0 in Dest0.
Setup One source register, Source0.
Results Dest0 is filled with the absolute value of Source0.
abs r0 , r0 abs r0.z, r0.z
SetSourceRegisters(); // Simulate the add instruction TempReg.x = abs( Source0.x ); TempReg.y = abs( Source0.y ); TempReg.z = abs( Source0.z ); TempReg.w = abs( Source0.w ); WriteDestinationRegisters();
add | vs 1.0, 1.1, 2.0 |
Adds two sources into the destination register.
One slot
_______________________________________________________________________________ add Dest0, Source0, Source1 _______________________________________________________________________________
Adds the Source0 and Source0 registers and places the result in Dest0 register.
Setup Two source registers, Source0 and Source1.
Results Each element of Dest0 is filled with the element-by-element addition of the elements of Source0 and Source1.
add r0 , r0 , c2 add r0.z, r0.z, -r0.z
SetSourceRegisters(); // Simulate the add instruction TempReg.x = Source0.x + Source1.x; TempReg.y = Source0.y + Source1.y; TempReg.z = Source0.z + Source1.z; TempReg.w = Source0.w + Source1.w; WriteDestinationRegisters();
call | vs 2.0 |
Makes an unconditional function call to the instruction label.
One slot
_______________________________________________________________________________ call_InstructionLabelID _______________________________________________________________________________
Pushes the address of the following instruction onto the internal shader stack and then sets the current instruction address to the address of the instruction that follows the label instruction with the name InstructionLabelID. The instruction label ID will be an integer in the range [1, 16]. Calls cannot be nested.
Typically, you'd create a shader subroutine that terminates with the ret instruction.
Setup Requires a valid, existing instruction label.
Results The shader execution is transferred to the instruction following the instruction label.
call_1 call_16 call_Fred // Error! Invalid label call_0 // Error! Invalid label (out of range)
// Simulate the call instruction // make a cast to a bare function pointer typedef (void (*fp)(void)); // take address of the label fp pFP = (fp)IntructionLabelID; pFP(); // call the function // returns here only when ret is executed
callnz | vs 2.0 |
Call if Not Zero. Makes a function call to the instruction label.
One slot
_______________________________________________________________________________ callnz InstructionLabelID BoolSource0 _______________________________________________________________________________
If the boolean register Source0 is not zero, then the address of the following instruction is pushed onto the internal shader stack, and then the current instruction address is set to the address of the instruction that follows the label instruction with the name InstructionLabelID. The instruction label ID will be an integer in the range [1, 16]. Calls cannot be nested.
Typically, you'd create a shader subroutine that terminates with the ret instruction.
Setup Source0 is a boolean register. Requires a valid, existing instruction label.
Results If the source register is not zero, the shader execution is transferred to the instruction following the instruction label.
callnz 1 b0 // transfer execution to label1 if = ! = b0 callnz 2 r0 // Error! Not a boolean register
// Simulate the callnz instruction // make a cast to a bare function pointer typedef (void (*fp)(void)); if ( 0 != Boolean argument ) { fp pFP = (fp)IntructionLabelID; pFP(); // call the function }
crs(macro) | vs 2.0 |
The three-component cross product computed.
Two slots
_______________________________________________________________________________ crs Dest0, Source0, Source1 _______________________________________________________________________________
Computes the three-component cross product using the right-hand rule. There are fairly severe restrictions on the use of swizzles. The w elements of all registers are ignored.
This macro is equivalent to the following sequence of instructions:
_______________________________________________________________________________ mul Dest0.xyz, Source0.yzxw, Source1.zxyw mad Dest0.xyz, -Source1.yzxw, Source0.zxyw, Dest0 _______________________________________________________________________________
Setup Two source registers, Source0 and Source1. These registers must not be the same as the destination register. The source registers must not have any swizzles
The destination register must have a destination mask, and that mask must not contain a reference to the w element of the destination register.
Results The cross product of the two input registers is stored in the specified elements of the destination register.
crs r0.xyz, r1.. r2 // fill r0 with dp3
SetSourceRegisters(); // Simulate the crs macro TempReg.x = Source0.x * Source1.z - Source0.z * Source1.y; TempReg.y = Source0.z * Source1.x - Source0.x * Source1.z; TempReg.z = Source0.x * Source1.y - Source0.y * Source1.x; // note w component ignored WriteDestinationRegisters();
dcl | vs 2.0 |
Declare. Map a vertex element to an input register.
Takes no slots
_______________________________________________________________________________ dcl Dest0 _______________________________________________________________________________
In order to make it easier to optimize and verify shaders VS 2.0 now requires a declaration statement on all input registers. Thus all texture or vertex input registers must be declared before use in the shader. Dest0 will be a specific input register. The partial precision modifier (_pp) can be applied to the declaration statement to indicate a lower precision is acceptable when using this register. You must supply a component mask on Dest0 to indicate which elements are in use and valid. dcl statements must appear before the first executable instruction.
dcl t1.rg // using a 2D texture dcl t2 // using a 4D texture (default mask) dcl_pp t3 // indicate partial precision is OK
def | vs 1.0, 1.1, 2.0 |
Sets the value of vertex shader floating point constants. In DirectX 8 it is up to the programmer to insert these into the shader code.
No slot
_______________________________________________________________________________ def Dest0, value0, value1, value2, value3 _______________________________________________________________________________
Stores four floating point values in the elements of the Dest() register. If these instructions are used in a shader, the instructions must follow the vs instruction and precede any other instructions.
Setup Four floating point values separated by commas.
Results In DirectX 8 this instruction has no effect upon the shader code to follow. You must manually insert the returned code fragment into your shader. If you use the def in a shader, then when the shader is compiled, you will have to use the fourth parameter returned from D3DXAssembleShader(). This parameter will contain an ID3DXBuffer interface, which will contain a compiled shader code fragment. You will have to manually insert this fragment into your shader declaration.
In DirectX 9, this instruction causes the register to immediately assume the values specified. Previous values are restored when the shader exits. |
def r0, 0.0f, 0.5f, 0.25f, -1.0f def r1, 1.0f, 2.0f, 5.0f, 10.0f
defi | vs 2.0 |
Sets the value of vertex shader integer constants.
No slot
_______________________________________________________________________________ defi IntDest0, value0, value1, value2, value3 _______________________________________________________________________________
Stores four integer values in the elements of IntDest() register for use in this shader.
Setup Four integer values separated by commas.
Results Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantI() call to set a shader constant. The previous values of the register are restored upon exit from the shader.
defi i0, 0, 2, 4, 8 defi i1, -2, -1, 1, 2
defb | vs 2.0 |
Sets the value of vertex shader boolean constants.
No slot
_______________________________________________________________________________ defb BoolDest0, value0, value1, value2, value3 _______________________________________________________________________________
Stores four boolean values in the elements of BoolDest0 register for use in this shader. Zero indicates false. Nonzero indicates true.
Setup Four booleans separated by commas.
Results Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantB() call to set a shader constant. The previous values of the register are restored upon exit from the shader.
defb b0, 0, 1, 0, 2 // false, true, false, true
dp3 | vs 1.0, 1.1, 2.0 |
Three-component dot product (dot product 3) is computed and the result replicated in all specified channels of the destination register.
One slot
_______________________________________________________________________________ dp3 Dest0, Source0, Source1 _______________________________________________________________________________
Computes the dot product of the Source0 and Source1 registers, and places the result in the Dest0 register. Only the x, y, and z values are used to compute the dot product; the w component is ignored.
Setup Two source registers, Source0 and Source1.
Results Unless otherwise masked, each element of Dest0 is filled with the dot product of the first three elements of registers Source0 and Source1.
dp3 r0 , v3, c2 // fill r0 with dp3 dp3 r1.x, v3, c2 // just fill r1.x
SetSourceRegisters(); // Simulate the dp3 instruction TempReg.x = TempReg.y = TempReg.z = TempReg.w = Source0.x * Source1.x + Source0.y * Source1.y + Source0.z * Source1.z; // note w component ignored WriteDestinationRegisters();
dp4 | vs 1.0, 1.1, 2.0 |
Four-component dot product (dot product 4) is computed and the result stored in all specified channels of the destination register.
One slot
_______________________________________________________________________________ dp4 Dest0, Source0, Source1 _______________________________________________________________________________
Computes the dot product of the Source0 and Source1 registers, and places the result in the Dest0 register. If no mask is specified on the destination, then the entire register is filled with the dot product.
Setup Two source registers, Source0 and Source1.
Results Unless otherwise masked, each element of Dest0 is filled with the dot product of the four elements of registers Source0 and Source1.
dp4 r0, v3, c2 dp4 r1.x, v3, c2 // just fill r1.x
SetSourceRegisters(); // Simulate the dp4 instruction TempReg.x = TempReg.y = TempReg.z = TempReg.w = Source0.x * Source1.x + Source0.y * Source1.y + Source0.z * Source1.z + Source0.w * Source1.w; WriteDestinationRegisters();
dst | vs 1.0, 1.1 |
Computes a distance vector in the format typically used for attenuated lighting calculations.
One slot
_______________________________________________________________________________ dst Dest0. Source0, Source1 _______________________________________________________________________________
Creates a distance vector from a set of distance squared and reciprocal distance values, and puts them in a format that can be used for attenuated lighting calculations.
Setup Two source registers are required to be set up. Source0 should be set up as [n/a, d2, d2, n/a]. Source1 should be set up as [n/a, 1/d, n/a, 1/d]. Elements noted as n/a are not used, and their values are ignored.
Results Dest0 will be filled with elements that correspond to [1, d, d2, 1/d]. Dest0.y is computed from the product of Source0.y and Source1.y.
dst r2, r0, r1
SetSourceRegisters(); // Simulate the dst instruction TempReg.x = 1; TempReg.y = Source0.y * Source1.y; TempReg.z = Source0.z; TempReg.w = Source1.w; WriteDestinationRegisters();
else | vs 2.0 |
Provided an alternate path of execution for an if- else-endif block.
One slot
_______________________________________________________________________________ else _______________________________________________________________________________
Must be inside of an if-endif block. If the boolean argument of the if statement is false, then the execution will skip to the else instruction and continue to the terminating endif statement. If the boolean was true then execution will skip over the code enclosed by the else-endif block. There can be only one else statement in an if-endif block.
Setup The else statement must be between an if and endif statement.
Results If the argument provided to the if statement was false, then the code inside the else-endif block will be executed.
else
endif | vs 2.0 |
The termination point for an if-endif or ifc-endif block.
Zero slots
_______________________________________________________________________________ endif _______________________________________________________________________________
When used with the if or ifc instruction, creates a block of instruction over which execution can be specified a number of times.
Setup You must have an if or ifc instruction in your shader prior to this instruction.
Results Execution is controlled by the if or ifc instruction that proceeds this instruction. When the argument of that statement is false then execution will jump to the statement following the endif.
if b1 // if b1 != 0, this section gets executed else // optional else statement // if b1 = 0, this section gets executed endif
endloop | vs 2.0 |
The termination point for a loop-endloop block.
One slot
_______________________________________________________________________________ endloop _______________________________________________________________________________
When used with the loop instruction, creates a block of instructions over which execution can be specified a variable number of times.
Setup You must have a loop instruction in your shader prior to this instruction.
Results When the loop reached the endloop instruction, the loop counter (specified in the loop instruction) is incremented by the increment value (also specified in the loop instruction). See the loop instruction to see the pseudocode of a loop-endloop block.
endloop
// simulate the endloop instruction // assume that LoopCounter, LoopStep, LoopInterator // were defined in the loop instruction and // StartLoopOffset is the instruction following // the loop instruction LoopCounter += LoopStep; --LoopInterator; if ( LoopIterater > 0 ) goto StartLoopOffset // fall through
endrep | vs 2.0 |
The termination point for a rep-endrep block.
Zero slots
_______________________________________________________________________________ endrep _______________________________________________________________________________
When used with the rep instruction, creates a block of instruction over which execution can be specified a number of times.
Setup You must have a rep instruction in your shader prior to this instruction.
Results Execution is controlled by the rep instruction that precedes this instruction. When the iteration count of that statement is zero then execution will jump to the statement following the endrep. See the rep instruction for simulation code.
defi i0, 20, 0, 0, 0 rep i0 // i0.x is used = 20 // this section gets executed 20 times endrep
exp(macro) | vs 1.0, 1.1, 2.0 |
This macro computes power of two to at least 20 bits of precision. By default, only the source register's w element is used. The results are replicated in the entire destination register. Note that the expp instruction sets the destination's w element to 1.
At least 12 instruction slots
_______________________________________________________________________________ exp Dest0, Source0 _______________________________________________________________________________
Calculates for 2Source0.w, and writes the result in Dest0. Unless otherwise specified, Source0.w is the input value, and all elements of Dest0 are written with the exponented value. This is somewhat different from the expp instruction, which always sets Dest0.w to 1.A replicate swizzle is required on the source register.
exp r0, cl.w // fill all of r0 with exp2 (c1.w) exp r0.x, c1.y // store exp2 (c1.y) in r0.x
SetSourceRegisters(); // Simulate the exp macro TempReg.x = TempReg.y = TempReg.z = TempReg.w = ::pow (2, Source0.w); WriteDestinationRegisters();
expp | vs 1.0, 1.1, 2.0 |
Computes partial precision power of two. For DirectX 8 the results are broken into a partial precision part and a higher precision integer and fractional parts. This allows you to use the lower precision single element or a more complicated integer/fraction calculation when you need higher precision. The destination's w element is set to 1. Only the integer part of the source register's w element is used. If Source0.w < 0 then the results are undefined.
For DirectX 9, the partial precision result fills the destination register. |
Note | Don't confuse this with the exp macro! |
One slot
_______________________________________________________________________________ expp Dest0, Source0 _______________________________________________________________________________
The DirectX 8 version computes low- and higher precision values for 2Source0.w, where Dest0.z contains the low-precision single-element approximation, Dest0.x and Dest0.y contain the integer and fractional parts, and Dest0.w is set to 1.
You have a choice in which part of the results to use. The low-precision part will contain the exponent of the input value to 10 bits of precision. The two-part higher precision part will contain the exponent of the integer part of the input value, and the fractional part of the input value, which you will have to provide a function to compute the value of 2n for 0 <=n <= 1 to your desired precision, and then add that to the integer's exponent value.
The DirectX 9 version just computes the low precision part.
Setup Stores the value you want the exponent of in Source0.w. The value should be positive. The other register elements are ignored. A replicate swizzle is required on the Source Register.
Results Dest0.z will contain a low-precision exponential value. Dest0.x will contain the exponential of the integer part of the input. Dest0.y will contain the fractional part of the input, not the exponential of the fractional part. You have to do the conversion yourself. Dest0.w is set to 1.0.
expp r0, r1.w
// DirectX 8 version SetSourceRegisters(); // Simulate the expp instruction float wWhole = Source0.w; // take all float wInt = (int)Source0.w; // take integer part // compute the higher precision parts TempReg.x = pow(2,wInt); TempReg.y = Source0.w - wInt; // fractional part of w // calculate the 2^(Source0.w) then chop // to 10 bits precision TempReg.z = pow(2,wWhole) & 0xffffff00; // set w to 1 TempReg.w = 1; WriteDestinationRegisters();
// DirectX 9 version SetSourceRegisters(); // Simulate the expp instruction float wWhole = Source0.w; // take all TempReg.x = TempReg.y = TempReg.z = TempReg.w = pow(2,wWhole) & 0xffffff00; WriteDestinationRegisters();
frs(macro) | vs 1.0, 1.1, 2.0 |
This macro removes the integer part of the input register's elements, and places the fractional remainder into the destination register's elements. The sign of the results is always positive.
Three instruction slots
_______________________________________________________________________________ frc Dest0, Source0 _______________________________________________________________________________
Takes the fractional parts of Source0's elements and places them in Dest0's elements. The sign of the input arguments is ignored. Version 1.x requires an. xy write mask.
frc r0.xy, r1 // use r1.xy and store fractions in r0.xy // use r1.x and store fraction in r0.y, r0.x (and z & w) // remain unchanged frc r0.y , r1.x // this has no effect on the results, since the // sign is ignored frc r0.y , -r1.x frc r0, r1 // Error! No write mask.
SetSourceRegisters(); // Simulate the frc macro TempReg.x = abs( Source0.x ); TempReg.y = abs( Source0.y ); TempReg.z = abs( Source0.z ); TempReg.w = abs( Source0.w ); TempReg.x = TempReg.x - (int)TempReg.x; TempReg.y = TempReg.y - (int)TempReg.y; TempReg.z = TempReg.z - (int)TempReg.z; TempReg.w = TempReg.w - (int)TempReg.w; WriteDestinationRegisters ();
if | vs 2.0 |
The start of an if-else-endif block. Conditionally execute a block of code.
One slot
_______________________________________________________________________________ if BoolReg0 _______________________________________________________________________________
The argument must be a boolean constant register. There must be a terminating endif that follows the if instruction. The else instruction is optional and must be between the if and endif statements. If the boolean argument is true, then execution will continue immediately after the if statement, until either the else or endif statement are reached.
if blocks can be placed inside an if-endif or a loop-endloop block, but they must be entirely inside them.
Setup The argument must be a boolean register. You must have an endif instruction in your shader following to this instruction.
Results Execution is controlled by the if instruction. When the boolean of that statement is false then execution will jump to the statement following the if, which must be either an else or an endif statement.
if b1 // if b1 != 0, this section gets executed else // optional else statement // if b1 = 0, this section gets executed endif
label | vs 2.0 |
Defines a label for use with a call or callnz instruction.
Zero slots
_______________________________________________________________________________ label <n> _______________________________________________________________________________
The label instruction marks the next instruction as having the specified label, thus making it a target for a subroutine call. The argument <n> must be integer label in the range [0,16]—that is, there can be a total of sixteen labels.
Setup The argument must be an integer.
Results When a call or callnz instruction calls the integer label, execution immediately (and conditionally for the callnz instruction) goes to the instruction following the label statement. Execution will return when a ret instruction is encountered
// VS 2.0 call 12 // somewhere in shader call subroutine label 12 // label 12 // the subroutine instructions go here ret // execution returns after the call
lit | vs 1.0, 1.1, 2.0 |
Computes the traditional diffuse and specular lighting coefficients when passed on the resulting dot products from n • 1 and n • h and a power coefficient.
One slot
_______________________________________________________________________________ lit Dest0, Source0 _______________________________________________________________________________
You'll need to calculate normalized n • 1 and n • h dot products, and specify a specular power value prior to using this instruction. The results will be the traditional diffuse component in Dest0.y,
and the traditional specular component (i.e., Blinn's equation) in Dest0.z,
Note that there are no k parameters in the equations. If you need them, you'll have to do the multiplication in your shader.
Setup Source0.x should contain the normalized dot product between the normal and the direction from the vertex to the light. Source0.y should contain the normalized dot product between the normal and the half-angle vector. Source0.w should contain the power value in the range −128 to +128. Source0.z is ignored.
Results Dest0.x and Dest0.w are set to 1. If Source0.x (n • l) is positive, it's stored in Dest0.y; else Dest0.y is set to 0. If both Source0.x and Source0.y (n • h) are positive, then Dest0.z is set to Source0.y raised to the Source0.w power; else it is set to zero. The power value (Source0.w) is clamped to the range [−128,128].
Note | Early versions of the SDK documentation incorrectly stated that negative exponential values would cause undefined results. |
lit r0 r1
SetSourceRegisters(); // Simulate the lit instruction // these are constants TempReg.x = TempReg.w = 1; // if n dot l is positive... if ( Source0.x > 0 ) { Dest0.y = Source0.x; // if n dot h is positive if ( Source0.y > 0 ) { // clamp the power value to an 8.8 // fixed point representation of the // maximum allowable value const float kPowerMax = 127.9961f; float ClampedPower = Source0.w; if (ClampedPower < -kPowerMax ) ClampedPower = −kPowerMax; else if (ClampedPower > kPowerMax ) ClampedPower = kPowerMax; // actual value in shader math is only // good to seven fractional bits of // precision Dest0.z = pow( Source0.y, ClampedPower ); } else { Dest0.z = 0; // if n dot h was negative/zero } } else { Dest0.y = 0; // if n dot 1 was negative/zero } WriteDestinationRegisters();
log(macro) | vs 1.0, 1.1, 2.0 |
This macro computes log2 of the input argument in at least 20-bit precision. The absolute value the source register's w element is used. Unlike the logp instruction, the destination's w element is not set to 1.
Note | Don't confuse this with the logp instruction. |
At least 12 instruction slots
_______________________________________________________________________________ log Dest0, Source0 _______________________________________________________________________________
Computes log2 of the absolute value of the Source0.w element (unless otherwise specified) and places the result into all elements of Dest0. If the argument is equal to zero, all result registers are set to minus infinity. This is somewhat different from the logp instruction, which always sets Dest0.w to 1.
log r0, r1
SetSourceRegisters(); // Simulate the log macro if ( 0 == Source0.w ) { TempReg.x = TempReg.y = TempReg.y = TempReg.w = MINUS_INFINITY; } else { // note we use absolute value TempReg.x = TempReg.y = TempReg.y = TempReg.w = log(abs(Source0.w))/log(2); } WriteDestinationRegisters();
logp | vs 1.0, 1.1, 2.0 |
Computes partial precision log2. In DirectX 8 the results are broken into a single 10-bit precision part and a higher precision dual-element part. This allows you to use the lower precision single element or a more complicated integer/ fractional calculation if you need higher precision.
For DirectX9 the partial result fills the entire destination register. |
One slot
_______________________________________________________________________________ logp Dest0, Source0 _______________________________________________________________________________
The DirectX 8 version computes low- and higher precision values for log2Source0.w. The destination's w element is set to 1. Only the source register's w element is used. If Source0.w is negative, then its absolute value is used. If Source0.w is zero, then the results are negative infinity in Dest0.x and Dest0.z, and 1.0 in Dest0.y.
You have a choice in which part of the results to use. The low-precision part will contain the log2 of the input value to 10 bits of precision.
The two-part higher precision part represents the exponent and mantissa. This allows you to use the lower precision single element or a more complicated exponent/mantissa calculation when you need higher precision.
To use the high-precision results, you'll need to provide a function that computes log2 in the range [1,2] with your desired precision. You'd then add this result to the value returned in Dest0.x to get the log2 of your input value.
The DirectX 9 version just computes the low-precision part.
Note | Don't confuse this with the log macro! |
Setup For DirectX 8, store the value you want the log2 of in Source0.w. The value should be positive. The other register elements are ignored. For DirectX 9, indicate the element by using a (required) replicate swizzle.
Results DirectX 8: Dest0.z contains the low precision (10-bit) single-element approximation. Dest0.x contains the most significant part of the dual-element result. This value can be negative. Dest0.y contains the mantissa of the dual-element result in the form of an exponented value in the range [1,2). You have to do the conversion yourself.
DirectX 9: All destination elements will have the low-precision result.
logp r0, r1
// DirectX 8 version: SetSourceRegisters(); // Simulate the logp instruction float v = abs(Source0.w); // only positive values TempReg.y = TempReg.w = 1.0f; if ( 0 == v ) { TempReg.x = TempReg.z = MINUS_INFINITY; } else { float logValue = (float)(log(v)/log(2)); // store exponent Dest0.x = (int)::floor( logValue ); // store mantissa, lop off anything more than // 8 bits of significance int p = (*(unsigned long*)&v & 0x7FFFFF | 0x3F800000; Dest0.y = *(float*)&p; // store low-precision part to 10 bits unsigned long temp = *(unsigned long*)&logValue; Dest0.z = *(float*)& temp & 0xFFFFFF00; } WriteDestinationRegisters();
// DirectX 9 version SetSourceRegisters(); // Simulate the logp instruction float v = abs(Source0.w); // only positive values float logValue; if ( 0 == v ) { logValue = MINUS_INFINITY; } else { logValue = (float)(log(v)/log(2)); logValue = (int)::floor( logValue ); // store low-precision part to 10-bits unsigned long temp = *(unsigned long*)&logValue; logValue = *(float*)& temp & 0xFFFFFF00; } TempReg.x = TempReg.y = TempReg.z = TempReg.w = LogValue; WriteDestinationRegisters();
loop | vs 2.0 |
The starting point for a loop-endloop block. Iterate over a block of code a number of times.
One slot
_______________________________________________________________________________ loop IntSource0 _______________________________________________________________________________
When used with the endloop instruction, creates a block of instruction over which execution can be specified a variable number of times. Each time through the loop the loop counter is incremented by the specified amount. Compare this to the rep instruction, which does not increment the loop counter independently.
Setup The argument must be integer register. IntSource0.x holds the number of times the loop is to execute. The loop counter register gets incremented at the endloop.
The counter can be used to index into the constant register array. IntSource0.y is the initial value for the loop counter register. IntSource0.z specifies the step for the loop counter register. Execution will go to the statement following the matching end loop instruction when IntSource0.x <= 0. The loop counter register, aL, is available inside the loop.
You must not nest loops, nor jump either out of or into a loop-endloop block. The endloop instruction must precede the loop instruction.
Results The instructions in the loop-endloop block are executed IntSource0.x times with the loop counter getting incremented each time through.
loop i0 // assume i0.x, i0.y, and i0.z are set up // aL is available and incremented each time // through the loop endloop
// simulate the loop instruction // -- loop statement begin aL = IntReg0.y; // loop counter StartLoopOffset: // <- label if ( IntSource0.x <= 0 ) goto EndLoopOffset; // -- loop statement end // section of code between the loop/endloop // -- endloop statement begin aL += IntSource0.z; // increment loop counter IntSource0.x--; // decrement goto StartLoopOffset; EndLoopOffset: // <- label // -- endloop statement end
lrp(macro) | vs 2.0 |
Linear interpolation between two registers (a.k.a. "lerp") using a fraction specified in a third register. This is done on an element-by-element basis.
Two slots
_______________________________________________________________________________ lrp Dest0, Source0, Source1, Source2 _______________________________________________________________________________
This macro instruction interpolates between two floating point numbers, Source0 and Source1, based upon a third, Source2. When Source2 is zero, Source0 is placed in the destination. When Source2 is one, Source1 is placed in the destination. Values in the [0,1] range interpolate between Source1 and Source2. If the value is outside the range [0,1] the result is indeterminate. Dest0 must be a temporary register and cannot be the same as Source0 or Source2.
The macro expands to the following code:
_______________________________________________________________________________ add Dest0, Source1, -Source2 mad Dest0, Dest0, Source0, Source2 _______________________________________________________________________________
Setup Dest0 must be a temporary register, and Source2 should be in the range [0,1].
Results The value between Source0 and Source1 is interpolated from the value in Source2. The result is written in Dest0.
lrp r0, r1, r2, c14 lrp r0, r1, r2, r14.x // lrp using a single value
SetSourceRegisters(); // simulate lrp instruction TempReg.x = Source3.x *(Source1.x - Source2.x) + Source2.x; TempReg.y = Source3.y *(Source1.y - Source2.y) + Source2.y; TempReg.z = Source3.z *(Source1.z - Source2.z) + Source2.z; TempReg.w = Source3.w *(Source1.w - Source2.w) + Source2.w; WriteDestinationRegisters();
m3×2(macro) | vs 1.0, 1.1, 2.0 |
Matrix 3 by 2. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro is typically used for 2D transformation calculations.
Two instruction slots
_______________________________________________________________________________ m3x2 Dest0, Source0, Source1 _______________________________________________________________________________
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read; only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused, and only the x and y elements of the destination are written.
This macro
_______________________________________________________________________________ m3x2 Dest0, Source0, Source1 _______________________________________________________________________________
expands to the following:
_______________________________________________________________________________ dp3 Dest0.x, Source0, Source1 dp3 Dest0.y, Source0, Source2 _______________________________________________________________________________
Note | You can use the swizzle or negate modifier if you expand this macro yourself. |
Warning | Make sure that your Dest0 and Source0 registers aren't the same registers. It will compile and run, but your results will be incorrect. |
Note | You are not allowed to use the swizzle or negate modifier on Source1. |
m3x2 r0, v0. c6 ; // will use c7 as well m3x2 r0, v0, c6.yzxw // Error! Can't use swizzle
SetSourceRegisters(); // Simulate the m3x2 macro TempReg.x = Source0.x * Source1.x + Source0.y * Source1.y + Source0.z * Source1.z; TempReg.y = Source0.x * Source2.x + Source0.y * Source2.y + Source0.z * Source2.z; WriteDestinationRegisters();
m3x3(macro) | vs 1.0, 1.1, 2.0 |
Matrix 3 by 3. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro typically is used for normal transformations during lighting calculations.
Three instruction slots
_______________________________________________________________________________ m3x3 Dest0, Source0, Source1 _______________________________________________________________________________
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read; only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused, and only the x, y, and z elements of the destination register are written.
This macro
_______________________________________________________________________________ m3x3 Dest0, Source0, Source1 _______________________________________________________________________________
expands to the following:
_______________________________________________________________________________ dp3 Dest0.x, Source0, Source1 dp3 Dest0.y, Source0, Source2 dp3 Dest0.z, Source0, Source3 _______________________________________________________________________________
Note | You can use the swizzle or negate modifier if you expand this macro yourself. |
Warning | Make sure that your Dest0 and Source0 registers are different. It will compile, but your results will be incorrect. |
Note | You are not allowed to use the swizzle or negate modifier on Source1. |
m3x3 r0, v0, c6 ; // will use c7 & c8 as well m3x3 r0, v0, c6.yzxw // Error! Can't use swizzle
SetSourceRegisters(); // Simulate the m3x3 macro TempReg.x = Source0.x * Source1.x + Source0.y * Source1.y + Source0.z * Source1.z; TempReg.y = Source0.x * Source2.x + Source0.y * Source2.y + Source0.z * Source2.z; TempReg.z = Source0.x * Source3.x + Source0.y * Source3.y + Source0.z * Source3.z; WriteDestinationRegisters();
m3x3(macro) | vs 1.0, 1.1, 2.0 |
Matrix 3 by 4. Performs a matrix multiply on the input vector and input matrix and stores the result.
Four instruction slots
_______________________________________________________________________________ m3x4 Dest0. Source0. Sourcel _______________________________________________________________________________
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source 1 and there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read; only those that are calculated are written to the destination registers. The w elements in the source matrix and rector are unused.
This macro
_______________________________________________________________________________ m3x4 Dest0, Source0, Sourcel _______________________________________________________________________________
expands to the following:
_______________________________________________________________________________ dp3 Dest0.x, Source0, Source1 dp3 Dest0.y, Source0, Source2 dp3 Dest0.y, Source0, Source3 dp3 Dest0.y, Source0, Source4 _______________________________________________________________________________
Note | You can use the swizzle or negate modifier if you expand this macro yourself. |
Warning | Make sure that your Dest0 and Source0 registers are different. It will compile, but your results will be incorrect. |
Note | You are not allowed to use the swizzle or negate modifier on Source1. |
m3x4 r0, v0, c6 ; // will use c7, c8, & c9 as well m3x4 r0, v0, c6.yzxw // Error! Can't use swizzle
SetSourceRegisters(); // Simulate the m3x4 macro TempReg.x = Source0.x * Sourcel.x + Source0.y * Sourcel.y + Source0.z * Sourcel.z; TempReg.y = Source0.x * Source2.x + Source0.y * Source2.y + Source0.z * Source2.z; TempReg.z = Source0.x * Source3.x + Source0.y * Source3.y + Source0.z * Source3.z; TempReg.w = Source0.x * Source4.x + Source0.y * Source4.y + Source0.z * Source4.z; WriteDestinationRegisters();
m4x3(macro) | vs 1.0, 1.1, 2.0 |
Matrix 4 by 3. Performs a matrix multiply on the input vecto and input matrix and stores the result.
Three instruction slots
_______________________________________________________________________________ m4x3 Dest0. Source0, Sourcel _______________________________________________________________________________
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and there are the correct number of registers available after Source1. All source register elements are used, but Dest0.w will be unmodified.
This macro
_______________________________________________________________________________ m4x3 Dest0, Source0, Sourcel _______________________________________________________________________________
expands to the following:
_______________________________________________________________________________ dp4 Dest0.x, Source0, Source1 dp4 Dest0.y, Source0, Source2 dp4 Dest0.z, Source0, Source3 _______________________________________________________________________________
Note | You can use the swizzle or negate modifier if you expand this macro yourself. |
Warning | Make sure that your Dest0 and Source0 registers are different. It will compile, but your results will be incorrect. |
Note | You are not allowed to use the swizzle or negate modifier on Source1. |
m4x3 r0, v0, c6 ; // will use c7 & c8 as well m4x3 r0, v0, c6.yzxw // Error! Can't use swizzle
SetSourceRegisters(); // Simulate the m4x3 macro TempReg.x = Source0. x * Sourcel.x + Source0. y * Sourcel.y + Source0. z * Sourcel.z + Source0. w * Sourcel.w; TempReg.y = Source0. x * Source2.x + Source0. y * Source2.y + Source0. z * Source2.z + Source0. w * Source2.w; TempReg.z = Source0. x * Source3.x + Source0. y * Source3.y + Source0. z * Source3.z + Source0. w * Source3.w; WriteDestinationRegisters();
m4x4(macro) | vs 1.0, 1.1, 2.0 |
Matrix 4 by 4. Performs a matrix multiply on the input vector and input matrix and stores the result.
Four instruction slots
_______________________________________________________________________________ m4x4 Dest0, Source0, Source1 _______________________________________________________________________________
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and there are the correct number of registers available after Source1. All source register elements are used, and all destination registers will be written.
This macro
_______________________________________________________________________________ m4x4 Dest0, Source0, Source1 _______________________________________________________________________________
expands to the following:
_______________________________________________________________________________ dp4 Dest0.x, Source0, Source1 dp4 Dest0.y, Source0, Source2 dp4 Dest0.z, Source0, Source3 dp4 Dest0.w, Source0, Source4 _______________________________________________________________________________
Note | You can use the swizzle or negate modifier if you expand this macro yourself. |
Warning | Make sure that your Dest0 and Source0 registers are different. It will compile, but your results will be incorrect. |
Note | You are not allowed to use the swizzle or negate modifier on Source1. |
m4x4 r0, v0, c6 ; // will use c7, c8, & c9 as well m4x4 r0, v0, c6.yzxw // Error! Can't use swizzle
SetSourceRegisters(); // Simulate the m4x4 macro TempReg.x = Source0.x * Sourcel.x + Source0.y * Sourcel.y + Source0.z * Sourcel.z + Source0.w * Sourcel.w; TempReg.y = Source0.x * Source2.x + Source0.y * Source2.y + Source0.z * Source2.z + Source0.w * Source2.w; TempReg.z = Source0.x * Source3.x + Source0.y * Source3.y + Source0.z * Source3.z + Source0.w * Source3.w; TempReg.w = Source0.x * Source4.x + Source0.y * Source4.y + Source0.z * Source4.z + Source0.w * Source4.w; WriteDestinationRegisters();
mad | vs 1.0, 1.1, 2.0 |
Multiply and add. Multiplies two registers, then adds a third to the result, and then stores the result.
One slot
_______________________________________________________________________________ mad Dest0, Source0, Source1, Source2 _______________________________________________________________________________
Multiplies Source0 by Source1, then adds Source2 to the result. The final result is stored in Dest0.
Setup Source0 and Source1 are the registers to be multiplied. Source2 is the register to be added to the result of the multiplication of Source0 and Source1.
Results Dest0 contains (Source0* Source1) + Source2.
mad r0, r0, r1, r2
Set SourceRegisters(); //Simulate the mad instruction TempReg.x = Source0.x * Source0.x + Source1.x; TempReg.y = Source0.y * Source0.y + Source1.y; TempReg.z = Source0.z * Source0.z + Source1.z; TempReg.w = Source0.w * Source0.w + Source1.w; WriteDestinationRegisters();
max | vs 1.0, 1.1, 2.0 |
Stores the maximum value from comparing two source registers element by element into the destination register.
One slot
_______________________________________________________________________________ max Dest0, Source0, Source1 _______________________________________________________________________________
Finds the maximum between elements of Source0 and Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register since the comparison is per element.
Setup Source0 and Source1 are the registers to be compared.
Result Dest0 contains the maximum elements of the two input registers done on an element-by-element basis.
max r0, r1, r2
SetSourceRegisters(); // Simulate the max instruction TempReg.x = Source0.x > Sourcel.x ? Source0.x : Source0.x; TempReg.y = Source0x > Source0.y ? Source0.y : Source0.y; TempReg.z = Source0.z > Source0.z ? Source0.z : Source.z; TempReg.w = Source0.w > Source0.w ? Source0.w : Source0.w; WriteDestinationRegisters();
min | vs 1.0, 1.1, 2.0 |
Stores the minimum value from comparing two source registers element by element into the destination register.
One slot
_______________________________________________________________________________ min Dst0, Source0. Source1 _______________________________________________________________________________
Finds the minimum between elements of Source0 Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register since the comparison is per element.
Setup Source0 and Source1 are the registers to be compared.
Results Dest0 contains the minimum of the two input registers done on an element-by-element basis.
min r0, r1, r2
SetSourceRegisters(); // Simulate the min instruction TempReg.x = Source0.x < Source0.x ? Source0.x : Source0.x; TempReg.y = Source0.y < Source0.y ? Source0.y : Source0.y; TempReg.z = Source0.z < Source0.z ? Source0.z : Source0.z; TempReg.w = Source0.w < Source0.w ? Source0.w : Source0.w; WriteDestinationRegisters();
mov | vs 1.0, 1.1, 2.0 |
Stores the source registers into the destination register. Useful for moving from a temporary register into an output register or for swizzling. The source and destination registers can be the same.
For DirectX 8, the mov instruction is the only instruction that can use the address register as a destination and only in vertex shaders version 1.1 or later. If the address register is the destination, then the value is rounded to the integer value that is less than or equal to the initial value. In VS 2.0 you must use the mova instruction to set the address register.
One slot
_______________________________________________________________________________ mov Dest0, Source0 _______________________________________________________________________________
Move Source0 into Dest0. A special case is in DirectX8 when the Dest0 is an address register. In this case, the value stored is the closest integer value that is less than the initial value. This means that it rounds the number toward negative infinity. Thus 1.5 would get stored as 1, while −1.5 would get stored as −2. In both cases, the value stored is the integer value that's closest and less than the initial value. In DirectX9, only floating point data can be moved.
Setup Source0 is the register to be copied.
Results Dest0 contains a copy of Source0, unless it's the address register in a DirectX8 shader, in which case it is the nearest integer value that's less than or equal to the initial value in the register. If the destination is the address register, then, unless otherwise specified, only the Source0.x register is used.
mov r0 , r1 mov a0.x , c1w. // initializing address register, DirectX 8
SetSourceRegisters(); // Simulate the mov instruction // it's the address register in a DirectX 8 shader if ( Source0 == a0 ) { // use only integer part TempReg.x = (int) :: floor ( Source0.x ); } else // DirectX 9 shader or Source0 is a float { TempReg.x = SourceO.x; TempReg.y = Source0.y; TempReg.z = Source0.z; TempReg.w = Source0.W; } WriteDestinationRegisters();
mova | vs 2.0 |
Mova date from a floating point register into the address register.
One slot
_______________________________________________________________________________ mova Dest0, Source0 _______________________________________________________________________________
This instruction rounds Source0 to the nearest integer and places the result in Dest0. Dest0 must be the address register. Rounding is to nearest even, though this is not exactly specified and applications should not rely on this behavior. That is, for values equidistant between two integers, some implementations may round up, down, or randomly pick a direction. The _sat modifier is not supported.
Setup Source0 is the floating point register to be rounded, then placed in the address register.
Results The rounded value from Source0 is placed in the address register.
mova a0.x, cl.w // move one element mova a0, cl // move all
SetSourceRegisters(); // use only integer part // Note: RoundToNearestInteger () is // Implementation dependent a0.x = RoundToNearestInteger( Source0.x ); a0.y = RoundToNearestInteger( Source0.y ); a0.z = RoundToNearestInteger( Source0.z ); a0.w = RoundToNearestInteger( Source0.w ); WriteDestinationRegisters();
mul | vs 1.0, 1.1, 2.0 |
Multiplies the two source registers element by element and stores them in the destination register.
One slot
_______________________________________________________________________________ mul Dest0, SourceO, Source1 _______________________________________________________________________________
Multiplies Source0 by Source1 and stores the result in Dest0.
Setup Source0 and Source1 are the two registers to be multiplied.
Results Dest0 contains the result of the multiplication of Source0 and Source 1.
mul r0, r1, r2
SetSourceRegisters (); // Simulate the mul instruction TempReg.x = Source0.x * Source0.x; TempReg.y = Source0.y * Source0.y; TempReg.z = Source0.z * Source0.z; TempReg.w = Source0.w * Source0.w; WriteDestinationRegisters ();
nop | vs 1.0–2.0 |
Defines the null instruction (No-Operation).
One slot, possibly optimized out.
_______________________________________________________________________________ nop _______________________________________________________________________________
You can use it to create a shader that does nothing but take up slots and/or time as it executes to see how a shader of that length would affect your rendering. It's possible that a driver might optimize away this instruction.
Setup None.
Results Nothing.
nop
nrm(macro) | vs 2.0 |
This macro will normalize all elements of a register.
Three slots
_______________________________________________________________________________ nrm Dest0, Source0 _______________________________________________________________________________
This macro will take all elements of Source0and normalize them so that the square root of the sum of squares of all elements in Dest0 is one. Dest0 cannot be the same register as Source0.
This macro
_______________________________________________________________________________ nrm Dest0, Source0 _______________________________________________________________________________
is equivalent to the following:
_______________________________________________________________________________ dp4 Dest0.x, Source0, Source0 rsq Dest0.x, Dest.x mul Dest0, Source0, Dest0.x _______________________________________________________________________________
nrm r0, v0
pow(macro) | vs 2.0 |
Computes the power function for a scalar value.
Three slots
_______________________________________________________________________________ pow Dest0, Source0, Source1 _______________________________________________________________________________
Only the W element of the source registers are used. Only the absolute value of the Source0 is used. Dest0 is filled with abs(Source0.x) raised to the Source1.x power. The result is replicated in all elements of the destination.
This macro
_______________________________________________________________________________ pow Dest0, Source0, Source1 _______________________________________________________________________________
is equivalent to the following:
_______________________________________________________________________________ log Dest0.w, Source0 // takes absolute value mul Dest0.w, Dest0.w, Source1.w exp Dest0, Dest0.w _______________________________________________________________________________
pow r0, r3, c6 // assume r3.x and c6.x are set
rcp | vs 1.0, 1.1, 2.0 |
Computes the reciprocal of an element of the source register and stores it in the destination register.
One slot
_______________________________________________________________________________ rcp Dest0, Source0 _______________________________________________________________________________
Computes the reciprocal of a single element of the source register and stores it in all elements of the destination register. Only one element of the source is used. If no element is specified, then Source0.w is used. A value of exactly 1 on input returns 1 on output (no round-off error), whereas a value of 0 on input returns positive infinity.
Setup Source0 contains the elemente of which to take the reciprocal. If unspecified, Source0.w is used.
Results Dest0 contains the reciprocal of the specified element copied in all elements.
Note | This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you do use it, try to arrange your code so that you don't need the results immediately. |
rcp r0, r1
SetSourceRegisters (); // Simulate the rcp instruction if ( 0.0f == Source0.w ) // if 0 { TempReg.w = PLUS_INFINITY; } else if ( 1.0f == Source0.w) // if 1 { TempReg.w ==1.0f; } else { TempReg.w = 1.0f/Source0.w; } TempReg.x = TempReg.y = TempReg.z = TempReg.w; WriteDestinationRegisters ();
rep | vs 2.0 |
Repeat. Indicates the start of a rep-endrep block.
One slot
_______________________________________________________________________________ rep IntSource0 _______________________________________________________________________________
IntSource0 must be an integer register. Only the .x element is used. The maximum initial value can be 255. Execution over th block will continue for IntSource0.x times, as long as the number is positive. Compare this to the loop instruction, which additionally increments over the loop counter independently.
Setup IntSource0 must be an integer register with the .x element initialized to the number of times to iterate through the block.
Results The instructions in the rep - endrep blocks are executed IntSource0.x times.
defi i0, 10, 0, 0, 0 // i0.x is set to the count rep i0 //the instructions here will get executed i0.x times endrep
// Simulate the rep instruction int LoopCounter = IntReg0.x; if (LoopCounter <= 0) goto EndLoop // the instructions following the loop // instruction would go here // Simulate endloop instruction aL += IntReg0.z; LoopCounter--; goto TopLoop; EndLoop:
ret | vs 2.0 |
Indicates the end of a subroutine.
One slot
_______________________________________________________________________________ ret _______________________________________________________________________________
This instruction will return to the calling instruction (a call or callnz instruction) or return from the main function.
Setup Returns to the address following the most recent call or callnz instruction, or returns from the main function.
Results The path of execution is changed to the next instruction on the instruction stack.
ret
rsq | vs 1.0, 1.1, 2.0 |
Computes the reciprocal square root of specified element of the source register and stores it in all elements of the destination register.
One slot
_______________________________________________________________________________ rsq Dest0, Source0 _______________________________________________________________________________
Computes the reciprocal square root of the specified element of the source register and stores it in all elements of the destination register. If no element is specified, then Source0.w. is used. The absolute value of the input is used. A value of exactly 1 on input returns 1 on output (no round-off), whereas a value of 0 on input returns positive infinity.
Setup Source0 contains the element of which to take the reciprocal square root. If unspecified, Source0.w is used.
Results Dest0 contains the reciprocal square root of the absolute value of the specified element copied in all elements.
Note | This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you do use it, try to arrange your code so that you don't need the results immediately. |
rsq r0, r1
SetSourceRegisters (); // Simulate the rsq instruction float v = abs (Source0.w); if ( 0.0f == v) // if 0 { TempReg.w = PLUS_INFINITY; } else if ( 1.0f == v) // if 1 { TempReg.w = 1.0f; } else { TempReg.w = 1.0f/sqrt (v); } TempReg.x = TempReg.y = TempReg.z = TempReg.w; WriteDestinationRegisters();
sge | vs 1.0, 1.1, 2.0 |
Sets Greater-Than or Equal-To. Stores 1 in the destination register if the first source register is greater than or equal to the second source register. If not it stores 0 in the destination register. Does an element-by-element comparison and assignment.
One slot
_______________________________________________________________________________ sge Dest0, Source0, Source1 _______________________________________________________________________________
Compares the two source registers element by element. If the first source register's element is greater than or equal to the second source register's element, the value 1 is placed in the destination register's element. If not, it stores 0 in the destination register's element. The resulting register can be a mix of 0s and 1s.
Setup Source0 and Source1 are the registers to be compared.
Results The element Dest0.n contains 1.0 if the Source0.n is greater than or equal to Source1.n; otherwise, it contains 0.0. This is done for all elements of Dest0.
sge r0, r1, r2
SetSourceRegisters (); // Simulate the sge instruction TempReg.x = Source0.x >= Source0.x ? 1.0f : 0.0f; TempReg.y = Source0.y >= Source 0.y ? 1.0f : 0.0f; TempReg.z = Source0.z >= Source0.z ? 1.0f : 0.0f; TempReg.W = Source0.w >= Source0.w ? 1.0f : 0.0f; WrteDestinationRegisters();
sgn (macro) | vs 2.0 |
Computes the sign of each element in a register.
Three slots
_______________________________________________________________________________ sng Dest0, Source0, Source1, Source2 _______________________________________________________________________________
Computes the sign of the elements of Source0, using two temporary scratch registers. All elements of the source registers are compared. The comparison is done element by element. Source1 and Source2 should be temporary registers and should not be the same. If an element in Source0 was > 0, then the corresponding element in Dest0 will be 1. If it was < 0, then the result will be −1. If it was 0m the result will be 0.
This macro
_______________________________________________________________________________ sgn Dest0, Source0, Source1, Source2 _______________________________________________________________________________
is equivalent to the following:
_______________________________________________________________________________ slt Source1, Source0, -Source0 slt Source2, -Source0, Source0 add Dest0, Source2, -Source1 _______________________________________________________________________________
Note | Source1 and Source2 will be modified after this macro! |
sgn r3, r1, r2
sincos (macro) | vs 2.0 |
Computes the sine and cosine values for a scalar argument.
Eight slots
_______________________________________________________________________________ sincos Dest0, Source0, Source1, Source2 _______________________________________________________________________________
Estimates the sine and cosine value inside a shader with a maximum error of 0.002 through the use of a Taylor series expansion. Source0 must have a replicate swizzle to indicate which element to use. This should be a value in radians between ±π. Dest0 should be a temporary reguster. The destination must have .x, .y, or .xy as a write mask.
Setup One element of Source0 has to have the value in radians. Source1 and Source2 have to be set up with the following values to perform the expansion.
Results The resulting sine and cosine values are written in Dest0.x and Dest0.y respectively.
// setup values def cl, 1.0f/(7!*128),1/0f /(6!*64), 1.ff/(4!*16), 1.0f/(5!*16) def c2, 1.0f/(3!*8), 1.0f/(2!*8), 1.0f, 0.5f // assume value to take sin/cos of is in r0.x sincos r0.xy, r0.x, c1, c2
slt | vs 1.0, 1.1, 2.0 |
Set Less-Than. Stores 1 in the destination register if the first source register is less than the second source register. If not, it sotres 0 in the destination register. Does an element-by-element comparison and assignment.
One slot
_______________________________________________________________________________ slt Dest0, Source0, Source1 _______________________________________________________________________________
Comares the two source register element by element. If the first source register's element is less than the second source register's element, the value 1 is placed in the destination register's element. If not, it sotres 0 in the destination register's element. The resulting reigster may consist of a mix of 0s and 1s.
Setup Source0 and Source1 are the registers to be compared.
Results The element Dest0.n contains 1.0 if the Source0.n is less than Source1.n otherwise, it contains 0.0. This is done for all elements of Dest0.
slt r0. r1, r2
SetSourceRegisters(); // Simulate the slt instruction TempReg.x = Source0.x < Source0.x ? 1.0f : 0.0f; TempReg.y = Source0.y < Source0.y ? 1.0f : 0.0f; TempReg.z = Source.z < Source0.z ? 1.0f : 0.0f; TempReg.w = Source0.w < Source0.w ? 1.0f : 0.0f; WriteDestinationRegisters();
sub | vs 1.0, 1.1 |
Subtracts one register from another and places the result into the destinatoin register.
One slot
_______________________________________________________________________________ sub Dest0, Source0, Source1 _______________________________________________________________________________
Subtracts Source1 from Source0 and places the result in the Dest0 register.
Setup Two source registers, Source0 and Source1.
Result Each element of Des0 is filled with the element-by-element subtraction of the element of Source1 from Source0.
sub r0, r0, c2
SetSourceRegisters(); // simulate the sub instruction TempReg.x = Source0.x - Source1.x; TempReg.y = Source0.y - Source1.x; TempReg.z = Source0.z - Source1.z; TempReg.w = Source0.w - Source1.w; WriteDestinationRegisters();
vs | vs 1.0, 1.1, 2.0 |
Defines the version of the vertex shader code you are using.
No slots
_______________________________________________________________________________ vs.integer1.integer2 // Directx 8 vs_interger1_integ2 // Directx 9 _______________________________________________________________________________
The argu for DirectX 8 shaders, and vs_s_y for DirectX 9 shaders, where x is the main version number, and y is the minor version number. Both values the integers.
Setup Two integers that instruct the assembler on the major and minor version numbers of the shader version you want to use. This must be the first instruction in your shader.
Results Tells the assembler what features to allow in the shader instruction to follow.
//DirectX 8 vs.1.0 // not using the address register in this one vs.1.1 // uses address register //DirectX 9 vs_2_0
By default, a register reference in a shader expands to a reference to all the elements of the register, in x, y, z, w order. This means that whenever you write R, where R is the name of some register, it auomatically gets expanded (at least conceptually) to R.xyzw. What I want to get across is that the specific element-by-element reference to the register is made without your having to do anythying. This is merely semantic convenience and efficiency.
What this means is that it costs nothing extra to reorder or even ignore specific elements of a register if it makes sense to. I'll say it again: it costs nothing to use these masks in your shaders. They are there so you can take advantage of the single-instruction multiple-data nature of the shader language to possibly merge similar computations or save on register usage. If, in fact, you don't need to have a value stored in every element of a register, then by all means use a destination mask and have the value you need written only to the destination register.
destination mask/write mask |
Note the word destination.Masks can be used only to select which element of a register is to be written to. (Hence they are often referred to as write masks.) Even if an instruction usually writes to all four elements of the destination, the mask can be used to select which element(s) of the destination are writtem. If you leave an element out of a mask, it doesn't get written. The element masks must be in order; x comes before y which comes before z which comes before w.
mov r1, c1 // use all mov r1.xyzw, c1 // use all explicitly (default) mov r1.xw, c1 // just move c.x and c.w mov r1.wx, c1 // Error! - invalid order mov r1.wzyx, c1 // Error! - invalid order
Source swizzle |
Note the word source. Swizzles can be used only to select the order and the elements of a register to use as a source. A swizzle must specify four elements though ther is no restriction on the order. A swizzle consists of four letters, xyzw. If there are not four elements specified, the last element specification is relplicated.
mov r1, c1 // use all - same as below mov r1, c1.xyzw // use all in order mov r1, c1.wzyx // reverse the order mov r1, c1.wwww // just use the w element mov r1, c1.xyzy // replace w with y mov r1.W, c1.zxyx // move c.x into r1.w mov r1.Z, c1.xzwy // move c.w into r1.z mov r1, c1.x // same as c1.xxxx mov r1, c1.xy // same as c1.xyyy
source absolute value |
Available only in vs 2.0. The absolute modifier takes the absolute value of the source register. If the _abs is used with the negate modifier, the _abs is done first.
vs_2_0 // vertex shader 2.0 or better mov r5, r5_abs // absolute value of c5 in c5 mov r5, -r5_abs // all values in c5 will be negative
saturation instruction modifier |
Available only in vs 2.0. The saturation instruction modifier clamps the results to the [0,1] range.
vs_2_0 // vertex shader 2.0 or better add_sat r5, r5, r5 // absolute value of c5 in c5
source negation |
Negation can be used to negate an entire source register. They can be used with swizzles.
mov r1, c1 // move c1 into r1 mov r1, -c1 // move negative c1 into r1 mov r1.w, -r1 // negate just r1's w and store it mov r1.w, -c1.zzzy // move -c1.y into r1.w mov -r1, c1 // Error! Can't negate destination
address registers |
Not available in shader version 1.0. Only register a0.x can be used for version 1.1. The address register can be used as a signed offset into the constant register file, and it can be used only in the mov or mova instruction. The values in the address register when used must compute to the legal range for the constant registers (i.e., 0 to 95 for most 1.1 vertex shaders).
vs.1.1 mov a0.x, c5.w // load a0.x mov r1, c1 // regular move, format 1 mov r1, c[1] // same thing alternative format mov r1, c[1+a0.x] // relative move
|
|