OUTPUT REGISTER MASKS, ARGUMENT, AND INSTRUCTION MODIFIERS | Real-Time Shader Programming (The Morgan Kaufmann Series in Computer Graphics)

In order to give you more control over how an individual register or instruction is used, you have an array of masks, selectors, and modifiers to manipulate exactly how an instruction works and what register channels are used or written to.

source negation

ps 1.0–2.0

Negation can be used to negate an entire source register before it is used. Source negation is indicated by placing a minus sign in front of the source register to be negated. The source register values are unchanged.

Rules for using source negation:

For use only with arithmetic instructions.
Cannot be used with the invert modifier.
Performed after other modifiers.

 mov     t0, -v0         // t0 = -1.0 * v0 mul     t0, -v0, -c3 add     r0,  v0, -v1 mul     r1, 1-v0, -v1   // with invert modifier

source invert

ps 1.0–1.4

Subtracts all elements of a register from 1 and uses that as its output. Source negation is indicated by placing a 1- (number 1 followed by a minus sign) in front of the source register to be inverted. The source register values are unchanged.

Rules for using source invert:

For use only with arithmetic instructions.
Cannot be combined with any other register modifier.
It can be combined only with the alpha-replicate modifier.

 mov     r0, 1-v0    // swaps colors mul     r1, 1-v0, -v1

source bias

ps 1.0–1.4

The bias modifier is used for shifting the range of the input register from the [0,1] range to the [−0.5, +0.5] range. The bias modifier is indicated by adding a _bias suffix to a register. Essentially, the modifier subtracts 0.5 from the register's values before they are used. Be careful when using this modifier with the color registers, since the range of the color registers is [0,1], and you'll get an implicit clamping. The source register values are unchanged.

If you use it with a mov_2X instruction or modifier, you can convert a register range from [0,1] to [−1,1], the same as a source signed scaling modifier.

Note

If you used the D3DTOP_ADDSIGNED texture operation in one of your DirectX texture stages, the bias modifier performs the same operation.

Rules for using source bias:

For use only with arithmetic instructions.
Cannot be combined with the invert modifier.
Initial data outside the [0,1] range may produce undefined results.

 // Shift range from [0,1] to [-0.5, 0.5] mov         r0,  r0_bias    // r0 = r0 - 0.5 // Shift range from [0,1] to [-0.5, 0.5] // Then shift sign mov         r0,  -r0_bias    // r0 = 0.5 - r0 // shift range from [0,1] to [-1,1] mov_x2     r0,     r0_bias

source signed scaling

ps 1.0–1.4

The signed scaling modifier (also called bias times two) is used for shifting the range of the input register from the [0,1] range to the [−1,+1] range, typically when you want to use the full signed range of which registers are capable. The bias modifier is indicated by adding a _bx2 suffix to a register. Essentially, the modifier subtracts 0.5 from the register's values and then multiplies that result by 2 before they are used. The source register values are unchanged.

For PS 1.0 and 1.1, arguments for the texm3x2* and texm3x3* instructions can use the bx2 modifier. For PS 1.2 and 1.3, arguments for any tex* instruction can use the _bx2 modifier.

Note

If you used the D3DTOP_ADDSIGNED2X texture operation in one of your DirectX texture stages, the signed scaling modifier performs the same operation.

Rules for using signed source scaling:

For use only with arithmetic instructions.
Cannot be combined with the invert modifier.
Initial data outside the [0,1] range may produce undefined results.

 mov     t0, t0_bx2     // t0 = 2.0* (t0 - 0.5) mov     r0, r0_bx2     // darken dull colors

source signed scaling

ps 1.0–1.4

The scale by two modifier is used for shifting the range of the input register from the [0,1] range to the [−1,+1] range, typically when you want to use the full signed range of which registers are capable. The scale by two modifier is indicated by adding a _x2 suffix to a register. Essentially, the modifier multiplies the register values by 2 before they are used. The source register values are unchanged.

Rules for using scale by two:

For use only with arithmetic instructions.
Cannot be combined with the invert modifier.
Available for PS 1.4 shaders only.

 mov   r0, r0_x2 // 2x r0

source replication/selection

Just as vertex shaders let you select the particular elements of a source register to use, so pixel shaders do, with some differences. You can select only a single element, and that element will be replicated to all channels. You specify a channel to replicate by adding .n suffix to the register, where n is r, g, b, or a (or x, y, z, or w).

 mov     r0,       v0.a     // ps.1.0 mov     r0.a,     v0.b     // ps.1.1 ps.1.2 ps.1.3 // these commands are an error if not ps.1.4 mov     r0,     v0.b        // same as v0.bbbb mov     r0,     v0.g        // same as v0.gggg mov     r0,     v0.brgn     // ps 2.0

SOURCE REGISTER SELECTORS

	REGISTER SWIZZLE
PS version	.rrrr	.gggg	.bbbb	.aaaa	.gbra	.brga	.abgr

1.0				x
1.1			x	x
1.2			x	x
1.3			x	x
1.4 phase 1	x	x	x	x
1.4 phase 2	x	x	x	x
2.0	x	x	x	x	x	x	x

texture register modifiers

ps 1.4 only

PS 1.4 has its own set of modifiers for texture instructions. Since only the texcrd and texld instructions are used to load or sample textures with PS 1.4, these modifiers are unique to those instructions. Note that you can interchange rgba syntax with xyzw syntax, thus _dz is the same as _db.

Source Texture Register Selectors

These allow you to swizzle the source register to a limited extent. The syntax is added as a suffix on the register. They can be used anytime texcrd or texld can be used. Since the instructions will read only three components, these selectors allow you to fill the register's last two channels with either the .z value or the .w value instead of leaving it uninitialized. You can mix the .xyw selector with the _dw modifier. You can use the _dz modifier only on a temporary register, but not more than twice per shader. This allows you to map a 4D texture into 3D texture space so it can be manipulated in the shader.

 texld     r0, t0.xyz        // r0.xyzw - t0.xyzz texld     r0, t0.rgb        // alternate syntax texld     r0, r0_dz.xyz     // with a register modifier

PS 1.4 SOURCE REGISTER SELECTORS

DESCRIPTION	SYNTAX
Source register looks like .xyzz	.xyz
Source register looks like .xyww	.xyw

Once you use a particular selector on a texture register, you cannot use a different one on the same source register in the same shader. For example, the following is a legal set of instructions; register t2 is used with the .xyz selector twice:

 texld     r0,     t2.xyz texld     r1,     t2.xyz

However, the following, which uses register t2 with the .xyz selector and then the .xyw selector is in error:

 texld     r0,     t2.xyz texld     r1,     t2.xyw     // Error register t2 // used again but with different selector.

Source Texture Register Modifiers

These modifiers allow you to do a perspective divide (either by the .z or the .w element) in the pixel shader. The syntax is added as a suffix on the register. They can be used anytime texcrd or texld can be used. Only the .xy channel of the destination will be modified. If the divisor is zero, then the destination is set to 1. The _dw modifier is for Phase 1; the _dz modifier is for Phase 2.

PS 1.4 SOURCE REGISTER MODIFIERS

DESCRIPTION	SYNTAX
Divide x,y by z	`_dz`
Divide x,y by w	`_dw`

 texld r0, t0_dz // these are the same as above texld r0, t0_dz.xyz texld r0, t0_db.xyz texld r0, t0&_db.rgb

You can mix the .xyw selector with the _dw modifier. The _dw modifier can be used as many times as necessary in Phase 1. After Phase 1, the .w channel is invalid, thus you can't use the modifier. You can use the _dz modifier only on a temporary register (thus, only in Phase 2), and not more than twice per shader. The following shows what phase an instruction would be valid or invalid for (I've ignored usage restrictions on texture register, etc.):

 // Phase 1 texld r0, r0_dz // Invalid - dz Phase 2 only texld r0, r0_dw // Valid phase // Phase 2 texld r0, r1_dz.xyz // Invalid - text register texld r0, r1_db.xyz // Invalid - _db == _dz texld r0, r0_dz.xyz // Valid - temp register texld r0, r0_dw.xyz // Invalid - w is undefined

Destination Write Masks

These write masks control which channel(s) are written to. They can be used anytime texcrd or texld can be used. No mask is the same as specifying all. Only the combinations shown in the table can be used.

PS 1.4 DESTINATION WRITE MASKS

DESCRIPTION	SYNTAX
Writes to the xyzw channels	`xyzw`
Writes to the xyz channels	`xyz`
Writes to the xy channels	`xy`

 mov r0.xy,   t0 mov r0.rg,   t0 // same as previous mov r0.xyzw, t0 mov r0,      t0 // same as previous

destination write mask

ps 1.0–2.0

Note the word destination. Masks can be used only to select which elements of a register are to be written to. Unlike vertex shaders, however, all you can do is select all channels (.rgba), color channels only (.rgb), or the alpha channel (.a)—though later pixel shaders allow more control. This mimics the traditional lighting pipeline in which you can have color and alpha channels processed separately. Omitting a mask is the same as specifying the full mask. The alpha mask is also referred to as the scalar mask since it uses a scalar value. The color write mask is sometimes referred to as the vector mask. An alternative syntax is to use .xyzw instead of .rgba.

Destination write masks are supported only for arithmetic instructions, with the exception of the texcrd and texld instructions. The dp3 instruction can use only .rgb or .rgba masks for PS 1.0–1.3.

Destination masks are particularly important when you start getting set up for instruction pairing.

Note that with PS 1.4 shaders you have the ability to operate on individual channels, giving you a lot more flexibility.

DESTINATION WRITE MASK DESCRIPTIONS
MASK	OPERATION
`.rgb`	The operation works on the color channel (rgb) and is scheduled for execution in the vector pipeline.
`.a`	The operation works on the alpha channel and is scheduled for execution in the scalar pipeline.
`.r, .g, .b`	Let's you select the destination channel to write to.
`.rgba`	The operation works on the color and alpha channel, and is scheduled for parallel execution in the vector and scalar pipelines. This is the default if a mask is not specified.
`.(r)(g)(b)(a)`	Arbitrary mask. Must be listed in .rgba order but can use any of the masks.

DESTINATION WRITE MASK SELECTORS

	SELECTOR
PS version	r	g	b	a	rgb	rgba	(r)(g)(b)(a)

1.0				x	x	x
1.1				x	x	x
1.2				x	x	x
1.3				x	x	x
1.4 phase 1	x	x	x	x	x	x	x
1.4 phase 2	x	x	x	x	x	x	x
2.0	x	x		x	x	x	x

Here are some examples of using the write mask.

 // color channel is modulated mul r0.rgb, t0, v0 // alpha is added using a different source register add r0.a,    t1, v1 // mul r0.rgb,  t0, v0 +add r0.a,    t0, v0 // note instruction pairing // variations that have the same effect // no masks is equivalent to mul r0,      t0, v0 mul r0.rgba, t0, v0 // full specification

Note that specifying exactly the same operation on the color and alpha channel (including registers) will automatically cause pairing to occur. The following code fragments cause the same code to be assembled in the pixel shader:

 // no masks, a single operation mul r0,      t0,   v0

This is the same as writing

 // full mask with a single operation mul r0.rgba, t0,   v0

This is the same as writing

 // color and alpha mask with the same operation mul r0.rgba, t0,   v0 // on color mul r0.a,    t0,   v0 // on alpha, same arguments

except it takes up an extra slot and will run slower. However, you can rewrite it as

 // color and alpha mask with the same operation // with pairing (DirectX 8 only!) mul r0.rgba, t0,    v0 // on color +mul r0.a,   t0,    v0 // on alpha, same arguments

And now you've paired the instructions since you've freed one slot and reduced the run time. The point being that now you can change the alpha manipulations and perform something different in the scalar (alpha) pipe.

instruction modifiers

Note that these are placed on the actual instructions, not the arguments. The pixel shader assembler support shift and scale modifier flags, as well as a saturation modifier flag that affects the generated output result. The modifiers can be thought of as shift left (power-of-two multiply), shift right (power-of-two divide), and saturate (clamp output range to [0,1]).

Rules for using instruction modifiers:

For use only with arithmetic instructions.
The _sat can suffix any other instruction modifier.

INSTRUCTION MODIFIERS DESCRIPTION
MODIFIER	OPERATION
`_2x`	2× modifier. Multiply the results by 2 before storing in the register.
`_4x`	4× modifier. Multiply the results by 4 before storing in the register.
`_8x`	8× modifier. Multiply the results by 8 before storing in the register.
`_d2`	Half modifier. Divide the results by 2 before storing in the register.
`_d4`	Quarter modifier. Divide the results by 4 before storing in the register.
`_d8`	Eighth modifier. Divide the results by 8 before storing in the register.
`_sat`	Saturation modifier. Clamps the results to the range [0,1] before storing.
`_pp`	Partial precision hint.

INSTRUCTION MODIFIERS USAGE

	MODIFIER
PS version	_x2	_x4	_x8	_d2	_d4	_d8	_sat	_pp

1.0	x	x		x			x
1.1	x	x		x			x
1.2	x	x		x			x
1.3	x	x		x			x
1.4 phase 1	x	x	x	x	x	x	x
1.4 phase 2	x	x	x	x	x	x	x
2.0							x	x

Here are some examples of using instruction modifiers.

 add_x2       r0, v1, v1 add_d2     r0, v1, v0 add_sat    r0, v1, v0 add_x2_sat r0, v1, v1 add_d2_sat r0, v1, v1 add_sat_d2 r0, v1, v1 // Error! _sat must be last

partial precision declaration modifier

ps 2.0

DirectX 9 introduced the partial precision declaration modifier for texture coordinate register usage. This modifier allows the shader writer to provide a hint that the operations on this texture coordinate register can be performed and stored at a lower precision (at least 16 bits). The implementation may ignore this hint. If applied the implementation could possibly propagate this lower precision through the shader.

Here is an example of using the partial precision modifier.

 dcl_pp t2 // use t2 in lower precision

instruction pairing

DirectX 8 only

PS 2.0 removed the need for instruction pairing, but it's valid in PS 1.0–1.4.

You might see documentation talking about the scalar pipeline or the vector pipeline. This refers to the pipeline that corresponds to the alpha (scalar) or color (vector) hardware path. Since you typically want to process the alpha channel in a different manner from the color channels, there are (supposedly) separate, parallel, vector, and scalar hardware paths on the graphics processor. One is for vector processing (color) and one is for scalar processing (alpha). Since the pixel shader assembler can't be assumed to be that smart, the assembler needs help in being told when to pair color operations with alpha operations.

In order to use instruction pairing, you first break up the operations to be performed on the color and alpha channels (assuming of course that they are different) using the output masks. You then place a plus sign in front of the second instruction of the pair. Generally, you want to take a look at your pixel shader code and see which instruction that operates only on the color channels can be paired with a potentially independent instruction that operates on the alpha channel.

Note

For PS 1.0, the destination register for paired instructions must be the same. For all other versions, the destination register can be different for the coissued instruction. The dp4 and bem instructions can't be co-issued.

The following example demonstrates instruction pairing. We have an operation on the color channels of texture zero. At the same time, we want to add in the alpha channel from texture one. We can get these operations to happen at the same time by instruction pairing.

 // an example of instruction pairing // an RGB (vector) operation mul r0.rgb, t0, v0 // note write mask // now co-issue an alpha operation +add r0.a, t1, v0 // note alpha mask and plus sign

The reason that it's worthwhile to do this is that pairing instructions allows the pixel shader to execute the operations in parallel, thus reducing the number of clocks required and achieving better graphics processor utilization. Pairing also increases the number of slots that is available since each pair of instructions takes only one slot.

Note that you can pair the vector with the scalar or the scalar with the vector. As long as one operates on the color channels, while the other works on the alpha channel and one immediately follows the other, they can be in either order.

 // the colors are modulated mul r0.rgb, t0, v0 // alpha is added + add r0.a, t0, v0

The dp3 instruction is a special case. When used, it can be paired with an instruction that is operating on the alpha component of its destination register. The dp3 uses the .rgb elements so that you can pair it with an instruction that uses just the alpha pipe.

 // ps.1.0 // two unrelated operations can get paired dp3 r0.rgb, t0, v0 +add r0.a, t0, v0 // note same register is Dest0 // another example dp3 r0.rgb, t0, v0 mov r1.a, v1.a // note different destination for ps1.1+

The output masks can affect how the two pipelines are allocated, but there can be ambiguities in the order of operations unless explicit pairing syntax is used.