The Ordering Rules | HyperTransportв„ў System Architecture

HyperTransport packet ordering rules are divided into groups: general rules, rules for upstream I/O ordering, and rules for downstream ordering. Even the peer-to-peer example in Figure 6-1 on page 121 can be broken into two parts : the request moving to the bridge (covered by upstream ordering rules) and the reflection of the request downstream to the peer-to-peer target (covered by downstream I/O ordering rules). Refer to Chapter 20, entitled "I/O Compatibility," on page 457 for a discussion of ordering when packets move between HyperTransport and another protocol (PCI, PCI-X, or AGP).

General I/O Ordering Limits

Ordering Covers Targets At Same Hierarchy Level

Ordering rules only apply to the order in which operations are detected by targets at the same level in the HyperTransport fabric hierarchy. Referring to Figure 6-2 on page 122, assume that two peer-to-peer writes targeting devices on two different chains have been performed by the end device in chain 0.

Figure 6-2. Targets At Different Levels In Hierarchy And In Different Chains

graphics/06fig02.jpg

In the illustration Figure 6-2 on page 122, assume that Request A, a write transaction, is sent first. This is immediately followed by Request B, another write request. HyperTransport general ordering rules are then applied:

Upstream ordering rules assure that the two writes (Request A and Request B) arrive at the host bridge in the order they were generated.
When the host bridge then reflects the two write transactions downstream onto the separate chains (Chain 1 and Chain 2), downstream ordering rules guarantee that they will leave the host bridge in the order they arrived.
Once the two writes reach their respective chains, there is no way to guarantee that they will arrive at their respective targets in the order the requester intended because the ultimate targets are at different levels in the hierarchy.
The HyperTransport specification indicates that if the requester must be certain of the completion order at the targets, it should either poll the target of Request A for completion before issuing Request B or use a non-posted write for Request A and wait for the response to return before sending Request B.

Read And Non-Posted Write Completion At Target

Non-posted transactions issued by one requester to the same target are required to complete at the target in the order they were issued by the requester. This means that any combination of reads and non-posted writes must complete at the target in the original order they were issued. However, there is no ordering guarantee on the responses which are returned for each.

Referring to Figure 6-3 on page 123, ordering rules for target completion of non-posted requests and subsequent responses may be summarized:

The requester issues a non-posted write or read request to the target (1)
The requester issues another non-posted read or write request (2)

Figure 6-3. Non-Posted Requests And Responses At Target

graphics/06fig03.jpg

Ordering rules require that the two requests be handled internally by the target in order. When the responses return, they may come back in either order (3) and (4). The results of non-posted transactions must be globally visible (to all system devices) before a response is returned.

What If A Device Requires Response Ordering?

All HyperTransport devices must be able to tolerate out-of-order response delivery or else restrict outstanding non-posted requests to one at a time. This also applies to bridges which sit between HyperTransport and a protocol that requires responses be returned in order. The bridge must not issue more outstanding requests than it has internal buffer space to hold responses it may be required to reorder.

Support For The Producer-Consumer Ordering Model

When the PassPW and Sequence ID bits are cleared in a request packet, HyperTransport transactions are compatible with the same producer-consumer model PCI employs. Basic features of the model include:

A producer device anywhere in the system may send data and modify a flag indicating data availability to a consumer anywhere in the system.
The data and flag need not be located in the same device as long as the consumer of the data waits for the response of a flag read before attempting to access the data.
In cases where the consumer is allowed to issue two ordered reads without making them part of an ordered sequence (setting SequenceID tag to a non-zero value), the producer-consumer model is only supported if the flag and data are within the same device.
Ordering rules guarantee that if the flag is modified after the data becomes available, the flag read will return valid status.

Producer-Consumer Model Simpler If Flag/Data In Same Place

If the flag and data are restricted to being in the same device, the PassPW bit may be set in requests which relaxes the ordering of responses and improves performance. At the same time, the producer-consumer model is maintained .

Upstream Ordering Rules

Posted requests, non-posted requests, and responses travel in independent virtual channels. Each uses a different command, which permits devices to distinguish them from one another. Requests have a Sequence ID field. Assigning non-zero sequence ID fields to non-posted requests forces all tunnel and bridge devices in the path to the target to forward these requests in the same order they were received. The target is also required to maintain this order when processing these requests internally. Requests with a Sequence ID of zero are not considered to be part of an ordered sequence. Requests and response packets also carry a May Pass Posted Writes (PassPW) bit.

Reordering Packets In DifferentTransaction Streams

Other than when a Fence command is issued, there is no ordering guarantee for packets originating from different sources. Traffic from each UnitID is considered a separate transaction stream; devices may reorder upstream packets from different streams as necessary. Figure 6-4 on page 125 depicts reordering being done by a tunnel (UnitID1); this device is forwarding packets upstream on behalf of two devices behind it (UnitID2 and UnitID3).

UnitID3 issues packet (1) first. It is forwarded by UnitID2 to UnitID1.
Next UnitID1 receives a packet (2) from UnitID2.
When UnitID 1 forwards the two packets onto its upstream link, it may send packet (2) first. Packet (2) has then been reordered around packet (1).

Figure 6-4. Upstream Reordering: Packets From Different Transaction Streams

graphics/06fig04.jpg

No Reordering Packets In AStrongly Ordered Sequence

If one requester has issued a series of request packets carrying the same non-zero SequenceID, the packets may not be reordered (regardless of the state of the PassPW bit. The sequence only applies to packets within a single transaction stream (UnitID) and VC. Upstream devices still may reorder these packets with respect to those from other streams. Figure 6-5 on page 126 illustrates an ordered sequence issued by an I/O Hub cave device. Key details include:

The I/O Hub issues a series of requests (1), (2), (3). All carry the same, non-zero SequenceID in the request.
When they are received by the first tunnel device, it checks the sequence ID field and the UnitID (all are identical). When it forwards the three packets to the PCI-X tunnel, it sends them in the same strongly ordered sequence.
The HyperTransport-to-PCI-X bridge makes the same determination and forwards packets (1), (2), and (3) through its tunnel interface to the host bridge in the same order.
The host bridge is also required to treat the three packets as a strongly ordered sequence internally.
If these were non-posted requests, there would be no guarantee of ordering in the responses returned to the I/O hub.

Figure 6-5. A Strongly Ordered Sequence Must Be Preserved

graphics/06fig05.jpg

Packets With PassPW Bit Clear Are Restricted In Passing

Packets with the PassPW bit clear must not pass an already posted request in the same stream and not part of the same ordered sequence. This forces packets of all types (posted request, non-posted request, or response) which have not been granted relaxed ordering privileges to remain behind all previously posted requests within the same transaction stream. This guarantees the ultimate target (e.g. host bridge) will see them all in the original order they were issued. Figure 6-6 on page 127 illustrates this case.

The I/O Hub first issues a posted request (1)
It then issues another packet (2) which has the PassPW bit clear. This could be a posted or non-posted request, or a response packet.
When the upstream tunnel devices receive the two packets and determine they are from the same source (UnitID), and that the first is traveling in the posted virtual channel and the second has PassPW disabled, the order will be maintained during forwarding.
The host bridge is guaranteed to see the two packets in the original order they were issued, (1) then (2).

Figure 6-6. Packets With PassPW Clear Can't Pass Posted Requests

graphics/06fig06.jpg

Packets With PassPW Bit Set May Or May Not Pass

Packets with the PassPW bit set may or may not pass an already posted request in the same stream and not part of the same ordered sequence. It is up to the forwarding devices to determine whether there is a benefit to reordering the two. Figure 6-7 on page 128 illustrates this case.

The I/O Hub first issues a posted request (1)
It then issues another packet (2) which has the PassPW bit set = 1. This could be a posted or non-posted request, or a response packet.
When the upstream tunnel devices receive the two packets and determine they are from the same source (UnitID), and that the first is traveling in the posted virtual channel and the second has PassPW set, they may or may not reorder the two packets as they are sent upstream.
In this case, it is indeterminate which packet will arrive at the host bridge first. If it matters which arrives first, then the requester would have issued the packets as a strongly ordered sequence (both packets would carry the same, non-zero Sequence ID).

Figure 6-7. Packets With PassPW Set May Or May Not Pass Other Posted Requests

graphics/06fig07.jpg

Non-Posted Requests May Pass Each Other

For non-posted requests which are not part of an ordered sequence (Sequence ID = 0), ordering rules allow them to pass other non-posted requests in the same transaction stream. Again, it is up to the forwarding devices to determine whether there is a benefit to reordering the non-posted requests. Figure 6-8 on page 129 illustrates this case.

The I/O Hub first issues a non-posted request packet (1)
It then issues another non-posted request packet (2).
When the upstream tunnel devices receive the two packets and determine they are from the same source (UnitID) and that they are non-posted requests which are not part of an ordered sequence, they may or may not reorder the two packets as they are sent upstream.
Again, it is indeterminate which packet will arrive at the host bridge first. If it matters which arrives first, then the requester would have issued the packets as a strongly ordered sequence (both packets would carry the same, non-zero Sequence ID).

Figure 6-8. Non-Posted Requests May Pass Each Other

graphics/06fig08.jpg

Posted Requests And Responses Must Be Able To Pass

Posted requests and responses must be able to pass previous non-posted requests that are not part of the same ordered sequence. This is part of the HyperTransport deadlock- avoidance strategy. Note: The HyperTransport specification provides several additional recommendations for designers related to deadlock-avoidance.

Figure 6-9 on page 130 illustrates the case of posted requests and responses passing previous non-posted requests in the same virtual channel.

The I/O Hub first issues a non-posted request packet (1)
It then issues another posted request or response packet (2).
When the upstream tunnel devices receive the two packets and determine they are from the same source (UnitID), they may or may not reorder the posted request (or response) around the non-posted request as they are sent upstream.
Again, a strongly ordered sequence could have been used if determinacy was required.

Figure 6-9. Posted Request Or Response Must Be Able To Pass Non-Posted Requests

graphics/06fig09.jpg

Posted Request Must Be Able To Pass A Response

Posted requests must be able to pass an earlier response which is not part of the same ordered sequence. This is another component of the HyperTransport deadlock-avoidance strategy. Note: The HyperTransport specification provides several additional recommendations for designers related to deadlock-avoidance.

Figure 6-10 on page 131 illustrates the case of a posted requests passing an earlier response packet in the same transaction stream

The I/O Hub first issues a response packet (1).
It then issues a posted request packet (2).
When the upstream tunnel devices receive they must be able to reorder the posted request around the earlier response as they are sent upstream.
Again, a strongly ordered sequence could have been used if determinacy was required.

Figure 6-10. Posted Request Must Be Able To Pass An Earlier Response

graphics/06fig10.jpg

Non-Posted Requests Or Response May Pass A Response

Non-Posted requests or responses may or may not pass an earlier response which is not part of the same ordered sequence. Figure 6-11 on page 132 illustrates the case of a non-posted request or response passing an earlier response packet in the same transaction stream

The I/O Hub first issues a response packet (1).
It then issues a non-posted request or response packet (2).
When the upstream tunnel devices receive the two packets and determine they are from the same source (UnitID), they may or may not reorder the non-posted request or later response around the earlier response as they are sent upstream.
Again, a strongly ordered sequence could have been used if determinacy was required.

Figure 6-11. Non-Posted Request/Response May Pass Earlier Responses

graphics/06fig11.jpg

Host Ordering Requirements

A system that hosts HyperTransport must assure that the ordering of virtual channel traffic in the HyperTransport topology is extended to the host system. Because the host bridge interfaces directly to the processor(s), main memory, and the HyperTransport fabric, it plays a central role in enforcing the proper ordering interaction between the host system and HyperTransport. Figure 6-12 on page 133 illustrates the central role played by the host bridge in host system and HyperTransport ordering.

Figure 6-12. Host Bridge Extends Ordering To Host System

graphics/06fig12.jpg

Note that some aspects of host system ordering are system-specific. For example, CPU host bus protocol and cache management vary with the processor. Still, host ordering rules make it possible for any host system to reliably interact with the HyperTransport fabric.

Host Ordering Requirements: General Features

The HyperTransport specification breaks down the ordering rules governing transaction completion in the host system into a set of rules for ordered pairs of transactions. Depending on the request types and where the target locations are, the second request may be received but might have to wait to take effect in the host fabric until the first request reaches a specific point in completion called its ordering point. "Taking effect", in this case, means that a read request actually fetches data, a write request actually exposes new data, peer-to-peer requests are actually queued for reissue downstream, etc.

How read and write accesses originating in HyperTransport are handled depends on the type of space they target in the host system.

Cacheable address ranges have strongest ordering
Non-cacheable memory, I/O, and MMIO have weaker ordering
Interrupt and System Management Address ranges have special ordering

Two Ordering Points Are Defined

There are two ordering points (degrees of transaction completion) defined for the first transaction in an ordered pair; this information is used in determining whether the second request of the ordered pair may take effect or must wait. The ordering points are called Globally Ordered (GO) and Globally Visible (GV).

Globally Ordered (GO)

HyperTransport defines the globally ordered point for the first request as the point where it is guaranteed to be observed in the correct order (with respect to the second transaction) from any "observer". While the two transactions are guaranteed to complete in the proper order, they may not have actually done so yet. This means agents such as caches may not have been updated at this ordering point.

Globally Visible (GV)

HyperTransport defines the globally visible ordering point for the first request as the point where it is assured to be "visible" to all observers (CPUs, I/O devices, etc.). It also means that all side effects of the first request (cache transitions, etc.) have completed.

Note: If there are no "sideband" agents (caches, etc.), GO and GV are equivalent.

Ordering Rule Summary

Table 6-1 on page 134 summarizes the host ordering rules for various combinations of transaction ordered pairs.

Table 6-1. Summary Of Host Ordering Rules For Transaction Pairs

First Command	Second Command	Second Command Waits For First To Be:
Cacheable Write	Cacheable Write	GV
Cacheable Write	Cacheable Read	GO
Cacheable Read	Cacheable Read or Write	GO
Non-Cacheable	Non-Cacheable	GO
Cacheable Write	Non-Cacheable	GV
Cacheable Read	Non-Cacheable	GO
Non-Cacheable	Cacheable	GO
Cacheable Write	Flush/InterruptSysMgmt Response	GV
Cacheable Read	Flush/InterruptSysMgmt Response	No Wait Requirement
Non-Cacheable	Flush/InterruptSysMgmt Response	GO
Flush/Response	Any	No Wait Requirements
Int/SysMgmt	Fence or Response	GV
Int/SysMgmt	Any Except Fence/Response	No Wait Requirements
Posted Cacheable	Fence	GV
Posted Non-Cacheable	Fence	GO
Any Non-Posted	Fence	No Wait Requirements
Fence	Any	GV

Host Responses To Non-Posted Requests

Although the second request in an ordered pair may be allowed to take effect in the host system before the first request is globally visible, the host is not allowed to return the response for a non-posted HyperTransport request until all previous ordered requests are complete and the side effects (e.g. cache state transitions) of the current request are similarly globally visible.

An Example (Refer to Table 6-1 and Figure 6-13)

Figure 6-13. Ordering Example: Read Followed By Posted Write To Cacheable Memory

graphics/06fig13.jpg

Assume that an ordered pair of requests have been received by the host bridge. The first is a read request targeting a cacheable area of memory; the second is a non-posted write targeting the same location in cacheable memory space. Both requests are part of the same strongly ordered sequence (same stream, same VC, SeqID >0). According to Table 6-1 (see line number 3), the second request (cacheable write) must wait until the first request is globally ordered (it is guaranteed to complete first).

The I/O hub issues the read request targeting a cacheable area of memory.
It immediately issues a second request, a non-posted write to the same location. These two requests are part of a strongly ordered sequence.
The host bridge causes a snoop cycle of processor caches before allowing the read of memory (the processor may have a modified cache line)
With cache coherency taken care of, the read of memory completes. The response (not shown) may be returned to the requester because previous ordered requests are complete and side effects of this read are handled.
The cacheable write request is submitted for CPU cache look-up. The cache line will be invalidated in the event of a hit in the cache.
Coherency assured, the cacheable write is allowed to complete to memory.

Downstream I/O Ordering

Downstream ordering rules in HyperTransport are much the same as the upstream rules previously described, with a few exceptions:

While the same virtual channels are used (posted request, non-posted request, and response), downstream I/O streams are determined by the target of the transaction instead of the source.
Although UnitID uniquely identifies upstream transaction stream requests, it can't be used for this purpose in downstream requests because the UnitID field is always that of the host bridge (UnitID 0). All downstream request traffic is assumed to be part of the same transaction stream (the host bridge's).
The bridg e bit is used to help nodes distinguish downstream from upstream response traffic. It also helps devices interpret the UnitID field in responses. Upstream responses carry the UnitID of the sender (the original target), while downstream responses carry the UnitID of the original requester. Interior nodes are only allowed to claim response packets which carry their UnitID and are moving downstream (bridge bit set = 1).
A host bridge (this includes the secondary interface of HyperTransport-HyperTransport bridges) which performs a peer-to-peer reflection must preserve strongly ordered sequences (non-zero Sequence ID) when it reissues them downstream. It is allowed to change the Sequence ID tag, but the same tag will be applied to all requests in the sequence.

Double-Hosted Chain Ordering

Upstream traffic and downstream traffic in HyperTransport have no ordering interaction because they are in different transaction streams. A special case arises in sharing double-hosted chains when one of the host bridges must send traffic to the other host bridge. Refer to Figure 6-14 on page 138.

Host bridge A sends a posted write targeting host bridge B.
At nearly the same time, host bridge B performs a read from host bridge A.
The read response/data will be travelling in the same direction as the posted write (towards host bridge B).
Although the posted write request is traveling downstream and the read response is traveling upstream (from the perspective of Device B), the producer-consumer ordering model requires that both must be treated as being in the same transaction stream (response will push posted write request if PassPW is clear).
Devices in the path can perform ordering tests on upstream responses based only on UnitID (both 0 in the case of two bridges communicating with each other), and by disregarding the direction of the requests.
In the event a host has its Act as Slave bit set = 1, then it won't use UnitID 0; for its requests and responses; in this case, conventional ordering based on UnitID will work.

Figure 6-14. Double-Hosted Chain Ordering

graphics/06fig14.jpg