The three error reporting methods , error responses, fatal and non-fatal interrupts, and Sync flood have different system implications. They are described here in order of increasing severity. Error Responses ( Non-Posted Requests Only)The HyperTransport specification considers error responses the preferred error reporting mechanism because they are the most localized (conveyed only from target to requester). Error responses are transaction-specific and do not prevent the link from performing other transfers ” even to or from the same device. Every RdSized or Atomic Read-Modify-Write request results in the return of a Read response from the target, followed by all of the requested data. All non-posted WrSized and Flush requests result in the return of a Target Done response which confirms the completion of the operation, but is not accompanied by data. The response packet format is shown in Figure 10-14 on page 250. Figure 10-14. Response Packet And Error Bits
When either a Read or Target Done response packet is returned to a requester, the requester checks the state of the two error bits ” Error and NXA (Non-Existent Address) ” contained in the packet to determine if the transaction completed properly. The two sources of error responses are the target device and, in the case of a non-existent address, the end-of-chain device. Error Response Returned By The TargetIf a non-posted request reaches a target, but the target cannot complete the operation (can't source or accept data, etc.), the target will return the appropriate response with the Error bit set. If the request called for the return of data (RdSized or Atomic RMW), all requested data (as indicated in the Mask/Count field of the request) will also be returned. Sending the data (even though it is invalid) allows devices in the path to deallocate buffer space and retire the outstanding transaction. A returning response with Error set and NXA cleared is equivalent to a PCI target abort; HyperTransport requesters detecting this "non-NXA" error response set the Received Target Abort bit in the PCI Status register. Bridges seeing this error on a secondary bus would set the bit in the Secondary Status CSR. Error Response Returned By An End-Of-Chain DeviceIf a non-posted request fails to reach the target (bad address, etc.), an end-of-chain device must send the response on its behalf . The response will have both the Error and NXA bits set. As in the target response above, if the request called for the return of data, all requested data (again, invalid) will be returned as FFh. A returning response with both Error and NXA set is equivalent to a PCI master abort; HyperTransport requesters detecting the NXA error response set the Received Master Abort bit in the PCI Status register. Bridges seeing this error on a secondary bus would set the equivalent bit in the Secondary Status CSR. Fatal And Non-Fatal InterruptsUsing interrupts to inform the system of errors is slightly more complex because the interrupt message must travel up through the topology to the host. Interrupts can indicate a non-fatal error ( roughly analogous to INTR# in an x86 machine) which implies that the device issuing it has seen an error, but may be able to recover from it; or an interrupt can indicate a fatal error condition (analogous to NMI# in an x86 machine) which indicates that the nature of the error is such that recovery is not possible. Interrupts of either type do not prevent the link from performing other transfers. The conditions under which fatal or non-fatal interrupts are to be used are device and driver specific. In HyperTransport, interrupts are typically sent using an interrupt message scheme rather than sideband interrupt signals as found in other buses. Devices are not prevented from using external pins as an option, although this method is beyond the scope of the HyperTransport specification. An interrupt message transaction is actually a special case of the standard size byte write (WrSized Byte) request. Devices in the system can distinguish interrupt messages being sent from other sized writes by the following attributes of the request:
Note : the data payload for interrupt requests is system-specific. Refer to Chapter 8, entitled "HT Interrupts," on page 199 for a detailed discussion of HyperTransport interrupts. Sync Flood: When All Else FailsIn some cases, one or more links in HyperTransport may get into a state where ordinary packets cannot be sent reliably. For example, a device may detect a series of CRC errors which indicates to it that either the external link is broken or, more likely, it may not be synchronized with its neighbor with respect to CRC stuffing in the CAD stream. If this is the case, it can't send new packets; it also can't convey the fault using fatal/non-fatal interrupts because they travel in the same channels as other packets. Sync flood reports errors that cannot be signalled by other methods. It is roughly analogous to the PCI SERR# (system error) event and has a serious impact on the entire chain. Sync flood packets put the chain into an inactive state pending a warm reset to restore normal packet protocol. The behavior of the device initiating the sync flood is slightly different from the other devices which propagate it. The basic rules are described below. Device Initiating The Sync Flood
Devices Detecting Sync Flood
Sync Flooding And HyperTransport Bridges
Miscellaneous NotesFlooding Continues Until ResetOnce a device commences the sync flood operation, it must continue until a reset is detected on the affected bus. This assures that the sync flood propagates throughout the chain. CRC Not Checked During Sync FloodCRC is not generated or checked on links where a sync flood event is in progress. Normal packet protocol, including CRC, resumes after a reset occurs on the chain. Refer Chapter 12, entitled "Reset & Initialization," on page 275 for a discussion of reset and initialization. Sync Flood ExampleFigure 10-15 on page 255 below illustrates the sequence of events associated with a sync flood on a single chain. Figure 10-15. Sync Flood Example
Sequence of events: (Figure 10-15 on page 255)
A warm reset on this chain will restore normal protocol. |