Ethernet Errors | Upgrading and Repairing Networks (5th Edition)

Most of the problems addressed in the following sections can be remedied by making a simple change to your network. If you are still using hubs, consider upgrading to switches instead. Although it is possible to continue to use CSMA/CD- intensive Ethernet devices (and therefore have to worry about collisions), many new applications that rely on a large bandwidth to access multiple servers and other network resources will end up causing network traffic delays. It doesn't matter how good the new application is if it can't get data pushed through a slow network pipe. And the move today is toward centralizing servers rather than installing large applications or data on desktop computers.

However, if you are still using hubs and other equipment that enables collisions to occur, this section might help you in solving some problems that can come up.

Simple Error Detection

A lot of things can go wrong when you send hundreds of thousands of bits out on a copper wire, hoping they arrive at their destination in the proper order and with no changes. With the higher speeds that are being achieved with new technologies, detecting errors is becoming increasingly more important.

The simplest method for error detection is called a parity check . An example of this method is transmitting characters using the ASCII 7-bit character set with an eighth bit added. If even parity is being used, the eighth bit is set to zero or one, whichever makes the number of "1" bits an even number. If odd parity is being used, the eighth bit is selected to make the number of "1" bits an odd number. The receiving station can calculate what the parity bit should be by examining the first seven characters and making a simple calculation. This scheme easily breaks down, however, if more than one bit was transmitted in error.

Also, this type of error checking operates at the byte level and is not very useful for determining whether an error exists in a frame of data that is 1,518 bytes in length. Ethernet frames use the frame check sequence ( FCS ) to check the integrity of the frame. Higher-level protocols employ other methods to ensure that packets arrive intact and in the correct order. Besides errors involving corrupted frames that can be detected using the FCS, there are other types of common Ethernet errors. This chapter takes a quick look at the most common errors and their possible causes.

Bad FCS and Misaligned Frames

The most obvious place to start is the frame check sequence (FCS) error. The MAC layer computes a cyclic redundancy check (CRC) value, based on the contents of the frame, and places this value in the FCS field. The receive station can perform the same calculation and, by checking its result against that stored by the transmitting station, can determine whether the frame has been damaged in transit.

It is possible that this value was incorrectly computed by the sending station due to a hardware problem where this MAC layer function is performed. It also is possible that the adapter that is sending out this frame is experiencing some other kind of problem and is not correctly transmitting the bits on the wire. As with most errors, the problem might also lie with noise on the cables that are connecting the network.

When you monitor a level of bad FCS errors that exceeds 2% or 3% of the total utilization of bandwidth on the network, you should begin troubleshooting to find the offending device. Using a LAN analyzer, you can usually locate the source address of the faulty device and take corrective action.

To determine whether the suspected device is indeed the source of the error, first power it off and continue to monitor the network. If errors continue to occur but another address appears to be the source, there might be cabling problems on the network. If the errors disappear when the device is powered off, you can troubleshoot it further to locate the cause. You should look for the following:

Bad connector ” Check the connector that attaches the network cable to the workstation's adapter card.
Bad port ” If the workstation is connected to a hub or a switch, the port on that device might be causing the problem. Also, be sure to check the connector on that end of the cable segment.
Bad cable ” There's always the chance that a cable has been damaged or disconnected. If nothing you try solves the problem, use a diagnostic tool, such as a Time domain reflectometer, to search for problems in the network cabling.
Malfunctioning network card ” Finally, replace the network adapter card on the workstation to see whether this clears up the problem.

Because a frame is composed of bytes ”units of 8 bits ”the resulting frame should be evenly divisible by eight when it reaches its destination. If it's not, something has gone wrong. This type of error is called a misaligned frame , and the frame usually has a bad FCS as well. The most common reason for this type of error is electrical interference on the network or a collision. Another common cause is an incorrect network topology, in which more than two multiport repeaters are used in a cascaded fashion.

You can troubleshoot this type of problem using the same methods as for a bad FCS error. Of course, if you are aware of a topology problem, you already know where the problem is.

Short Frames (Runts)

A runt is an Ethernet frame that is smaller than the minimum size of 64 bytes. Remember that the transmitting NIC must transmit a packet for an amount of time that allows it to make a round trip in the local broadcast domain before it stops transmitting. Otherwise, the transmitting NIC cannot effectively detect a collision. The maximum propagation time for Ethernet segments is 51.2 microseconds, which is the amount of time it takes to transmit about 64 bytes. This minimum frame size does not include the preamble.

There are many reasons why short frame errors can occur on the network wire. Some of these short frames stem from the following:

Collisions
Faulty network adapters
Topology errors

If a runt frame has a valid FCS value, which indicates that the frame appears to be internally valid, the problem is most likely in the network card that generated the frame. If the FCS value is not correct for the frame's contents, the problem most likely is due to collisions or topology.

Collisions are a normal event for Ethernet. Sometimes, however, the byproduct of a collision results in signals on the wire that are interpreted as a short frame. If you are experiencing a lot of errors that indicate short frames, check the utilization statistics for the segment. If the peak utilization is heavy, but ordinary overall utilization is acceptable, try to rearrange user workloads so that some tasks are delayed to a time when the network is less busy. Another option is to place high-end workstations that use a lot of bandwidth on a separate LAN segment, and thus free up bandwidth on the other segment. Connect these segments using a switch or router, and this might solve your problem.

If the utilization values for the segment are low, you might want to investigate further to determine the workstation or device that is originating the short frames, and subject the NIC to diagnostic testing to determine whether it is at fault. This can be a difficult task because a lot of errors of this type occur with frames so short that you cannot determine the source address.

Ignoring the topology rules of Ethernet also can produce short frames. A common error is to use more than four repeaters for a single collision domain, which can result in short frames appearing on the wire.

Giant Frames and Jabber

Sometimes, a network adapter produces frames that are larger than the maximum allowable size. The opposite of a short frame error is a giant frame error . According to the rules that govern Ethernet communications, the maximum size of a frame is 1,518 bytes, excluding the preamble bits. Reasons for oversized frame errors appearing on the wire include these:

A defective NIC that is transmitting continuously.
Bits indicating the length of the frame have been corrupted and indicate that the frame is larger than it actually is.
There is noise on the wire. Random noise on a faltering cable can be interpreted as part of a frame, but this is not a very common reason why oversized frame errors occur.

Finding the location of a device that is malfunctioning might be simple if the LAN analyzer you are using is capable of detecting a source address. You can power off or disconnect the suspected node to determine whether it is the cause of the problem. It is possible that you will not be capable of detecting the address of the NIC if the malfunctioning card is repeatedly sending out meaningless signals. In that case, you need to look at each workstation on the segment, one by one, and try to remove them from the network to see whether the condition clears up.

The term jabber is sometimes used to refer to oversized frames, but it is really just a catch-all term used to indicate that a device on the network is not following the rules and is behaving improperly when it comes to signaling on the network. A defective NIC might be sending out frames that are larger than allowed, or it might be signaling continuously.

This type of error can literally bring down an entire segment because an adapter that continuously transmits does not give any other station a chance to use the wire. Because stations are supposed to check the network medium to see whether it is busy before transmitting, the workstations that are functioning normally simply wait until the network becomes available.

Multiple Errors

Depending on the tool used to monitor the network, the number of different error types you see might vary. For example, misaligned frame errors usually have a bad FCS field as well. Some analyzers record two errors for one event, whereas others might record the error as one type or the other.

Check the documentation for the product you use to determine whether this is true for your particular product.

Broadcast Storms

Broadcast storms usually occur when devices on the network generate traffic that causes even more traffic to be generated. Although this additional traffic might be due to physical problems in the network devices or the network media, it is usually caused by higher-level protocols. The problem with trying to detect the cause of this type of situation is that when it occurs, you are usually unable to access the network. Broadcast storms can slow down network access dramatically, and can sometimes bring it to a halt.

When monitoring the network for broadcast activity, you normally see a rate of 100 broadcast frames per second or less. When this value increases to more than 100 per second on an ongoing basis, there might be a problem with a network card, or you might need to segment the collision domain into smaller parts . You can use routers to do this because they do not pass broadcast frames unless they are configured to do so. Many bridges also can be configured to detect excessive broadcasts and to drop broadcast packets until the storm subsides.