Understanding NNTP


Modern news servers use a protocol known as the Network News Transfer Protocol (NNTP) both among themselves and with news clients (often called news readers ). An NNTP server normally runs on TCP port 119. NNTP was designed for the transfer of news on TCP/IP networks, but Usenet isn't restricted to such networks. Indeed, the earliest news servers used other network protocols. NNTP is therefore not the only news transfer protocol in existence, but it is the one that's most common on TCP/IP networks today.

The basic currency of NNTP is the message (aka the post or article ), which is a single document that normally originates from one person at one site. (Multiple people may collaborate on a single message, but this is rare.) Messages are collected into newsgroups, as described earlier, but a single message may be posted to multiple newsgroups. Such cross-posting is discouraged, especially when it's taken to extremes with dozens of newsgroups. Newsgroups are arranged hierarchically, with multiple names similar to directory names that combine to form a complete newsgroup name . These names are delineated by periods ( . ), with the least specific name to the left. For instance, comp.os.linux.misc and comp.os.linux.hardware are both newsgroups in the comp.os.linux hierarchy, and so are closely related , as you might expect. A newsgroup that's somewhat more distant from these is comp.dcom.modems , and a still more distant one is rec.arts.sf.dune .

When a user posts a message, the news server attaches an identifying code to the message, using a message header line of Message-Id . This code includes a serial number of some sort generated by the server followed by the server's name. Because the ID includes the server's name, this code should be unique, assuming the server can keep track of the serial numbers that it generates for messages. News servers use the message IDs to keep track of which messages they've seen, and therefore which messages are worth transferring.

When two news servers connect to each other, they can transfer messages using either of two types of protocols. As with mail, these are known as push and pull protocols. For purposes of this transfer, one server assumes the role of the server, and the other takes on the role of a client. In a push protocol, the client tells the server about each message it has available in turn , using the message ID numbers. The server can then check its database and decide whether it needs a specific message. The process repeats for the next message, and so on. This procedure requires that the server do a lot of work, because it must check its database of message IDs with every exchange. The alternative is a pull protocol, in which the receiving system takes on the role of the client. This system requests a complete list of articles that arrived on the server after a given date. The client can then request specific messages. This process can be more efficient, but it requires careful checking so that the server doesn't accidentally deliver messages from newsgroups that should be private.

Because people are constantly generating news messages, news servers need some way to purge old messages; if this weren't done, news messages would soon fill the server's hard disk (a danger that's very real on full Usenet servers even with careful pruning of old messages). Typically, a news server will automatically expire old messages, meaning that they're deleted after they've been made available for a certain period of time. How often a server expires messages depends on many factors, including the available disk space, the number of newsgroups carried, the traffic on these newsgroups, and the popularity of various groups. It's possible to set different retention intervals for different newsgroups.

Whichever transfer method is used and however often messages are expired , transfers can involve far more than two computers. News servers can link to each other in an extremely complex web of connections. Each of these connections is known as a news feed. Typically, a smaller or less-well-connected site requests a news feed from a larger or better-connected site. For instance, the news administrator at Tiny College might request a news feed from the much larger Pangaea University. This means that most of the news articles to be made available on news.tiny.edu would come from news.pangaea.edu . The bulk of the transfers would flow in this direction, but of course Tiny College users might post news, so news.pangaea.edu would also accept some messages from news.tiny.edu . Pangaea University, in turn, has a news feed from some other source, which leads to others, and so on.

This relationship need not be entirely linear, though. For instance, it's possible that news.pangaea.edu doesn't carry some newsgroups that the Tiny College news administrators want. In this case, those administrators might seek out a secondary news feed for those newsgroups, and perhaps more news feeds for others. Pangaea University might do the same to obtain the groups it needs. In all these cases, not all newsgroups need be transferred. A site might drop a newsgroup or even an entire hierarchy from its feed to conserve disk space, because the group isn't of interest at the destination site, or for any other reason.

The end result is a system of interconnected news servers that includes a few very large servers that feed other large servers and smaller servers. These in turn feed others, and so on. Any of these servers may also function as a server to news client programs. These news readers also use NNTP, and they can both retrieve and post messages, but they don't feed other sites. Also, a news server adds and modifies some news headers to indicate that the post originated with it, and often to identify the client that originated the message.

It's important to remember that the flow of news articles goes both ways. Indeed, if it weren't for posts at the extreme branches of the news-feeding "tree," the large sites that constitute the "trunk" wouldn't have any news to feed to smaller sites. The large sites simply serve as aggregation points for the massive amounts of news generated by individuals who read and post to newsgroups. Individual news servers, though, receive far more news from their feeds than they generate locally.



Advanced Linux Networking
Advanced Linux Networking
ISBN: 0201774232
EAN: 2147483647
Year: 2002
Pages: 203

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net