Getting Started


Let's try to set up a Spread network with just two daemons:

# cat spread.conf DebugFlags = { PRINT EXIT } EventTimeStamp DangerousMonitor = true Spread_Segment 10.0.0.255:4913 {         www-0-1                    10.0.0.132         www-0-2                    10.0.0.133 } #./spread /===========================================================================\ | The Spread Toolkit.                                                       | | Copyright (c) 1993-2002 Spread Concepts LLC                               | ... | All rights reserved.                                                      | | Version 3.17.03 Built 15/October/2004                                     | \===========================================================================/ Conf_init:using file:spread.conf [Mon 08 May 2006 07:32:19] ENABLING Dangerous Monitor Commands! Make sure Spread network is secured [Mon 08 May 2006 07:32:19] Conf_init:My proc id (192.168.221.22) is not in configuration Exit caused by Alarm(EXIT)


It looks like we were unable to start. This is the first hurdle that you may encounter when trying to start up Spread. Notice that the error we get even refers to a different IP address from any of the ones in our config file. In fact, Spread by default does something that you might not expect: It determines the name of the hostname, resolves that to an IP address, and looks for that in the configuration file. However, we can tell Spread which named entry from the configuration file it should use to start up:

$./spread -n www-0-1 | Version 3.17.03 Built 15/October/2004                                     | \===========================================================================/ Conf_init:using file:spread.conf [Mon 08 May 2006 08:15:54] ENABLING Dangerous Monitor Commands! Make sure Spread network is secured


Looks like the daemon was able to start! Well, we don't have a message saying that, but at least it is not exiting like before. From this point on, we basically have a couple of options: Start up another daemon and see whether the two can communicate, or start the spuser application that comes with Spread and try to join a group and send a message to show that we have a functioning daemon. Let's go the second route:

# ./spuser Spread library version is 3.17.3 SP_error: (-2) Could not connect. Is Spread running? Bye. # ./spuser usage Usage: spuser         [-u <user name>]  : unique (in this machine) user name         [-s <address>]    : either port or port@machine         [-n <username>]   : username for authentication         [-p <password>]   : users password         [-r ]    : use random user name # ./spuser -s 4913 Spread library version is 3.17.3 User: connected to 4913 with private group #user#fog1 ========== User Menu: ----------         j <group> -- join a group         l <group> -- leave a group         s <group> -- send a message         b <group> -- send a burst of messages         r -- receive a message (stuck)         p -- poll for a message         e -- enable asynchonous read (default)         d -- disable asynchronous read         q -- quit User> j test User> s test enter message: weee User> q Bye.


I may be paranoid, but all's too quiet on the western front as the saying goes. We did find out that we have to specify the port that Spread is running on for spuser to be able to connect, but after that I am a bit disappointed that the message I sent to a group that I joined does not seem to be received. At this point we need to find out more about what is going on, so let's enable a few more DebugFlag options in the spread.conf file and start over:

# grep DebugFlags spread.conf DebugFlags = { PRINT EXIT SESSION MEMBERSHIP GROUPS } # ./spread -n www-0-2 [Mon 08 May 2006 07:58:57] Sess_init: ended ok [Mon 08 May 2006 07:58:57] Scast_alive: State is 2 [Mon 08 May 2006 07:58:58] Scast_alive: State is 2 [Mon 08 May 2006 07:58:59] Send_join: State is 4 [Mon 08 May 2006 07:59:00] Send_join: State is 4 [Mon 08 May 2006 07:59:01] Send_join: State is 4 [Mon 08 May 2006 07:59:02] Send_join: State is 4 [Mon 08 May 2006 07:59:03] Send_join: State is 4 [Mon 08 May 2006 07:59:09] Memb_token_loss: I lost my token, state is 5 [Mon 08 May 2006 07:59:09] Scast_alive: State is 2 [Mon 08 May 2006 07:59:10] Scast_alive: State is 2 [Mon 08 May 2006 07:59:11] Send_join: State is 4 [Mon 08 May 2006 07:59:12] Send_join: State is 4 [Mon 08 May 2006 07:59:13] Send_join: State is 4 [Mon 08 May 2006 07:59:14] Send_join: State is 4 [Mon 08 May 2006 07:59:15] Send_join: State is 4 [Mon 08 May 2006 07:59:21] Memb_token_loss: I lost my token, state is 5 [Mon 08 May 2006 07:59:21] Scast_alive: State is 2 [Mon 08 May 2006 07:59:22] Scast_alive: State is 2 [Mon 08 May 2006 07:59:23] Send_join: State is 4 [Mon 08 May 2006 07:59:24] Send_join: State is 4 [Mon 08 May 2006 07:59:25] Send_join: State is 4 [Mon 08 May 2006 07:59:26] Send_join: State is 4 [Mon 08 May 2006 07:59:27] Send_join: State is 4 [Mon 08 May 2006 07:59:33] Memb_token_loss: I lost my token, state is 5 [Mon 08 May 2006 07:59:33] Scast_alive: State is 2 [Mon 08 May 2006 07:59:34] Scast_alive: State is 2 [Mon 08 May 2006 07:59:35] Send_join: State is 4 


Aha! Although it may not be obvious what each of the preceding messages means, something seems not right about the fact that the token gets lost all the time and the states seem to be changing periodically in a loop that doesn't seem to finish. Let's go back to the basics. The good news is that the Spread configuration is simple, so there are not many places that we could have gone wrong. In fact, let's check the nodes definitions again. We are telling Spread to start on the node called www-0-2, which has the IP address 10.0.0.133. Let's make sure just in case:

# /sbin/ifconfig fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500         options=8<VLAN_MTU>         inet 10.0.0.132 netmask 0xffffff00 broadcast 10.0.0.255         inet6 fe80::290:27ff:fef6:3a0e%fxp0 prefixlen 64 scopeid 0x2         ether 00:90:27:f6:3a:0e         media: Ethernet autoselect (100baseTX <full-duplex>)         status: active


Oops! It looks like we got confused about the machine we're running on, and we're trying to start up Spread on www-0-1, telling it that it is www-0-2 instead! That's our mistake, although you would expect to see a more alarming message when Spread cannot bind to the network interface. Let's see whether this is our only problem:

#./spread -n www-0-1 Conf_init: using file: spread.conf [Mon 08 May 2006 07:48:21] ENABLING Dangerous Monitor Commands! Make sure Spread network is secured [Mon 08 May 2006 07:48:21] Memb_token_loss: I lost my token, state is 1 [Mon 08 May 2006 07:48:21] Sess_init: INET bind for port 4913 interface 0.0.0.0 ok [Mon 08 May 2006 07:48:21] Sess_init: INET went ok on mailbox 6 [Mon 08 May 2006 07:48:21] Sess_init: UNIX bind for name /tmp/4913 ok [Mon 08 May 2006 07:48:21] Sess_init: UNIX went ok on mailbox 7 [Mon 08 May 2006 07:48:21] G_init: [Mon 08 May 2006 07:48:21] Sess_init: ended ok [Mon 08 May 2006 07:48:21] Scast_alive: State is 2 [Mon 08 May 2006 07:48:22] Scast_alive: State is 2 [Mon 08 May 2006 07:48:23] Send_join: State is 4 [Mon 08 May 2006 07:48:24] Send_join: State is 4 [Mon 08 May 2006 07:48:25] Send_join: State is 4 [Mon 08 May 2006 07:48:26] Send_join: State is 4 [Mon 08 May 2006 07:48:27] Send_join: State is 4 [Mon 08 May 2006 07:48:28] Memb_handle_token: handling form2 token [Mon 08 May 2006 07:48:28] Handle_form2 in FORM [Mon 08 May 2006 07:48:28] Memb_transitional [Mon 08 May 2006 07:48:28] G_handle_trans_memb: [Mon 08 May 2006 07:48:28] G_handle_trans_memb in GOP [Mon 08 May 2006 07:48:28] G_handle_trans_memb: Received trans memb id of: {proc_id: -1062675178  time:    1147088908} [Mon 08 May 2006 07:48:28] Memb_regular Membership id is ( -1062675178, 1147088909) [Mon 08 May 2006 07:48:28] -------------------- [Mon 08 May 2006 07:48:28] Configuration at www-0-1 is: [Mon 08 May 2006 07:48:28] Num Segments 1 [Mon 08 May 2006 07:48:28]      1       10.0.0.255   4913 [Mon 08 May 2006 07:48:28]              www-0-1                 10.0.0.132 [Mon 08 May 2006 07:48:28] ==================== [Mon 08 May 2006 07:48:28] G_handle_reg_memb:  with (10.0.0.132, 1147088909) id [Mon 08 May 2006 07:48:28] G_handle_reg_memb in GTRANS


This time it looks like we were successful! In fact, if we now go back to the minimal set of DebugFlags, we will be able to tell that we started successfully:

# grep Debug spread.conf DebugFlags = { PRINT EXIT } # ./spread -n relay-0-1 Conf_init: using file: spread.conf [Mon 08 May 2006 08:12:52] ENABLING Dangerous Monitor Commands! Make sure Spread network is secured Membership id is ( -1062675178, 1147090380) [Mon 08 May 2006 08:12:59] -------------------- [Mon 08 May 2006 08:12:59] Configuration at www-0-1 is: [Mon 08 May 2006 08:12:59] Num Segments 1 [Mon 08 May 2006 08:12:59]      1       10.0.0.255   4913 [Mon 08 May 2006 08:12:59]              www-0-1                    10.0.0.132 [Mon 08 May 2006 08:12:59] ====================


Keep in mind that the preceding output is something we want to see after we start Spread because it shows that the daemon that we just started was able to successfully establish a membership. Let's see now whether we can get more satisfying results out of playing with the spuser application:

# ./spuser -s 4913 Spread library version is 3.17.3 User: connected to 4913 with private group #user#www-0-1 ========== User Menu: ----------         j <group> -- join a group         l <group> -- leave a group         s <group> -- send a message         b <group> -- send a burst of messages         r -- receive a message (stuck)         p -- poll for a message         e -- enable asynchonous read (default)         d -- disable asynchronous read         q -- quit User>  j test User> ============================ Received REGULAR membership for group test with 1 members, where I am member 0:         #user#www-0-1 grp id is -1062675178 1147091682 1 Due to the JOIN of #user#www-0-1 User>  s test enter message: weee User> ============================ received SAFE message from #user#www-0-1, of type 1, (endian 0) to 1 groups (5 bytes): weee


Indeed, this looks much more like what we would expect to see. We joined a group that we called test and sent a message that was actually received. Let's attempt to bring the second server up as well:

# ./spread -n www-0-2 Conf_init:using file:spread.conf [Mon 08 May 2006 08:44:41] ENABLING Dangerous Monitor Commands! Make sure Spread network is secured Membership id is ( -1062675179, 1147092289) [Mon 08 May 2006 08:44:48] -------------------- [Mon 08 May 2006 08:44:48] Configuration at www-0-2 is: [Mon 08 May 2006 08:44:48] Num Segments 1 [Mon 08 May 2006 08:44:48]      2       10.0.0.255   4913 [Mon 08 May 2006 08:44:48]              www-0-1                    10.0.0.132 [Mon 08 May 2006 08:44:48]              www-0-2                    10.0.0.133 [Mon 08 May 2006 08:44:48] ==================== ++++++++++++++++++++++ Num of groups: 1 [1] group test with 1 members:         [1] #user#www-0-1 ----------------------


In the output generated by www-0-1, we can see the exact same membershipa good sign. We also notice that both daemons are aware of the existence of a group called test and an application that is a member of that group because we did not quit the spuser connection.

In this case, we probably passed most of the typical hurdles that you would encounter when starting and setting up Spread for the first time. It is possible, however, that when we start up the second Spread daemon, it will not be able to communicate with the first one. On www-0-2, you might see something like this:

Membership id is ( -1062675178, 1147092289) [Mon 08 May 2006 08:44:48] -------------------- [Mon 08 May 2006 08:44:48] Configuration at www-0-2 is: [Mon 08 May 2006 08:44:48] Num Segments 1 [Mon 08 May 2006 08:44:48]      2       10.0.0.255   4913 [Mon 08 May 2006 08:44:48]              www-0-2                    10.0.0.133 [Mon 08 May 2006 08:44:48] ====================


On www-0-1, the output might look like this:

Membership id is ( -1062675179, 1147092272) [Mon 08 May 2006 08:44:48] -------------------- [Mon 08 May 2006 08:44:48] Configuration at www-0-1 is: [Mon 08 May 2006 08:44:48] Num Segments 1 [Mon 08 May 2006 08:44:48]      2       10.0.0.255   4913 [Mon 08 May 2006 08:44:48]              www-0-1                    10.0.0.132 [Mon 08 May 2006 08:44:48] ====================


The two daemons cannot communicate with each other. Most commonly this behavior is caused by firewall restrictions. Spread needs to communicate via UDP/IP and TCP/IP on the port specified in the configuration file, as well as the port immediately above that! In the preceding example, these would be ports 4913 and 4914. By default, Spread comes configured to run on port 4803.

Alternatively, Spread may be configured to run using IP multicast by specifying a multicast address in the Spread segment. If you run into trouble while attempting to use multicast with Spread, the first step should be to independently check the multicast setup of the network. Often problems with multicast setups have nothing to do with Spread and require separate troubleshooting. Another common situation is that Spread seems to work in a setup with two daemons, but it stops working when a third daemon is added. Spread tacitly falls back from using multicast or broadcast, when they don't work, to using unicast if the configuration allows it. However, when three or more nodes are in a segment, unicast is not an option, and the network problem becomes apparent by causing Spread to stop working.

Spread comes with two other tools that are useful for troubleshooting network problems without using Spread itself at allspsend and sprecv:

# make spsend gcc -g -O2 -Wall -I. -I.   -DHAVE_CONFIG_H -c s.c gcc -o spsend s.o alarm.o data_link.o events.o memory.o  -lnsl fog2:~/spread-src-3.17.3 $ make sprecv gcc -g -O2 -Wall -I. -I.   -DHAVE_CONFIG_H -c r.c gcc -o sprecv r.o alarm.o data_link.o  -lnsl $ ./spsend Checking (127.0.0.1, 4444). Each burst has 100 packets, 1024 bytes each with 10 msec delay in between, for a total of 10000 packets sent 1000 packets of 1024 bytes sent 2000 packets of 1024 bytes sent 3000 packets of 1024 bytes sent 4000 packets of 1024 bytes sent 5000 packets of 1024 bytes sent 6000 packets of 1024 bytes sent 7000 packets of 1024 bytes sent 8000 packets of 1024 bytes sent 9000 packets of 1024 bytes sent 10000 packets of 1024 bytes total time is (1,300054), with 0 problems # ./spsend usage Usage:         [-p <port number>] : to send on, default is 4444         [-b <burst>]       : number of packets in each burst, default is 100         [-t <delay>]       : time (mili-secs) to wait between bursts, default 10         [-n <num packets>] : total number of packets to send, default is 10000         [-s <num bytes>]   : size of each packet, default is 1024         [-a <IP address>]  : default is 127.0.0.1 # ./sprecv usage Usage: r         [-p <port number>] : to receive on, default is 4444         [-a <multicast class D address>] : if receiving multicast is desirable,                                            default is 0         [-i <IP interface>] : set interface, default is 0         [-d ]              : print a detailed report whenever messages are missed


The usage of the preceding programs is straightforward and orthogonal to Spread itself, so we will not go into details about them. However, Spread comes with another tool that is useful for both basic and advanced troubleshooting of problems: spmonitor. With spmonitor we can view information about the current state of the Spread daemons, and we can also cause artificial network partitions to test the robustness of our applications in the presence of network faults:

#./spmonitor -n `hostname` ============= Monitor Menu: -------------         0. Activate/Deactivate Status {all, none, Proc, CR}         1. Define Partition         2. Send   Partition         3. Review Partition         4. Cancel Partition Effects         5. Define Flow Control         6. Send   Flow Control         7. Review Flow Control         8. Terminate Spread Daemons {all, none, Proc, CR}         9. Exit Monitor>  0 ============= Activate Status -------------         Enter Proc Name: www-0-1         Enter Proc Name: Monitor: send status query Monitor> ============================ ============================ Status at www-0-1 V 3.17. 3 (state 1, gstate 1) after 718730 seconds : Membership  :  9  procs in 1 segments, leader is www-0-1 rounds   : 18742726     tok_hurry : 3874314     memb change:      33 sent pack:  930924 recv pack : 6136167  retrans    : 1270737 u retrans: 1219986 s retrans :   50751  b retrans  :       0 My_aru   : 2560710 Aru       : 2560710  Highest seq: 2560710 Sessions :       2 Groups    :       5  Window     :      60 Deliver M: 12011390     Deliver Pk: 12479141    Pers Window:      15 Delta Mes:     146 Delta Pack:     146  Delta sec  :      10 ==================================


Inspecting the state of a Spread daemon may provide good insight into the health of the Spread setup. The status message is sent at regular intervals until we turn it off; therefore, we can follow how the various figures change. On the first status line we notice the version of Spread that we are running and the uptime of the instance that we are looking at. The state and gstate variables should be "1" if the daemon membership is stable. We can see that we are looking at a membership that now has nine daemons in one segment (this is from a live configuration that has seven more nodes listed in its configuration). The number of rounds in the third line indicates the number of times the token has rotated around the Spread ring. The number of membership changes on the same line indicates how often we had daemons joining/leaving the system either due to crashes, restarts, or network issues. A large value there may indicate instability of the network setup. In the fourth and fifth lines of the status message we can see how many total packets were sent, received, and retransmitted, as well as a breakdown of the types of retransmissions (unicast, segment, broadcast). A high number of retransmissions may also indicate problemsa congested network, for example. Segment retransmissions (s_retrans) may indicate problems in the case of multisegment setups, when there are connectivity issues between two segments. The number of sessions represents the number of applications locally connected to this daemon instance; Groups refers to the number of groups that exist in the system, whereas the Window and Pers Window parameters define the flow control characteristics of the systemhow many messages can be sent per token revolution and how many messages can each sender send when it holds the token. We can see the number of delivered messages and packets (a message may be split into multiple packets) from the start of this instance, as well as the number of messages/packets delivered in the Delta interval since the previous status message was sent.

We now have a Spread network running and can move on to look at the programming API used to build on our own applications.




Scalable Internet Architectures
Scalable Internet Architectures
ISBN: 067232699X
EAN: 2147483647
Year: 2006
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net