12.4. The Fault Tree
The fault tree presented in this section is for diagnosing and fixing problems that occur when you're installing and reconfiguring Samba. Before you set out to troubleshoot any part of the Samba suite, you should know the following information:
For clarity, we've renamed the server in the following examples to server.example.com, and the client system to client.example.com.
12.4.1. How to Use the Fault Tree
Start the tests here, without skipping forward; it won't take long (about five minutes) and might actually save you time backtracking. Whenever a test succeeds, you will be given a name of a section to which you can safely skip.
12.4.2. Troubleshooting Low-Level IP
The first series of tests is that of the low-level services that Samba needs to run. The tests in this section verify that:
Subsequent sections add the Samba daemons smbd and nmbd, host-based access control, authentication and per-user access control, file services, and browsing. The tests are described in considerable detail to make them understandable by both technically oriented end users and experienced systems and network administrators.
18.104.22.168. Testing the networking software with ping
The first command to enter on both the server and the client is ping 127.0.0.1. This tries to send data to the loopback address and indicates whether any networking support is functioning. On both Windows and Unix, you can run ping 127.0.0.1 from a command shell and usually interrupt it after a few lines. Here is an example on a Linux server:
$ ping 127.0.0.1 PING localhost: 56 data bytes 64 bytes from localhost (127.0.0.1): icmp-seq=0. time=1. ms 64 bytes from localhost (127.0.0.1): icmp-seq=1. time=0. ms 64 bytes from localhost (127.0.0.1): icmp-seq=2. time=1. ms ^C ----127.0.0.1 PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/1
Some versions of ping let you set a limit on how many times it makes the round trip, so you don't have manually interrupt the command. For instance, on Linux you could enter ping -c5 to stop automatically after five transmissions.
If you get ping: no answer from _ ._ ._ ._ or 100% packet loss, you have no IP networking installed on the system. The address 127.0.0.1 is the internal loopback address and doesn't depend on the computer being physically connected to a network. If this test fails, you have a local problem. TCP/IP isn't installed, it's misconfigured, or a firewall might be preventing ICMP packets. See your operating system documentation if it's a Unix server. If it's a Windows client, follow the instructions in Chapter 3 to install networking support.
22.214.171.124. Testing local name services with ping
Next, try to ping localhost from a shell on the Samba server. The name localhost is the conventional hostname for the 127.0.0.1 loopback interface, and it should resolve to that address. After typing ping localhost, you should see output similar to the following:
$ ping localhost PING localhost: 56 data bytes 64 bytes from localhost (127.0.0.1): icmp-seq=0. time=0. ms 64 bytes from localhost (127.0.0.1): icmp-seq=1. time=0. ms 64 bytes from localhost (127.0.0.1): icmp-seq=2. time=0. ms ^C
If this succeeds, try the same test on the client. Otherwise:
126.96.36.199. Testing the networking hardware with ping
Next, ping the server's network IP address from itself. This should get you exactly the same results as pinging 127.0.0.1:
$ ping 192.168.236.86 PING 192.168.236.86: 56 data bytes 64 bytes from 192.168.236.86 (192.168.236.86): icmp-seq=0. time=1. ms 64 bytes from 192.168.236.86 (192.168.236.86): icmp-seq=1. time=0. ms 64 bytes from 192.168.236.86 (192.168.236.86): icmp-seq=2. time=1. ms ^C ----192.168.236.86 PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/1
If this test works on the server, repeat it for the client. Otherwise:
188.8.131.52. Testing connections with ping
Now, ping the server by name (instead of its IP address)once from the server and once from the client. This is the general test to determine whether your network is working:
$ ping server PING server.example.com: 56 data bytes 64 bytes from server.example.com (192.168.236. 86): icmp-seq=0. time=1. ms 64 bytes from server.example.com (192.168.236.86): icmp-seq=1. time=0. ms 64 bytes from server.example.com (192.168.236.86): icmp-seq=2. time=1. ms ^C ----server.example.com PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/0/1
If successful, this test tells you four things:
If this test isn't successful, one of several things can be wrong with the network:
If this command works from the server, repeat it from the client.
12.4.3. Troubleshooting Server Daemons
Once you've confirmed that basic networking is working properly, the next step is to make sure that the daemons are running on the server. This determination takes three separate tests, because no single one of the following tests can decisively prove that everything is functioning properly.
To be sure that the daemons are running, you need to find out whether they:
184.108.40.206. Tracking daemon startup
First, check the Samba logs. If you've started the daemons, the message smbd version release started should appear. If it doesn't, you need to restart the Samba daemons.
If the daemon reports that it has indeed started, look out for bind failed on port XXX socket_addr=0 (Address already in use). This means another daemon has been started on port 139 or 445 (smbd). Also, nmbd will report a similar failure if it cannot bind to port 137 or 138. Either you've started a daemon twice, or the inetd server has tried to provide a daemon for you.[*] If it's the latter, we'll diagnose that in a moment.
Another useful trick for locating a startup failure is to start the failing service from the command line and monitor its progress. All Samba daemons support the -i command-line option for just such a purpose. Combined with a high debug level dumping to standard output, this option should help you to locate the exact point of startup. The following example illustrates the message displayed when you try to launch smbd when a previous instance was still running:
$ smbd -d 10 -i .... ERROR: smbd is already running. File /var/run/smbd.pid exists and process id 31654 is running. talloc report on 'null_context' (total 453 bytes in 73 blocks) lp_talloc contains 453 bytes in 72 blocks
From here, you can check the process listing to verify whether the existing process is in fact smbd. It is possible that a previous instance of smbd has exited but not cleaned up its pid file, and that another process exists with that same pid. Use the ps command on the server with the "long" option for your system type (commonly ps ax or ps -ef), and see whether smbd and nmbd are already running. This often looks like the following:
$ ps ax | grep mbd 31654 ? Ss 0:00 smbd -D -d3 31656 ? Ss 0:02 nmbd -D -d3 31657 ? S 0:00 smbd -D -d3
This example illustrates that smbd and nmbd have already started as standalone daemons (the -D option) at log level 3 (-d3).
220.127.116.11. Looking for daemons bound to ports
Both smbd and nmbd have to register with the operating system so that they can get access to the necessary TCP/IP ports. The netstat command will tell you if this has been done. Run the command netstat -a on the server, and look for lines mentioning netbios-ns (137/udp), netbios-dgm (138/udp), netbios-ssn (139/tcp), or microsoft-ds (445/tcp):
$ netstat -an | egrep ':(137|138|139|445)' tcp 0 0 *:139 *:* LISTEN tcp 0 0 *:445 *:* LISTEN udp 0 0 *:137 *:* udp 0 0 *:138 *:*
Although you may see additional lines listed, there should be at least two UDP lines, one for the NetBIOS name service port (137) and one for the NetBIOS datagram service (138), indicating that the nmbd server is registered and (we hope) is waiting to answer requests. There should also be at least one TCP line for each of the values of the smb ports parameter in smb.conf. The default value includes both ports 139 and 445, so frequently you will see a TCP line for each one. Additionally, these ports should be in the LISTEN state. This means that smbd is up and waiting to accept connections.
There might be other TCP lines indicating connections from smbd to clients, one for each client. These are usually in the ESTABLISHED state. If there are smbd lines in the ESTABLISHED state, smbd is definitely running. If there is only one line in the LISTEN state, you can't be sure yet. If both of the lines are missing, a daemon has not succeeded in starting, so it's time to check the logs, and then go back to Chapter 2.
If there is a line for each client, it might be coming either from a Samba daemon or from the meta-daemon, inetd. It's quite possible that your inetd startup file contains lines that start Samba daemons without your realizing it; for instance, although such behavior is becoming increasingly rare, the lines might have been placed there if you installed Samba as part of a Linux distribution. The daemons started by inetd prevent ours from running. This problem typically produces log messages such as bind failed on port XXX socket addr=0 (Address already in use).
Check your /etc/inetd.conf file or /etc/xinetd.d/ directory; unless you're intentionally starting the daemons from there, any servers bound to the netbios-* or microsoft-ds ports should be disabled. Refer to Chapter 2 for details concerning Samba and inetd.
18.104.22.168. Checking smbd with telnet
The easiest way to test that the smbd server is actually working is to send it a meaningless message and see if it is rejected. Try something such as the following:
$ echo "hello" | telnet localhost 139 Trying Trying 192.168.236.86 ... Connected to server. Escape character is '^]'. Connection closed by foreign host.
This command sends an erroneous but harmless message to smbd. If you get a Connected message followed by a Connection closed message, the test was a success. You have an smbd daemon listening on the port and rejecting improper connection messages. On the other hand, if you get telnet: connect: Connection refused, most likely no daemon is present. A less likely explanation is that you have attempted to connect to the wrong port. Remember that the ports used by smbd are controlled by the smb ports option. Make sure you use one of these ports. If all else fails, check the logs and go back to Chapter 2.
Regrettably, there isn't an easy test for nmbd. If the telnet test and the netstat test both say that an smbd is running, there is a good chance that netstat will also be correct about nmbd running. nmbd is tested further later in this chapter when we troubleshoot network browsing problems.
22.214.171.124. Testing daemons with testparm
Once you know there's a daemon, you should always run testparm, in hopes of getting something such as the following:
$ testparm Load smb config files from /usr/local/samba/lib/smb.conf Processing section "[homes]" Processing section "[printers]" ... Processing section "[tmp]" Loaded services file OK. ...
The testparm program normally reports the processing of a series of sections and responds with Loaded services file OK if it succeeds. If there is something wrong with the file, testparm reports one or more of the following messages, which also appear in the logs as noted:
After the first testparm test, repeat it with (exactly) three parameters: the name of your smb.conf file, the name of your client, and its IP address:
$ testparm /usr/local/samba/lib/smb.conf client 192.168.236.10
This command runs one more test that checks the hostname and address against hosts allow and hosts deny options and might produce the Allow connection from hostname to service and/or Deny connection from hostname to service messages for the client system. These messages indicate that you have hosts allow and/or hosts deny options in your smb.conf, and they prohibit access from the client system.
12.4.4. Troubleshooting SMB Connections
Now that you know the servers are up, you need to make that sure they're running properly. Start by placing a simple smb.conf file in the /usr/local/samba/lib directory.
126.96.36.199. A minimal smb.conf file
In the following tests, we assume that you have a [temp] share suitable for testing, plus at least one valid user account (we'll use one named rose). An smb.conf file that includes just these is as follows:
[global] workgroup = EXAMPLE security = user [homes] read only = no [temp] path = /data/tmp read only = no
188.8.131.52. Testing locally with smbclient
The first test ensures that the server can list its own services (shares). Run the command smbclient -L localhost -N to anonymously connect to the server from itself. You should see the following:
$ smbclient -L localhost -N Anonymous login successful Domain=[EXAMPLE] OS=[Unix] Server=[Samba 3.0.22]] Sharename Type Comment --------- ----- ---------- temp Disk homes Disk IPC$ IPC IPC Service (Samba 3.0.22) ...
If you received this output or something similar, move on to the next section, "Testing connections with smbclient." On the other hand, if you receive an error, check the following:
184.108.40.206. Testing connections with smbclient
Run the command smbclient //server/temp to connect to the server's [temp] share and to see if you can connect to a file service. We assume that a valid account for the user named rose has already been created. You should get the following response:
$ smbclient //server/temp -U rose Password: <enter password> Domain=[EXAMPLE] OS=[Unix] Server=[Samba 3.0.22] smb: \> quit
If you get Get_Hostbyname: Unknown host name or Connect error: Connection refused, see the previous section, "Testing locally with smbclient," for the possible diagnoses.
Now, at the Password: prompt, provide the password for the account given as the -U argument value. If you then get an smb: \> prompt, the connection works. Enter quit and continue on to the next section, "Testing connections with net use."
A response of NT_STATUS_LOGON_FAILURE indicates either that you are using an invalid account name or that the password you used didn't match the credentials for the account. It is a good idea to verify that the account exists by running pdbedit --verboserose.
An error message referring to NT_STATUS_BAD_NETWORK_NAME can be caused by any one of the following:
Once you have connected to [temp] successfully, repeat the test, this time logging in to your home directory (e.g., connect to the network path //server/rose). If you have to change anything to get that to work, retest [temp] again afterward.
220.127.116.11. Testing connections with net use
Run the following command on the Windows client to see whether it can connect to the server:
C:\> net use * \\server\temp /user:rose
You should be prompted for a password whether or not the password for rose on the Samba server is different than the one you used to logon to the Windows console. Once the correct password has been transmitted, you should see the response:
The command was completed successfully.
If that worked, congratulations! You have completed all of these tests successfully, and your server should be ready to accept connections from users. Otherwise: