Performing Solaris Initialization and Shutdown | Solaris 9 Sun Certified System Administrator Study Guide

Earlier in this chapter, you learned about the Solaris boot process. Now is a good time to review that four-phase SPARC boot sequence and the files involved:

Boot PROM
Boot programs (bootblk, ufsboot)
Kernel initialization (ufsboot, /sbin/init, /etc/inittab)
Init (/sbin/init, /sbin/rc*)

After the boot PROM is done with its initial boot sequence, it finds and loads the bootblk file. The bootblk file then loads ufsboot, which finds and begins to load the kernel. Although bootblk and ufsboot are critical to the Solaris boot process, they're not very exciting to talk about. Because we've already discussed the boot PROM in some depth, and bootblk and ufsboot don't merit a great deal of dialogue, it's time to take a look at what happens on the computer when loading Solaris.

System Initialization

If your SPARC system is configured to auto-boot (auto-boot? = true in OpenBoot), you don't need to worry about starting the /sbin/init process (also known just as the init process). It will happen as the system boots. However, if your system doesn't auto-boot, you'll be sitting at an ok prompt waiting for something to happen.

Instead of sitting there waiting (computers can wait a long time), type boot to get the computer to start Solaris. There are various command-line arguments you can use with boot, and some are shown in Table 3.5.

Table 3.5: Common *boot* Command Options
Option	Description
-a	Performs an interactive boot
-r	Performs a reconfiguration boot
-s	Boots into run level S (regardless of what's in the /etc/inittab file)
-v	Boots and displays messages in a verbose mode
net	Boots from the network instead of the local disk
cdrom	Boots from the local CD-ROM or DVD-ROM
floppy	Boots from the floppy disk drive

Command options can be combined. If, for example, you wanted to boot from the CD-ROM and also wanted to boot in verbose mode, you could type boot -v cdrom. After initiating the boot command, Solaris will begin to load.

The first process the kernel starts is called the swapper. Swapper has the official name of sched and a process ID of 0. The swapper's responsibility is to schedule all other processes, so it makes sense that it needs to be started first.

Because /sbin/init is directly responsible for starting all other processes when Solaris boots, it's known as a parent process. Spawned processes are called children or child processes. One of the first things that /sbin/init does is to look at the /etc/inittab file to see which other processes need to be started and what order they should be started in. Here is an example /etc/inittab file:

 ap::sysinit:/sbin/autopush -f /etc/iu.ap ap::sysinit:/sbin/soconfig -f /etc/sock2path fs::sysinit:/sbin/rcS sysinit >/dev/msglog 2<>/dev/msglog </dev/console is:3:initdefault: p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog    2<>/dev/msglog sS:s:wait:/sbin/rcS          >/dev/msglog 2<>/dev/msglog </dev/console s0:0:wait:/sbin/rc0          >/dev/msglog 2<>/dev/msglog </dev/console s1:1:respawn:/sbin/rc1       >/dev/msglog 2<>/dev/msglog </dev/console s2:23:wait:/sbin/rc2         >/dev/msglog 2<>/dev/msglog </dev/console s3:3:wait:/sbin/rc3          >/dev/msglog 2<>/dev/msglog </dev/console s5:5:wait:/sbin/rc5          >/dev/msglog 2<>/dev/msglog </dev/console s6:6:wait:/sbin/rc6          >/dev/msglog 2<>/dev/msglog </dev/console fw:0:wait:/sbin/uadmin 2 0   >/dev/msglog 2<>/dev/msglog </dev/console of:5:wait:/sbin/uadmin 2 6   >/dev/msglog 2<>/dev/msglog </dev/console rb:6:wait:/sbin/uadmin 2 1   >/dev/msglog 2<>/dev/msglog </dev/console sc:234:respawn:/usr/lib/saf/sac -t 300 co:234:respawn:/usr/lib/saf/ttymon -g -h -p "`uname -n` console login:    " -T sun -d /dev/console -l console -m ldterm,ttcompat

Each line in the inittab file is a separate entry, and each entry in the /etc/inittab file contains the following fields:

 id:runlevel:action:process

Each entry starts with a one- to four-character identifier (id), which is just a name for the particular entry. The runlevel field specifies which run levels this entry applies to. The action field describes how the process listed in the process field is supposed to be executed. Possible values include initdefault, sysinit, boot, bootwait, wait, powerfail, and respawn. The last field, process, identifies a program or script to execute. Table 3.6 describes the action field values used in the default /etc/inittab file.

Table 3.6: *action* Field Values
Value	Description
initdefault	Defines the default run level for this computer.
powerfail	This process is executed when init receives a power fail signal.
respawn	If the process associated with this entry fails, restart it immediately.
sysinit	These actions are executed before the login prompt is displayed. Each entry using sysinit must be completed before the next entry can be processed.
wait	The process associated with this field must be completed before another process linked to the same run level can be started.

Tip

On the exam, you might be expected to read parts of an /etc/inittab file and to determine what actions Solaris will take based on the information presented.

Based on the /etc/inittab file presented earlier, the default run level for Solaris 9 is run level 3. This is defined by the fourth entry, which uses the initdefault variable in the action field. If you're not clear about what a run level is, don't worry just yet, because we'll cover that in the next section. For now, just accept run level 3 as the default.

If the system boots into run level 3, entries in the /etc/inittab file that have a 3 in the runlevel field will be processed. Those lines are as follows:

 ap::sysinit:/sbin/autopush -f /etc/iu.ap ap::sysinit:/sbin/soconfig -f /etc/sock2path fs::sysinit:/sbin/rcS sysinit >/dev/msglog 2<>/dev/msglog </dev/console is:3:initdefault: p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog    2<>/dev/msglog s2:23:wait:/sbin/rc2          >/dev/msglog 2<>/dev/msglog </dev/console s3:3:wait:/sbin/rc3           >/dev/msglog 2<>/dev/msglog </dev/console sc:234:respawn:/usr/lib/saf/sac -t 300 co:234:respawn:/usr/lib/saf/ttymon -g -h -p "`uname -n` console login:    " -T sun -d /dev/console -l console -m ldterm,ttcompat

You might take a look at these lines and say, "Wait a minute. The first three lines don't have a 3 anywhere in them!" That's true, but remember that because they have sysinit in the action field, they will be processed regardless of the run level. The fourth line defines the default run level (initdefault) as 3.

The fifth entry, p3, tells Solaris to run the shutdown process (/usr/sbin/shutdown) if a power fail signal is detected. This line also applies to run levels S, 1, 2, and 4. Line 6, s2, says to run the /sbin/rc2 script, and line 7 says to run the /sbin/rc3 script. More on rc scripts a few sections from now.

Line 8 and 9 both start processes that need to be restarted if they ever fail. The /usr/lib/ saf/sac process starts the port monitors, and the /usr/lib/saf/ttymon process monitors the console for login requests.

Run Levels

A run level defines the operation of Solaris, including what resources and services are available to users. You will also hear run levels called run states or init states. Although Solaris 9 has eight run levels, a system can be in only one run level at a time. Run levels and their associated scripts are not platform dependent. They will be the same whether you're running SPARC or IA. Table 3.7 describes the eight run levels and their features.

Table 3.7: The Eight Solaris Run Levels
Run Level	Description	Purpose
0	Stops all services, terminates all processes, and unmounts all file systems.	To shut down Solaris and return the machine to the Forth Monitor. Preparation for system shutdown.
S or s	Single-user state. Only root is allowed to log in, and all logged-in users are logged out.	For performing administrative tasks that require all users to be logged out (such as system backups).
1	Single-user state. Logged-in users are allowed to remain logged in.	For performing administrative tasks, and not allowing new users to log in.
2	Multiuser state. All file systems are mounted, but NFS is not started.	For normal operations when the sharing of file systems across a network is not required.
3	Multiuser state with NFS. All file systems are mounted, and the NFS daemon is started.	For normal operations that require resources to be shared across a network.
4	Alternative multiuser state.	Not currently used.
5	Power-down state.	To shut down the computer. On supported systems, run level 5 will automatically power down the computer. If not, the Forth Monitor will be presented.
6	Reboot.	To reboot the machine into the default run level.

Note

Although init state 4 is not used, Sun still considers Solaris to have eight run levels.

Run states can be changed with the init command. The init command calls the /etc/ inittab file and processes whatever scripts are needed to change to the run level you indicated. For example, to change to run state S, you could issue the command init s. Reading the /etc/ inittab file tells the init process to run the /sbin/rcS script.

To determine which run state your computer is in, issue the who -r command. It will tell you which run state you are in and when the system was last booted into this state.

Run Control Scripts

There's one more piece to the puzzle of understanding the boot process and init states. You already know that the init process reads the /etc/inittab file whenever the system is booted or a run level is changed by way of the init command. In the example we used earlier, changing the system to run level 3 forces the execution of the /sbin/rc2 and /sbin/rc3 scripts. In fact, each of the run levels has an associated run control (rc) script located in the /sbin directory. They are named rcS and rc0 through rc6.

For each script in the /sbin directory, there exists a related directory named /etc/rc#.d that contains scripts to run in order to affect the desired run level's characteristics. For example, if you issue the init 1 command, you execute the /sbin/rc1 script, which in turn executes all scripts located in the /etc/rc1.d directory (of which there are 40). Those scripts are designed to keep the system running, but not allow any new users to log in.

Note

The exceptions are the /sbin/rc5 and /sbin/rc6 scripts, which execute the /etc/rc0.d/K* scripts to kill all active processes and unmount all file systems.

Each /etc/rc#.d directory contains a number of scripts designed to either stop or start services and daemons. Script names start with either an S or a K, followed by two digits and a name. Examples are S88sendmail and K21dhcp. S scripts are designed to start services, and K scripts kill services. The number after the S or K indicates the order in which to execute the script. All kill scripts are executed before start scripts, and lower-numbered scripts will be executed before higher-numbered scripts.

The combination of scripts in a directory is really the key to setting up a run level. As you might expect, the /etc/rc2.d and /etc/rc3.d directories contain a lot more S scripts than K scripts, whereas the /etc/rc0.d directory has nothing but K scripts by default.

If you want to stop an individual service without changing the run level, scripts in the /etc/init.d directory can be executed. For example, if you wanted to change the running status of the NFS server daemon without leaving run level 3, you could run the /etc/init.d/ nfs.server script with either the stop or start argument:

 # /etc/init.d/nfs.server stop

If there is no script that stops or starts a service that you require, you can create your own, give it an appropriate name, and place it in the directory corresponding to the run level in which you want the script to execute. To prevent a script from running, disable it by renaming it. It must not start with a K or S, or the script will be run. A good recommendation is to rename it with a preceding underscore. The script S90samba would become _S90samba. This way, if you ever want to re-enable the script, it is easy to find and rename appropriately.

System Shutdown

Solaris 9 is designed to run continuously for long periods of time. It's conceivable to run a Solaris machine for a year or longer without needing to power down or reboot. However, at times you might need to shut down the system. Perhaps you want to upgrade the memory in your Solaris server. Or the power went out in your building and your backup power supply has only about 10 minutes left. Or you have made changes to kernel parameters in the /etc/system file and need to reboot. These are all times that you need to either power cycle or kill the power completely, and it's best to shut down Solaris gracefully.

The two preferred commands to shut down Solaris are init and shutdown. To execute either command, you must be logged in as the superuser or have an equivalent role. Both init and shutdown perform a clean shutdown of Solaris, meaning they run all the necessary kill scripts to shut down services and daemons in an orderly manner.

Note

The reboot and halt commands can also be used to stop Solaris, but they are not recommended. Neither reboot nor halt performs a clean shutdown, and using these commands can result in data loss or corrupt files.

Shutting Down a Server

The preferred command to bring down a server properly is the shutdown command. By default, this command takes the system to init state S. Before it shuts down the system (remember, all users are logged off when the system is changed to run state S), the shutdown command can send a message to all logged-in users and wait a specified amount of time before shutting down.

The shutdown command uses the following syntax:

 # shutdown -i init_state -g grace_period -y "message"

All the arguments are optional. By executing shutdown with no arguments, the system will reboot into run state S, wait 60 seconds to reboot, and ask you to confirm the reboot. To specify an alternate run state or grace period, use the -i or -g arguments, respectively. The grace period is specified in seconds. The -y argument will execute the shutdown command after the grace period without prompting you. The -y switch is not required to supply a shutdown message.

Tip

If you have made changes to the /etc/inittab file and want it to be reread without rebooting, issue the init q command.

Before shutting down a server, it's recommended that you see who is logged in first. To do this, use the who command. You will see something like this:

 # who root      console     Aug  4 09:27 ramini    pts/0       Aug 31 13:02     (badpun) mgantz    pts/1       Mar 17 05:44     (macuser)

This output shows three users logged in. The superuser is logged in locally, whereas the other two users are logged in remotely. You can see when each user logged in and the name of the computer that each user is logging in from. Although the shutdown command does warn users, you might want to send the users an e-mail or warn them personally for good measure.

Shutting Down a Workstation

Generally speaking, only one user is logged into a workstation at a time. Warnings and grace periods don't need to be issued, so you can use init to shut down the system properly. The init command requires a run level to be specified as an argument. To change a system to run level 0, type init 0.

Tip

The poweroff command can also be used and is the equivalent of init 5.

Using the init 0 command will return a SPARC computer to the OpenBoot ok prompt. If you're using an IA computer, you will see the message Type any key to continue. In either case, it's safe to power off the computer. If you want to reboot, type boot at the ok prompt, or press any key on the IA computer to reboot.

Although the reboot and halt commands are not recommended for general use, they do have a purpose. Neither command takes as long to execute as init or shutdown, by virtue of not running kill scripts. So if you're desperate and in a hurry to save a few seconds, feel free to use reboot or halt. Also, reboot and halt might work if either init or shutdown fail.

If none of the shutdown commands work, it's possible that the system is hung. You have a few options when this happens. One is to use a Stop key sequence, such as Stop+A or L1+A, depending on your keyboard type (or the Break key if you're logged in through a terminal). Another option is to simply power the machine down. Neither option is very attractive, but they're about all you have left if nothing else works.

Real World Scenario: The Pitfalls of Shutting Down a Server

Your Solaris server is in need of a memory upgrade. You decide that next Tuesday at 8 P.M. you are going to take the server down, install the memory, and perform some routine cleaning. That late at night will be a good time to perform maintenance because everyone in the office is usually gone by then. Just to be safe and to warn everyone far in advance, you send out a company-wide e-mail to alert users of the scheduled downtime.

Tuesday arrives, and 8:00 eventually does as well. You execute a shutdown -i 0 -y "Log out now!" command to bring down the server. No more than two minutes later, your phone starts ringing off the hook. Four angry managers want to know where their unsaved files are. What happened?

You did use the shutdown command, which was a good choice, but a few things could have made this scenario turn out better. First, you could have used a longer grace period; 60 seconds is not a lot of time for users to save files and log off. Besides, what if someone is away from his or her desk?

Second, you forgot the almighty rule when scheduling and performing server maintenance: some users will always, always forget about the e-mail they received (not to mention ignore the warning) and then be mad when the server does go down. To alleviate this issue, educate your users. Make sure they understand that when you say the server is coming down, you're not kidding. It would be wise for them to save their files and log off. To help them remember, you could take a brief tour of your facilities and warn users personally (although this isn't always possible).

To summarize, when bringing down a server, give the users plenty of warning, educate them about your IT department practices, and make sure to bring the server down gracefully. In the end, everyone will benefit.

Troubleshooting a Hung System

Solaris 9 is a solid operating system and rarely has serious operational problems. Occasionally, though, a process or application can hang or even cause the operating system itself to hang. If your system seems to be unresponsive, there are some steps you can take in an attempt to rectify the situation.

If your system is running in a windowed environment (such as CDE), a hung application can be terminated without affecting the rest of the system. The first thing to try is moving the mouse pointer. If it moves, then it's possible the application is frozen. If the mouse pointer does not move, it could indicate that CDE has locked up or the operating system is stuck.

If your mouse pointer works, move it over the open window that is not responding. Press Control+Q (the Control and Q keys at the same time) to attempt to unfreeze the window. (A common sequence is to use Control+Q to attempt to activate the window, followed by Control+C, and then Alt+F4.) If the window you are in is unresponsive, you can always attempt to use another window. You can also try logging into the machine remotely and using the pgrep utility to find the hung process and pkill to kill it.

On IA computers, if the keyboard and mouse are not responding, you might have to use the computer's reset button if it has one. If the reset button does not work, hold the power button in, and the computer should power itself off after about four seconds. Again, these aren't recommended ways to shut down a normally operating computer, but if it's hung, you might not have any other choice.

If the preceding actions don't help, or you are not running a windowed system, you can try the following:

Press Control+\ to force the running program to quit.
Press Control+C to interrupt the running program.
Log in remotely, identify the hung process, and kill it.
Log in remotely, become a superuser, and reboot the computer.
If none of the first four steps work, force a crash dump and reboot. To force a crash dump:
1. Press Stop+A or L1+A to stop your system (or the Break key if you are logged in remotely).
2. Type sync at the ok prompt.
3. Reboot the computer and log back in.
If nothing else works, turn the power off, wait a minute, and turn the power back on.

Using common sense in troubleshooting can go a long way, too. For example, if your keyboard is not responding (for example, the Caps Lock and/or Num Lock keys don't toggle the lights on the keyboard), then all the key sequences designed to kill applications are not likely to work. Similarly, if your keyboard and mouse are not responsive in a windowed system, it's probable that the operating system is locked, and you will need to turn the power off and back on.