2.2. Tweaking a Running Kernel: sysctl

< Day Day Up >

As the BSD kernels have evolved, many of their internal data structures have become more exposed to the administrator. Modern BSD kernels can now be tuned while they run, using a program called sysctl(8) that displays and sometimes alters the value of a variable in the running kernel. Most sysctl variables control highly specialized and localized functionality in the kernel. Your server's role, and the environment in which it deployed, largely determine which of these variables if any you need to set. A very active database server might need more open files per process than usual. A very active web server will often need various TCP/IP variables tuned. An Internet server that is in a hostile network will need options set that are unnecessary for an intranet server on a friendly network.

These variables might indicate the default settings in a kernel module or device driver (e.g., net.inet.ip.ttl=64). Sometimes they offer low-level tuning on things like buffer sizes (e.g., kern.ipc.maxsockbuf=262144). Some of them are read-only and dynamic to provide a snapshot in time of some kernel state that is constantly in flux (e.g., vm.loadavg=0.04 0.08 0.08). Others merely offer a convenient way to get information out of the kernel. Changing these values is a fundamental skill that will be required as part of most significant tasks discussed in this book.

sysctls Change

The names and behaviors of sysctl variables change from time to time. New versions of the operating system introduce new variables and retire old ones. In this chapter we discuss FreeBSD 5.3-RELEASE and OpenBSD 3.6. There have been significant changes in FreeBSD, for example, from the 4.x versions to the 5.x versions.

There are several places to find all the names and values for variables. The definitive source, of course, is to run sysctl -a. That will list all of the sysctl variables known to the current kernel, and it will show their current values. It does not, however, indicate which values can be changed, nor what would happen if they were changed. The sysctl(3) and sysctl(8) manpages indicate whether particular values are changeable. Some values are not changeable at runtime, but can be altered from their defaults at boot time by entries in /etc/sysctl.conf(5).

2.2.1. Setting sysctl Values

Reading sysctl values is easy: just run sysctl variablename. Any user can read all kernel values. Setting variables is equally easy. Just add an equals (=) sign and the new value: sysctl kern.securelevel=2. Only root can set values. Once you have a set of sysctl values that you like and want to make default, you can add them to sysctl.conf(5) in a simple variable=value format. The only values that can go in /etc/sysctl.conf, however, are those that can be set once the system is up and running multiuser. FreeBSD allows some values to be set at boot time, but not later. Those go in /boot/loader.conf. The loader(8) manpage lists which variables can only be tuned at boot time in this file.

sysctl is a very powerful tool and it has no error checking. Tweaking the wrong variable the wrong way can send your system spiraling downward quickly. Fortunately, sysctl values are not permanent, so a simple reboot will fix a badly set sysctl value. Be sure to test them before codifying them in a file like /etc/sysctl.conf.

2.2.2. Kernel Security Level

There are several sysctl variables that are important to overall system security. Probably the single most important sysctl variable in the entire system is the variable kern.securelevel, simply referred to as its securelevel. Its value has diverse effects across a wide variety of functions and features.

Table 2-2 summarizes the various ways in which the kernel securelevel affects system operations. They are explained in detail in the following sections.

Table 2-2. Kernel security levels
	Securelevel
System property	-1	0	1	2	3
System immutable and append-only flags can be changed			-	-	-
Raw disk devices for mounted file systems can be written^*			-	-	-
/dev/mem and /dev/kmem can be written			-	-	-
Kernel modules can be loaded and unloaded			-	-	-
Non-mounted raw disk devices can be written^*				-	-
Filesystems can be mounted				-	-
Time can be adjusted more than one second forward or back				-	-
IP filtering and firewall rules can be changed					-

^* Raw disk devices only exist in OpenBSD.

Securelevel 2 in OpenBSD is equivalent to 3 in FreeBSD.

Filesystems can be mounted under OpenBSD, FreeBSD 4.x , but not under FreeBSD 5.x.

Root can raise the securelevel at any time, but it can never be lowered. You must change your configuration file and reboot to run at a lower level.

2.2.2.1 Level -1: "permanently insecure"

If the system finds itself in securelevel -1 at the end of the boot process, it will not raise the securelevel. Thus, setting your default level to -1 is how you get the system to stay insecure at boot time. At various times in later chapters we will recommend that you "reboot to a lower securelevel" in order to accomplish something that can only be done that way. Setting your default securelevel to -1 is how you'd do that.

In FreeBSD, you may also specify kern_securelevel_enable="NO" in/etc/rc.conf to boot into securelevel -1.

At kernel securelevels 0 and -1 the operating system behaves as like any traditional Unix: root is supreme and can do all things. All filesystem flags are enforced but can be both set and unset. All devices can be read and written to as their permissions indicate. Firewall rules can be set and unset at will, and the system clock can be set arbitrarily.

Production systems that have any sort of security requirement should normally run with a securelevel higher than 0. It's worth noting that even if you immediately promote your system to securelevel 1, you'll still be able to configure everything without difficulty. It's only after flags have been set on something you want to change that the securelevel may interfere. You can install new packages and set flags on the newly installed files without worrying about the securelevel.

2.2.2.2 Level 0: transitional security level

There is no operational difference between secure level 0 and securelevel -1. The system only runs in level 0 briefly during boot time and when it is in single-user mode. Generally systems boot at level 0 and then switch to a higher level after booting, or they boot at -1 and stay there.

Securelevel 0 is unusual in that you can lower the securelevel back to -1 if you want to. If your OpenBSD system normally runs at securelevel 1 or 2 and you want to boot to a lower level just once, you can do so without modifying your configuration file. Boot to single-user mode and you will find your system in securelevel 0. Set the securelevel to -1 by running sysctl kern.securelevel=-1. When the system boots, it will not increase the securelevel to its default.

2.2.2.3 Level 1: improved operational security

At level 1, the kernel imposes stricter security constraints than a traditional Unix system. Write access to raw disk devices (which exist on OpenBSD and FreeBSD 4.x) is denied, even to root-equivalent processes, if the raw device corresponds to a currently mounted filesystem. On FreeBSD, though raw devices don't exist, similar constraints apply. You cannot write to the disk device of the mounted filesystem.

In traditional Unices there are two styles of devices: "raw" and "cooked." The raw devices do not use buffering in the kernel, but instead perform their I/O directly to the device. The kernel mediates access to cooked devices. It might prefetch more data than a program asks for and store the extra in a buffer, or it might buffer data that a program writes until it has enough to make a call to the real device. Raw devices are used by programs like fsck(8), dump(8), mount(8), and newfs(8) to read and write data directly off of disks.

The kernel also enforces the system append-only and system immutable flags (see Section 2.1, earlier in the chapter). The /dev/mem and /dev/kmem devices cannot be opened for writing, which helps protect against rogue processes writing into other processes' memory.

Kernel modules cannot be loaded or unloaded in any level greater than zero, and this might occasionally interfere with your maintenance. The BSD kernels have become increasingly modular in recent years, which helps reduce the amount of RAM the kernel uses. They don't load device drivers for devices you don't have. For example, most dedicated servers rarely mount CD-ROMs. The ISO-9660 filesystem driver, therefore, is not built into the kernel. It is available instead as a kernel module. The first time you try to use a CD-ROM on a system running at securelevel 1 on higher, the mount(8) command will automatically try to load the kernel module and fail. If the ISO-9660 driver is something you need often, you will have to either add a command to the boot process to load the driver at boot time, or compile the driver into the kernel. For dedicated servers, this is rarely an issue after the server is configured the first time.

Just be aware that you cannot add certain module-based functionality to a server with a non-zero securelevel. You have to reconfigure the system so it will boot into a lower securelevel, reboot, and then do the work that requires the kernel module.

2.2.2.4 Level 2: high security

Each level of security includes all the protections of all the lower levels. At level 2, all of the level 1 protections remain, but several are expanded. For OpenBSD and FreeBSD 4.x, no raw disk device can be opened for writing at all once you're in level 2. This means, among other things, that the newfs(8) command cannot be used to create filesystems. Additionally the growfs(8) command will not run at this level. Even though FreeBSD does not use raw devices anymore, these same restrictions are implemented in FreeBSD on the so-called "cooked" devices.

It's interesting to note that mounting an MFS filesystem will fail in securelevel 2 in FreeBSD 4x and in FreeBSD 5.x, but for different reasons. In the first case, it will fail because newfs cannot run. In the latter case, you just can't mount filesystems in securelevel 2.

This protection is selective, so devices such as tapes and network devices are still accessible to processes with root privileges. Additionally, the protections are only for writing to raw disk devices, not reading. If reading from raw devices were restricted, then dump(8) and fsck(8) would not be able to run. Programs like FreeBSD's camcontrol(8) and anything else that tries to directly manipulate the SCSI bus would fail, though. Interestingly mtx(1), a tape library control program available in ports/misc/mtx, will not work at this securelevel, but the built-in chio(1) library control program will. The reason is that mtx tries to use /dev/pass0, a device that directly reads and writes on the SCSI bus, whereas chio uses API calls in the operating system. If direct reads and writes to the SCSI bus were allowed, then disks could be written and manipulated bypassing the securelevel.

Time is also carefully controlled at level 2 and higher. It cannot be adjusted forward or backward by more than 1 second at a time. This restriction on time has two beneficial effects. It makes log entries more trustworthy because the time cannot be modified to make events look like they happened in the future or in the past. Accurate time also affects digital signatures. If any processes on the system use asymmetric cryptography to apply digital signatures (e.g., PGP or S/MIME email), the accuracy of the clock is critical to the validity of the signature.

Restricting time adjustments is not normally a problem when you keep your time synchronized using ntpd(8). At boot time, before the securelevel is set, the system calls ntpdate(8) (or ntpd -q) to adjust the clock however much it needs to be adjusted. Then the system starts the ntpd daemon, to continuously synchronize the local clock with the public time servers. Although your typical Intel-based PC system clocks tend to drift significantly, even the worst can be kept in check by ntpd.

2.2.2.5 Level 3: network security

This level only exists in FreeBSD. The protections it offers in FreeBSD are duplicated in OpenBSD at securelevel 2. At kernel securelevel 3, two network features the firewall rules ipfw(8) and ipfirewall(4) as well as the dummynet(4) traffic shaping parameters become immutable. This is most beneficial when the system is acting as router, performing network address translation, or is some other core network device. It helps prevent an attacker from opening holes in your firewall to allow malicious network traffic through. These are the only differences between level 3 and level 2. If you do not have concerns about firewall rules (for instance, if you do not use the firewalling features), there is no particular value in running in securelevel 3.

2.2.2.6 Setting the securelevel for FreeBSD

The FreeBSD securelevel is controlled by two variables in /etc/rc.conf(5). A typical configuration looks like:

kern_securelevel_enable="YES" kern_securelevel="2"

By default, kern_securelevel_enable is set to NO in /etc/defaults/rc.conf, which causes the system default of 0 to be demoted to -1 at boot. You may find additional details about securelevels in the manpage for init(8).

2.2.2.7 Setting the securelevel for OpenBSD

The OpenBSD securelevel is configured to be 1 by default. To change your default securelevel, edit /etc/rc.securelevel(8) and change the line securelevel=1 to a new value. OpenBSD documents its behavior in the securelevel(7) manpage.

2.2.2.8 Thoughts on using securelevel

Making rc.conf or rc.conf.local immutable in a securelevel greater than 1 is the only way to prevent an attacker who gains control of your system from lowering its security level. The only way to reduce the security level of your system in such a configuration is to interrupt the boot process at the console and enter single-user mode.

Before trying to "make your system go to eleven" on the kernel securelevel, however, think about whether the hassle in maintenance is worth the additional security. We would argue that the additional security is mostly illusory, but the maintenance hassle is absolutely real. The additional security is minimal because your attacker's goal is probably not to reboot the system to a lower securelevel. Instead, she really wants to modify database records, set up back doors into the system, and so forth. If she gets into the system at all, you will need more than directory permissions and securelevel to protect your assets. Don't look at securelevel as a silver bullet that cures all security problems. Security levels can make administration more complicated and time-consuming in addition to making systems safer. Sometimes all the work of raising the securelevel can be bypassed somewhat, too. See the sidebar "Sidestepping Immutability," for an example of bypassing securelevel in FreeBSD 4.x or OpenBSD.

The more volatile your filesystem, the harder it is to run at a high security level all the time. Each time you install software, you will be installing files in potentially sensitive areas. For instance, if you use something like Osiris (http://osiris.shmoo.com/) to capture all the attributes of files on the filesystem, that database will need to be updated after a software install. If that database is stored on the system itself, it will probably be immutable, and the securelevel settings will make it impossible to update it without rebooting into a lower securelevel. Clearly software installation in such an environment requires a deeper depth of planning and staging than a non-securelevel installation. The more sensitive your users are to system reboots, the more planning and staging you'll need.

The above considerations notwithstanding, no server that needs to be secured should omit the kernel securelevel setting. It is one of the distinguishing features that sets FreeBSD and OpenBSD apart from similar free Unix-like operating systems.

2.2.3. Other Security-Related Kernel Variables

Several additional variables are available that each add some small value to securing a server. Not all of them are appropriate to all environments.

2.2.3.1 Random PIDs

A number of exploits, such as race conditions, make use of the fact that process IDs (PIDs) have historically issued sequentially by the operating system. Process ID 1 is assigned to init, and then all the others are issued incrementally. If the server launches the BIND nameserver at boot time, for example, the named(8) process probably gets the same PID (plus or minus a few) every time. Unless something unusual happens and the nameserver is killed, an attacker can count on named having that PID on his target. Of course, every server will be slightly different, but once you learn PIDs for a particular system, they will only change a little each time the server boots. Other processes, such as sendmail(8), fork often. If a process runs with root privileges and it forks, there is a second copy of it (the child process) running with root privileges for some small instant of time. Typically the child sheds its root privileges as quickly as it can, but sometimes the child process is vulnerable to some outside influence like creating a temporary file named the same as one it plans to open. By knowing the PID of the child process and how to influence it, attackers can try to exploit these kinds of race conditions.

FreeBSD allows you to choose to assign PIDs to processes randomly by setting the kern.randompid variable to 1. PID 1 still goes to init, but everything else is random. OpenBSD does not offer a choice for this behavior. It always assigns random PIDs, except to init.

2.2.3.2 Controlling core dumps

When a process crashes, or when it is sent certain unhandled signals, it might "dump core" in an attempt to aid in debugging what went wrong. More often than not, this core file is large and useless for the system administrator. It is a snapshot of the memory the program was using at the time the program crashed, along with various register values like the stack pointer. If you are the program's developer and you have the source code and motivation, you can use the core file to help track down just what was happening when the program crashed.

Most of the time, system administrators and system users have no use for core files. More importantly, such files can inadvertently leak important information to a malicious user. Imagine, for instance, a situation where an attacker can cause the web server process to dump core. Perhaps there is a buffer overflow that causes a segmentation fault, or some kind of unhandled exception. If the web server dumps core while it has, for example, the server's SSL private key unencrypted in memory, that key will be in the core file. While the core file probably will not be dumped somewhere visible in the web hierarchy, the attacker may have some other means of getting the httpd.core file from the web server (e.g., perhaps the attacker has a login account). If he does get the core file, he can rummage through the core file to extract the SSL private key, and now he can decrypt all that server's traffic! A lot of things have to go wrong for such an attack to actually succeed, but you can see the potential.

Table 2-3 describes some sysctl variables that control core dump behavior. In general, core dumps should be turned off unless you have a very specific need for them. Note that two of these are FreeBSD specific, and the last one performs the same role, but is named differently in FreeBSD and OpenBSD.

Table 2-3. Core dump controlling sysctl variables
Variable name	Default	Usage
`kern.coredump` (FreeBSD only)	1	Enables core dumps by programs. If they are enabled, the core file will be named according to the `kern.corefile` template below. This can virtually always be disabled safely.
`kern.corefile` (FreeBSD only)	`%N.core`	Template for core filenames. %N is the name of the program that crashed.
`kern.sugid_coredump` (FreeBSD)	0	Setuid and setgid programs are especially likely to have sensitive information in memory, so they do not dump core by default. You would only enable this if you were actually trying to get a core file from such a program for debugging or forensic purposes.
`kern.nosuidcoredump` (OpenBSD)	1

2.2.3.3 Reducing visibility in the network

Two related variables tweak behaviors in the TCP/IP stack to make a system less visible to probes. This helps you keep a low profile in the presence of automated scanners, and it helps reduce the amount of resources the kernel spends responding to such probes. The variables are net.inet.tcp.blackhole and net.inet.udp.blackhole. To use them, put the following lines in /etc/sysctl.conf:

net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1

Network Scans: What Are They?

When your system is available on the Internet, it will definitely be the subject of one or more probes. If the system is behind a firewall, the firewall can block the majority of the probes. Otherwise, your system will receive a lot of weird traffic that malicious people send. They are trying to figure out what operating system and software you are running and whether there are any known exploits for it. They do this in a variety of ways.

Someone who is scanning usually just picks IP address ranges and starts sending packets to ports on those IP addresses. There are a number of tools such as nmap (http://www.insecure.org/nmap/) that automate this process. Once he knows what ports your server listens to, he will probe those ports more specifically. For instance, he will connect to port 22 to determine if the version of OpenSSH you run is vulnerable to any number of vulnerabilities OpenSSH has had over the years. He may also send packets with the SYN and FIN bits set. Such packets are not used in normal TCP/IP connections, but malicious scanners use them regularly. It turns out that most operating systems are idiosyncratic in how they process such packets, so the system's response can often identify what operating system is running.

You will never completely defeat such scans, since you have to have some connectivity. Giving away as little information as possible, though, is a good practice.

These variables cause the kernel to drop packets when another system attempts to connect to a TCP or UDP port where no process is listening. The normal kernel behavior would be to compose a TCP reset packet, or an ICMP port unreachable message, and send it as a response. When the two "blackhole" sysctl variables are set, the kernel does nothing when it receives connection attempts on non-listening ports. The probing system gains no information about your system. It cannot distinguish a non-listening port from a timeout in the network. This will slow an attacker down some because he will wait some amount of time after each probe packet is sent. His scan will take a lot longer. Turning on these variables also reduces the amount of resources consumed by the kernel's responding to such probes. Any system that is subject to a lot of random port scans or probes should have these variables on. Any system that is connected to the Internet is subject to just such scans and so should have these variables enabled.

2.2.3.4 Dropping "synfins"

A similar variable net.inet.tcp.drop_synfin will cause the kernel to drop all TCP packets that have the SYN and FIN bits set.

See the sidebar "Network Scans: What are They?" for more information on synfin packets and how they are used.

These so-called "synfin packets" are often used by programs like queso and nmap to "fingerprint" an operating system. Some believe that dropping such packets (i.e., not responding to them) violates the TCP specification, but there are vigorous compelling arguments on both sides. Some system and network administrators are not comfortable with dropping them. It is unlikely, however, that failing to respond will have any adverse effect on the operation of your server.

Dropping SYN+FIN packets requires that you change your FreeBSD kernel configuration. Unless you add the option TCP_DROP_SYNFIN statement to your configuration and recompile your kernel, the net.inet.tcp.drop_synfin variable will not be honored. Consult Chapter 9 of the FreeBSD Handbook (O'Reilly) for a thorough walkthrough of kernel configuration and compilation.