Section 5.5. The init Thread

5.5. The init Thread

The code found in .../init/main.c is responsible for bringing the kernel to life. After start_kernel() performs some basic kernel initialization, calling early initialization functions explicitly by name, the very first kernel thread is spawned. This thread eventually becomes the kernel thread called init(), with a process id (PID) of 1. As you will learn, init() becomes the parent of all Linux processes in user space. At this point in the boot sequence, two distinct threads are running: that represented by start_kernel() and now init(). The former goes on to become the idle process, having completed its work. The latter becomes the init process. This can be seen in Listing 5-9.

Listing 5-9. Creation of Kernel `init` THRead

static void noinline rest_init(void)         __releases(kernel_lock) {         kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND);         numa_default_policy();         unlock_kernel();         preempt_enable_no_resched();         /*          * The boot idle thread must execute schedule()          * at least one to get things moving:          */         schedule();         cpu_idle(); }

The start_kernel() function calls rest_init(), reproduced in Listing 5-9. The kernel's init process is spawned by the call to kernel_thread().init goes on to complete the rest of the system initialization, while the thread of execution started by start_kernel() loops forever in the call to cpu_idle().

The reason for this structure is interesting. You might have noticed that start_kernel(), a relatively large function, was marked with the __init macro. This means that the memory it occupies will be reclaimed during the final stages of kernel initialization. It is necessary to exit this function and the address space that it occupies before reclaiming its memory. The answer to this was for start_kernel() to call rest_init(), shown in Listing 5-9, a much smaller piece of memory that becomes the idle process.

5.5.1. Initialization via initcalls

When init() is spawned, it eventually calls do_initcalls(), which is the function responsible for calling all the initialization functions registered with the *_initcall family of macros. The code is reproduced in Listing 5-10 in simplified form.

Listing 5-10. Initialization via initcalls

static void __init do_initcalls(void) {     initcall_t *call;     for( call = &__initcall_start; call < &__initcall_end; call++) {         if (initcall_debug) {             printk(KERN_DEBUG "Calling initcall 0x%p", *call);             print_symbol(":%s()", (unsigned long) *call);             printk("\n");         }         (*call)(); }

This code is self-explanatory, except for the two labels marking the loop boundaries: __initcall_start and __initcall_end. These labels are not found in any C source or header file. They are defined in the linker script file used during the link stage of vmlinux. These labels mark the beginning and end of the list of initialization functions populated using the *_initcall family of macros. You can see each of the labels by looking at the System.map file in the top-level kernel directory. They all begin with the string __initcall, as described in Listing 5-8.

In case you were wondering about the debug print statements in do_initcalls(), you can watch these calls being executed during bootup by setting the kernel command line parameter initcall_debug. This command line parameter enables the printing of the debug information shown in Listing 5-10. Simply start your kernel with the kernel command line parameter initcall_debug to enable this diagnostic output.^[9]

^[9] You might have to lower the default loglevel on your system to see these debug messages. This is described in many references about Linux system administration. In any case, you should see them in the kernel log file.

Here is an example of what you will see when you enable these debug statements:

... Calling initcall 0xc00168f4: tty_class_init+0x0/0x3c() Calling initcall 0xc000c32c: customize_machine+0x0/0x2c() Calling initcall 0xc000c4f0: topology_init+0x0/0x24() Calling initcall 0xc000e8f4: coyote_pci_init+0x0/0x20() PCI: IXP4xx is host PCI: IXP4xx Using direct access for memory space ...

Notice the call to customize_machine(), the example of Listing 5-7. The debug output includes the virtual kernel address of the function (0xc000c32c, in this case) and the size of the function (0x2c here.) This is a useful way to see the details of kernel initialization, especially the order in which various subsystems and modules get called. Even on a modestly configured embedded system, dozens of these initialization functions are invoked in this manner. In this example taken from an ARM XScale embedded target, there are 92 such calls to various kernel-initialization routines.

5.5.2. Final Boot Steps

Having spawned the init() thread and all the various initialization calls have completed, the kernel performs its final steps in the boot sequence. These include freeing the memory used by the initialization functions and data, opening a system console device, and starting the first userspace process. Listing 5-11 reproduces the last steps in the kernel's init() from main.c.

Listing 5-11. Final Kernel Boot Steps from `main.c`

if (execute_command) {       run_init_process(execute_command);       printk(KERN_WARNING "Failed to execute %s.  Attempting "                           "defaults...\n", execute_command); } run_init_process("/sbin/init"); run_init_process("/etc/init"); run_init_process("/bin/init"); run_init_process("/bin/sh"); panic("No init found.  Try passing init= option to kernel.");

Notice that if the code proceeds to the end of the init() function, a kernel panic results. If you've spent any time experimenting with embedded systems or custom root file systems, you've undoubtedly encountered this very common error message as the last line of output on your console. It is one of the most frequently asked questions (FAQs) on a variety of public forums related to Linux and embedded systems.

One way or another, one of these run_init_process() commands must proceed without error. The run_init_process() function does not return on successful invocation. It overwrites the calling process with the new one, effectively replacing the current process with the new one. It uses the familiar execve() system call for this functionality. The most common system configurations spawn /sbin/init as the userland^[10] initialization process. We study this functionality in depth in the next chapter.

^[10] Userland is an often-used term for any program, library, script, or anything else in user space.

One option available to the embedded system developer is to use a custom userland initialization program. That is the purpose of the conditional statement in the previous code snippet. If execute_command is non-null, it points to a string containing a custom user-supplied command to be executed in user space. The developer specifies this command on the kernel command line, and it is set via the __setup macro we examined earlier in this chapter. An example kernel command line incorporating several concepts discussed in this chapter might look like this:

initcall_debug init=/sbin/myinit console=ttyS1,115200 root=/dev/hda1

This kernel command line instructs the kernel to display all the initialization routines as encountered, configures the initial console device as /dev/ttyS1 at 115 kbps, and executes a custom user space initialization process called myinit, located in the /sbin directory on the root file system. It directs the kernel to mount its root file system from the device /dev/hda1, which is the first IDE hard drive. Note that, in general, the order of parameters given on the kernel command line is irrelevant. The next chapter covers the details of user space system initialization.