Flylib.com

Books Software

 
 
 

Appendices


Appendices

 

• Appendix A, "Kernel Virtual Address Maps"

 

• Appendix B, "Adding a System Call to Solaris"

 

• Appendix C, "A Sample Procfs Utility"



Appendix A. Kernel Virtual Address Maps

In this appendix, we illustrate the allocation- and location-specific information for the segments that constitute the Solaris 10 kernel address space.

The kernel address space is represented by the address space pointed to by the system object, kas . The segment drivers manage the manipulation of the segments within the kernel address space. Figure A.1 illustrates the architecture.

Figure A.1. Kernel Address Space and Segments


You can look at the kernel address space with the as seg walker and D command, using the kernel address space pointer. The seg walker will show the kernel address space segments and the ::seg D Command will show detail for each of the segments.

sol9

# mdb -k

>

kas ::walk seg ::seg

SEG             BASE             SIZE              DATA OPS
         1841258          1000000           8f2000                 0 segkmem_ops
         1835878          18f2000           40e000                 0 segkmem_ops
         18358c8         70000000         10000000                 0 segkmem_ops
         1843410         edd00000          2300000                 0 segkmem_ops
         18355d0      2a100000000         1ec20000       300000c8090 segkp_ops
         18357e8      2a750000000          1cbc000       30000451f40 segmap_ops
         1835618      30000000000      1fff8000000                 0 segkmem_ops
         18455f8      50000000000      20000000000                 0 segkmem_ops
         18389f0      70000000000           3da000                 0 segkmem_ops
         1838a38 fffffa0000000000      40000000000       30000464760 segkpm_ops


For more detail, you can then use the ::print D command to print a list of the kernel memory segments structures.

>

kas ::walk seg ::print "struct seg"

{
    s_base = scb
    s_size = 0x8f2000
    s_szc = 0
    s_flags = 0
    s_as = kas
    s_tree = {
        avl_child = [ 0, 0 ]
        avl_pcb = 0x1835899
    }
    s_ops = segkmem_ops
    s_data = 0
}
{
    s_base = 0x18f2000
    s_size = 0x40e000
    s_szc = 0
    s_flags = 0
    s_as = kas
    s_tree = {
        avl_child = [ ktextseg+0x20, kvseg32+0x20 ]
        avl_pcb = 0x1843431
    }
    s_ops = segkmem_ops
    s_data = 0
}
...


The next figures illustrate Solaris 10 address space, as follows :

  • Figure A.2 Solaris 10 sun4u 64-Bit Kernel Address Space

    Figure A.2. Solaris 10 sun4u 64-Bit Kernel Address Space

  • Figure A.3 Solaris 10 amd64 64-Bit Kernel Address Space

    Figure A.3. Solaris 10 amd64 64-Bit Kernel Address Space

  • Figure A.4 Solaris 10 x86 32-Bit Kernel Address Space

    Figure A.4. Solaris 10 x86 32-Bit Kernel Address Space



Appendix B. Adding a System Call to Solaris

Contributed by Eric Schrock

In this appendix, we provide an example of how to add a system call to Solaris.



B.1. Setting Kernel Parameters

For the purposes of this appendix, we will assume that it's a simple system call that lives in the generic kernel code, and we'll put the code into an existing file to avoid having to deal with Makefiles. The goal is to print an arbitrary message to the console whenever the system call is issued.

B.1.1. Picking a Syscall Number

Before writing any real code, we first have to pick a number that will represent our system call. The main source of documentation here is syscall.h, which describes all the available system call numbers , as well as which ones are reserved. The maximum number of syscalls is currently 256 (NSYSCALL), which doesn't leave much space for new ones. This could theoretically be extendedI believe the hard limit is in the size of sysset_t, whose 16 integers must be able to represent a complete bitmask of all system calls. This puts our actual limit at 16*32, or 512, system calls. But for the purposes of this example, we'll pick system call number 56, which is currently unused. For my own amusement , we'll name our system call 'schrock.' So first we add the following line to syscall.h .

#define SYS_uadmin       55
#define SYS_schrock      56
#define SYS_utssys       57

See usr/src/uts/common/sys/syscall.h


B.1.2. Writing the Syscall Handler

Next, we have to actually add the function that will get called when we invoke the system call. What we should really do is add a new file schrock.c to usr/src/uts/common/syscall.c , but instead, we'll just use code from getpid.c.

#include <sys/cmn_err.h>

int
schrock(void *arg)
{
        char    buf[1024];
        size_t  len;

        if (copyinstr(arg, buf, sizeof (buf), &len) != 0)
                return (set_errno(EFAULT));

        cmn_err(CE_WARN, "%s", buf);

        return (0);
}


Note that declaring a buffer of 1024 bytes on the stack is a very bad thing to do in the kernel. We have limited stack space, and a stack overflow will result in a panic. We also don't check that the length of the string was less than our scratch space. But this will suffice for illustrative purposes. The cmn_err() function is the simplest way to display messages from the kernel.

B.1.3. Adding an Entry to the Syscall Table

We need to place an entry in the system call table. This table lives in sysent.c , and makes heavy use of macros to simplify the source. Our system call takes a single argument and returns an integer, so we'll need to use the SYSENT_CI macro. We need to add a prototype for our syscall, and add an entry to the sysent and sysent32 tables.

int     rename();
void    rexit();
int     schrock();
int     semsys();
int     setgid();

/* ... */

        /* 54 */ SYSENT_CI("ioctl",             ioctl,           3),
        /* 55 */ SYSENT_CI("uadmin",            uadmin,          3),
        /* 56 */ SYSENT_CI("schrock",           schrock,         1),
        /* 57 */ IF_LP64(
                        SYSENT_2CI("utssys",    utssys64,        4),
                        SYSENT_2CI("utssys",    utssys32,        4)),

/* ... */

        /* 54 */ SYSENT_CI("ioctl",             ioctl,           3),
        /* 55 */ SYSENT_CI("uadmin",            uadmin,          3),
        /* 56 */ SYSENT_CI("schrock",           schrock,         1),
        /* 57 */ SYSENT_2CI("utssys",           utssys32,        4),

See usr/src/uts/common/os/sysent.c


B.1.4. Updating /etc/name_to_sysnum

At this point, we could write a program to invoke our system call, but the point here is to illustrate everything that needs to be done to integrate a system call, so we can't ignore the little things. One of these little things is /etc/name_to_sysnum , which provides a mapping between system call names and numbers, and is used by dtrace(1M) , truss (1) , and friends . Of course, there is one version for x86 and one for SPARC, so you will have to add the following lines to both the Intel and SPARC versions.

ioctl                    54
uadmin                   55
schrock                  56
utssys                   57
fdsync                   58

See /etc/name_to_sysnum


B.1.5. Updating TRuss(1)

Truss does fancy decoding of system call arguments. In order to do this, we need to maintain a table in truss that describes the type of each argument for every syscall. This table is found in systable.c. Since our syscall takes a single string, we add the following entry:

{"ioctl",       3, DEC, NOV, DEC, IOC, IOA},                     /*  54 */
{"uadmin",      3, DEC, NOV, DEC, DEC, DEC},                     /*  55 */
{"schrock",     1, DEC, NOV, STG},                               /*  56 */
{"utssys",      4, DEC, NOV, HEX, DEC, UTS, HEX},                /*  57 */
{"fdsync",      2, DEC, NOV, DEC, FFG},                          /*  58 */

See usr/src/cmd/truss/systable.c


Don't worry too much about the different constants. But be sure to read up on the truss source code if you're adding a complicated system call.

B.1.6. Updating proc_names.c

This is the file that gets missed the most often when adding a new syscall. Libproc uses the table in proc_names.c to translate between system call numbers and names. Why it doesn't make use of /etc/name_to_sysnum is anybody's guess, but for now you have to update the systable array in this file:

"ioctl",                 /* 54 */
"uadmin",                /* 55 */
"schrock",               /* 56 */
"utssys",                /* 57 */
"fdsync",                /* 58 */

See usr/src/lib/libproc/common/proc_names.c


B.1.7. Putting It All Together

Finally, everything is in place. We can test our system call with a simple program:

#include <sys/syscall.h>

int
main(int argc, char **argv)
{
        syscall(SYS_schrock, "OpenSolaris Rules!");
        return (0);
}


If we run this on our system, we'll see the following output on the console:

June 14 13:42:21 halcyon genunix: WARNING: OpenSolaris Rules!


Because we did all the extra work, we can actually observe the behavior using truss(1) , mdb(1) , or dtrace(1M) .