4.13. Solaris and Linux APIsLinux 2.6 comes with the latest GNU C library. As of this writing, the latest release of the GNU glibc is 2.3.5. As seen in the preceding section, the GNU libc distribution installs several libraries, which may come in both archived and shared form. The main objective of this section is to compare Solaris APIs as documented in the Solaris 10 man pages and GNU libc as referenced through available online manuals, header files, and Linux man pages. The C library used on every Linux system is GNU libc. Much of the interface of GNU libc has been determined by the history of UNIX and various standards. GNU libc supports most standards that modern UNIX systems support today, such as ISO C and the POSIX standards. GNU libc also supports features of the two major UNIX variants, namely BSD and System V. The list of libraries included in GNU libc can be found in Chapter 3. To learn what version of glibc you have on your installation, run /lib/libc.so (that is, just type the following on the command line): # ./libc.so.6 GNU C Library stable release version 2.3.3 (20040405), by Roland McGrath et al. Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Configured for i686-suse-linux. Compiled by GNU CC version 3.3.3 (SuSE Linux). Compiled on a Linux 2.6.4 system on 2004-04-05. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.10 by Xavier Leroy GNU Libidn by Simon Josefsson NoVersion patch for broken glibc 2.0 binaries BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Thread-local storage support included. Report bugs using the 'glibcbug' script to <bugs@gnu.org>. Table A-3 in appendix A shows Solaris basic library functions and their equivalents on Linux as implemented through the GNU libc distribution. In the following subsections, we turn our focus to the basic system interfaces, such as memory management and interprocess communication. 4.13.1. Memory ManagementTable 4-21 compares library functions that applications use for memory management on Solaris and Linux.
4.13.2. Interprocess Communication (IPC)Here we provide information about a set of techniques that processes can use to communicate with each other. These techniques are collectively called IPC and include pipes, FIFOs, message queues, shared memory, and semaphores. 4.13.2.1. PipesA pipe is a technique used to communicate between two threads in a process or between parent and child processes. When a process calls fork, its file descriptors are copied to the new child process. As a result, the parent can communicate with the child. A pipe can be created by calling the pipe function: #include <unistd.h> int pipe (int fildes [2]); Pipes in Linux, however, are half duplex. Pipes in Solaris are full duplex. In Linux, the file descriptor fildes[0] is only for reading, and the file descriptor fildes[1] is only for writing. The popen and pclose functions are both supported in Solaris and Linux: #include <stdio.h> FILE *popen(const char *command, const char *mode); int pclose(FILE *stream); The call to popen creates a pipe between the calling process and the program specified by command in the child process. The calling process can either read from or write to the pipe, as specified by the mode argument. The return value is a stream pointer such that one can write to the standard input of the command, if the mode is w, by writing to the file stream, and one can read from the standard output of the command, if the mode is r, by reading from the file stream. 4.13.2.2. FIFOsA first-in, first-out (FIFO) file (also known as a named pipe) is a pipe that has a name in the filesystem associated with it. This enables the pipe to be open or closed by any process. The processes on either end of the pipe need not be related to each other. You can create a FIFO programmatically by calling the mkfifo function: #include <sys/types.h> #include <sys/stat.h> int mkfifo(const char *pathname, mode_t mode); This function works the same way on both Solaris and Linux. A FIFO can have multiple readers and writers. A write request of PIPE_BUF bytes or less from a writer is guaranteed not to be interleaved with data from other processes. On Linux, PIPE_BUF is 4096 bytes,[33] whereas it is 5120 bytes on Solaris.[34]
4.13.2.3. POSIX MessagesMessages enable multiple processes to send formatted data streams to arbitrary processes. On Linux, POSIX messages are supported only when the option _POSIX_MESSAGE_PASSING is defined. You can determine whether your Linux system has the option _POSIX_MESSAGE_PASSING set by issuing the following command: $ getconf _POSIX_MESSAGE_PASSING 200112 Table 4-22 compares message queue interfaces in Solaris and Linux.
4.13.2.4. POSIX SemaphoresSimilar to POSIX messages, if your Linux system supports POSIX semaphores, you should see the following output: $getconf _POSIX_SEMAPHORES 200112 Table 4-23 compares POSIX semaphore interfaces in Solaris and Linux.
4.13.2.5. POSIX Shared MemoryTable 4-24 compares POSIX shared memory interfaces in Solaris and Linux.
On Linux, these two functions are available in glibc version 2.2 and later. To use these functions in your code, you need to use the required real-time library by specifying the -lrt flag with the compiler. 4.13.2.6. System V MessagesTable 4-25 compares System V message interfaces in Solaris and Linux.
The definition of struct msqid_ds used in the function msgctl() is not the same on Solaris and Linux. In Solaris, /usr/include/sys/msg.h: struct msqid_ds { struct ipc_perm msg_perm; /* operation permission struct */ struct msg *msg_first; /* ptr to first message on q */ struct msg *msg_last; /* ptr to last message on q */ msglen_t msg_cbytes; /* current # bytes on q */ msgqnum_t msg_qnum; /* # of messages on q */ msglen_t msg_qbytes; /* max # of bytes on q */ pid_t msg_lspid; /* pid of last msgsnd */ pid_t msg_lrpid; /* pid of last msgrcv */ #if defined(_LP64) time_t msg_stime; /* last msgsnd time */ time_t msg_rtime; /* last msgrcv time */ time_t msg_ctime; /* last change time */ #else time_t msg_stime; /* last msgsnd time */ int32_t msg_pad1; /* reserved for time_t expansion */ time_t msg_rtime; /* last msgrcv time */ int32_t msg_pad2; /* time_t expansion */ time_t msg_ctime; /* last change time */ int32_t msg_pad3; /* time_t expansion */ #endif short msg_cv; short msg_qnum_cv; long msg_pad4[3]; /* reserve area */ }; In Linux, /usr/include/bits/msq.h: struct msqid_ds { struct ipc_perm msg_perm; /* structure describing operation permission */ __time_t msg_stime; /* time of last msgsnd command */ unsigned long int __unused1; __time_t msg_rtime; /* time of last msgrcv command */ unsigned long int __unused2; __time_t msg_ctime; /* time of last change */ unsigned long int __unused3; unsigned long int __msg_cbytes; /* current number of bytes on queue */ msgqnum_t msg_qnum; /* number of messages currently on queue */ msglen_t msg_qbytes; /* max number of bytes allowed on queue */ __pid_t msg_lspid; /* pid of last msgsnd() */ __pid_t msg_lrpid; /* pid of last msgrcv() */ unsigned long int __unused4; unsigned long int __unused5; }; The default maximum size of a message queue (msg_qbytes) on Linux is set to the system parameter MSGMNB (16384 bytes). The msg_qbytes value can be raised beyond MSGMNB by using the msgctl function with the appropriate privileges. On Linux, the maximum size for a message text is set to MSGMAX (8192 bytes). 4.13.2.7. System V SemaphoresTable 4-26 compares System V semaphore interfaces in Solaris and Linux.
Here is how Linux defines struct semid_ds (which is different from Solaris): struct semid_ds { struct ipc_perm sem_perm; /* operation permission struct */ __time_t sem_otime; /* last semop() time */ unsigned long int __unused1; __time_t sem_ctime; /* last time changed by semctl() */ unsigned long int __unused2; unsigned long int sem_nsems; /* number of semaphores in set */ unsigned long int __unused3; unsigned long int __unused4; }; Linux programmers should define a union like the following to use for the fourth argument of the semctl function: union semun { int val; /* value for SETVAL */ struct semid_ds *buf; /* buffer for IPC_STAT & IPC_SET */ unsigned short int *array; /* array for GETALL & SETALL */ struct seminfo *__buf; /* buffer for IPC_INFO */ }; 4.13.2.8. System V Shared MemoryTable 4-27 compares System V shared memory interfaces in Solaris and Linux.
There is a slight difference in the possible values of the third argument in the function shmat. void *shmat(int shmid, const void *shmaddr, int shmflg); Table 4-28 compares the value of shmflg available in Solaris and Linux.
In Linux, you can request a shared memory segment to be supported by large page by specifying SHM_HUGETLB in the shmflg argument of the function shmget. In Solaris, shared memory segments must be explicitly removed (by using the shmctl function) when there is no reference to them. In Linux, you can specify that the shared memory segment be removed on last detach. 4.13.3. Memory Placement Optimization (MPO)Solaris's MPO[36] provides performance improvements on systems in which each CPU accesses some area of memory more quickly than others. This architecture is also known as NUMA (Non-Uniform Memory Access). The essence of the NUMA architecture is the presence of multiple memory subsystems, as opposed to a single one on an SMP system. With MPO, Solaris can recognize the memory locality effects by ensuring that memory is as close as possible to the processors that access it while still maintaining balance in the system to avoid bottlenecks. Solaris also provides several APIs[37] for developers who want to further optimize application performance through MPO. Some of those APIs include getcpuid, gethomelgroup, lgrp_affinity_get, lgrp_affinity_set, lgrp_children, lgrp_init, lgrp_mem_size, and lgrp_view.
The Linux community has made a tremendous effort to make the Linux kernel NUMA-aware. The 2.6 kernel features NUMA awareness in the scheduler so that the majority of processes execute in the local memory. With the advent of IBM Power5 and AMD Opteron systems, NUMA is becoming more prevalent in the marketplace, used in many servers, entry-level or high-end. For more information about how NUMA is supported in the kernel, consult "Linux on NUMA Systems" (www.kernel.org/pub/linux/kernel/people/mbligh/presentations/OLS2004-numa_paper.pdf). Enterprise-level Linux distributions such as SuSE SLES9 and Red Hat EL4 are preconfigured with NUMA support. Applications do not require any code changes to take advantage of the NUMA support provided in the Linux kernel. If the application binds its processes to a processor, the kernel attempts to allocate memory closest to those processes. However, if programmers would like to take an extra step to further optimize the performance of their applications, they can also use the NUMA API to instruct the kernel where memory should be allocated and how. The Linux NUMA API enables applications to assign specific allocation behaviors (policies) to regions of their own virtual memory space. There is also a user-space numactl utility that controls NUMA policy for processes or shared memory. 4.13.4. vfstabSolaris filesystem routines employ vfstab structures and contain vfs in the function name, such as getvfsent. Linux provides equivalent interfaces, but the routines use fstab structures and contain fs in the routine name, such as getfsent. vfstab on Solaris is defined in /usr/include/sys/vfstab.h. The definition of fstab on Linux is in /usr/include/fstab.h. 4.13.5. posix_spawn() and posix_spawnp()In Solaris 10, these calls are implemented as a vfork/exec combination. That is, the fork handlers are not run when posix_spawn or posix_spawnp is called. These two calls have been available in Linux since glibc version 2.2 as user-level implementations. These calls use vfork if the POSIX_SPAWN_USEVFORK flag is set.[38] You can find more information about posix_spawn and posix_spawnp at www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html.
4.13.6. plock()The Linux equivalent of the plock function is mlock. The mlock function call disables paging for the memory in the range starting at addr with length len bytes. All pages that contain a part of the specified memory range are guaranteed to be resident in RAM when the mlock call returns successfully. 4.13.7. waitpid()This call suspends execution of the current process until a child as specified by one of the arguments has exited, or until a signal is delivered whose action is to terminate the current process or to call a signal-handling function. Table 4-29 compares the options available for this call in Solaris and Linux.
In Solaris, if the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN and the process has no unwaited children that were transformed into zombie processes, it blocks until all of its children terminate, and waitpid() fails and sets errno to ECHILD. Linux 2.6 conforms to this behavior. In Linux 2.4, if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored. That is, the call blocks until the next child terminates and then returns the PID and status of that child. The waitid() system call (available since Linux 2.6.9) provides better control over which child state changes to wait for. |