Section 4.13. Solaris and Linux APIs

4.13. Solaris and Linux APIs

Linux 2.6 comes with the latest GNU C library. As of this writing, the latest release of the GNU glibc is 2.3.5. As seen in the preceding section, the GNU libc distribution installs several libraries, which may come in both archived and shared form. The main objective of this section is to compare Solaris APIs as documented in the Solaris 10 man pages and GNU libc as referenced through available online manuals, header files, and Linux man pages.

The C library used on every Linux system is GNU libc. Much of the interface of GNU libc has been determined by the history of UNIX and various standards. GNU libc supports most standards that modern UNIX systems support today, such as ISO C and the POSIX standards. GNU libc also supports features of the two major UNIX variants, namely BSD and System V. The list of libraries included in GNU libc can be found in Chapter 3. To learn what version of glibc you have on your installation, run /lib/libc.so (that is, just type the following on the command line):

# ./libc.so.6 GNU C Library stable release version 2.3.3 (20040405), by Roland McGrath et al. Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Configured for i686-suse-linux. Compiled by GNU CC version 3.3.3 (SuSE Linux). Compiled on a Linux 2.6.4 system on 2004-04-05. Available extensions:     GNU libio by Per Bothner     crypt add-on version 2.1 by Michael Glad and others     linuxthreads-0.10 by Xavier Leroy     GNU Libidn by Simon Josefsson     NoVersion patch for broken glibc 2.0 binaries     BIND-8.2.3-T5B     libthread_db work sponsored by Alpha Processor Inc     NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Thread-local storage support included. Report bugs using the 'glibcbug' script to <bugs@gnu.org>.

Table A-3 in appendix A shows Solaris basic library functions and their equivalents on Linux as implemented through the GNU libc distribution.

In the following subsections, we turn our focus to the basic system interfaces, such as memory management and interprocess communication.

4.13.1. Memory Management

Table 4-21 compares library functions that applications use for memory management on Solaris and Linux.

Table 4-21. Memory Management APIs
Solaris	Linux	Description
`mlock`	`mlock`^[*]	Locks pages in memory.
`munlock`	`munlock`	Unlocks pages in memory.
`mlockall`	`mlockall`^[*]	Locks address space.
`munlockall`	`munlockall`	Unlocks address space.
`msync`	`msync`^[*]	Synchronizes memory with physical storage.
`malloc`	`malloc`	Returns a pointer to a block of memory at least as large as the amount of requested memory.
`free`	`free`	Returns the memory.
`calloc`	`calloc`	Returns a pointer to a block of memory that is initialized to 0.
`memalign`	`memalign`^[*]	Allocates a specified number of bytes on a specified alignment boundary.
`valloc`	`valloc`^[*]	Allocates a specified number of bytes that are aligned on a page boundary.
`realloc`	`realloc`	Changes the size of the memory block allocated to a process.

^[*] errno for this function in Solaris is different from what is used in Linux.

4.13.2. Interprocess Communication (IPC)

Here we provide information about a set of techniques that processes can use to communicate with each other. These techniques are collectively called IPC and include pipes, FIFOs, message queues, shared memory, and semaphores.

4.13.2.1. Pipes

A pipe is a technique used to communicate between two threads in a process or between parent and child processes. When a process calls fork, its file descriptors are copied to the new child process. As a result, the parent can communicate with the child. A pipe can be created by calling the pipe function:

#include <unistd.h> int pipe (int fildes [2]);

Pipes in Linux, however, are half duplex. Pipes in Solaris are full duplex. In Linux, the file descriptor fildes[0] is only for reading, and the file descriptor fildes[1] is only for writing.

The popen and pclose functions are both supported in Solaris and Linux:

#include <stdio.h> FILE *popen(const char *command, const char *mode); int pclose(FILE *stream);

The call to popen creates a pipe between the calling process and the program specified by command in the child process. The calling process can either read from or write to the pipe, as specified by the mode argument. The return value is a stream pointer such that one can write to the standard input of the command, if the mode is w, by writing to the file stream, and one can read from the standard output of the command, if the mode is r, by reading from the file stream.

4.13.2.2. FIFOs

A first-in, first-out (FIFO) file (also known as a named pipe) is a pipe that has a name in the filesystem associated with it. This enables the pipe to be open or closed by any process. The processes on either end of the pipe need not be related to each other.

You can create a FIFO programmatically by calling the mkfifo function:

#include <sys/types.h> #include <sys/stat.h> int mkfifo(const char *pathname, mode_t mode);

This function works the same way on both Solaris and Linux. A FIFO can have multiple readers and writers. A write request of PIPE_BUF bytes or less from a writer is guaranteed not to be interleaved with data from other processes. On Linux, PIPE_BUF is 4096 bytes,^[33] whereas it is 5120 bytes on Solaris.^[34]

^[33] /usr/include/linux/limits.h

^[34] /usr/include/limits.h

4.13.2.3. POSIX Messages

Messages enable multiple processes to send formatted data streams to arbitrary processes. On Linux, POSIX messages are supported only when the option _POSIX_MESSAGE_PASSING is defined. You can determine whether your Linux system has the option _POSIX_MESSAGE_PASSING set by issuing the following command:

$ getconf _POSIX_MESSAGE_PASSING 200112

Table 4-22 compares message queue interfaces in Solaris and Linux.

Table 4-22. Comparison Between POSIX Message Queue Interfaces in Solaris and Linux
Solaris Interfaces	Linux Interfaces	Description
`mq_close`	`mq_close`	Closes a message queue.
`mq_getattr`	`mq_getattr`	Gets message queue attributes.
`mq_notify`	`mq_notify`	Notifies the process (or thread) that a message is available on a queue.
`mq_open`	`mq_open`	Opens a message queue.
`mq_receive`	`mq_receive`	Receives a message from a message queue.
`mq_reltimed receive_np`	N/A	Receives a message from a message queue and stops waiting if the specified timeout expires. (Timeout is specified as a relative time interval.)
`mq_reltimedsend_np`	N/A	Sends a message to a message queue and stops blocking on the full message queue if the specified timeout expires. (Timeout is specified as a relative time interval.)
`mq_send`	`mq_send`	Sends a message to a message queue.
`mq_setattr`	`mq_setattr`	Sets/gets message queue attributes.
`mq_timedreceive`	`mq_timedreceive`	Receives a message from a message queue and stops waiting if the specified timeout expires.
`mq_timedsend`	`mq_timedsend`	Sends a message to a message queue and stops blocking on the full message queue if the specified timeout expires.
`mq_unlink`	`mq_unlink`	Removes a message queue.

4.13.2.4. POSIX Semaphores

Similar to POSIX messages, if your Linux system supports POSIX semaphores, you should see the following output:

$getconf _POSIX_SEMAPHORES 200112

Table 4-23 compares POSIX semaphore interfaces in Solaris and Linux.

Table 4-23. Comparison Between POSIX Semaphore Interfaces in Solaris and Linux
Solaris Interfaces	Linux Interfaces	Description
`sem_close`	`sem_close`	Closes a named semaphore.
`sem_destroy`	`sem_destroy`	Destroys an unnamed semaphore.
`sem_getvalue`	`sem_getvalue`	Gets the value of a semaphore.
`sem_init`	`sem_init`	Initializes an unnamed semaphore.
`sem_open`	`sem_open`	Initializes/opens a named semaphore.
`sem_post`	`sem_post`	Increments the count of a semaphore.
`sem_reltimedwait_np`	N/A	Locks a semaphore, but stops waiting when the specified timeout expires. (Timeout is specified as relative time interval.)
`sem_timedwait`	`sem_timedwait`	Locks a semaphore, but stops waiting when the specified timeout expires.
`sem_trywait`	`sem_trywait`	Acquires or waits for a semaphore.
`sem_unlink`	`sem_unlink`	Removes a named semaphore.
`sem_wait`	`sem_wait`	Acquires or waits for a semaphore.

4.13.2.5. POSIX Shared Memory

Table 4-24 compares POSIX shared memory interfaces in Solaris and Linux.

Table 4-24. Comparison Between POSIX Shared Memory Interfaces in Solaris and Linux
Solaris Interfaces	Linux Interfaces	Description
`shm_open`	`shm_open`	Opens a shared memory object. In Linux, using `O_TRUNC` with `O_RDONLY` successfully truncates an existing shared memory object, whereas the result of this combination is undefined in Solaris.
`shm_unlink`	`shm_unlink`	Removes a shared memory object.

On Linux, these two functions are available in glibc version 2.2 and later. To use these functions in your code, you need to use the required real-time library by specifying the -lrt flag with the compiler.

4.13.2.6. System V Messages

Table 4-25 compares System V message interfaces in Solaris and Linux.

Table 4-25. Comparison Between System V Message Interfaces in Solaris and Linux
Solaris	Linux	Description
`msgctl`	`msgctl`	Message control operation
`msgget`	`msgget`	Get message queue
`msgids`	N/A	Discover all message queue identifiers
`msgrcv`	`msgrcv`	Message receive operation
`msgsnap`	N/A	Message queue snapshot operation
`msgsnd`	`msgsnd`	Message send operation

The definition of struct msqid_ds used in the function msgctl() is not the same on Solaris and Linux.

In Solaris, /usr/include/sys/msg.h:

struct msqid_ds {     struct ipc_perm msg_perm;     /* operation permission struct */     struct msg   *msg_first;      /* ptr to first message on q */     struct msg   *msg_last;       /* ptr to last message on q */     msglen_t    msg_cbytes;       /* current # bytes on q */     msgqnum_t    msg_qnum;        /* # of messages on q */     msglen_t    msg_qbytes;       /* max # of bytes on q */     pid_t      msg_lspid;         /* pid of last msgsnd */     pid_t      msg_lrpid;         /* pid of last msgrcv */ #if defined(_LP64)     time_t     msg_stime;         /* last msgsnd time */     time_t     msg_rtime;         /* last msgrcv time */     time_t     msg_ctime;         /* last change time */ #else     time_t     msg_stime;         /* last msgsnd time */     int32_t     msg_pad1;         /* reserved for time_t expansion */     time_t     msg_rtime;         /* last msgrcv time */     int32_t     msg_pad2;         /* time_t expansion */     time_t     msg_ctime;         /* last change time */     int32_t     msg_pad3;         /* time_t expansion */ #endif     short      msg_cv;     short      msg_qnum_cv;     long      msg_pad4[3];   /* reserve area */ };

In Linux, /usr/include/bits/msq.h:

struct msqid_ds {  struct ipc_perm msg_perm;        /* structure describing operation permission */  __time_t msg_stime;              /* time of last msgsnd command */  unsigned long int __unused1;  __time_t msg_rtime;              /* time of last msgrcv command */  unsigned long int __unused2;   __time_t msg_ctime;             /* time of last change */  unsigned long int __unused3; unsigned long int __msg_cbytes;   /* current number of bytes on queue */  msgqnum_t msg_qnum;               /* number of messages currently on queue */  msglen_t msg_qbytes;              /* max number of bytes allowed on queue */  __pid_t msg_lspid;                /* pid of last msgsnd() */  __pid_t msg_lrpid;                /* pid of last msgrcv() */  unsigned long int __unused4;  unsigned long int __unused5; };

The default maximum size of a message queue (msg_qbytes) on Linux is set to the system parameter MSGMNB (16384 bytes). The msg_qbytes value can be raised beyond MSGMNB by using the msgctl function with the appropriate privileges. On Linux, the maximum size for a message text is set to MSGMAX (8192 bytes).

4.13.2.7. System V Semaphores

Table 4-26 compares System V semaphore interfaces in Solaris and Linux.

Table 4-26. Comparison Between System V Semaphore Interfaces in Solaris and Linux
Solaris	Linux	Description
`semctl`	`semctl`^[35]	Semaphore control operation
`semget`	`semget`	Get set of semaphores
`semids`	N/A	Discover all semaphore IDs
`semop`	`semop`	Semaphore operation
`semtimedop`	`semtimedop`	Semaphore operation with time limit

^[35] See the text following the table.

Here is how Linux defines struct semid_ds (which is different from Solaris):

struct semid_ds {  struct ipc_perm sem_perm;       /* operation permission struct */  __time_t sem_otime;          /* last semop() time */  unsigned long int __unused1;  __time_t sem_ctime;          /* last time changed by semctl() */  unsigned long int __unused2;  unsigned long int sem_nsems;     /* number of semaphores in set */  unsigned long int __unused3;  unsigned long int __unused4; };

Linux programmers should define a union like the following to use for the fourth argument of the semctl function:

union semun {  int val;                    /* value for SETVAL */  struct semid_ds *buf;       /* buffer for IPC_STAT & IPC_SET */  unsigned short int *array;  /* array for GETALL & SETALL */  struct seminfo *__buf;      /* buffer for IPC_INFO */ };

4.13.2.8. System V Shared Memory

Table 4-27 compares System V shared memory interfaces in Solaris and Linux.

Table 4-27. Comparison Between System V Shared Memory Interfaces in Solaris and Linux
Solaris	Linux	Description
`shmat`	`shmat` (see comment following this table)	Attach the shared memory segment to the data segment of the calling process
`shmctl`	`shmctl`	Shared memory control
`shmdt`	`shmdt`	Detach the shared memory segment
`shmget`	`shmget`	Get shared memory segment identifier
`shmids`	N/A	Discover all shared memory identifiers

There is a slight difference in the possible values of the third argument in the function shmat.

void *shmat(int shmid, const void *shmaddr, int shmflg);

Table 4-28 compares the value of shmflg available in Solaris and Linux.

Table 4-28. Comparison Between shmflg in Solaris and Linux
shmflg in Solaris	shmflg in Linux	Description
`SHM_RDONLY`	`SHM_RDONLY`	Attach read-only (else read-write)
`SHM_RND`	`SHM_RND`	Round attach address to `SHMLBA`
`SHM_SHARE_MMU`	N/A	Share VM resources such as page table
`SHM_PAGEABLE`	N/A	Share VM resources and the dynamic shared memory framework is created
N/A	`SHM_REMAP`	Take over region on attach

In Linux, you can request a shared memory segment to be supported by large page by specifying SHM_HUGETLB in the shmflg argument of the function shmget.

In Solaris, shared memory segments must be explicitly removed (by using the shmctl function) when there is no reference to them. In Linux, you can specify that the shared memory segment be removed on last detach.

4.13.3. Memory Placement Optimization (MPO)

Solaris's MPO^[36] provides performance improvements on systems in which each CPU accesses some area of memory more quickly than others. This architecture is also known as NUMA (Non-Uniform Memory Access). The essence of the NUMA architecture is the presence of multiple memory subsystems, as opposed to a single one on an SMP system. With MPO, Solaris can recognize the memory locality effects by ensuring that memory is as close as possible to the processors that access it while still maintaining balance in the system to avoid bottlenecks. Solaris also provides several APIs^[37] for developers who want to further optimize application performance through MPO. Some of those APIs include getcpuid, gethomelgroup, lgrp_affinity_get, lgrp_affinity_set, lgrp_children, lgrp_init, lgrp_mem_size, and lgrp_view.

^[36] www.sun.com/software/solaris/performance.jsp

^[37] http://iforce.sun.com/protected/solaris10/adoptionkit/tech/mpo/mpo_man.html

The Linux community has made a tremendous effort to make the Linux kernel NUMA-aware. The 2.6 kernel features NUMA awareness in the scheduler so that the majority of processes execute in the local memory. With the advent of IBM Power5 and AMD Opteron systems, NUMA is becoming more prevalent in the marketplace, used in many servers, entry-level or high-end. For more information about how NUMA is supported in the kernel, consult "Linux on NUMA Systems" (www.kernel.org/pub/linux/kernel/people/mbligh/presentations/OLS2004-numa_paper.pdf). Enterprise-level Linux distributions such as SuSE SLES9 and Red Hat EL4 are preconfigured with NUMA support. Applications do not require any code changes to take advantage of the NUMA support provided in the Linux kernel. If the application binds its processes to a processor, the kernel attempts to allocate memory closest to those processes. However, if programmers would like to take an extra step to further optimize the performance of their applications, they can also use the NUMA API to instruct the kernel where memory should be allocated and how. The Linux NUMA API enables applications to assign specific allocation behaviors (policies) to regions of their own virtual memory space. There is also a user-space numactl utility that controls NUMA policy for processes or shared memory.

4.13.4. vfstab

Solaris filesystem routines employ vfstab structures and contain vfs in the function name, such as getvfsent. Linux provides equivalent interfaces, but the routines use fstab structures and contain fs in the routine name, such as getfsent. vfstab on Solaris is defined in /usr/include/sys/vfstab.h. The definition of fstab on Linux is in /usr/include/fstab.h.

4.13.5. posix_spawn() and posix_spawnp()

In Solaris 10, these calls are implemented as a vfork/exec combination. That is, the fork handlers are not run when posix_spawn or posix_spawnp is called. These two calls have been available in Linux since glibc version 2.2 as user-level implementations. These calls use vfork if the POSIX_SPAWN_USEVFORK flag is set.^[38] You can find more information about posix_spawn and posix_spawnp at www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html.

^[38] The discussion on using vfork in posix_spawn() can also be found at http://sources.redhat.com/bugzilla/show_bug.cgi?id=378.

4.13.6. plock()

The Linux equivalent of the plock function is mlock. The mlock function call disables paging for the memory in the range starting at addr with length len bytes. All pages that contain a part of the specified memory range are guaranteed to be resident in RAM when the mlock call returns successfully.

4.13.7. waitpid()

This call suspends execution of the current process until a child as specified by one of the arguments has exited, or until a signal is delivered whose action is to terminate the current process or to call a signal-handling function. Table 4-29 compares the options available for this call in Solaris and Linux.

Table 4-29. Comparison of the Option for waitpid() Between Solaris and Linux
Solaris	Linux	Description
`WNOHANG`	`WNOHANG`	Return immediately if status is not immediately available for one of the child processes specified by `pid`.
`WUNTRACED`	`WUNtrACED`	The status of any specified child processes that are stopped, and whose status has not yet been reported since they stopped, is also reported to the calling process.
`WCONTINUED`	`WCONTINUED` (since Linux 2.6.10)	Wait for processes continued.
`WNOWAIT`	N/A	Keep the process whose status is returned in a waitable state. The process may be waited for again with the identical result.
N/A	`__WCLONE`	(Linux-specific option) Wait for the clone children only.
N/A	`__WALL`	(Linux-specific option) Wait for all children, regardless of type (clone or nonclone).
N/A	`__WNOTHREAD`	(Linux-specific option) Do not wait for children of other threads in the same thread group.

In Solaris, if the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN and the process has no unwaited children that were transformed into zombie processes, it blocks until all of its children terminate, and waitpid() fails and sets errno to ECHILD. Linux 2.6 conforms to this behavior.

In Linux 2.4, if a wait() or waitpid() call is made while SIGCHLD is being ignored, the call behaves just as though SIGCHLD were not being ignored. That is, the call blocks until the next child terminates and then returns the PID and status of that child.

The waitid() system call (available since Linux 2.6.9) provides better control over which child state changes to wait for.

4.13. Solaris and Linux APIs

4.13.1. Memory Management

Table 4-21. Memory Management APIs

4.13.2. Interprocess Communication (IPC)

4.13.2.1. Pipes

4.13.2.2. FIFOs

4.13.2.3. POSIX Messages

Table 4-22. Comparison Between POSIX Message Queue Interfaces in Solaris and Linux

4.13.2.4. POSIX Semaphores

Table 4-23. Comparison Between POSIX Semaphore Interfaces in Solaris and Linux

4.13.2.5. POSIX Shared Memory

Table 4-24. Comparison Between POSIX Shared Memory Interfaces in Solaris and Linux

4.13.2.6. System V Messages

Table 4-25. Comparison Between System V Message Interfaces in Solaris and Linux

4.13.2.7. System V Semaphores

Table 4-26. Comparison Between System V Semaphore Interfaces in Solaris and Linux

4.13.2.8. System V Shared Memory

Table 4-27. Comparison Between System V Shared Memory Interfaces in Solaris and Linux

Table 4-28. Comparison Between shmflg in Solaris and Linux

4.13.3. Memory Placement Optimization (MPO)

4.13.4. vfstab

4.13.5. posix_spawn() and posix_spawnp()

4.13.6. plock()

4.13.7. waitpid()

Table 4-29. Comparison of the Option for waitpid() Between Solaris and Linux