Section 25.1. Extending Python with Python s C API

25.1. Extending Python with Python's C API

A Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows; x.so on most Unix-like platforms) in an appropriate directory (often the site-packages subdirectory of the Python library directory). You generally build the x extension module from a C source file x.c whose the overall structure is:

 #include <Python.h> /* omitted: the body of the x module */ void initx(void) {     /* omitted: the code that initializes the module named x */ }

When you have built and installed the extension module, a Python statement import x loads the dynamic library, then locates and calls the function named initx, which must do all that is needed to initialize the module object named x.

25.1.1. Building and Installing C-Coded Python Extensions

To build and install a C-coded Python extension module, it's simplest and most productive to use the distribution utilities, distutils, covered in "The Distribution Utilities (distutils)" on page 150. In the same directory as x.c, place a file named setup.py that contains the following statements:

 from distutils.core import setup, Extension setup(name='x', ext_modules=[ Extension('x', sources=['x.c']) ])

From a shell prompt in this directory, you can now run:

 C:\> python setup.py install

to build the module and install it so that it becomes usable in your Python installation. distutils performs all needed compilation and linking steps, with the right compiler and linker commands and flags, and copies the resulting dynamic library into an appropriate directory, dependent on your Python installation (depending on that installation's details, you may need to have administrator or super-user privileges for the installation; for example, on a Mac or Linux, you may need to run sudo python setup.py install). Your Python code can then access the resulting module with the statement import x.

25.1.1.1. The C compiler you need

To compile C-coded extensions to Python, you normally need the same C compiler that was used to build the Python version you want to extend. For most platforms, this usually means the free gcc compiler that normally comes with your platform or can be freely downloaded for it. On the Macintosh, gcc comes with Apple's free XCode (a.k.a. Developer Tools) Integrated Development Environment (IDE).

For Windows, you normally need the Microsoft product known as Visual Studio 7.1 (a.k.a. Visual Studio 2003). However, it may be possible to compile C-coded extensions on Windows without having to purchase that Microsoft product. At http://www.vrplumber.com/programming/mstoolkit/, you will find instructions that show how to perform this task by downloading, installing, and configuring five other Microsoft components, ones that can be downloaded without paying license fees. Unfortunately, at the time of this writing, the freely downloadable Microsoft Visual Studio 2005 is not suitable for compiling extensions for the standard distributions of Python for Windows, which (for both Python 2.4 and 2.5) are compiled with Visual Studio 2003.

25.1.1.2. Compatibility of C-coded extensions among Python versions

In general, a C-coded extension compiled to run with one version of Python is not guaranteed to run with another. For example, a version compiled for Python 2.4 is only certain to run with 2.4, not with 2.3 nor 2.5. On some platforms, such as Windows, you cannot even try to run an extension with a different version of Python; on others, such as Linux or Mac OS X, a given extension may happen to work right on more than one version of Python, but you will at least get a warning when the module is imported, and the most prudent course is to heed the warning and recompile the extension appropriately.

At a C-source level, on the other hand, compatibility is almost always preserved. One exception is with the new Python 2.5, in which many values that used to be C ints are now of type Py_ssize_tequivalent to int on 32-bit platforms, but a 64-bit signed integer (specifically, the signed equivalent of size_t) on 64-bit platforms. This C API change lets you address more than two billion items in a Python 2.5 sequence on a 64-bit platform, and makes no difference on 32-bit platforms. If your C-coded extension, originally developed and tested under previous versions of Python, produces errors and warnings from the C compiler when you recompile your sources for Python 2.5, the cause is almost certainly this: you need (with the help of the errors and warnings from the C compiler) to find and change the occurrences of int that must become Py_ssize_t. A simple checking tool to make this process easier can be freely downloaded from http://svn.effbot.python-hosting.com/stuff/sandbox/python/ssizecheck.py. To ensure that your extension remains compilable for Python 2.4 and earlier, as well as becoming correct for Python 2.5 on 64-bit machines, insert, early in your source files, the lines:

 #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #endif

25.1.2. Overview of C-Coded Python Extension Modules

Your C function initx generally has the following overall structure:

 void initx(void) {     PyObject* thismod = Py_InitModule3("x", x_methods, "docstring for x");     /* optional: calls to PyModule_AddObject(thismod, "somename", someobj)        and other Python   C API calls to finish preparing module object        thismod and its types (if any) and other objects.     */ }

More details are covered in "Module Initialization" on page 617. x_methods is an array of PyMethodDef structs. Each PyMethodDef struct in the x_methods array describes a C function that your module x makes available to Python code that imports x. Each such C function has the following overall structure:

 static PyObject* func_with_named_arguments(PyObject* self, PyObject* args, PyObject* kwds) {     /* omitted: body of function, accessing arguments via the Python C        API function PyArg_ParseTupleAndKeywords, returning a PyObject*        result, NULL for errors */ }

or some slightly simpler variant, such as:

 static PyObject* func_with_positional_args_only(PyObject* self, PyObject* args) {     /* omitted: body of function, accessing arguments via the Python C        API function PyArg_ParseTuple, returning a PyObject* result,        NULL for errors */ }

How C-coded functions access arguments passed by Python code is covered in "Accessing Arguments" on page 621. How such functions build Python objects is covered in "Creating Python Values" on page 624, and how they raise or propagate exceptions back to the Python code that called them is covered in "Exceptions" on page 625. When your module defines new Python types (as well as or instead of Python-callable functions), your C code defines one or more instances of struct PyTypeObject. This subject is covered in "Defining New Types" on page 638.

A simple example that makes use of all these concepts is shown in "A Simple Extension Example" on page 635. A toy-level "Hello World" example could be as simple as:

 #include <Python.h> static PyObject* helloworld(PyObject* self) {     return Py_BuildValue("s", "Hello, C-coded Python extensions world!"); } static char helloworld_docs[] =     "helloworld( ): return a popular greeting phrase\n"; static PyMethodDef helloworld_funcs[] = {     {"helloworld", (PyCFunction)helloworld, METH_NOARGS, helloworld_docs},     {NULL} }; void inithelloworld(void) {     Py_InitModule3("helloworld", helloworld_funcs,                    "Toy-level extension module"); }

Save this as helloworld.c and build it through a setup.py script with distutils. After you have run python setup.py install, you can use the newly installed modulefor example, from a Python interactive sessionsuch as:

 >>> import helloworld >>> print helloworld.helloworld( )

Hello, C-coded Python extensions world!

>>>

25.1.3. Return Values of Python's C API Functions

All functions in the Python C API return either an int or a PyObject*. Most functions returning int return 0 in case of success and -1 to indicate errors. Some functions return results that are true or false: these functions return 0 to indicate false and an integer not equal to 0 to indicate true, and never indicate errors. Functions returning PyObject* return NULL in case of errors. See "Exceptions" on page 625 for more details on how C-coded functions handle and raise errors.

25.1.4. Module Initialization

Function initx must contain, at a minimum, a call to one of the module initialization functions supplied by the C API. You can always use the Py_InitModule3 function.

Py_InitModule3	`PyObject* Py_InitModule3(char* name,PyMethodDef* methods,char* doc)` `name` is the C string name of the module you are initializing (e.g., `"name"`). `methods` is an array of `PyMethodDef` structures, covered in "The PyMethodDef structure" on page 619. `doc` is the C string that becomes the docstring of the module. `Py_InitModule3` returns a `PyObject` that is a borrowed reference to the new module object, as covered in "Reference Counting" on page 620. In practice, this means that you can ignore the return value if you need to perform no more initialization operations on this module. Otherwise, assign the return value to a C variable of type `PyObject` and continue initialization. `Py_InitModule3` initializes the module object to contain the functions described in table `methods`. Further initialization, if any, may add other module attributes and is generally best performed with calls to the following convenience functions.
PyModule_AddIntConstant	`int PyModule_AddIntConstant(PyObject* module,char* name,int value)` Adds to module `module` an attribute named `name` with integer value `value`.
PyModule_AddObject	`int PyModule_AddObject(PyObject* module,char* name,PyObject* value)` Adds to module `module` an attribute named `name` with value `value` and steals a reference to value, as covered in "Reference Counting" on page 620.
PyModule_AddStringCon-stant	`int PyModule_AddStringConstant(PyObject* module,char* name,char* value)` Adds to module `module` an attribute named `name` with string value `value`. Some module initialization operations may be conveniently performed by executing Python code with `PyRun_String` (covered in "PyRun_String" on page 649) with the module's dictionary as both the `globals` and `locals` argument. If you find yourself using `PyRun_String` extensively, rather than just as an occasional convenience, consider the possibility of splitting your extension module in two: a C-coded extension module that offers raw, fast functionality, and a Python module that wraps the C-coded extension to provide further convenience and handy utilities. When you do need to get a module's dictionary, use the `PyModule_GetDict` function.
PyModule_GetDict	`PyObject* PyModule_GetDict(PyObject* module)` Returns a borrowed reference to the dictionary of module `module`. You should not use `PyModule_GetDict` for the specific tasks supported by `PyModule_Add` functions (as covered in "PyModule_ AddObject" on page 618): use `PyModule_GetDict` only for such purposes as supporting the use of `PyRun_String`. If you need to access another module, you can import it by calling the `PyImport_Import` function.
PyImport_Import	`PyObject* PyImport_Import(PyObject* name)` Imports the module named in Python string object `name` and returns a new reference to the module object, like Python's `_ _import_ _(name). PyImport_Import` is the highest-level, simplest, and most often used way to import a module. Beware, in particular, of using function `PyImport_ImportModule`, which may often look more convenient because it accepts a `char*` argument. `PyImport_ImportModule` operates on a lower level, bypassing any import hooks that may be in force, so extensions that use it will be far harder to incorporate in packages such as those built by tools `py2exe` and `cxFreeze`, covered in "py2exe" and "cxFreeze". Always do any needed importing by calling `PyImport_Import`, unless you have very specific needs and know exactly what you're doing.

25.1.4.1. The PyMethodDef structure

To add functions to a module (or nonspecial methods to new types, as covered in "Defining New Types" on page 638), you must describe the functions or methods in an array of PyMethodDef structures and terminate the array with a sentinel (i.e., a structure whose fields are all 0 or NULL). PyMethodDef is defined as follows:

 typedef struct {     char* ml_name;        /* Python name of function or method */     PyCFunction ml_meth;  /* pointer to C function impl */     int ml_flags;         /* flag describing how to pass arguments */     char* ml_doc;         /* docstring for the function or method */ } PyMethodDef

You must cast the second field to (PyCFunction) unless the C function's signature is exactly PyObject* function(PyObject* self, PyObject* args), which is the typedef for PyCFunction. This signature is correct when ml_flags is METH_O, which indicates a function that accepts a single argument, or METH_VARARGS, which indicates a function that accepts positional arguments. For METH_O, args is the only argument. For METH_VARARGS, args is a tuple of all arguments, to be parsed with the C API function PyArg_ParseTuple. However, ml_flags can also be METH_NOARGS, which indicates a function that accepts no arguments, or METH_KEYWORDS, which indicates a function that accepts both positional and named arguments. For METH_NOARGS, the signature is PyObject* function(PyObject* self), without arguments. For METH_KEYWORDS, the signature is:

 PyObject* function(PyObject* self, PyObject* args, PyObject* kwds)

args is the tuple of positional arguments, and kwds is the dictionary of named arguments; both are parsed with the C API function PyArg_ParseTupleAndKeywords. In these cases, you do need to cast the second field to (PyCFunction).

When a C-coded function implements a module's function, the self parameter of the C function is always NULL for any value of the ml_flags field. When a C-coded function implements a nonspecial method of an extension type, the self parameter points to the instance on which the method is being called.

25.1.5. Reference Counting

Python objects live on the heap, and C code sees them as pointers of type PyObject*. Each PyObject counts how many references to itself are outstanding and destroys itself when the number of references goes down to 0. To make this possible, your code must use Python-supplied macros: Py_INCREF to add a reference to a Python object and Py_DECREF to abandon a reference to a Python object. The Py_XINCREF and Py_XDECREF macros are like Py_INCREF and Py_DECREF, but you may also use them innocuously on a null pointer. The test for a non-null pointer is implicitly performed inside the Py_XINCREF and Py_XDECREF macros, which saves you from needing to write out that test explicitly when you don't know whether the pointer might be null.

A PyObject* p, which your code receives by calling or being called by other functions, is known as a new reference if the code that supplies p has already called Py_INCREF on your behalf. Otherwise, it is known as a borrowed reference. Your code is said to own new references it holds, but not borrowed ones. You can call Py_INCREF on a borrowed reference to make it into a reference that you own; you must do this if you need to use the reference across calls to code that might cause the count of the reference you borrowed to be decremented. You must always call Py_DECREF before abandoning or overwriting references that you own, but never on references you don't own. Therefore, understanding which interactions transfer reference ownership and which rely on reference borrowing is absolutely crucial. For most functions in the C API, and for all functions that you write and Python calls, the following general rules apply:

PyObject* arguments are borrowed references.
A PyObject* returned as the function's result transfers ownership.

For each of the two rules, there are a few exceptions for some functions in the C API. PyList_SetItem and PyTuple_SetItem steal a reference to the item they are setting (but not to the list or tuple object into which they're setting it). So do the faster versions of these two functions that exist as C preprocessor macros, PyList_SET_ITEM and PyTuple_SET_ITEM. So does PyModule_AddObject, covered in "PyModule_ AddObject" on page 618. There are no other exceptions to the first rule. The rationale for these exceptions, which may help you remember them, is that the object you're setting is most often one you created for the purpose, so the reference-stealing semantics save you from having to call Py_DECREF immediately afterward.

The second rule has more exceptions than the first one. There are several cases in which the returned PyObject* is a borrowed reference rather than a new reference. The abstract functions, whose names begin with PyObject_, PySequence_, PyMapping_, and PyNumber_, return new references. This is because you can call them on objects of many types, and there might not be any other reference to the resulting object that they return (i.e., the returned object might have to be created on the fly). The concrete functions, whose names begin with PyList_, PyTuple_, PyDict_, and so on, return a borrowed reference when the semantics of the object they return ensure that there must be some other reference to the returned object somewhere.

In this chapter, I indicate all cases of exceptions to these rules (i.e., the return of borrowed references and the rare cases of reference stealing from arguments) regarding all functions that I cover. When I don't explicitly mention a function as being an exception, it means that the function follows the rules: its PyObject* arguments, if any, are borrowed references, and its PyObject* result, if any, is a new reference.

25.1.6. Accessing Arguments

A function that has ml_flags in its PyMethodDef set to METH_NOARGS is called from Python with no arguments. The corresponding C function has a signature with only one argument, self. When ml_flags is METH_O, Python code must call the function with exactly one argument. The C function's second argument is a borrowed reference to the object that the Python caller passes as the argument's value.

When ml_flags is METH_VARARGS, Python code can call the function with any number of positional arguments, which the Python interpreter implicitly collects into a tuple. The C function's second argument is a borrowed reference to the tuple. Your C code can then call the PyArg_ParseTuple function.

PyArg_ParseTuple

int PyArg_ParseTuple(PyObject* tuple,char* format,...)

Returns 0 for errors, and a value not equal to 0 for success. tuple is the PyObject* that was the C function's second argument. format is a C string that describes mandatory and optional arguments. The following arguments of PyArg_ParseTuple are addresses of C variables in which to put the values extracted from the tuple. Any PyObject* variables among the C variables are borrowed references. Table 25-1 lists the commonly used code strings, of which zero or more are joined to form string format.

Table 25-1. Format codes for PyArg_ParseTuple
Code	C type	Meaning
`c`	`char`	A Python string of length `1` becomes a C `char`.
`d`	`double`	A Python `float` becomes a C `double`.
`D`	`Py_Complex`	A Python `complex` becomes a C `Py_Complex`.
`f`	`float`	A Python `float` becomes a C `float`.
`i`	`int`	A Python `int` becomes a C `int`.
`l`	`long`	A Python `int` becomes a C `long`.
`L`	`long long`	A Python `int` becomes a C `long long` (`_ _int64` on Windows).
`O`	`PyObject*`	Gets non-`NULL` borrowed reference to Python argument.
`O!`	`type+PyObject*`	Like code `O`, plus type checking (see below).
`O&`	`convert+void*`	Arbitrary conversion (see below).
`s`	`char*`	Python string without embedded nulls to C `char*`.
`s#`	`char*+int`	Any Python string to C address and length.
`t#`	`char*+int`	Read-only single-segment buffer to C address and length.
`u`	`Py_UNICODE*`	Python Unicode without embedded nulls to C.
`u#`	`Py_UNICODE*+int`	Any Python Unicode C address and length.
`w#`	`char*+int`	Read/write single-segment buffer to C address and length.
`z`	`char*`	Like `s`, also accepts `None` (sets C `char*` to `NULL`).
`z#`	`char*+int`	Like `s#`, also accepts `None` (sets C `char*` to `NULL`).
`(...)`	as per `...`	A Python sequence is treated as one argument per item.
`\|`		The following arguments are optional.
`:`		Format end, followed by function name for error messages.
`;`		Format end, followed by entire error message text.

Code formats d to L accept numeric arguments from Python. Python coerces the corresponding values. For example, a code of i can correspond to a Python float; the fractional part gets truncated, as if built-in function int had been called. Py_Complex is a C struct with two fields named real and imag, both of type double.

O is the most general format code and accepts any argument, which you can later check and/or convert as needed. Variant O! corresponds to two arguments in the variable arguments: first the address of a Python type object, then the address of a PyObject*. O! checks that the corresponding value belongs to the given type (or any subtype of that type) before setting the PyObject* to point to the value; otherwise, it raises TypeError (the whole call fails, and the error is set to an appropriate TypeError instance, as covered in "Exceptions" on page 625). Variant O& also corresponds to two arguments in the variable arguments: first the address of a converter function you coded, then a void* (i.e., any address). The converter function must have signature int convert(PyObject*, void*). Python calls your conversion function with the value passed from Python as the first argument and the void* from the variable arguments as the second argument. The conversion function must either return 0 and raise an exception (as covered in "Exceptions" on page 625) to indicate an error, or return 1 and store whatever is appropriate via the void* it gets.

Code format s accepts a string from Python and the address of a char* (i.e., a char**) among the variable arguments. It changes the char* to point at the string's buffer, which your C code must treat as a read-only, null-terminated array of chars (i.e., a typical C string; however, your code must not modify it). The Python string must contain no embedded null characters. s# is similar, but corresponds to two arguments among the variable arguments: first the address of a char*, then the address of an int to set to the string's length. The Python string can contain embedded nulls, and therefore so can the buffer to which the char* is set to point. u and u# are similar, but accept a Unicode string, and the C-side pointers must be Py_UNICODE* rather than char*. Py_UNICODE is a macro defined in Python.h, and corresponds to the type of a Python Unicode character in the implementation (this is often, but not always, a C wchar_t).

t# and w# are similar to s#, but the corresponding Python argument can be any object of a type respecting the buffer protocol, respectively read-only and read/write. Strings are a typical example of read-only buffers. mmap and array instances are typical examples of read/write buffers, and like all read/write buffers they are also acceptable where a read-only buffer is required (i.e., for a t#).

When one of the arguments is a Python sequence of known fixed length, you can use format codes for each of its items, and corresponding C addresses among the variable arguments, by grouping the format codes in parentheses. For example, code (ii) corresponds to a Python sequence of two numbers and, among the remaining arguments, corresponds to two addresses of ints.

The format string may include a vertical bar (|) to indicate that all following arguments are optional. In this case, you must initialize the C variables, whose addresses you pass among the variable arguments for later arguments, to suitable default values before you call PyArg_ParseTuple. PyArg_ParseTuple does not change the C variables corresponding to optional arguments that were not passed in a given call from Python to your C-coded function.

The format string may optionally end with :name to indicate that name must be used as the function name if any error messages are needed. Alternatively, the format string may end with ;text to indicate that text must be used as the entire error message if PyArg_ParseTuple detects errors (this form is rarely used).

A function that has ml_flags in its PyMethodDef set to METH_KEYWORDS accepts positional and keyword arguments. Python code calls the function with any number of positional arguments, which the Python interpreter collects into a tuple, and keyword arguments, which the Python interpreter collects into a dictionary. The C function's second argument is a borrowed reference to the tuple, and the third one is a borrowed reference to the dictionary. Your C code then calls the PyArg_ParseTupleAndKeywords function.

PyArg_ParseTupleAndKeywords

int PyArg_ParseTupleAndKeywords(PyObject* tuple, PyObject* dict, char* format, char** kwlist,...)

Returns 0 for errors, and a value not equal to 0 for success. tuple is the PyObject* that was the C function's second argument. dict is the PyObject* that was the C function's third argument. format is the same as for PyArg_ParseTuple, except that it cannot include the (...) format code to parse nested sequences. kwlist is an array of char* terminated by a NULL sentinel, with the names of the parameters, one after the other. For example, the following C code:

 static PyObject* func_c(PyObject* self, PyObject* args, PyObject* kwds) {     static char* argnames[] = {"x", "y", "z", NULL};     double x, y=0.0, z=0.0;     if(!PyArg_ParseTupleAndKeywords(         args,kwds,"d|dd",argnames,&x,&y,&z))         return NULL;     /* rest of function snipped */

is roughly equivalent to this Python code:

 def func_py(x, y=0.0, z=0.0):     x, y, z = map(float, (x,y,z))     # rest of function snipped

25.1.7. Creating Python Values

C functions that communicate with Python must often build Python values, both to return as their PyObject* result and for other purposes, such as setting items and attributes. The simplest and handiest way to build a Python value is most often with the Py_BuildValue function.

Py_BuildValue

PyObject* Py_BuildValue(char* format,...)

format is a C string that describes the Python object to build. The following arguments of Py_BuildValue are C values from which the result is built. The PyObject* result is a new reference. Table 25-2 lists the commonly used code strings, of which zero or more are joined into string format. Py_BuildValue builds and returns a tuple if format contains two or more format codes, or if format begins with ( and ends with ). Otherwise, the result is not a tuple. When you pass buffersas, for example, in the case of format code s#--Py_BuildValue copies the data. You can therefore modify, abandon, or free( ) your original copy of the data after Py_BuildValue returns. Py_BuildValue always returns a new reference (except for format code N). Called with an empty format, Py_BuildValue("") returns a new reference to None.

Table 25-2. Format codes for Py_BuildValue
Code	C type	Meaning
`c`	`char`	A C `char` becomes a Python string of length `1`.
`D`	`double`	A C `double` becomes a Python `float`.
`d`	`Py_Complex`	A C `Py_Complex` becomes a Python `complex`.
`i`	`int`	A C `int` becomes a Python `int`.
`l`	`long`	A C `long` becomes a Python `int`.
`N`	`PyObject*`	Passes a Python object and steals a reference.
`O`	`PyObject*`	Passes a Python object and `INCREF`s it as normal.
`O&`	`convert+void*`	Arbitrary conversion (see below).
`s`	`char*`	C `0`-terminated `char*` to Python string, or `NULL` to `None`.
`s#`	`char*+int`	C `char*` and length to Python string, or `NULL` to `None`.
`u`	`Py_UNICODE*`	C-wide, null-terminated string to Python Unicode, or `NULL` to `None`.
`u#`	`Py_UNICODE*+int`	C-wide string and length to Python Unicode, or `NULL` to `None`.
`(...)`	As per `...`	Builds Python tuple from C values.
`[...]`	As per `...`	Builds Python list from C values.
`{...}`	As per `...`	Builds Python dictionary from C values, alternating keys and values (must be an even number of C values).

Code O& corresponds to two arguments among the variable arguments: first the address of a converter function you code, then a void* (i.e., any address). The converter function must have signature PyObject* convert(void*). Python calls the conversion function with the void* from the variable arguments as the only argument. The conversion function must either return NULL and raise an exception (as covered in "Exceptions" on page 625) to indicate an error, or return a new reference PyObject* built from data obtained through the void*.

Code {...} builds dictionaries from an even number of C values, alternately keys and values. For example, Py_BuildValue("{issi}",23,"zig","zag",42) returns a dictionary like Python's {23:'zig','zag':42}.

Note the crucial difference between codes N and O. N steals a reference from the corresponding PyObject* value among the variable arguments, so it's convenient to build an object including a reference you own that you would otherwise have to Py_DECREF. O does no reference stealing, so it's appropriate to build an object including a reference you don't own, or a reference you must also keep elsewhere.

25.1.8. Exceptions

To propagate exceptions raised from other functions you call, return NULL as the PyObject* result from your C function. To raise your own exceptions, set the current-exception indicator and return NULL. Python's built-in exception classes (covered in "Standard Exception Classes" on page 130) are globally available, with names starting with PyExc_, such as PyExc_AttributeError, PyExc_KeyError, and so on. Your extension module can also supply and use its own exception classes. The most commonly used C API functions related to raising exceptions are the following.

PyErr_Format	`PyObject* PyErr_Format(PyObject* type,char* format,...)` Raises an exception of class `type`, which must be either a built-in such as `PyExc_IndexError` or an exception class created with `PyErr_NewException`. Builds the associated value from format string `format`, which has syntax similar to `printf`'s, and the following C values indicated as variable arguments above. Returns `NULL`, so your code can just call: return PyErr_Format(PyExc_KeyError, "Unknown key name (%s)", thekeystring);
PyErr_NewException	`PyObject* PyErr_NewException(char* name,PyObject* base,PyObject* dict)` Subclasses exception class `base`, with extra class attributes and methods from dictionary `dict` (normally `NULL`, meaning no extra class attributes or methods), creating a new exception class named `name` (string `name` must be of the form `"modulename.classname"`) and returning a new reference to the new class object. When `base` is `NULL`, uses `PyExc_Exception` as the base class. You normally call this function during initialization of a module object `module`. For example: PyModule_AddObject(module, "error", PyErr_NewException("mymod.error", NULL, NULL));
PyErr_NoMemory	`PyObject* PyErr_NoMemory( )` Raises an out-of-memory error and returns `NULL`, so your code can just call: return PyErr_NoMemory( );
PyErr_SetObject	`void PyErr_SetObject(PyObject* type,PyObject* value)` Raises an exception of class `type`, which must be a built-in such as `PyExc_KeyError` or an exception class created with `PyErr_NewException`, with `value` as the associated value (a borrowed reference). `PyErr_SetObject` is a `void` function (i.e., returns no value).
PyErr_SetFromErrno	`PyObject* PyErr_SetFromErrno(PyObject* type)` Raises an exception of class `type`, which must be a built-in such as `PyExc_OSError` or an exception class created with `PyErr_NewException`. Takes all details from global variable `errno`, which C standard library functions and system calls set for many error cases, and the standard C library function `strerror`, which translates such error codes into appropriate strings. Returns `NULL`, so your code can just call: return PyErr_SetFromErrno(PyExc_IOError);
PyErr_SetFromErrnoWithFilename	`PyObject* PyErr_SetFromErrnoWithFilename(PyObject* type,char* filename)` Like `PyErr_SetFromErrno`, but also provides string `filename` as part of the exception's value. When `filename` is `NULL`, works like `PyErr_SetFromErrno`. Your C code may want to deal with an exception and continue, as a `try`/`except` statement would let you do in Python code. The most commonly used C API functions related to catching exceptions are the following.
PyErr_Clear	`void PyErr_Clear( )` Clears the error indicator. Innocuous if no error is pending.
PyErr_ExceptionMatches	`int PyErr_ExceptionMatches(PyObject* type)` Call only when an error is pending, or the whole program might crash. Returns a value not equal to `0` when the pending exception is an instance of the given `type` or any subclass of `type`, or `0` when the pending exception is not such an instance.
PyErr_Occurred	`PyObject* PyErr_Occurred( )` Returns `NULL` if no error is pending; otherwise, a borrowed reference to the type of the pending exception. (Don't use the returned value; call `PyErr_ExceptionMatches` instead, in order to catch exceptions of subclasses as well, as is normal and expected.)
PyErr_Print	`void PyErr_Print( )` Call only when an error is pending, or the whole program might crash. Outputs a standard traceback to `sys.stderr`, then clears the error indicator. If you need to process errors in highly sophisticated ways, study other error-related functions of the C API, such as `PyErr_Fetch, PyErr_Normalize, PyErr_GivenExceptionMatches`, and `PyErr_Restore`. However, I do not cover such advanced and rarely needed possibilities in this book.

25.1.9. Abstract Layer Functions

The code for a C extension typically needs to use some Python functionality. For example, your code may need to examine or set attributes and items of Python objects, call Python-coded and built-in functions and methods, and so on. In most cases, the best approach is for your code to call functions from the abstract layer of Python's C API. These are functions that you can call on any Python object (functions whose names start with PyObject_), or on any object within a wide category, such as mappings, numbers, or sequences (with names starting with PyMapping_, PyNumber_, and PySequence_, respectively).

Some of the functions callable on specifically typed objects within these categories duplicate functionality that is also available from PyObject_ functions. In these cases, you should almost invariably use the more general PyObject_ function instead. I don't cover such almost-redundant functions in this book.

Functions in the abstract layer raise Python exceptions if you call them on objects to which they are not applicable. All of these functions accept borrowed references for PyObject* arguments and return a new reference (NULL for an exception) if they return a PyObject* result.

The most frequently used abstract layer functions are the following.

PyCallable_Check

int PyCallable_Check(PyObject* x)

True if x is callable, like Python's callable(x).

PyEval_CallObject

PyObject* PyEval_CallObject(PyObject* x,PyObject* args)

Calls callable Python object x with the positional arguments held in tuple args. Returns the call's result, like Python's return x(*args).

PyEval_CallObjectWithKeywords

PyObject* PyEval_CallObjectWithKeywords(PyObject* x,PyObject* args,PyObject* kwds)

Calls callable Python object x with the positional arguments held in tuple args and the named arguments held in dictionary kwds. Returns the call's result, like Python's return x(*args,**kwds).

PyIter_Check

int PyIter_Check(PyObject* x)

True if x supports the iterator protocol (i.e., if x is an iterator).

PyIter_Next

PyObject* PyIter_Next(PyObject* x)

Returns the next item from iterator x. Returns NULL without raising any exception if x's iteration is finished (i.e., when Python's x.next( ) raises StopIteration).

PyNumber_Check

int PyNumber_Check(PyObject* x)

True if x supports the number protocol (i.e., if x is a number).

PyObject_CallFunction

PyObject* PyObject_CallFunction(PyObject* x,char* format,...)

Calls the callable Python object x with positional arguments described by format string format, using the same format codes as Py_BuildValue, covered in "Py_BuildValue" on page 624. When format is NULL, calls x with no arguments. Returns the call's result.

PyObject_CallMethod

PyObject* PyObject_CallMethod(PyObject* x,char* method,char* format,...)

Calls the method named method of Python object x with positional arguments described by format string format, using the same format codes as Py_BuildValue. When format is NULL, calls the method with no arguments. Returns the call's result.

PyObject_Cmp

int PyObject_Cmp(PyObject* x1,PyObject* x2,int* result)

Compares objects x1 and x2 and places the result (-1, 0, or 1) in *result, like Python's result=cmp(x1,x2).

PyObject_DelAttrString

int PyObject_DelAttrString(PyObject* x,char* name)

Deletes x's attribute named name, like Python's del x.name.

PyObject_DelItem

int PyObject_DelItem(PyObject* x,PyObject* key)

Deletes x's item with key (or index) key, like Python's del x[key].

PyObject_DelItemString

int PyObject_DelItemString(PyObject* x,char* key)

Deletes x's item with key key, like Python's del x[key].

PyObject_GetAttrString

PyObject* PyObject_GetAttrString(PyObject* x,char* name)

Returns x's attribute name, like Python's x.name.

PyObject_GetItem

PyObject* PyObject_GetItem(PyObject* x,PyObject* key)

Returns x's item with key (or index) key, like Python's x[key].

PyObject_GetItemString

int PyObject_GetItemString(PyObject* x,char* key)

Returns x's item with key key, like Python's x[key].

PyObject_GetIter

PyObject* PyObject_GetIter(PyObject* x)

Returns an iterator on x, like Python's iter(x).

PyObject_HasAttrString

int PyObject_HasAttrString(PyObject* x,char* name)

True if x has an attribute name, like Python's hasattr(x,name).

PyObject_IsTrue

int PyObject_IsTrue(PyObject* x)

True if x is true for Python, like Python's bool(x).

PyObject_Length

int PyObject_Length(PyObject* x)

Returns x's length, like Python's len(x).

PyObject_Repr

PyObject* PyObject_Repr(PyObject* x)

Returns x's detailed string representation, like Python's repr(x).

PyObject_RichCompare

PyObject* PyObject_RichCompare(PyObject* x,PyObject* y,int op)

Performs the comparison indicated by op between x and y, and returns the result as a Python object. op can be Py_EQ, Py_NE, Py_LT, Py_LE, Py_GT, or Py_GE, corresponding to Python comparisons x==y, x!=y, x<y, x<=y, x>y, or x>=y.

PyObject_RichCompareBool

int PyObject_RichCompareBool(PyObject* x,PyObject* y,int op)

Like PyObject_RichCompare, but returns 0 for false and 1 for true.

PyObject_SetAttrString

int PyObject_SetAttrString(PyObject* x,char* name,PyObject* v)

Sets x's attribute named name to v, like Python's x.name=v.

PyObject_SetItem

int PyObject_SetItem(PyObject* x,PyObject* k,PyObject *v)

Sets x's item with key (or index) key to v, like Python's x[key]=v.

PyObject_SetItemString

int PyObject_SetItemString(PyObject* x,char* key,PyObject *v)

Sets x's item with key key to v, like Python's x[key]=v.

PyObject_Str

PyObject* PyObject_Str(PyObject* x)

Returns x's readable string form, like Python's str(x).

PyObject_Type

PyObject* PyObject_Type(PyObject* x)

Returns x's type object, like Python's type(x).

PyObject_Unicode

PyObject* PyObject_Unicode(PyObject* x)

Returns x's Unicode string form, like Python's unicode(x).

PySequence_Contains

int PySequence_Contains(PyObject* x,PyObject* v)

True if v is an item in x, like Python's v in x.

PySequence_DelSlice

int PySequence_DelSlice(PyObject* x,int start,int stop)

Deletes x's slice from start to stop, like Python's del x[start:stop].

PySequence_Fast

PyObject* PySequence_Fast(PyObject* x)

Returns a new reference to a tuple with the same items as x, unless x is a list, in which case returns a new reference to x. When you need to get many items of an arbitrary sequence x, it's fastest to call t=PySequence_Fast(x) once, then call PySequence_Fast_GET_ITEM(t,i) as many times as needed, and finally call Py_DECREF(t).

PySequence_Fast_GET_ITEM

PyObject* PySequence_Fast_GET_ITEM(PyObject* x,int i)

Returns the i item of x, where x must be the result of PySequence_Fast, x!=NULL, and 0<=i<PySequence_Fast_GET_SIZE(t). Violating these conditions can cause program crashes. This approach is optimized for speed, not for safety.

PySequence_Fast_GET_SIZE

int PySequence_Fast_GET_SIZE(PyObject* x)

Returns the length of x. x must be the result of PySequence_Fast, x!=NULL.

PySequence_GetSlice

PyObject* PySequence_GetSlice(PyObject* x,int start,int stop)

Returns x's slice from start to stop, like Python's x[start:stop].

PySequence_List

PyObject* PySequence_List(PyObject* x)

Returns a new list object with the same items as x, like Python's list(x).

PySequence_SetSlice

int PySequence_SetSlice(PyObject* x,int start,int stop,PyObject* v)

Sets x's slice from start to stop to v, like Python's x[start:stop]=v. Just as in the equivalent Python statement, v must also be a sequence.

PySequence_Tuple

PyObject* PySequence_Tuple(PyObject* x)

Returns a new reference to a tuple with the same items as x, like Python's tuple(x).

Other functions whose names start with PyNumber_ allow you to perform numeric operations. Unary PyNumber functions, which take one argument PyObject* x and return a PyObject*, are listed in Table 25-3 with their Python equivalents.

Table 25-3. Unary PyNumber functions
Function	Python equivalent
`PyNumber_Absolute`	`abs(x)`
`PyNumber_Float`	`float(x)`
`PyNumber_Int`	`int(x)`
`PyNumber_Invert`	`~x`
`PyNumber_Long`	`long(x)`
`PyNumber_Negative`	`-x`
`PyNumber_Positive`	`+x`

Binary PyNumber functions, which take two PyObject* arguments x and y and return a PyObject*, are similarly listed in Table 25-4.

Table 25-4. Binary PyNumber functions
Function	Python equivalent
`PyNumber_Add`	`x + y`
`PyNumber_And`	`x & y`
`PyNumber_Divide`	`x / y`
`PyNumber_Divmod`	`divmod(x, y)`
`PyNumber_FloorDivide`	`x // y`
`PyNumber_Lshift`	`x << y`
`PyNumber_Multiply`	`x * y`
`PyNumber_Or`	`x \| y`
`PyNumber_Remainder`	`x % y`
`PyNumber_Rshift`	`x >> y`
`PyNumber_Subtract`	`x - y`
`PyNumber_TrueDivide`	`x / y` (nontruncating)
`PyNumber_Xor`	`x ^ y`

All the binary PyNumber functions have in-place equivalents whose names start with PyNumber_InPlace, such as PyNumber_InPlaceAdd and so on. The in-place versions try to modify the first argument in place, if possible, and in any case return a new reference to the result, be it the first argument (modified) or a new object. Python's built-in numbers are immutable; therefore, when the first argument is a number of a built-in type, the in-place versions work just the same as the ordinary versions. Function PyNumber_Divmod returns a tuple with two items (the quotient and the remainder) and has no in-place equivalent.

There is one ternary PyNumber function, PyNumber_Power.

PyNumber_Power

PyObject* PyNumber_Power(PyObject* x,PyObject* y,PyObject* z)

When z is Py_None, returns x raised to the y power, like Python's x**y or, equivalently, pow(x,y). Otherwise, returns x**y%z, like Python's pow(x,y,z). The in-place version is named PyNumber_InPlacePower.

25.1.10. Concrete Layer Functions

Each specific type of Python built-in object supplies concrete functions to operate on instances of that type, with names starting with Pytype_ (e.g., PyInt_ for functions related to Python ints). Most such functions duplicate the functionality of abstract-layer functions or auxiliary functions covered earlier in this chapter, such as Py_BuildValue, which can generate objects of many types. In this section, I cover some frequently used functions from the concrete layer that provide unique functionality or substantial convenience or speed. For most types, you can check if an object belongs to the type by calling Pytype_Check, which also accepts instances of subtypes, or Pytype_CheckExact, which accepts only instances of type, not of subtypes. Signatures are the same as for function PyIter_Check, covered in "PyIter_Check" on page 628.

PyDict_GetItem	`PyObject* PyDict_GetItem(PyObject* x,PyObject* key)` Returns a borrowed reference to the item with key `key` of dictionary `x`.
PyDict_GetItemString	`int PyDict_GetItemString(PyObject* x,char* key)` Returns a borrowed reference to the item with key `key` of dictionary `x`.
PyDict_Next	`int PyDict_Next(PyObject* x,int* pos,PyObject k,PyObject v)` Iterates over items in dictionary `x`. You must initialize `pos` to `0` at the start of the iteration: `PyDict_Next` uses and updates `pos` to keep track of its place. For each successful iteration step, returns `1`; when there are no more items, returns `0`. Updates `k` and `v` to point to the next key and value, respectively (borrowed references), at each step that returns `1`. You can pass either `k` or `v` as `NULL` if you are not interested in the key or value. During an iteration, you must not change in any way the set of `x`'s keys, but you can change `x`'s values as long as the set of keys remains identical.
PyDict_Merge	`int PyDict_Merge(PyObject* x,PyObject* y,int override)` Updates dictionary `x` by merging the items of dictionary `y` into `x. override` determines what happens when a key `k` is present in both `x` and `y`: if `override` is `0`, then `x[k]` remains the same; otherwise, `x[k]` is replaced by the value `y[k]`.
PyDict_MergeFromSeq2	`int PyDict_MergeFromSeq2(PyObject* x,PyObject* y,int override)` Like `PyDict_Merge`, except that `y` is not a dictionary but a sequence of sequences, where each subsequence has length `2` and is used as a `(key,value)` pair.
PyFloat_AS_DOUBLE	`double PyFloat_AS_DOUBLE(PyObject* x)` Returns the C `double` value of Python `float` `x`, very fast, without any error checking.
PyList_New	`PyObject* PyList_New(int length)` Returns a new, uninitialized list of the given `length`. You must then initialize the list, typically by calling `PyList_SET_ITEM` `length` times.
PyList_GET_ITEM	`PyObject* PyList_GET_ITEM(PyObject* x,int pos)` Returns the `pos` item of list `x`, without any error checking.
PyList_SET_ITEM	`int PyList_SET_ITEM(PyObject* x,int pos,PyObject* v)` Sets the `pos` item of list `x` to `v`, without any error checking. Steals a reference to `v`. Use only immediately after creating a new list `x` with `PyList_New`.
PyString_AS_STRING	`char* PyString_AS_STRING(PyObject* x)` Returns a pointer to the internal buffer of string `x`, very fast, without any error checking. You must not modify the buffer in any way, unless you just allocated it by calling `PyString_FromStringAnd-Size(NULL,size)`.
PyString_AsStringAndSize	`int PyString_AsStringAndSize(PyObject* x,char** buffer,int* length)` Puts a pointer to the internal buffer of string `x` in `buffer`, and `x`'s length in `length`. You must not modify the buffer in any way, unless you just allocated it by calling `PyString_FromStringAnd-Size(NULL,size)`.
PyString_FromFormat	`PyObject* PyString_FromFormat(char* format,...)` Returns a Python string built from format string `format`, which has syntax similar to `printf`'s, and the following C values indicated as variable arguments (...) above.
PyString_FromStringAnd-Size	`PyObject* PyString_FromFormat(char* data,int size)` Returns a Python string of length `size`, copying `size` bytes from `data`. When `data` is `NULL`, the Python string is uninitialized, and you must initialize it. You can get the pointer to the string's internal buffer by calling `PyString_AS_STRING`.
PyTuple_New	`PyObject* PyTuple_New(int length)` Returns a new, uninitialized tuple of the given `length`. You must then initialize the tuple, typically by calling `PyTuple_SET_ITEM` `length` times.
PyTuple_GET_ITEM	`PyObject* PyTuple_GET_ITEM(PyObject* x,int pos)` Returns the `pos` item of tuple `x`, without error checking.
PyTuple_SET_ITEM	`int PyTuple_SET_ITEM(PyObject* x,int pos,PyObject* v)` Sets the `pos` item of tuple `x` to `v`, without error checking. Steals a reference to `v`. Use only immediately after creating a new tuple `x` with `PyTuple_New`.

25.1.11. A Simple Extension Example

Example 25-1 exposes the functionality of Python C API functions PyDict_Merge and PyDict_MergeFromSeq2 for Python use. The update method of dictionaries works like PyDict_Merge with override=1, but Example 25-1 is more general.

Example 25-1. A simple Python extension module merge.c

 #include <Python.h> static PyObject* merge(PyObject* self, PyObject* args, PyObject* kwds) {     static char* argnames[] = {"x","y","override",NULL};     PyObject *x, *y;     int override = 0;     if(!PyArg_ParseTupleAndKeywords(args, kwds, "O!O|i", argnames,         &PyDict_Type, &x, &y, &override))             return NULL;     if(-1 == PyDict_Merge(x, y, override)) {         if(!PyErr_ExceptionMatches(PyExc_TypeError))             return NULL;         PyErr_Clear( );         if(-1 == PyDict_MergeFromSeq2(x, y, override))             return NULL;     }     return Py_BuildValue(""); } static char merge_docs[] = "\ merge(x,y,override=False): merge into dict x the items of dict y (or the pairs\n\     that are the items of y, if y is a sequence), with optional override.\n\     Alters dict x directly, returns None.\n\ "; static PyObject* mergenew(PyObject* self, PyObject* args, PyObject* kwds) {     static char* argnames[] = {"x","y","override",NULL};     PyObject *x, *y, *result;     int override = 0;     if(!PyArg_ParseTupleAndKeywords(args, kwds, "O!O|i", argnames,         &PyDict_Type, &x, &y, &override))             return NULL;     result = PyObject_CallMethod(x, "copy", "");     if(!result)         return NULL;     if(-1 == PyDict_Merge(result, y, override)) {         if(!PyErr_ExceptionMatches(PyExc_TypeError))             return NULL;         PyErr_Clear( );         if(-1 == PyDict_MergeFromSeq2(result, y, override))             return NULL;     }     return result; } static char mergenew_docs[] = "\ mergenew(x,y,override=False): merge into dict x the items of dict y (or\n\      the pairs that are the items of y, if y is a sequence), with optional\n\      override.  Does NOT alter x, but rather returns the modified copy as\n\      the function's result.\n\ "; static PyMethodDef funcs[] = {     {"merge", (PyCFunction)merge, METH_KEYWORDS, merge_docs},     {"mergenew", (PyCFunction)mergenew, METH_KEYWORDS, mergenew_docs},     {NULL} }; void initmerge(void) {     Py_InitModule3("merge", funcs, "Example extension module"); }

This example declares as static every function and global variable in the C source file, except initmerge, which must be visible from the outside so Python can call it. Since the functions and variables are exposed to Python via PyMethodDef structures, Python does not need to see their names directly. Therefore, declaring them static is best: this ensures that names don't accidentally end up in the whole program's global namespace, as might otherwise happen on some platforms, possibly causing conflicts and errors.

The format string "O!O|i" passed to PyArg_ParseTupleAndKeywords indicates that function merge accepts three arguments from Python: an object with a type constraint, a generic object, and an optional integer. At the same time, the format string indicates that the variable part of PyArg_ParseTupleAndKeywords's arguments must contain four addresses in the following order: the address of a Python type object, two addresses of PyObject* variables, and the address of an int variable. The int variable must be previously initialized to its intended default value, since the corresponding Python argument is optional.

And indeed, after the argnames argument, the code passes &PyDict_Type (i.e., the address of the dictionary type object). Then it passes the addresses of the two PyObject* variables. Finally, it passes the address of variable override, an int that was previously initialized to 0, since the default, when the override argument isn't explicitly passed from Python, is "no overriding." If the return value of PyArg_ParseTupleAndKeywords is 0, the code immediately returns NULL to propagate the exception; this automatically diagnoses most cases where Python code passes wrong arguments to our new function merge.

When the arguments appear to be okay, it tries PyDict_Merge, which succeeds if y is a dictionary. When PyDict_Merge raises a TypeError, indicating that y is not a dictionary, the code clears the error and tries again, this time with PyDict_MergeFromSeq2, which succeeds when y is a sequence of pairs. If that also fails, it returns NULL to propagate the exception. Otherwise, it returns None in the simplest way (i.e., with return Py_BuildValue("")) to indicate success.

Function mergenew basically duplicates merge's functionality; however, mergenew does not alter its arguments, but rather builds and returns a new dictionary as the function's result. The C API function PyObject_CallMethod lets mergenew call the copy method of its first Python-passed argument, a dictionary object, and obtain a new dictionary object that it then alters (with exactly the same logic as function merge). It then returns the altered dictionary as the function result (thus, no need to call Py_BuildValue in this case).

The code of Example 25-1 must reside in a source file named merge.c. In the same directory, create the following script named setup.py:

 from distutils.core import setup, Extension setup(name='merge', ext_modules=[ Extension('merge',sources=['merge.c']) ])

Now, run python setup.py install at a shell prompt in this directory (with a user ID having appropriate privileges to write into your Python installation, or sudo on Unix-like systems if necessary). This command builds the dynamically loaded library for the merge extension module, and copies it to the appropriate directory, depending on your Python installation. Now your Python code can use the module. For example:

 import merge x = {'a':1,'b':2 } merge.merge(x,[['b',3],['c',4]]) print x                               # prints: {'a':1, 'b':2, 'c':4 } print merge.mergenew(x,{'a':5,'d':6},override=1) # prints: {'a':5, 'b':2, 'c':4, 'd':6 } print x                               # prints: {'a':1, 'b':2, 'c':4 }

This example shows the difference between merge (which alters its first argument) and mergenew (which returns a new object and does not alter its argument). It also shows that the second argument can be either a dictionary or a sequence of two-item subsequences. Further, it demonstrates default operation (where keys that are already in the first argument are left alone) as well as the override option (where keys coming from the second argument take precedence, as in Python dictionaries' update method).

25.1.12. Defining New Types

In your extension modules, you often want to define new types and make them available to Python. A type's definition is held in a large struct named PyTypeObject. Most of the fields of PyTypeObject are pointers to functions. Some fields point to other structs, which in turn are blocks of pointers to functions. PyTypeObject also includes a few fields that give the type's name, size, and behavior details (option flags). You can leave almost all fields of PyTypeObject set to NULL if you do not supply the related functionality. You can point some fields to functions in the Python C API in order to supply certain aspects of fundamental object functionality in standard ways.

The best way to implement a type is to copy from the Python sources the file Modules/xxsubtype.c, which Python supplies exactly for such didactical purposes, and edit it. It's a complete module with two types, subclassing from list and dict, respectively. Another example in the Python sources, Objects/xxobject.c, is not a complete module, and the type in this file is minimal and old-fashioned, and does not use modern recommended approaches. See http://www.python.org/dev/doc/devel/api/type-structs.html for detailed documentation on PyTypeObject and other related structs. File Include/object.h in the Python sources contains the declarations of these types, as well as several important comments that you would do well to study.

25.1.12.1. Per-instance data

To represent each instance of your type, declare a C struct that starts, right after the opening brace, with macro PyObject_HEAD. The macro expands into the data fields that your struct must begin with in order to be a Python object. These fields include the reference count and a pointer to the instance's type. Any pointer to your structure can be correctly cast to a PyObject*. You can choose to look at this practice as a kind of C-level implementation of a (single) inheritance mechanism.

The PyTypeObject struct that defines your type's characteristics and behavior must contain the size of your per-instance struct, as well as pointers to the C functions you write to operate on your structure. Therefore, you normally place the PyTypeObject toward the end of your C-coded module's source code, after the definitions of the per-instance struct, and of all the functions that operate on instances of the per-instance struct. Each x that points to a structure starting with PyObject_HEAD, and in particular each PyObject* x, has a field x->ob_type that is the address of the PyTypeObject structure that is x's Python type object.

25.1.12.2. The PyTypeObject definition

Given a per-instance struct such as:

 typedef struct {     PyObject_HEAD     /* other data needed by instances of this type, omitted */ } mytype;

the corresponding PyTypeObject struct almost invariably begins in a way similar to:

 static PyTypeObject t_mytype = { /* tp_head */        PyObject_HEAD_INIT(NULL)   /* use NULL, for MSVC++ */ /* tp_internal */    0,                 /* must be 0 */ /* tp_name */        "mymodule.mytype", /* type name, including module */ /* tp_basicsize */   sizeof(mytype), /* tp_itemsize */    0,                 /* 0 except variable-size type */ /* tp_dealloc */     (destructor)mytype_dealloc, /* tp_print */       0,                 /* usually 0, use str instead */ /* tp_getattr */     0,                 /* usually 0 (see getattro) */ /* tp_setattr */     0,                 /* usually 0 (see setattro) */ /* tp_compare*/      0,                 /* see also richcompare */ /* tp_repr */        (reprfunc)mytype_str,   /* like Python's _ _repr_ _ */     /* rest of struct omitted */

For portability to Microsoft Visual C++, the PyObject_HEAD_INIT macro at the start of the PyTypeObject must have an argument of NULL. During module initialization, you must call PyType_Ready(&t_mytype), which, among other tasks, inserts in t_mytype the address of its type (the type of a type is also known as a metatype), normally &PyType_Type. Another slot in PyTypeObject that points to another type object is tp_base, which comes later in the structure. In the structure definition itself, you must have a tp_base of NULL, again for compatibility with Microsoft Visual C++. However, before you invoke PyType_Ready(&t_mytype), you can optionally set t_mytype.tp_base to the address of another type object. When you do so, your type inherits from the other type, just as a class coded in Python can optionally inherit from a built-in type. For a Python type coded in C, inheriting means that, for most fields in the PyTypeObject, if you set the field to NULL, PyType_Ready copies the corresponding field from the base type. A type must specifically assert in its field tp_flags that it is usable as a base type; otherwise, no other type can inherit from it.

The tp_itemsize field is of interest only for types that, like tuples, have instances of different sizes, and can determine instance size once and forever at creation time. Most types just set tp_itemsize to 0. Fields tp_getattr and tp_setattr are generally set to NULL because they exist only for backward compatibility; modern types use fields tp_getattro and tp_setattro instead. Field tp_repr is typical of most of the following fields, which are omitted here: the field holds the address of a function, which corresponds directly to a Python special method (here, _ _repr_ _). You can set the field to NULL, indicating that your type does not supply the special method, or else set the field to point to a function with the needed functionality. If you set the field to NULL, but also point to a base type from the tp_base slot, you inherit the special method, if any, from your base type. You often need to cast your functions to the specific typedef type that a field needs (here, type reprfunc for field tp_repr) because the typedef has a first argument PyObject* self, while your functions, being specific to your type, normally use more specific pointers. For example:

 static PyObject* mytype_str(mytype* self) {.../* rest omitted */

Alternatively, you can declare mytype_str with a PyObject* self, then use a cast (mytype*)self in the function's body. Either alternative is acceptable style, but it's more common to locate the casts in the PyTypeObject declaration.

25.1.12.3. Instance initialization and finalization

The task of finalizing your instances is split among two functions. The tp_dealloc slot must never be NULL, except for immortal types (i.e., types whose instances are never deallocated). Python calls x->ob_type->tp_dealloc(x) on each instance x whose reference count decreases to 0, and the function thus called must release any resource held by object x, including x's memory. When an instance of mytype holds no other resources that must be released (in particular, no owned references to other Python objects that you would have to DECREF), mytype's destructor can be extremely simple:

 static void mytype_dealloc(PyObject *x) {     x->ob_type->tp_free((PyObject*)x); }

The function in the tp_free slot has the specific task of freeing x's memory. Often, you can just put in slot tp_free the address of the C API function _PyObject_Del.

The task of initializing your instances is split among three functions. To allocate memory for new instances of your type, put in slot tp_alloc the C API function PyType_GenericAlloc, which does absolutely minimal initialization, clearing the newly allocated memory bytes to 0 except for the type pointer and reference count. Similarly, you can often set field tp_new to the C API function PyType_GenericNew. In this case, you can perform all per-instance initialization in the function you put in slot tp_init, which has the signature:

 int init_name(PyObject *self,PyObject *args,PyObject *kwds)

The positional and named arguments to the function in slot tp_init are those passed when calling the type to create the new instance, just like, in Python, the positional and named arguments to _ _init_ _ are those passed when calling the class object. Again, as for types (classes) defined in Python, the general rule is to do as little initialization as feasible in tp_new and do as much as possible in tp_init. Using PyType_GenericNew for tp_new accomplishes this. However, you can choose to define your own tp_new for special types, such as ones that have immutable instances, where initialization must happen earlier. The signature is:

 PyObject* new_name(PyObject *subtype ,PyObject *args,PyObject *kwds)

The function in tp_new must return the newly created instance, normally an instance of subtype (which may be a type that inherits from yours). The function in tp_init, on the other hand, must return 0 for success, or -1 to indicate an exception.

If your type is subclassable, it's important that any instance invariants be established before the function in tp_new returns. For example, if it must be guaranteed that a certain field of the instance is never NULL, that field must be set to a non-NULL value by the function in tp_new. Subtypes of your type might fail to call your tp_init function; therefore, such indispensable initializations, needed to establish invariants, should always be in tp_new for subclassable types.

25.1.12.4. Attribute access

Access to attributes of your instances, including methods (as covered in "Attribute Reference Basics" on page 89), is mediated by the functions you put in slots tp_getattro and tp_setattro of your PyTypeObject struct. Normally, you put there the standard C API functions PyObject_GenericGetAttr and PyObject_GenericSetAttr, which implement standard semantics. Specifically, these API functions access your type's methods via the slot tp_methods, pointing to a sentinel-terminated array of PyMethodDef structs, and your instances' members via the slot tp_members, which is a similar sentinel-terminated array of PyMemberDef structs:

 typedef struct {     char* name;        /* Python-visible name of the member */     int type;          /* code defining the data-type of the member */     int offset;        /* offset of the member in the per-instance struct */     int flags;         /* READONLY for a read-only member */     char* doc;         /* docstring for the member */ } PyMemberDef

As an exception to the general rule that including Python.h gets you all the declarations you need, you have to include structmember.h explicitly in order to have your C source see the declaration of PyMemberDef.

type is generally T_OBJECT for members that are PyObject*, but many other type codes are defined in Include/structmember.h for members that your instances hold as C-native data (e.g., T_DOUBLE for double or T_STRING for char*). For example, say that your per-instance struct is something like:

 typedef struct {     PyObject_HEAD     double datum;     char* name; } mytype;

Expose to Python per-instance attributes datum (read/write) and name (read-only) by defining the following array and pointing your PyTypeObject's tp_members to it:

 static PyMemberDef[] mytype_members = {     {"datum", T_DOUBLE, offsetof(mytype, datum), 0, "The current datum"},     {"name", T_STRING, offsetof(mytype, name), READONLY, "Datum name"},     {NULL} };

Using PyObject_GenericGetAttr and PyObject_GenericSetAttr for tp_getattro and tp_setattro also provides further possibilities, which I do not cover in detail in this book. Field tp_getset points to a sentinel-terminated array of PyGetSetDef structs, the equivalent of having property instances in a Python-coded class. If your PyTypeObject's field tp_dictoffset is not equal to 0, the field's value must be the offset, within the per-instance struct, of a PyObject* that points to a Python dictionary. In this case, the generic attribute access API functions use that dictionary to allow Python code to set arbitrary attributes on your type's instances, just like for instances of Python-coded classes.

Another dictionary is per-type, not per-instance; the PyObject* for the per-type dictionary is slot tp_dict of your PyTypeObject struct. You can set slot tp_dict to NULL, and then PyType_Ready initializes the dictionary appropriately. Alternatively, you can set tp_dict to a dictionary of type attributes, and then PyType_Ready adds other entries to that same dictionary, in addition to the type attributes you set. It's generally easier to start with tp_dict set to NULL, call PyType_Ready to create and initialize the per-type dictionary, and then, if need be, add any further entries to the dictionary via explicit C code.

Field tp_flags is a long whose bits determine your type struct's exact layout, mostly for backward compatibility. Normally, set this field to Py_TPFLAGS_DEFAULT to indicate that you are defining a normal, modern type. You should set tp_flags to Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_GC if your type supports cyclic garbage collection. Your type should support cyclic garbage collection if instances of the type contain PyObject* fields that might point to arbitrary objects and form part of a reference loop. However, to support cyclic garbage collection, it's not enough to add Py_TPFLAGS_HAVE_GC to field tp_flags; you also have to supply appropriate functions, indicated by slots tp_traverse and tp_clear, and register and unregister your instances appropriately with the cyclic garbage collector. Supporting cyclic garbage collection is an advanced subject, and I do not cover it further in this book. Similarly, I do not cover the advanced subject of supporting weak references.

Field tp_doc, a char*, is a null-terminated character string that is your type's docstring. Other fields point to structs (whose fields point to functions); you can set each such field to NULL to indicate that you support none of the functions of that kind. The fields pointing to such blocks of functions are tp_as_number, for special methods typically supplied by numbers; tp_as_sequence, for special methods typically supplied by sequences; tp_as_mapping, for special methods typically supplied by mappings; and tp_as_buffer, for the special methods of the buffer protocol.

For example, objects that are not sequences can still support one or a few of the methods listed in the block to which tp_as_sequence points, and in this case the PyTypeObject must have a non-NULL field tp_as_sequence, even if the block of function pointers it points to is in turn mostly full of NULLs. For example, dictionaries supply a _ _contains_ _ special method so that you can check if x in d when d is a dictionary. At the C code level, the method is a function pointed to by field sq_contains, which is part of the PySequenceMethods struct to which field tp_as_sequence points. Therefore, the PyTypeObject struct for the dict type, named PyDict_Type, has a non-NULL value for tp_as_sequence, even though a dictionary supplies no other field in PySequenceMethods except sq_contains, and therefore all other fields in *(PyDict_Type.tp_as_sequence) are NULL.

25.1.12.5. Type definition example

Example 25-2 is a complete Python extension module that defines the very simple type intpair, each instance of which holds two integers named first and second.

Example 25-2. Defining a new intpair type

 #include "Python.h" #include "structmember.h" /* per-instance data structure */ typedef struct {     PyObject_HEAD     int first, second; } intpair; static int intpair_init(PyObject *self, PyObject *args, PyObject *kwds) {     static char* nams[] = {"first","second",NULL};     int first, second;     if(!PyArg_ParseTupleAndKeywords(args, kwds, "ii", nams, &first, &second))         return -1;     ((intpair*)self)->first = first;     ((intpair*)self)->second = second;     return 0; } static void intpair_dealloc(PyObject *self) {     self->ob_type->tp_free(self); } static PyObject* intpair_str(PyObject* self) {     return PyString_FromFormat("intpair(%d,%d)",         ((intpair*)self)->first, ((intpair*)self)->second); } static PyMemberDef intpair_members[] = {     {"first", T_INT, offsetof(intpair, first), 0, "first item" },     {"second", T_INT, offsetof(intpair, second), 0, "second item" },     {NULL} }; static PyTypeObject t_intpair = {     PyObject_HEAD_INIT(0)               /* tp_head */     0,                                  /* tp_internal */     "intpair.intpair",                  /* tp_name */     sizeof(intpair),                    /* tp_basicsize */     0,                                  /* tp_itemsize */     intpair_dealloc,                    /* tp_dealloc */     0,                                  /* tp_print */     0,                                  /* tp_getattr */     0,                                  /* tp_setattr */     0,                                  /* tp_compare */     intpair_str,                        /* tp_repr */     0,                                  /* tp_as_number */     0,                                  /* tp_as_sequence */     0,                                  /* tp_as_mapping */     0,                                  /* tp_hash */     0,                                  /* tp_call */     0,                                  /* tp_str */     PyObject_GenericGetAttr,            /* tp_getattro */     PyObject_GenericSetAttr,            /* tp_setattro */     0,                                  /* tp_as_buffer */     Py_TPFLAGS_DEFAULT,     "two ints (first,second)",     0,                                  /* tp_traverse */     0,                                  /* tp_clear */     0,                                  /* tp_richcompare */     0,                                  /* tp_weaklistoffset */     0,                                  /* tp_iter */     0,                                  /* tp_iternext */     0,                                  /* tp_methods */     intpair_members,                    /* tp_members */     0,                                  /* tp_getset */     0,                                  /* tp_base */     0,                                  /* tp_dict */     0,                                  /* tp_descr_get */     0,                                  /* tp_descr_set */     0,                                  /* tp_dictoffset */     intpair_init,                       /* tp_init */     PyType_GenericAlloc,                /* tp_alloc */     PyType_GenericNew,                  /* tp_new */     _PyObject_Del,                      /* tp_free */ }; void initintpair(void) {     static PyMethodDef no_methods[] = { {NULL} };     PyObject* this_module = Py_InitModule("intpair", no_methods);     PyType_Ready(&t_intpair);     PyObject_SetAttrString(this_module, "intpair", (PyObject*)&t_intpair); }

The intpair type defined in Example 25-2 gives just about no substantial benefits when compared to an equivalent definition in Python, such as:

 class intpair(object):     _ _slots_ _ = 'first', 'second'     def _ _init_ _(self, first, second):         self.first = first         self.second = second     def _ _repr_ _(self):         return 'intpair(%s,%s)' % (self.first, self.second)

The C-coded version does, however, ensure that the two attributes are integers, truncating float or complex number arguments as needed. For example:

 import intpair x=intpair.intpair(1.2,3.4)                 # x is: intpair(1,3)

Each instance of the C-coded version of intpair occupies somewhat less memory than an instance of the Python version in the above example. However, the purpose of Example 25-2 is purely didactic: to present a C-coded Python extension that defines a simple new type.