Section 23.3. Basic Embedding Techniques

23.3. Basic Embedding Techniques

As you can probably tell from the preceding overview, there is much flexibility in the embedding domain. To illustrate common embedding techniques in action, this section presents a handful of short C programs that run Python code in one form or another. Most of these examples make use of the simple Python module file shown in Example 23-1.

Example 23-1. PP3E\Integrate\Embed\Basics\usermod.py

 ######################################################### # C runs Python code in this module in embedded mode. # Such a file can be changed without changing the C layer. # There is just standard Python code (C does conversions). # You can also run code in standard modules like string. ######################################################### message = 'The meaning of life...' def transform(input):     input = input.replace('life', 'Python')     return input.upper( )

If you know any Python at all, you probably know that this file defines a string and a function; the function returns whatever it is passed with string substitution and uppercase conversions applied. It's easy to use from Python:

 .../PP3E/Integrate/Embed/Basics$ python >>> import usermod                                      # import a module >>> usermod.message                                     # fetch a string 'The meaning of life...' >>> usermod.transform(usermod.message)                  # call a function 'THE MEANING OF PYTHON...'

With a little Python API wizardry, it's not much more difficult to use this module the same way in C.

23.3.1. Running Simple Code Strings

Perhaps the simplest way to run Python code from C is by calling the PyRun_SimpleString API function. With it, C programs can execute Python programs represented as C character string arrays. This call is also very limited: all code runs in the same namespace (the module _ _main_ _), the code strings must be Python statements (not expressions), and there is no direct way to communicate inputs or outputs with the Python code run.

Still, it's a simple place to start. Moreover, when augmented with an imported C extension module that the embedded Python code can use to communicate with the enclosing C layer, this technique can satisfy many embedding goals. To demonstrate the basics, the C program in Example 23-2 runs Python code to accomplish the same results as the interactive session listed in the prior section.

Example 23-2. PP3E\Integrate\Embed\Basics\embed-simple.c

 /*******************************************************  * simple code strings: C acts like the interactive  * prompt, code runs in _ _main_ _, no output sent to C;  *******************************************************/ #include <Python.h>    /* standard API def */ main( ) {     printf("embed-simple\n");     Py_Initialize( );     PyRun_SimpleString("import usermod");                /* load .py file */     PyRun_SimpleString("print usermod.message");         /* on Python path */     PyRun_SimpleString("x = usermod.message");           /* compile and run */     PyRun_SimpleString("print usermod.transform(x)"); }

The first thing you should notice here is that when Python is embedded, C programs always call Py_Initialize to initialize linked-in Python libraries before using any other API functions. The rest of this code is straightforwardC submits hardcoded strings to Python that are roughly what we typed interactively. Internally, PyRun_SimpleString invokes the Python compiler and interpreter to run the strings sent from C; as usual, the Python compiler is always available in systems that contain Python.

23.3.1.1. Compiling and running

To build a standalone executable from this C source file, you need to link its compiled form with the Python library file. In this chapter, "library" usually means the binary library file that is generated when Python is compiled, not the Python source code library.

Today, everything in Python that you need in C is compiled into a single Python library file when the interpreter is built (e.g., libpython2.4.dll on Cygwin). The program's main function comes from your C code, and depending on your platform and the extensions installed in your Python, you may also need to link any external libraries referenced by the Python library.

Assuming no extra extension libraries are needed, Example 23-3 is a minimal makefile for building the C program in Example 23-2 under Cygwin on Windows. Again, makefile details vary per platform, but see Python manuals for hints. This makefile uses the Python include-files path to find Python.h in the compile step and adds the Python library file to the final link step to make API calls available to the C program.

Example 23-3. PP3E\Integrate\Embed\Basics\makefile.1

 # a Cygwin makefile that builds a C executable that embeds # Python, assuming no external module libs must be linked in; # uses Python header files, links in the Python lib file; # both may be in other dirs (e.g., /usr) in your install; PYLIB = /usr/bin PYINC = /usr/include/python2.4 embed-simple: embed-simple.o         gcc embed-simple.o -L$(PYLIB) -lpython2.4 -g -o embed-simple embed-simple.o: embed-simple.c         gcc embed-simple.c -c -g -I$(PYINC)

To build a program with this file, launch make on it as usual:

 .../PP3E/Integrate/Embed/Basics$ make -f makefile.1 gcc embed-simple.c -c -g -I/usr/include/python2.4 gcc embed-simple.o -L/usr/bin -lpython2.4 -g -o embed-simple

Things may not be quite this simple in practice, though, at least not without some coaxing. The makefile in Example 23-4 is the one I actually used to build all of this section's C programs on Cygwin.

Example 23-4. PP3E\Integrate\Embed\Basics\makefile.basics

 # cygwin makefile to build all 5 # basic embedding examples at once PYLIB = /usr/bin PYINC = /usr/include/python2.4 BASICS = embed-simple.exe   \          embed-string.exe   \          embed-object.exe   \          embed-dict.exe     \          embed-bytecode.exe all:    $(BASICS) embed%.exe: embed%.o         gcc embed$*.o -L$(PYLIB) -lpython2.4 -g -o $@ embed%.o: embed%.c         gcc embed$*.c -c -g -I$(PYINC) clean:         rm -f *.o *.pyc $(BASICS) core

On some platforms, you may need to also link in other libraries because the Python library file used may have been built with external dependencies enabled and required. In fact, you may have to link in arbitrarily many more externals for your Python library, and frankly, chasing down all the linker dependencies can be tedious. Required libraries may vary per platform and Python install, so there isn't a lot of advice I can offer to make this process simple (this is C, after all). The standard C development techniques will apply.

One thing to note is that on some platforms, if you're going to do much embedding work and you run into external dependency issues, you might want to build Python on your machine from its source with all unnecessary extensions disabled in the Modules/Setup file (or the top-level setup.py Distutils script in more recent releases). This produces a Python library with minimal external requirements, which links much more easily.

For example, if your embedded code won't be building GUIs, Tkinter can simply be removed from the library; see the README file at the top of Python's source distribution for details. You can also find a list of external libraries referenced from your Python in the generated makefiles located in the Python source tree. In any event, the good news is that you need to resolve linker dependencies only once.

Once you've gotten the makefile to work, run it to build the C program with Python libraries linked in:

 .../PP3E/Integrate/Embed/Basics$ make -f makefile.basics clean  rm -f *.o *.pyc embed-simple.exe embed-string.exe embed-object.exe embed-dict.ex e embed-bytecode.exe core .../PP3E/Integrate/Embed/Basics$ make -f makefile.basics  gcc embed-simple.c -c -g -I/usr/include/python2.4 gcc embed-simple.o -L/usr/bin -lpython2.4 -g -o embed-simple.exe  ...lines deleted... gcc embed-bytecode.c -c -g -I/usr/include/python2.4 gcc embed-bytecode.o -L/usr/bin -lpython2.4 -g -o embed-bytecode.exe rm embed-dict.o embed-object.o embed-simple.o embed-bytecode.o embed-string.o

After building, run the resulting C program as usual, regardless of how this works in your platform:^[*]

^[*] Under Python 2.4 and Cygwin on Windows, I had to first set my PYTHONPATH to include the current directory in order to run the embedding examples under Python 2.4 and Cygwin, with the shell command export PYTHONPATH=.. I also had to use the shell command ./embed-simple to execute the program due to my system path setting. Your mileage may vary; if you have trouble, try running the embedded Python commands import sys and print sys.path from C to see what Python's path looks like, and take a look at the Python/C API manual for more on path configuration for embedded applications.

 .../PP3E/Integrate/Embed/Basics$ embed-simple embed-simple The meaning of life... THE MEANING OF PYTHON...

Most of this output is produced by Python print statements sent from C to the linked-in Python library. It's as if C has become an interactive Python programmer.

Naturally, strings of Python code run by C probably would not be hardcoded in a C program file like this. They might instead be loaded from a text file or GUI, extracted from HTML or XML files, fetched from a persistent database or socket, and so on. With such external sources, the Python code strings that are run from C could be changed arbitrarily without having to recompile the C program that runs them. They may even be changed onsite, and by end users of a system. To make the most of code strings, though, we need to move on to more flexible API tools.

23.3.2. Running Code Strings with Results and Namespaces

Example 23-5 uses the following API calls to run code strings that return expression results back to C:

Py_Initialize: Initializes linked-in Python libraries as before
PyImport_ImportModule: Imports a Python module and returns a pointer to it
PyModule_GetDict: Fetches a module's attribute dictionary object
PyRun_String: Runs a string of code in explicit namespaces
PyObject_SetAttrString: Assigns an object attribute by namestring
PyArg_Parse: Converts a Python return value object to C form

The import calls are used to fetch the namespace of the usermod module listed in Example 23-1 earlier so that code strings can be run there directly (and will have access to names defined in that module without qualifications). Py_Import_ImportModule is like a Python import statement, but the imported module object is returned to C; it is not assigned to a Python variable name. As a result, it's probably more similar to the Python _ _import_ _ built-in function.

The PyRun_String call is the one that actually runs code here, though. It takes a code string, a parser mode flag, and dictionary object pointers to serve as the global and local namespaces for running the code string. The mode flag can be Py_eval_input to run an expression, or Py_file_input to run a statement; when running an expression, the result of evaluating the expression is returned from this call (it comes back as a PyObject* object pointer). The two namespace dictionary pointer arguments allow you to distinguish global and local scopes, but they are typically passed the same dictionary such that code runs in a single namespace.^[*]

^[*] A related function lets you run files of code but is not demonstrated in this chapter: PyObject* PyRun_File(FILE *fp, char *filename, mode, globals, locals). Because you can always load a file's text and run it as a single code string with PyRun_String, the PyRun_File call is not always necessary. In such multiline code strings, the \n character terminates lines and indentation groups blocks as usual.

Example 23-5. PP3E\Integrate\Embed\Basics\embed-string.c

 /* code-strings with results and namespaces   */ #include <Python.h> main( ) {     char *cstr;     PyObject *pstr, *pmod, *pdict;     printf("embed-string\n");     Py_Initialize( );     /* get usermod.message */     pmod  = PyImport_ImportModule("usermod");     pdict = PyModule_GetDict(pmod);     pstr  = PyRun_String("message", Py_eval_input, pdict, pdict);     /* convert to C */     PyArg_Parse(pstr, "s", &cstr);     printf("%s\n", cstr);     /* assign usermod.X */     PyObject_SetAttrString(pmod, "X", pstr);     /* print usermod.transform(X) */     (void) PyRun_String("print transform(X)", Py_file_input, pdict, pdict);     Py_DECREF(pmod);     Py_DECREF(pstr); }

When compiled and run, this file produces the same result as its predecessor:

 .../PP3E/Integrate/Embed/Basics$ embed-string embed-string The meaning of life... THE MEANING OF PYTHON...

But very different work goes into producing this output. This time, C fetches, converts, and prints the value of the Python module's message attribute directly by running a string expression and assigning a global variable (X) within the module's namespace to serve as input for a Python print statement string.

Because the string execution call in this version lets you specify namespaces, you can better partition the embedded code your system runseach grouping can have a distinct namespace to avoid overwriting other groups' variables. And because this call returns a result, you can better communicate with the embedded code; expression results are outputs, and assignments to globals in the namespace in which code runs can serve as inputs.

Before we move on, I need to explain two coding issues here. First, this program also decrements the reference count on objects passed to it from Python, using the Py_DECREF call introduced in Chapter 22. These calls are not strictly needed here (the objects' space is reclaimed when the programs exits anyhow), but they demonstrate how embedding interfaces must manage reference counts when Python passes their ownership to C. If this was a function called from a larger system, for instance, you would generally want to decrement the count to allow Python to reclaim the objects.

Second, in a realistic program, you should generally test the return values of all the API calls in this program immediately to detect errors (e.g., import failure). Error tests are omitted in this section's example to keep the code simple, but they will appear in later code listings and should be included in your programs to make them more robust.

23.3.3. Calling Python Objects

The last two sections dealt with running strings of code, but it's easy for C programs to deal in terms of Python objects too. Example 23-6 accomplishes the same task as Examples 23-2 and 23-5, but it uses other API tools to interact with objects in the Python module directly:

PyImport_ImportModule: Imports the module from C as before
PyObject_GetAttrString: Fetches an object's attribute value by name
PyEval_CallObject: Calls a Python function (or class, or method)
PyArg_Parse: Converts Python objects to C values
Py_BuildValue: Converts C values to Python objects

We met both of the data conversion functions in Chapter 22. The PyEval_CallObject call in this version of the example is the key call here: it runs the imported function with a tuple of arguments, much like the Python apply built-in function and newer func(*args) call syntax. The Python function's return value comes back to C as a PyObject*, a generic Python object pointer.

Example 23-6. PP3E\Integrate\Embed\Basics\embed-object.c

 /* fetch and call objects in modules */ #include <Python.h> main( ) {     char *cstr;     PyObject *pstr, *pmod, *pfunc, *pargs;     printf("embed-object\n");     Py_Initialize( );     /* get usermod.message */     pmod = PyImport_ImportModule("usermod");     pstr = PyObject_GetAttrString(pmod, "message");     /* convert string to C */     PyArg_Parse(pstr, "s", &cstr);     printf("%s\n", cstr);     Py_DECREF(pstr);     /* call usermod.transform(usermod.message) */     pfunc = PyObject_GetAttrString(pmod, "transform");     pargs = Py_BuildValue("(s)", cstr);     pstr  = PyEval_CallObject(pfunc, pargs);     PyArg_Parse(pstr, "s", &cstr);     printf("%s\n", cstr);     /* free owned objects */     Py_DECREF(pmod);     Py_DECREF(pstr);     Py_DECREF(pfunc);        /* not really needed in main( ) */     Py_DECREF(pargs);        /* since all memory goes away  */ }

When compiled and run, the result is the same again:

 .../PP3E/Integrate/Embed/Basics$ embed-object embed-object The meaning of life... THE MEANING OF PYTHON...

But this output is generated by C this timefirst, by fetching the Python module's message attribute value, and then by fetching and calling the module's transform function object directly and printing its return value that is sent back to C. Input to the TRansform function is a function argument here, not a preset global variable. Notice that message is fetched as a module attribute this time, instead of by running its name as a code string; there is often more than one way to accomplish the same goals with different API calls.

Running functions in modules like this is a simple way to structure embedding; code in the module file can be changed arbitrarily without having to recompile the C program that runs it. It also provides a direct communication model: inputs and outputs to Python code can take the form of function arguments and return values.

23.3.4. Running Strings in Dictionaries

When we used PyRun_String earlier to run expressions with results, code was executed in the namespace of an existing Python module. However, sometimes it's more convenient to create a brand-new namespace for running code strings that is independent of any existing module files. The C file in Example 23-7 shows how; the new namespace is created as a new Python dictionary object, and a handful of new API calls are employed in the process:

PyDict_New: Makes a new empty dictionary object
PyDict_SetItemString: Assigns to a dictionary's key
PyDict_GetItemString: Fetches (indexes) a dictionary value by key
PyRun_String: Runs a code string in namespaces, as before
PyEval_GetBuiltins: Gets the built-in scope's module

The main trick here is the new dictionary. Inputs and outputs for the embedded code strings are mapped to this dictionary by passing it as the code's namespace dictionaries in the PyRun_String call. The net effect is that the C program in Example 23-7 works exactly like this Python code:

  >>> d = {}  >>> d['Y'] = 2  >>> exec 'X = 99' in d, d  >>> exec 'X = X + Y' in d, d  >>> print d['X']  101

But here, each Python operation is replaced by a C API call.

Example 23-7. PP3E\Integrate\Embed\Basics\embed-dict.c

 /***************************************************  * make a new dictionary for code string namespace;  ***************************************************/ #include <Python.h> main( ) {     int cval;     PyObject *pdict, *pval;     printf("embed-dict\n");     Py_Initialize( );     /* make a new namespace */     pdict = PyDict_New( );     PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( ));     PyDict_SetItemString(pdict, "Y", PyInt_FromLong(2));   /* dict['Y'] = 2   */     PyRun_String("X = 99",  Py_file_input, pdict, pdict);  /* run statements  */     PyRun_String("X = X+Y", Py_file_input, pdict, pdict);  /* same X and Y    */     pval = PyDict_GetItemString(pdict, "X");               /* fetch dict['X'] */     PyArg_Parse(pval, "i", &cval);                         /* convert to C */     printf("%d\n", cval);                                  /* result=101 */     Py_DECREF(pdict); }

When compiled and run, this C program creates this sort of output:

 .../PP3E/Integrate/Embed/Basics$ embed-dict embed-dict 101

The output is different this time: it reflects the value of the Python variable X assigned by the embedded Python code strings and fetched by C. In general, C can fetch module attributes either by calling PyObject_GetAttrString with the module or by using PyDict_GetItemString to index the module's attribute dictionary (expression strings work too, but they are less direct). Here, there is no module at all, so dictionary indexing is used to access the code's namespace in C.

Besides allowing you to partition code string namespaces independent of any Python module files on the underlying system, this scheme provides a natural communication mechanism. Values that are stored in the new dictionary before code is run serve as inputs, and names assigned by the embedded code can later be fetched out of the dictionary to serve as code outputs. For instance, the variable Y in the second string run refers to a name set to 2 by C; X is assigned by the Python code and fetched later by C code as the printed result.

There is one subtlety: dictionaries that serve as namespaces for running code are generally required to have a _ _builtins_ _ link to the built-in scope searched last for name lookups, set with code of this form:

 PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( ));

This is esoteric, and it is normally handled by Python internally for modules. For raw dictionaries, though, we are responsible for setting the link manually.

23.3.5. Precompiling Strings to Bytecode

When you call Python function objects from C, you are actually running the already compiled bytecode associated with the object (e.g., a function body). When running strings, Python must compile the string before running it. Because compilation is a slow process, this can be a substantial overhead if you run a code string more than once. Instead, precompile the string to a bytecode object to be run later, using the API calls illustrated in Example 23-8:^[*]

^[*] In case you've forgotten: bytecode is simply an intermediate representation for already compiled program code in the current standard Python implementation. It's a low-level binary format that can be quickly interpreted by the Python runtime system. Bytecode is usually generated automatically when you import a module, but there may be no notion of an import when running raw strings from C.

Py_CompileString: Compiles a string of code and returns a bytecode object
PyEval_EvalCode: Runs a compiled bytecode object

The first of these takes the mode flag that is normally passed to PyRun_String, as well as a second string argument that is used only in error messages. The second takes two namespace dictionaries. These two API calls are used in Example 23-8 to compile and execute three strings of Python code in turn.

Example 23-8. PP3E\Integrate\Embed\Basics\embed-bytecode.c

 /* precompile code strings to bytecode objects */ #include <Python.h> #include <compile.h> #include <eval.h> main( ) {     int i;     char *cval;     PyObject *pcode1, *pcode2, *pcode3, *presult, *pdict;     char *codestr1, *codestr2, *codestr3;     printf("embed-bytecode\n");     Py_Initialize( );     codestr1 = "import usermod\nprint usermod.message";     /* statements */     codestr2 = "usermod.transform(usermod.message)";        /* expression */     codestr3 = "print '%d:%d' % (X, X ** 2),";              /* use input X */     /* make new namespace dictionary */     pdict = PyDict_New( );     if (pdict == NULL) return -1;     PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( ));     /* precompile strings of code to bytecode objects */     pcode1 = Py_CompileString(codestr1, "<embed>", Py_file_input);     pcode2 = Py_CompileString(codestr2, "<embed>", Py_eval_input);     pcode3 = Py_CompileString(codestr3, "<embed>", Py_file_input);     /* run compiled bytecode in namespace dict */     if (pcode1 && pcode2 && pcode3) {         (void)    PyEval_EvalCode((PyCodeObject *)pcode1, pdict, pdict);         presult = PyEval_EvalCode((PyCodeObject *)pcode2, pdict, pdict);         PyArg_Parse(presult, "s", &cval);         printf("%s\n", cval);         Py_DECREF(presult);         /* rerun code object repeatedly */         for (i = 0; i <= 10; i++) {             PyDict_SetItemString(pdict, "X", PyInt_FromLong(i));             (void) PyEval_EvalCode((PyCodeObject *)pcode3, pdict, pdict);         }         printf("\n");     }     /* free referenced objects */     Py_XDECREF(pdict);     Py_XDECREF(pcode1);     Py_XDECREF(pcode2);     Py_XDECREF(pcode3); }

This program combines a variety of techniques that we've already seen. The namespace in which the compiled code strings run, for instance, is a newly created dictionary (not an existing module object), and inputs for code strings are passed as preset variables in the namespace. When built and executed, the first part of the output is similar to previous examples in this section, but the last line represents running the same precompiled code string 11 times:

 .../PP3E/Integrate/Embed/Basics$ embed-bytecode embed-bytecode The meaning of life... THE MEANING OF PYTHON... 0:0 1:1 2:4 3:9 4:16 5:25 6:36 7:49 8:64 9:81 10:100

If your system executes strings multiple times, it is a major speedup to precompile to bytecode in this fashion.