Python C API – First Step
I am once again in that time of the year where I start to look back at projects I started and did not finish. I am trying to make a personal commitment to select one or two of those and have them rolling. One of these projects involve embedding Python into it to make it extensible. As it has been a long time since I started this project, I barely remember how that stuff works, so I am once again tackling the problem to see where I can get.
I plan to make this post a little bit simple (but can't guarantee I will be able to), showing some code and explaining parts of it. Where I see the need (and have the knowledge to do so) I will try to discuss a little bit about how certain thing work.
For everything discussed here, I will be using Python 3.2. It is possible some of this stuff won't work with Python 2.x but will probably work fine with Python 3.1.
So, let's start with a simple program that will just initialize Python and list all modules that are loaded as soon as the initialization is finished.
#include "Python.h" #include <stdio.h> #include <stdlib.h> int main(int argc, char** argv) { char *name; PyObject *dictModule; Py_ssize_t size; PyObject *poKey; PyObject *poValue; Py_Initialize(); dictModule = PyImport_GetModuleDict(); size = 0; while(PyDict_Next(dictModule, &size, &poKey, &poValue)) { name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error")); printf("Type: %s - Name: %s\n", (char *)poValue->ob_type->tp_name, name); } Py_Finalize(); return 0; }
To compile this code, you can use the following command (Considering you saved the source in a file called main.c):
gcc -Wall `python3.2-config --include` `python3.2-config --libs` -o pyx main.c
If python3.2-config is not available at you platform, this is what it expands to in my computer:
gcc -Wall -lpthread -ldl -lutil -lm -lpython3.2mu -I/usr/include/python3.2mu -I/usr/include/python3.2mu -o pyx main.c
It is up to you to figure out the include paths :) If you have trouble, let me know in the comments.
Since this is a new topic, I will talk a little bit about the new types we see: PyObject and Py_ssize_t.
Py_ssize_t is a typedef to ssize_t or to a long int. It is recommended to use Py_ssize_t as it might change in the future and this guarantees compatibility.
PyObject is an opaque type that seems to be the base to pretty much everything in Python. A module, a dictionary, an object, a function... they are all represented by a PyObject. There are functions and/or macros that can be used to identify what kind of information is stored in a PyObject. For example, if one wants to know if the PyObject in hand is a dictionary, the macro PyDict_Check(PyObject *p) can be used. This macro returns true if the PyObject 'p' is a dictionary.
I won't be discussing much of the inner workings of PyObject, because that would take a long time, and I am still not too familiar with that yet (Hopefully I will have a pot in the future just to discuss it). I guess it is enough to know that most functions that handle a python object will deal with this structure.
PyInitialize() simply initializes the interpreter and loads everything that python needs. I couldn't express this better than the documentation, so let me quote it:
Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; with the exception of Py_SetProgramName() and Py_SetPath(). This initializes the table of loaded modules (sys.modules), and creates the fundamental modules builtins, __main__ and sys. It also initializes the module search path (sys.path). It does not set sys.argv; use PySys_SetArgvEx() for that. This is a no-op when called for a second time (without calling Py_Finalize() first). There is no return value; it is a fatal error if the initialization fails.
It is safe to ignore the information regarding other functions for now.
So, as soon as Python has been initialized, I was curious about which modules were loaded to see if they would match what the documentation mentioned. Digging into it, I found the function PyImport_GetModuleDict() that will return the dictionary of the default loaded module sys.modules. With this in hand I am able to iterate through the dictionary and print everything that is available in it.
In order to navigate through the dictionary, we can use the function PyDict_Next() which has the following prototype:
int PyDict_Next(PyObject *p, Py_ssize_t *ppos, PyObject **pkey, PyObject **pvalue);
PyObject *p - The dictionary we want to iterate through.
Py_ssize_t *ppos - This will tell the offset inside the dictionary where the current data is found. This value HAS to be initialized to 0 prior to calling this function and should not be changed inside the loop calling PyDict_Next (as this is the variable that keeps track of the offset being used by the function).
PyObject **pkey - This will be a string object containing the name of the dictionary entry (the key of the dictionary, which represents the symbol being analyzed).
PyObject **pvalue - The value to which the current key refers to. This can be anything: a module, another dictionary, a function, a long, a string and so on...
Strings in Python are all Unicode and are represented by a PyObject as well. For that reason, in order to print the string value to the terminal, we need to first convert the PyUnicode object to a PyBytes object which can then be converted to a regular string (char *). This is what is happening in the following line:
name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error"));
When trying to determine the type of the PyObject I could not find a function that would return a string telling the type, so I started analyzing the PyObject structure and found a field tp_name (which I presume means "type name") inside the ob_type structure contained in it. I guess accessing PyObject's members directly isn't recommended, but as this is being used for learning purposes I might be forgiven.
After everything is completed, we just finalize the Python interpreter by calling PyFinalize(). This will make sure Python does its cleanup and frees all memory it was using.
As stated in the post title, this is just the first step playing with the API. I plan to show some more code in the posts to follow, where we will be able to execute python code without interacting with the interpreter, and a code to import a Python module (a .py file with Python code) and then interact with the interpreter, by reading the data in the file and executing its functions.
For now, this is it!
May 3rd, 2011 - 09:42
Great! Keep hacking! :)