GilgaLab

3May/112

Python C API – Second Step

Posted by Henrique

As promissed, here is the second step on playing with the Python C API. If you haven't read the first post, I strongly suggest you to do so: Python C API - First Step

This will be a much more exciting read with many new concepts. If you find something to be confusing or is having a hard time understanding it, please let me know and we can discuss!

For this post I will show a code that does many things, such as:

  • Look for the 'sys' module in the default loaded modules
  • Add a new path to the sys.path list in the sys module
  • Import a module wirtten by us
  • Access the members of the module that we loaded
  • Call a function from the module that we loaded
  • And more!

I hope that from this introduction you feel excited about what is coming ahead!

So, let's start looking at the module we will import:

''' Save this code in a file called example.py '''
 
def doubleValue(x):
    y = 2 * x
    return y
 
myString = "Hello World!"

And finally the code that will do all those wonderful things mentioned in the list above:

/**
 * Save the code in a file called main.c
 * Compile with one of the following:
 *    gcc -Wall `python3.2-config --libs` `python3.2-config --include` main.c -o pyx
 *    gcc -Wall -lpthread -ldl -lutil -lm -lpython3.2mu -I/usr/include/python3.2mu main.c -o pyx
 * 
 * Run with: ./pyx
 * 
 * Use the code as you wish. If any help is needed, please let me know.
 * If you are going to republish this code somewhere, I would be grateful
 * if you can keep the credits of the code and/or a link to the original.
 *
 * Author: Henrique M. D.
 * Email: typoon at gmail dot com
 */
 
#include "Python.h"
#include <stdio.h>
#include <stdlib.h>
 
int main(int argc, char** argv) {
    int i;
    char module_dir[255];
    char *name;
    char *str;
    //wchar_t *module_path;
    PyObject *poModule;
    PyObject *poDictModule;
    PyObject *poKey;
    PyObject *poValue;
    PyObject *poString;
    PyObject *poSys;
    PyObject *poList;
    PyObject *poListItem;
    PyObject *poPath;
    PyObject *poMyString;
    PyObject *poDoubleValue;
    PyObject *poResult;
    Py_ssize_t size;
 
    /* Start Part 1 */
 
    // This is the path where our .py scripts are. We will need to
    // add this path to the sys.path list later on prior to loading
    // it. If we don't do that, Python won't be able to load our script
    // as it won't find it.
    // Change this path accordingly to your setup
    memset(module_dir, 0, sizeof(module_dir));
    sprintf(module_dir, "/home/gilgamesh/codes/Pyx/scripts");
 
    Py_Initialize();
 
    poDictModule = PyImport_GetModuleDict();
 
    poString = PyUnicode_FromString("sys");
    poSys = PyDict_GetItem(poDictModule, poString);
    Py_DECREF(poString); // refcount = 0, it can be freed now
 
    if(poSys == NULL)
    {
        printf("Cannot find the sys item in the dictionary. Aborting...\n");
        return -1;
    }
 
    if(!PyModule_Check(poSys))
    {
        printf("This is not the sys module. Leaving...\n");
        return -1;
    }
 
    poDictModule = PyModule_GetDict(poSys);
    poList = PyDict_GetItemString(poDictModule, "path");
 
    if(!PyList_Check(poList))
    {
        printf("This is not the path list. Aborting...\n");
        return -1;
    }
 
    printf("Path list before:\n");
    for(i = 0; i < PyList_Size(poList); i++)
    {
        poListItem = PyList_GetItem(poList, i);
        if(PyUnicode_Check(poListItem))
        {
            char *value = PyBytes_AsString(PyUnicode_AsEncodedString(poListItem, "utf-8", "Error"));
            printf("\tValue [%d] = %s\n", i, value);
        }
    }
    printf("\n");
 
    poPath = PyUnicode_FromString(module_dir);
    PyList_Append(poList, poPath);
    Py_DECREF(poPath);
 
    printf("Path list after:\n");
    for(i = 0; i < PyList_Size(poList); i++)
    {
        poListItem = PyList_GetItem(poList, i);
        if(PyUnicode_Check(poListItem))
        {
            char *value = PyBytes_AsString(PyUnicode_AsEncodedString(poListItem, "utf-8", "Error"));
            printf("\tValue [%d] = %s\n", i, value);
        }
    }
 
    /* End Part 1 */
 
    /* Start Part 2 */
 
    // Here we will import our script. It is called example.py
    // We just pass the name of the module without the .py as the
    // Python interpreter knows that it should look for the file
    // example.py
    poModule = PyImport_ImportModule("example");
 
    if(poModule == NULL)
    {
        printf("Error importing module\n");
        PyErr_Print();
        return -1;
    }
 
    printf("Module imported succesfully!\n\n");
    poDictModule = PyModule_GetDict(poModule);
 
    size = 0;
    while(PyDict_Next(poDictModule, &size, &poKey, &poValue))
    {
        name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error"));
        printf("Type: %s - Symbol name: %s \n", (char *)poValue->ob_type->tp_name, name);
    }
    printf("\n");
 
    // Get the objects of the symbols we defined in our script    
    poMyString = PyDict_GetItemString(poDictModule, "myString");
 
    if(poMyString)
    {
        if(PyUnicode_Check(poMyString))
        {
            str = PyBytes_AsString(PyUnicode_AsEncodedString(poMyString, "utf-8", "Error"));
            printf("myString value is: %s\n", str);
        }
        else
        {
            printf("myString is not a PyUnicode object. Are you messing up with the script?\n");
        }
    }
    else
    {
        printf("Symbol 'myString' not found...\n");
    }
 
    poDoubleValue = PyDict_GetItemString(poDictModule, "doubleValue");
 
    if(poDoubleValue)
    {
        if(PyFunction_Check(poDoubleValue))
        {
            poResult = PyObject_CallFunction(poDoubleValue, "i", 2);
            printf("doubleValue(2) = %ld\n", PyLong_AsLong(poResult));
            Py_DECREF(poResult);
        }
        else
        {
            printf("doubleValue is not a PyFunction object. Are you messing up with the script?\n");
        }
    }
    else
    {
        printf("Symbol 'doubleValue' not found...\n");
    }
 
    Py_Finalize();
 
    /* End Part 2 */
 
    return 0;
 
}

I will not repeat myself about things that have already been explained in the first post, so I will just give an overview on what is happening here.

This code will import a Python module (a script written by us), print the value of the variable 'myString' that is declared in the script and execute the function 'doubleValue' that (as the name suggests) takes one integer argument, multiplies it by 2 and returns the result.

In order to achieve this, we need some preparation.

First, for Python to be able to find our module, we need to first provide it with the path to where our script is. This is what 'Part One' in the code is all about.

When one wants to import a module, the Python interpreter will check all the paths defined in the list "sys.path" for the module to be imported. If the module is found, it is then loaded otherwise the interpreter will throw an error about the module not being found.

As our module will be in a place different than the default places that python looks for, we need to add this path to the list of module paths.

    poDictModule = PyImport_GetModuleDict();
    poString = PyUnicode_FromString("sys");
    poSys = PyDict_GetItem(poDictModule, poString);
    Py_DECREF(poString); // refcount = 0, it can be freed now

First, we grab the sys.modules dictionary so we can navigate later to the sys module and grab the path list.

We then create a PyUnicode representing the string "sys" that will be used to search for the "sys" module in the dictionary.

PyDict_GetItem() will two arguments: the dictionary we want to get an item from, and a PyObject with the string (a PyUnicode) representing the key in the dictionary. It then returns a PyObject* of the item associated with that key.

After we have our item, we decrement the reference count of 'poString' so it can be free()'d by the system. I will talk about refcounts in another post that is on its way, so don't worry too much about it for now.

At this point we have the module 'sys' in the poSys variable, and can finally go ahead to grab its dictionary and finding the 'path' list in it.

    poDictModule = PyModule_GetDict(poSys);
    poList = PyDict_GetItemString(poDictModule, "path");

This is the other way of getting an item from the dictionary. Instead of having all the trouble of creating a PyUnicode object, we can use PyDict_GetItemString() to get the item identified by the key passed as a char*.
Not much new stuff here, we grab the dictionary for the "sys" module and then grab the "path" list from it.
I know that "sys.path" is a list, because it is documented in the Python documentation somewhere (don't recall where thou).
So now have the sys.path list in the 'poList' variable.

    printf("Path list before:\n");
    for(i = 0; i < PyList_Size(poList); i++)
    {
        poListItem = PyList_GetItem(poList, i);
        if(PyUnicode_Check(poListItem))
        {
            char *value = PyBytes_AsString(PyUnicode_AsEncodedString(poListItem, "utf-8", "Error"));
            printf("\tValue [%d] = %s\n", i, value);
        }
    }
    printf("\n");

Here we are just printing all the items in the sys.path list. Prior to this point one can see that I have made all the checks to make sure that poList is really a PyList and that we can access it as such.

The sys.path list is a list of Strings, so it is okay to just go ahead and convert each item (that we retrieved using PyList_GetItem()) to a char* and then print it out.

An equivalent code in Python to do this would be:

import sys
 
print(sys.path)

Now we want to add the diretory where our module will be

    poPath = PyUnicode_FromString(module_dir);
    PyList_Append(poList, poPath);
    Py_DECREF(poPath);

As we are working with a list of PyUnicode (strings), we need to create a PyObject that represents our path as a PyUnicode, so again we are using PyUnicode_FromString() for this.

After that, we simply append the new path to the list, and again decremet the refcount of the recently created PyObject to make sure it will be cleaned up by the interpreter when it is time.

    printf("Path list after:\n");
    for(i = 0; i < PyList_Size(poList); i++)
    {
        poListItem = PyList_GetItem(poList, i);
        if(PyUnicode_Check(poListItem))
        {
            char *value = PyBytes_AsString(PyUnicode_AsEncodedString(poListItem, "utf-8", "Error"));
            printf("\tValue [%d] = %s\n", i, value);
        }
    }

Print the list again, to prove that our path is now in the sys.path list! :D

Great! Up to this point we saw some new functions and learned some cool stuff on how to handle a list and repracticed handling dictionaries.

What we want to do now is load our module, print all its dictionary entries, access its 'myString' member and call its 'doubleValue' function.
All this things are identified in the code as Part 2.

    // Here we will import our script. It is called example.py
    // We just pass the name of the module without the .py as the
    // Python interpreter knows that it should look for the file
    // example.py
    poModule = PyImport_ImportModule("example");

I guess the comment in this part says it all. Just to make sure, the script presented in the begining of this post should be saved in a file 'example.py' for this to work. If you have saved it in a file with a different name, just make the changes here :)

    printf("Module imported succesfully!\n\n");
    poDictModule = PyModule_GetDict(poModule);
 
    size = 0;
    while(PyDict_Next(poDictModule, &size, &poKey, &poValue))
    {
        name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error"));
        printf("Type: %s - Symbol name: %s \n", (char *)poValue->ob_type->tp_name, name);
    }
    printf("\n");

The module was succesfully imported, we get its dictionary and start printing everything on it!

    poMyString = PyDict_GetItemString(poDictModule, "myString");
 
    if(poMyString)
    {
        if(PyUnicode_Check(poMyString))
        {
            str = PyBytes_AsString(PyUnicode_AsEncodedString(poMyString, "utf-8", "Error"));
            printf("myString value is: %s\n", str);
        }
        else
        {
            printf("myString is not a PyUnicode object. Are you messing up with the script?\n");
        }
    }
    else
    {
        printf("Symbol 'myString' not found...\n");
    }

We retrieve the object that refers to the 'myString' variable in our Python code. If the symbol 'myString' is not present in the dictionary, the function PyDict_GetItemString() will return NULL, so that is why we check to see if there is anything in the poMyString variable. After that, we just want to make sure it is a PyUnicode object, so we can print its value. The printing part is pretty much the same thing we have been doing for every other PyUnicode.

    poDoubleValue = PyDict_GetItemString(poDictModule, "doubleValue");
 
    if(poDoubleValue)
    {
        if(PyFunction_Check(poDoubleValue))
        {
            poResult = PyObject_CallFunction(poDoubleValue, "i", 2);
            printf("doubleValue(2) = %ld\n", PyLong_AsLong(poResult));
            Py_DECREF(poResult);
        }
        else
        {
            printf("doubleValue is not a PyFunction object. Are you messing up with the script?\n");
        }
    }
    else
    {
        printf("Symbol 'doubleValue' not found...\n");
    }

Now we retrieve the object that refers to the function 'myDouble' that we declared in the script. So far, this is pretty much the same thing we have been doing all along. After that we check to make sure that what we got is really a function, and then the interesting part!
PyObject_CallFunction() is one of the many ways to execute a function. Let's have a look at the prototype for PyObject_CallFunction() then:

PyObject* PyObject_CallFunction(PyObject *callable, char *format, ...)

Let's talk about the arguments first.
callable is the PyObject that represents the function we want to call.

format just like printf() is a function that receives a 'format' parameter, this parameter works the same way, the difference is that the characters used to represent thigs are different. This string will have the format of what kind of parameters will come right after. For our code, we have PyObject_CallFunction(poDoubleValue, "i", 2) - Here what happens is, the parameter right after the format is taken as an int, because the format string is "i". If the format string was "iii", we would need to pass 3 parameters, like this: PyObject_CallFunction(poDoubleValue, "iii", 2,3,4);

... all other parameters that are needed depending on the 'format' string.

The "format" parameter shall reflect as many parameters as the function is expecting.

After calling the function, simply print the result (which is an int, treated as a PyLong in Python).
We decrement the reference count for cleanup.

After all this, we just execute a PyFinalize() to finish the interpreter, and we are done!

References:

Python/C API Reference Manual

Filed under: C, Programming, Python 2 Comments
3May/111

Python C API – First Step

Posted by Henrique

I am once again in that time of the year where I start to look back at projects I started and did not finish. I am trying to make a personal commitment to select one or two of those and have them rolling. One of these projects involve embedding Python into it to make it extensible. As it has been a long time since I started this project, I barely remember how that stuff works, so I am once again tackling the problem to see where I can get.

I plan to make this post a little bit simple (but can't guarantee I will be able to), showing some code and explaining parts of it. Where I see the need (and have the knowledge to do so) I will try to discuss a little bit about how certain thing work.

For everything discussed here, I will be using Python 3.2. It is possible some of this stuff won't work with Python 2.x but will probably work fine with Python 3.1.

So, let's start with a simple program that will just initialize Python and list all modules that are loaded as soon as the initialization is finished.

#include "Python.h"
#include <stdio.h>
#include <stdlib.h>
 
int main(int argc, char** argv) {
    char *name;
    PyObject *dictModule;
    Py_ssize_t size;
    PyObject *poKey;
    PyObject *poValue;
 
    Py_Initialize();
    dictModule = PyImport_GetModuleDict();
 
    size = 0;
    while(PyDict_Next(dictModule, &size, &poKey, &poValue))
    {
        name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error"));
        printf("Type: %s - Name: %s\n", (char *)poValue->ob_type->tp_name, name);
    }
 
    Py_Finalize();
 
    return 0;
}

To compile this code, you can use the following command (Considering you saved the source in a file called main.c):
gcc -Wall `python3.2-config --include` `python3.2-config --libs` -o pyx main.c

If python3.2-config is not available at you platform, this is what it expands to in my computer:
gcc -Wall -lpthread -ldl -lutil -lm -lpython3.2mu -I/usr/include/python3.2mu -I/usr/include/python3.2mu -o pyx main.c

It is up to you to figure out the include paths :) If you have trouble, let me know in the comments.

Since this is a new topic, I will talk a little bit about the new types we see: PyObject and Py_ssize_t.

Py_ssize_t is a typedef to ssize_t or to a long int. It is recommended to use Py_ssize_t as it might change in the future and this guarantees compatibility.

PyObject is an opaque type that seems to be the base to pretty much everything in Python. A module, a dictionary, an object, a function... they are all represented by a PyObject. There are functions and/or macros that can be used to identify what kind of information is stored in a PyObject. For example, if one wants to know if the PyObject in hand is a dictionary, the macro PyDict_Check(PyObject *p) can be used. This macro returns true if the PyObject 'p' is a dictionary.

I won't be discussing much of the inner workings of PyObject, because that would take a long time, and I am still not too familiar with that yet (Hopefully I will have a pot in the future just to discuss it). I guess it is enough to know that most functions that handle a python object will deal with this structure.

PyInitialize() simply initializes the interpreter and loads everything that python needs. I couldn't express this better than the documentation, so let me quote it:

Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; with the exception of Py_SetProgramName() and Py_SetPath(). This initializes the table of loaded modules (sys.modules), and creates the fundamental modules builtins, __main__ and sys. It also initializes the module search path (sys.path). It does not set sys.argv; use PySys_SetArgvEx() for that. This is a no-op when called for a second time (without calling Py_Finalize() first). There is no return value; it is a fatal error if the initialization fails.

It is safe to ignore the information regarding other functions for now.

So, as soon as Python has been initialized, I was curious about which modules were loaded to see if they would match what the documentation mentioned. Digging into it, I found the function PyImport_GetModuleDict() that will return the dictionary of the default loaded module sys.modules. With this in hand I am able to iterate through the dictionary and print everything that is available in it.

In order to navigate through the dictionary, we can use the function PyDict_Next() which has the following prototype:

int PyDict_Next(PyObject *p, Py_ssize_t *ppos, PyObject **pkey, PyObject **pvalue);

PyObject *p - The dictionary we want to iterate through.
Py_ssize_t *ppos - This will tell the offset inside the dictionary where the current data is found. This value HAS to be initialized to 0 prior to calling this function and should not be changed inside the loop calling PyDict_Next (as this is the variable that keeps track of the offset being used by the function).
PyObject **pkey - This will be a string object containing the name of the dictionary entry (the key of the dictionary, which represents the symbol being analyzed).
PyObject **pvalue - The value to which the current key refers to. This can be anything: a module, another dictionary, a function, a long, a string and so on...

Strings in Python are all Unicode and are represented by a PyObject as well. For that reason, in order to print the string value to the terminal, we need to first convert the PyUnicode object to a PyBytes object which can then be converted to a regular string (char *). This is what is happening in the following line:

name = PyBytes_AsString(PyUnicode_AsEncodedString(poKey, "utf-8", "Error"));

When trying to determine the type of the PyObject I could not find a function that would return a string telling the type, so I started analyzing the PyObject structure and found a field tp_name (which I presume means "type name") inside the ob_type structure contained in it. I guess accessing PyObject's members directly isn't recommended, but as this is being used for learning purposes I might be forgiven.

After everything is completed, we just finalize the Python interpreter by calling PyFinalize(). This will make sure Python does its cleanup and frees all memory it was using.

As stated in the post title, this is just the first step playing with the API. I plan to show some more code in the posts to follow, where we will be able to execute python code without interacting with the interpreter, and a code to import a Python module (a .py file with Python code) and then interact with the interpreter, by reading the data in the file and executing its functions.

For now, this is it!

References:

Python/C API Reference Manual

Filed under: C, Programming, Python 1 Comment
18Sep/090

Lancamento: Dive Into Python 3

Posted by Henrique

Hoje foi lançado o livro 'Dive Into Python 3' de Mark Pilgrim em formato digital . Ao que parece ele também será comercializado a partir de 16 de outubro.
Dei uma lida no primeiro capítulo do livro e me pareceu bom. O autor escreve de forma simples de se entender e começa mostrando código ao invés de gastar 75 páginas contando histórinha sobre a linguagem.

Link: Dive Into Python 3

Fonte: Dzone

Filed under: Python No Comments