Personal blog of Matthias M. Fischer


Calling C from within Python, Part 1: Making a Shared Object

11th October 2022

Introduction

Due to an acute urge to use C again, some days ago I used it to implement the classical Runge-Kutta fourth-order method for solving systems of differential equations of arbitrary dimensionality. (See here for the source code.) While this was a nice quick finger exercise for using double pointers as well as pointers to functions, there was no real "use" to the whole ordeal. Thus, I figured I should make use of this little project to additionally teach myself something entirely new. This would not only make the whole thing way more exciting to me, but also contribute to expanding my skill set.

Hence, the idea was born to learn how to call a piece of compiled software initially written in C from within a Python script. Especially in numerics-heavy scientific applications, this can be a very useful approach to drastically reduce the time required for computations. In the following series of blog posts, I'll present what I've learned for future reference, both for myself and others. This first post is about the solver written in C and how to compile it in a way that makes it accessible to a Python script.

Implementing the Solver

I will not go into the mathematical details of the solver here, since they're readily available for studying throughout the literature, both online as well as offline. However, I will briefly comment about some specific design choices I made during implementation.

First, in order to be able to solve an arbitrary system of differential equations contained in a function f, the solver needs to take a pointer to f as one of its parameters. The function f takes the current time point double t, as well as a position in the system's state space (passed as a pointer to a contiguous block of d doubles, double* y), and somehow will yield the local gradient of the system, again as a pointer double* dydt to a contiguous block of d doubles.

Because f will be called repeatedly during the numerical integration, I figured it wouldn't really be efficient to allocate memory upon every single call, return a pointer to it, and free the block of memory afterwards. Instead, I pass a pointer double* dydt as third function argument, to which the result will be written. This allows us to recycle the same block of memory in every call. The function thus will be of void type, returning nothing.

For instance, a function describing the famous Lorenz system might look like this:

void lorenz(double t, double* y, double* dydt) {
    // The Lorenz system as an example function to solve.   
    dydt[0] = 10.0*(y[1]-y[0]);
    dydt[1] = y[0]*(28.0-y[2]) - y[1];
    dydt[2] = y[0]*y[1] - (8.0/3.0)*y[2];}

Thus, the signature of our solver routine will look like this:

double** solve(
           void (*f)(double, double*, double*), // The function to solve
           double* y0,                          // Initial condition of the system
           double dt,                           // The time step for integration
           double tmax,                         // The maximum time
           int d)                               // The dimension of the system

Second, if we intend to call this piece of code from within Python, we need to make sure to also offer a possibility to deallocate the memory again (from within Python), in which the results are stored and returned.

Let's think this through. The solver returns a double**, that is, a pointer to a block of memory, in which pointers to contiguous blocks of doubles are stored. To be precise, the solver returns a pointer to an array of d+1 double*s. The first of these pointers points to a block of doubles that contains the time points, at which the system has been evaluated. The remaining d pointers point to blocks of doubles in which the solution of the d system variables at the respective time points are stored.

Thus, in order to properly deallocate all of the memory, we first need to free the d+1 "arrays", and afterwards the "array" containing the pointers itself. This can easily be done like so:

void dealloc(double** ptr, int d) {
    for (int i = 0; i<d+1; i++) {
        free(ptr[i]);
    }
    free (ptr);
}

Now, we only need to remember to call this function after every use of our solver in order to avoid any memory leaks.

Compiling it as a Shared Object

Now that the solver is implemented, we need to compile it using our favourite C compiler. Personally, I use clang, but gcc works just as fine as well. To be able to use the code from Python, we need to compile it not to an ELF, like we normally would, but to a so-called "shared object". This basically means a standalone file, which can be automatically linked into a program when the program starts. Due to this dynamic linking, we also need to compile the code in a position-independent manner, meaning that it can be safely placed at any position in memory. (For instance, this concerns how jump adresses are stores.) Overall, we thus run

clang rk4.c -fPIC -shared -o rk4.so

and thus obtain our desired rk4.so.

Outlook

In the next blog post, I'll describe how to use this shared object file from within a Python script using the ctypes library. Generally, this is rather straight-forward to do, however I encountered some little traps and pitfalls, which I think are worth writing down. Stay tuned!