Everyone knows you should type your variables in cython and get amazing speedups! If only… The first time I followed this advice I got a 1.3 times speed up and balked. So here are a few things I gathered after cythonising for more than a bit.
The cython tutorial gives a basic way of how to do this, however as one who likes to folderize my scripts a fair bit, I have found this magic script to be pretty fool proof.
If you like bash scripts like me, this snippet is useful to check if compilation failed, otherwise bash will happily run the rest of your pipeline on your old cython scripts:
my_module....sowill cause an import conflict and
my_module.pygets imported instead. Best to use different names.
cimport numpy as npand
import numpy as npconvention.
cimportimports C functions from the Numpy C API: see
__init__.pxdfrom the Cython project here. For reasons of perhaps convenience, the convention is to import both as
np. I assume internally Cython checks the C API for availability of the class or method, and only if it is not present uses the normal python API.
Missing pxd file Along with the magic script above, we need to set the path to our modules, i.e,
export PYTHONPATH=$PYTHONPATH:xyz_directory/code/so that the compiler can find our pxd files.
Achieving speedups as advertised
- Memory views. Memory views allow efficient access to memory buffers underlying the numpy arrays, allowing us amazing speedups in lookups and writes. Definitely don’t tldr this
Enumerating over python arrays. A common pattern in python is
for i, val in enumerate(values):, however there is no equivalent in C so we should simply index the value instead:
for i in range(len(values)): val = values[i]
Cheapwins with libcmath Most numpy and python math functions that you would use would have a c equivalent. Cheap win on speed, easy to do. Why not?
Cheapwins but risky If the code is certified working, putting cython headers to tell it not to do a bunch of stuff can speed things up. These require you not to do negative indexing among other things. Should read more about them. Here’s the list I got, courtesy of Tim Vieira.
- “risky” because these can do unexpected things. For example for
cdivision=True, is documented as the c code having no zero division checking. However it ALSO does not do floating point division automatically i.e, int/int=int.
Mysterious Segmentation faults
This can occur when calling c-code from Python and in my case there was no indication which line caused the fault. Thanks to my lab mate Mitchell Gordon for pointing out that I should use a debugger to step through line by line until the
segmentation fault occurs instead of stupidly printing line by line. (It was when a line in a try-except statement was doing some illegal indexing.)
The type of exception must match the return type of the python function. When our return type is void, the only option is to throw
*. Always throw the exception, or a bug in the code will have the program complaining from the start to the end.
Handling numpy arrays and operations in cython class
- When to use
np.int32. Thanks to the above naming convention which causes ambiguity in which
npwe are using, errors like
float64_t is not a constant, variable or function identifiermay be encountered. Essentially, we use
np.float64_tto declare the C object type, and use
np.float64to create the object.
- When not to use
np.ndarray[np.float64_t, ndim=1]. Our intuitive
np.ndarrayinitialisation will fail when used as an attribute of a class. The following will throw an error:
Buffer types only allowed as function local variables.
The reason for this is that attributes of our
cdef class are members of
struct and hence we can only expose simple C datatypes. If we are taking in arbitary python objects, then the way to do this is with
cdef object. Although for numpy arrays we have something better: memory views.
- Numpy operations on memory views. Numpy vector and matrix operations would require us to convert the memory views back to
np.ndarraysas computations cannot be done in the memory views directly. The good news is numpy arrays can be written directly into memory views after they have been manipulated.
Numpy Array or memory view?!
This is one of the more confusing things about converting python code to cython. Sometimes python operations written in numpy are faster than the cythonic version. The cython yellow html is not going to help here because numpy is obviously python and will glare at you bright yellow.
Short of timing the operations which can turn into a real pain when your operations are chained (does it make sense to convert back and forth between array and memory view? probably not). In general it requires knowing what numpy is doing under the hood. If the internal numpy operation makes use of c operations, vectorization, multithreading it is going to be faster than your finicky cython for loops. One thing for sure, lists are bad.
Now if we have determined the numpy arrays are faster, we may seemed doomed to conversion because of the struct issue described above where we can only expose simple C datatypes. There is a way around it, which is to declare private attributes for the cython class. However this then means we can’t access our attribute easily and we have to implement boiler plate getter setter methods if we are calling it from outside the class.