Optimizing with Cython Introduction - Cython Tutorial




Cython Tutorial Series - 1 - Intro

Welcome to a Cython tutorial. The purpose of Cython is to act as an intermediary between Python and C/C++. At its heart, Cython is a superset of the Python language, which allows you to add typing information and class attributes that can then be translated to C code and to C-Extensions for Python.

If you've done much Python programming and shared it with your non-Python programmer friends and colleagues , chances are, you've been inquired about why you're using Python, since, of course, it's such a "slow" language!

Isn't Python slow? What about the GIL? That dynamic typing though?

I find myself frequently defending Python by explaining that, while pure Python is indeed quite slow, Python in practice is not. Libraries like Numpy, Pandas, and Scikit-learn all are C Optimized. When you use them, you're actually making use of C/C++ power, you're just able to use Python syntax. In fact, Numpy, Pandas, and Scikit-learn all make use of Cython! Chances are, the Python+C-optimized code in these popular libraries and/or using Cython is going to be far faster than the C code you might write yourself, and that's if you manage to write it without any bugs.

Despite there being already many C-Optimized libraries, some times there isn't already a C wrapper for what you're trying to do, and many libraries, or parts of them, are meant to be open enough to not force static typing on you, so then you're stuck with non optimized code that is slower than C. So, here, we're going to talk about optimizing code with Cython.

At a quick glance, Cython initially appeared to me to be quite complex and imposing, unlikely to be worth the effort to learn it. That was until I was sitting in on a Pycon talk about it, and realized it's actually unbelievably simple, or at least can be. The main crux: Add typing information...and seriously that's all you need to do to get massive gains.

What's typing information? In Python, when you declare a variable, like:

x = 5.0

You never had to tell the language that the variable 'x' was an integer. In fact, later, you can assign 'Gary' to x and be just fine. This is because Python checks every single time for you to figure out the type. This is called "dynamic typing."

This is nifty, and makes learning initially very simple, and Python was really only meant to be a teaching language, but this severely slows things down.

Instead, if we're willing to, we can use static typing and Cython to get some serious speed ups. Many languages do something more like:

float x = 5.0

Cython wants something like:

cdef float x = 5.0

Of course you need to keep x as a float, but, as long as you adhere to static typing alone, you will be significantly rewarded. To do this, we need two things:

  1. Cython
  2. A C/C++ compiler

For #1, you just simply do pip install cython For #2, things can get a little more hairy depending on your operating system:

Linux: Congratulations, you're probably done, likely already having a compiler. If not, a sudo apt-get install build-essential is likely all you need to do.

Mac: You want GCC most likely. You can do this by installing Apple's XCode

Windows: Use either MinGW, or get the exact same version of Visual C that compiled your version of Python. Here's the Cython guide for MinGW on Windows: http://cython.readthedocs.io/en/latest/src/tutorial/appendix.html You can also look into Python(x,y), Enthought Canopy, or WinPython, all of these I believe come with MinGW ready to go for you to make life easier (possibly, no promises!).

Once you have Cython and a compiler, let's go through the Cython workflow and make our own C-Exension! Let's start with a simple python file:

#example_original.py
def test(x):
    y = 0
    for i in range(x):
        y += i
    return y

How do we prepare this file to be passed through Cython? Simple, rather than .py, we do .pyx

#example_cython.pyx
def test(x):
    y = 0
    for i in range(x):
        y += i
    return y

We obviously don't have any typing information yet. We'll add that in later, but, for now, we'll stick with this.

Once you have a .pyx, you're ready to build. To do this, we're going to make a setup.py file:

from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules = cythonize('example_cython.pyx'))

Next, in your terminal, do:

python setup.py build_ext --inplace

This should create a build directory, a C file (.c), and a Shared Object file (.so). With this, we can import our C-extension. To illustrate this, you can now delete, or otherwise move your example.py and example.pyx files so all that remains is the build, .c and .so files. Now create a new file called testing.py, and we can import our new c extension:

#testing.py
import example_cython

example_cython.test(5)

Congratulations! You did it!

So we've not really done any Cython typing...etc, so this code isn't more optimized, but this is actually fairly interesting to show, because it was very simple to do, and it illustrates that you can do as much, or as little, as you want to with Cython implementation.

Now, we'll begin adding typing information. Let's go over some of the typing declarations:

cdef declarations:

  • cdef int x,y,z
  • cdef char *s
  • cdef float x = 5.2 (single precision)
  • cdef double x = 40.5 (double precision)
  • cdef list languages
  • cdef dict abc_dict
  • cdef object thing

def, cdef, and cpdef

  • def - regular python function, calls from Python only.
  • cdef - cython only functions, can't access these from python-only code, must access within Cython, since there will be no C translation to Python for these.
  • cpdef - C and Python. Will create a C function and a wrapper for Python. Why not *always* use cpdef? In some cases, you might have C only pointer, like a C array. We'll be mostly using cpdef, however.

Now, we're going to start with the same code from before:

#example_original.py
def test(x):
    y = 0
    for i in range(x):
        y += i
    return y

Now let's save this file as example_cython.pyx, and begin to make some changes.

#example_cython.pyx
def test(int x):
    cdef int y = 0
    cdef int i
    for i in range(x):
        y += i
    return y

Above, we've given the input parameter, and the two values that we will be using some typing information. Let's build and test this now.

#setup.py
from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules = cythonize('example_cy.pyx'))

Now we can test the outcome with:

#testing_things.py
import timeit

cy = timeit.timeit('''example_cy.test(5)''',setup='import example_cy',number=100)
py = timeit.timeit('''example.test(5)''',setup='import example', number=100)

print(cy, py)
print('Cython is {}x faster'.format(py/cy))

Running this in terminal:

python3 testing_things.py
5.769999916083179e-06 4.714800024885335e-05
Cython is 8.171230664567945x faster

Not bad! There are a few other minor changes we could make, like:

cpdef int test(int x):
    cdef int y = 0
    cdef int i
    for i in range(x):
        y += i
    return y

But this isn't necessarily all that much better (run it a few times, or use a higher iteration count with timeit).

So where are our gains coming from? Well, all we're doing is adding typing information, so where would that increase our speeds the most? Anywhere where we're using variables the most frequently. In our case, that'd be the for loop. So, if we wanted to impress our friends of our Cython powers, and how much more speed we can get out of Cython, all we need to do is make x larger. So this time:

#testing_things.py
import timeit

cy = timeit.timeit('''example_cy.test(5000)''',setup='import example_cy',number=100)
py = timeit.timeit('''example.test(5000)''',setup='import example', number=100)

print(cy, py)
print('Cython is {}x faster'.format(py/cy))

Again in terminal:

$ python3 testing_things.py
0.0002787369999168732 0.04767731600031766
Cython is 171.04767581819533x faster

We're wizards!

One option we have to analyze areas where we could attempt to use cython is via cythonize's html output. For example, let's take our original Python script, convert to .pyx:

#furthertesting.pyx
def test(x):
    y = 0
    for i in range(x):
        y += i
        return y

Now, in your terminal, you can do:

$ cython -a furthertesting.pyx

This should give you a furthertesting.html file:

Cython: furthertesting.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

Raw output: furthertesting.c

+1: def test(x):
/* Python wrapper */
static PyObject *__pyx_pw_14furthertesting_1test(PyObject *__pyx_self, PyObject *__pyx_v_x); /*proto*/
static PyMethodDef __pyx_mdef_14furthertesting_1test = {"test", (PyCFunction)__pyx_pw_14furthertesting_1test, METH_O, 0};
static PyObject *__pyx_pw_14furthertesting_1test(PyObject *__pyx_self, PyObject *__pyx_v_x) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("test (wrapper)", 0);
  __pyx_r = __pyx_pf_14furthertesting_test(__pyx_self, ((PyObject *)__pyx_v_x));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_14furthertesting_test(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_x) {
  PyObject *__pyx_v_y = NULL;
  PyObject *__pyx_v_i = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("test", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("furthertesting.test", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_y);
  __Pyx_XDECREF(__pyx_v_i);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_x, __pyx_n_s_y, __pyx_n_s_i); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_14furthertesting_1test, NULL, __pyx_n_s_furthertesting); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+2: 	y = 0
  __Pyx_INCREF(__pyx_int_0);
  __pyx_v_y = __pyx_int_0;
+3: 	for i in range(x):
  __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_INCREF(__pyx_v_x);
  __Pyx_GIVEREF(__pyx_v_x);
  PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_x);
  __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_1, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (likely(PyList_CheckExact(__pyx_t_2)) || PyTuple_CheckExact(__pyx_t_2)) {
    __pyx_t_1 = __pyx_t_2; __Pyx_INCREF(__pyx_t_1); __pyx_t_3 = 0;
    __pyx_t_4 = NULL;
  } else {
    __pyx_t_3 = -1; __pyx_t_1 = PyObject_GetIter(__pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __pyx_t_4 = Py_TYPE(__pyx_t_1)->tp_iternext; if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 3, __pyx_L1_error)
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  for (;;) {
    if (likely(!__pyx_t_4)) {
      if (likely(PyList_CheckExact(__pyx_t_1))) {
        if (__pyx_t_3 >= PyList_GET_SIZE(__pyx_t_1)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_2 = PyList_GET_ITEM(__pyx_t_1, __pyx_t_3); __Pyx_INCREF(__pyx_t_2); __pyx_t_3++; if (unlikely(0 < 0)) __PYX_ERR(0, 3, __pyx_L1_error)
        #else
        __pyx_t_2 = PySequence_ITEM(__pyx_t_1, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 3, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_2);
        #endif
      } else {
        if (__pyx_t_3 >= PyTuple_GET_SIZE(__pyx_t_1)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_2 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_3); __Pyx_INCREF(__pyx_t_2); __pyx_t_3++; if (unlikely(0 < 0)) __PYX_ERR(0, 3, __pyx_L1_error)
        #else
        __pyx_t_2 = PySequence_ITEM(__pyx_t_1, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 3, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_2);
        #endif
      }
    } else {
      __pyx_t_2 = __pyx_t_4(__pyx_t_1);
      if (unlikely(!__pyx_t_2)) {
        PyObject* exc_type = PyErr_Occurred();
        if (exc_type) {
          if (likely(exc_type == PyExc_StopIteration || PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();
          else __PYX_ERR(0, 3, __pyx_L1_error)
        }
        break;
      }
      __Pyx_GOTREF(__pyx_t_2);
    }
    __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_2);
    __pyx_t_2 = 0;
/* … */
  }
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+4: 		y += i
    __pyx_t_2 = PyNumber_InPlaceAdd(__pyx_v_y, __pyx_v_i); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 4, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_DECREF_SET(__pyx_v_y, __pyx_t_2);
    __pyx_t_2 = 0;
+5: 	return y
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_y);
  __pyx_r = __pyx_v_y;
  goto __pyx_L0;

This will produce a C file along with a .html file. Open that .html file, and you can see lines highlighted in yellow in accordance to their approximate proximity to Python. This isn't perfect, and you wont always be able to improve things based on this output, but it can become helpful to you to locate areas that you could possibly improve. For example, we can generate this same HTML file for our actual cython file:

$ cython -a example_cy.pyx

example_cy.html:

Cython: example_cy.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

Raw output: example_cy.c

+1: cpdef int test(int x):
static PyObject *__pyx_pw_10example_cy_1test(PyObject *__pyx_self, PyObject *__pyx_arg_x); /*proto*/
static int __pyx_f_10example_cy_test(int __pyx_v_x, CYTHON_UNUSED int __pyx_skip_dispatch) {
  int __pyx_v_y;
  int __pyx_v_i;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("test", 0);
/* … */
  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* Python wrapper */
static PyObject *__pyx_pw_10example_cy_1test(PyObject *__pyx_self, PyObject *__pyx_arg_x); /*proto*/
static PyObject *__pyx_pw_10example_cy_1test(PyObject *__pyx_self, PyObject *__pyx_arg_x) {
  int __pyx_v_x;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("test (wrapper)", 0);
  assert(__pyx_arg_x); {
    __pyx_v_x = __Pyx_PyInt_As_int(__pyx_arg_x); if (unlikely((__pyx_v_x == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 1, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L3_error:;
  __Pyx_AddTraceback("example_cy.test", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_10example_cy_test(__pyx_self, ((int)__pyx_v_x));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_10example_cy_test(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_x) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("test", 0);
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_f_10example_cy_test(__pyx_v_x, 0)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("example_cy.test", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
+2: 	cdef int y = 0
  __pyx_v_y = 0;
 3: 	cdef int i
+4: 	for i in range(x):
  __pyx_t_1 = __pyx_v_x;
  for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
    __pyx_v_i = __pyx_t_2;
+5: 		y += i
    __pyx_v_y = (__pyx_v_y + __pyx_v_i);
  }
+6: 	return y
  __pyx_r = __pyx_v_y;
  goto __pyx_L0;

Now we can see that the only relation to Python is our cpdef, since we wanted to be able to use this function in Python.

Okay, that's all for now for Cython. I may bring in more advanced topics in the future, but, believe it or not, most of your gains will come purely from using static typing. You can also look into various commands like "with nogil." Cython can get quite a bit more complicated for you if you're up for it. If you're familiar with C/C++, I highly recommend you dive in more. Otherwise, consider places in your code where Python has to keep verifying the type of some variable. This can either be in loops, or in programs that scale out. For example, if you have a heavily trafficked website, or maybe you've got some sort of crawlbot, or maybe you're analyzing tick prices from stocks, any time you're scaling out the use of variables, you should consider adding typing information for some serious performance improvements.

As you've seen, doing this is super quick and painless. Hope it's helped!

The next tutorial:





  • Optimizing with Cython Introduction - Cython Tutorial