Neil’s Notes and Thoughts

Using Cython for Great Speedups

May 03, 2012

About Cython

Python is a great dynamic language; it provides layers of abstractions (dynamic typing and memory management to name two) over the underlying hardware and makes it easier and faster to write productive code compared to C, Java and other strongly typed languages. Unfortunately, the layers of abstractions generally means it is slower. Many times this does not matter, but sometimes you want a piece of Python code to be fast and would be willing to make some restrictions to the code, trading Python’s dynamism, for faster runtimes. This is essentially what Python C-Extensions provide and using Cython is a very easy way to build C-extensions from existing Python code.

Cython allows you to take your existing working Python code, add type declarations, thereby restricting what can be done with the code, and creates a C-extension for you. This C-extension can be imported like any other Python module. Depending on your code, you can get a 4-5x performance improvement immediately with just several annotations. Your code will also use less memory and will likely be faster the equivalent node.js code.

Using Cython

Here is a simple example of how to use Cython. Here I use Cython to annotate a function to find the nth prime number. Note, there are more efficient algorithms to determine if a number is prime, this algorithm is used mostly for simplicity and pedagogical reasons:

Here is the program’s entry point. I added a flag, --with-cython, to test the Cython optimized-code side-by-side with the pure-Python original:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# nthprime.py
import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('n')
    parser.add_argument('--with-cython', default=False, action='store_true')
    args = parser.parse_args()

    if args.with_cython:
        import libprimes_fast as libprimes
    else:
        import libprimes
    print libprimes.nth_prime(int(args.n))

if __name__ == '__main__':
    main()

Here is the "unoptimized" pure Python code to find the nth prime number:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# libprimes.py
import math

def nth_prime(n):

    def _is_prime(num):
        for i in xrange(3, int(math.sqrt(num))+1, 2):
            if num % i == 0:
                return False
        return True

    assert n > 0
    if n == 1:
        return 2
    if n == 2:
        return 3
    prime_count = 2
    num = 3
    while prime_count != n:
        num += 2
        if _is_prime(num):
            prime_count += 1
    return num

Here is the "optimized" Cython code with the type annotations (NOTE: the .pyx file extension which means this is no longer pure Python and must be pre-processed using the Cython code generator):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#libprimes_fast.pyx
import math

def nth_prime(unsigned long long n):
    cdef unsigned long long prime_count
    cdef unsigned long long num

    def _is_prime(unsigned long long num):
        cdef unsigned long long i
        for i in xrange(3, int(math.sqrt(num))+1, 2):
            if num % i == 0:
                return False
        return True

    assert n > 0
    if n == 1:
        return 2
    if n == 2:
        return 3
    prime_count = 2
    num = 3
    while prime_count != n:
        num += 2
        if _is_prime(num):
            prime_count += 1
    return num

A setup.py is needed facilitate building the Cython C-extension:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext


setup(
    cmdclass = {'build_ext': build_ext},
    ext_modules = [Extension("libprimes_fast", ["libprimes_fast.pyx"])]
)

To build the Cython C-extension module use:

$ python setup.py build_ext --inplace
running build_ext
cythoning libprimes_fast.pyx to libprimes_fast.c
building 'libprimes_fast' extension
llvm-gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c libprimes_fast.c -o build/temp.macosx-10.7-intel-2.7/libprimes_fast.o
llvm-gcc-4.2 -Wl,-F. -bundle -undefined dynamic_lookup -Wl,-F. -arch i386 -arch x86_64 build/temp.macosx-10.7-intel-2.7/libprimes_fast.o -o ./libprimes_fast.so

The above tells Cython to create a Python C-extension module for the libprimes_fast module containing the function nth_prime. The generated source of the C-extension is in the intermediate source file libprimes_fast.c. When compiled using the system C compiler, LLVM here, this will produce the C-extension module libprimes_fast.so. The cdef annotation restricts the type of a variable to native C types (all unsigned long long in this example.) In the body of nth_prime, the value of the annotated variables (ex. prime_count) are marshalled to and remain the native C type which is simpler and faster to manage on the underlying hardware. When the value is returned to the caller it will get marshalled back to a Python Object.

Here is a test run on my MacBook with a 2.2GHz i7 Processor using Python 2.7. First running the unoptimized Python code:

$ time python nthprime.py 100000
1299709

real    0m5.607s
user    0m5.597s
sys     0m0.010s

Then with the Cython created C-extension:

$ time python nthprime.py --with-cython 100000
1299709

real    0m0.715s
user    0m0.704s
sys     0m0.010s

That is a quick and nearly 90% reduction in the runtime with only 5 lines of annotations. There are some limitations to Cython, so read the Cython FAQ and Cython Docs if you are going to do anything involved. Finally, if building C-extensions is not your thing, give PyPy a try. It is an implementation of Python in Python with JIT Compilation and may give you similar if not better speed ups without annotating your code.

Tags: python, cython, performance, speed, optimization