# Python number crunching faster? Part I

Everyone has their favorite computing platform, regardless if it is Matlab, Octave, Scilab, Mathematica, Mathcad, etc. I have been using Python and the common numerical and scientific packages available. Personally, I have found this to be very useful in my work. Lately there has been some chatter on speeding up Python.

From another project I follow, MyHDL, I was introduced to the Python JIT compiler, PyPy. And part of the PyPy effort is to create a native version of numpy. The reason for doing this effort is two fold. First, to provide numpy support in pypy (frequently requested feature, i.e a missing module from CPython world). Second, to experiment with a faster version of numpy (numeric array processing).

I do not know the details of the implementation approach but this is an interesting effort. My interest comes from the MyHDL perspective. PyPy has significantly sped up the MyHDL simulations. But majority of my designs include numpy/scipy in the testbench and/or configuration. Without numpy/scipy support I am unable to take advantage of the significant speed increase.

Travis Oliphant, the original developer of numpy, recently posted a basic benchmark, including pypy results. He included a generic Python version (no numpy, no foreign code) of a Laplace equation solver as a benchmark and compared PyPy and CPython (and others), in one of his tests. PyPy had a 42x speedup over raw Python and an 1.18x increase over the numpy implementation. In his comparisons other versions were faster but these required writing the algorithms in other, lower, languages, e.g. Fortran or C.

CPython Raw | CPython/Numpy | PyPy Raw | PyPy/Numpy |

36.3 | 1 | 0.85 | TBC |

33.36 | 1 | 2.45 | TBC |

The above table shows relative speed up (slow down) to the numpy version. The first row is Travis's results the second row is my results, same code different machine.

The PyPy numpy support isn't complete enough to implement the Laplace benchmark using the native PyPy numpy. It is presumed that the native.numpy.pypy (NNP) will be even faster than the raw (non-vectorized) Python version. It would be nice to complete Travis's table with the NNP but as mentioned this will have to wait for some future endeavor.

Very basic array support is available in the NNP, so I ran a simple exercise to see if a NNP array processing is faster than a CPython.numpy, raw PyPy, and raw CPython. The following is the simple test code and the results.

Raw code (lists and for loops)

def test(): N = 2**16 n = range(N) x = [0 for ii in range(N)] y = [0 for ii in range(N)] t = [0 for ii in range(N)] for ii in range(N): x[ii] = sin(ii*pi/4.) y[ii] = sin(ii*pi/1333.) for ii in range(N): t[ii] = x[ii]*y[ii] tsum = 0 for ii in range(N): tsum = t[ii] + tsum print tsum return t

Numpy code using ndarrays

def test(): N = 2**16 n = np.array(range(N)) x = np.sin(n*pi/4) y = np.sin(n*pi/1333) t = x*y t = t.sum() print t return t Normalized results for the above tests.

CPython Raw | CPython/Numpy | PyPy Raw | PyPy/Numpy |

1 | 0.23 | 0.37 | 0.20 |

If anyone is familiar with the PyPy/Numpy development and has additional information on the progress or approach feel free to post in the comments.

## Update 21-Oct-2011

There has been a good discussion at reddit demonstrating the raw Python 2 code could have been more efficient. Just as you would avoid loops with numpy code (often referred to as vectorizing) in the generic Python code this should be avoid as well. But the tools to complete this are different than the tools used with arrays and the functions that work on arrays. In generic Python code this usually means using *map*, *zip*, and other iterable methods to avoid explicit loops.

As the kind commentators on reddit showed; my basic Python is lacking. The reddit user, dalke, provided a complete set of examples on how to speed up the basic Python code (the code without numpy). The following is one of dalke's examples and an updated comparison table.

import math from itertools import izip def test_opt(): N = 2**16 pi = math.pi sin = math.sin t = [sin(ii * pi/4.)*sin(ii * pi/1333.) for ii in xrange(N)] tsum = sum(t) return t

Normalized results, where "Raw" was the original test code with the loops and "Raw-Opt" is the above test_opt() function.

CPython Raw | CPython Raw-Opt | PyPy Raw | PyPy Raw-Opt |

1 | 0.52 | 0.36 | 0.15 |

Or in other words the optimized function was almost twice (1.92) as fast as the non-optimized (original loop code). And the pypy execution of the optimized function was over 3 times as fast as the CPython optimized function.

This was a Python coding mistake by me. Most DSP'ers are familiar with "vectorizing" code in Matlab or the numpy package. You avoid loops because the interpreter has to constantly evaluate the loop. When you vectorize code you avoid this. The basic Python code has methods to avoid this as shown by the code example provided by dalke. When the code is written in a more optimized approach it is definitely faster. Still not as fast as pypy or the numpy version.

But wait. It depends on what I mean in the last sentence in the previous paragraph. The optimized function on pypy is faster than the pypy-numpy (from the previous results). For this small/simple example pypy code with optimal Python coding is the faster result.

The examples would have never been written with loops using numpy arrays (or any other array language, Matlab, Octave, etc), which they were not but the basic Python code was. My excuse, I don't do much processing in base Python language. I didn't take enough time to think about the base Python code and its implementation. This is where online collaboration is gold. It isn't shy to point out mistakes. Good feedback was received via the reddit comments. @Rhomboid In the future I will do my best not to post "awful" code :)

From this post it sounds like the next version of pypy, 1.7 release, will have enough support to redo the original Laplace benchmark with the pypy numpy (the above TBC). Also note, the pypy-numpy has become a hot topic for good reason. Here are a couple links, blog1, blog2, blog3, and pypy-dev.

**Previous post by Christopher Felton:**

Impulse Response Approximation

**Next post by Christopher Felton:**

[Book Review] Numpy 1.5 Beginner's Guide

## Comments:

- Comments
- Write a Comment Select to add a comment

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.