Python number crunching faster? Part I

Christopher Felton●September 17, 2011●4 comments

Everyone has their favorite computing platform, regardless if it is Matlab, Octave, Scilab, Mathematica, Mathcad, etc. I have been using Python and the common numerical and scientific packages available. Personally, I have found this to be very useful in my work. Lately there has been some chatter on speeding up Python.

From another project I follow, MyHDL, I was introduced to the Python JIT compiler, PyPy. And part of the PyPy effort is to create a native version of numpy. The reason for doing this effort is two fold. First, to provide numpy support in pypy (frequently requested feature, i.e a missing module from CPython world). Second, to experiment with a faster version of numpy (numeric array processing).

I do not know the details of the implementation approach but this is an interesting effort. My interest comes from the MyHDL perspective. PyPy has significantly sped up the MyHDL simulations. But majority of my designs include numpy/scipy in the testbench and/or configuration. Without numpy/scipy support I am unable to take advantage of the significant speed increase.

This article is available in PDF format for easy printing

Travis Oliphant, the original developer of numpy, recently posted a basic benchmark, including pypy results. He included a generic Python version (no numpy, no foreign code) of a Laplace equation solver as a benchmark and compared PyPy and CPython (and others), in one of his tests. PyPy had a 42x speedup over raw Python and an 1.18x increase over the numpy implementation. In his comparisons other versions were faster but these required writing the algorithms in other, lower, languages, e.g. Fortran or C.

CPython Raw	CPython/Numpy	PyPy Raw	PyPy/Numpy
36.3	1	0.85	TBC
33.36	1	2.45	TBC

The above table shows relative speed up (slow down) to the numpy version. The first row is Travis's results the second row is my results, same code different machine.

The PyPy numpy support isn't complete enough to implement the Laplace benchmark using the native PyPy numpy. It is presumed that the native.numpy.pypy (NNP) will be even faster than the raw (non-vectorized) Python version. It would be nice to complete Travis's table with the NNP but as mentioned this will have to wait for some future endeavor.

Very basic array support is available in the NNP, so I ran a simple exercise to see if a NNP array processing is faster than a CPython.numpy, raw PyPy, and raw CPython. The following is the simple test code and the results.

Raw code (lists and for loops)

def test():
    N = 2**16
    n = range(N)
    x = [0 for ii in range(N)]
    y = [0 for ii in range(N)]
    t = [0 for ii in range(N)]
    for ii in range(N):
         x[ii] = sin(ii*pi/4.)
         y[ii] = sin(ii*pi/1333.)
    for ii in range(N):
        t[ii] = x[ii]*y[ii]
    tsum = 0
    for ii in range(N):
        tsum = t[ii] + tsum
    print tsum
    return t

Numpy code using ndarrays

def test():
    N = 2**16
    n = np.array(range(N))
    x = np.sin(n*pi/4)
    y = np.sin(n*pi/1333)
    t = x*y
    t = t.sum()
    print t
    return t
Normalized results for the above tests.

CPython Raw	CPython/Numpy	PyPy Raw	PyPy/Numpy
1	0.23	0.37	0.20

If anyone is familiar with the PyPy/Numpy development and has additional information on the progress or approach feel free to post in the comments.

Update 21-Oct-2011

There has been a good discussion at reddit demonstrating the raw Python 2 code could have been more efficient. Just as you would avoid loops with numpy code (often referred to as vectorizing) in the generic Python code this should be avoid as well. But the tools to complete this are different than the tools used with arrays and the functions that work on arrays. In generic Python code this usually means using map, zip, and other iterable methods to avoid explicit loops.

As the kind commentators on reddit showed; my basic Python is lacking. The reddit user, dalke, provided a complete set of examples on how to speed up the basic Python code (the code without numpy). The following is one of dalke's examples and an updated comparison table.

import math
from itertools import izip
def test_opt():
    N = 2**16
    pi = math.pi
    sin = math.sin
    t = [sin(ii * pi/4.)*sin(ii * pi/1333.) for ii in xrange(N)]
    tsum = sum(t)
    return t

Normalized results, where "Raw" was the original test code with the loops and "Raw-Opt" is the above test_opt() function.

CPython Raw	CPython Raw-Opt	PyPy Raw	PyPy Raw-Opt
1	0.52	0.36	0.15

Or in other words the optimized function was almost twice (1.92) as fast as the non-optimized (original loop code). And the pypy execution of the optimized function was over 3 times as fast as the CPython optimized function.

This was a Python coding mistake by me. Most DSP'ers are familiar with "vectorizing" code in Matlab or the numpy package. You avoid loops because the interpreter has to constantly evaluate the loop. When you vectorize code you avoid this. The basic Python code has methods to avoid this as shown by the code example provided by dalke. When the code is written in a more optimized approach it is definitely faster. Still not as fast as pypy or the numpy version.

But wait. It depends on what I mean in the last sentence in the previous paragraph. The optimized function on pypy is faster than the pypy-numpy (from the previous results). For this small/simple example pypy code with optimal Python coding is the faster result.

The examples would have never been written with loops using numpy arrays (or any other array language, Matlab, Octave, etc), which they were not but the basic Python code was. My excuse, I don't do much processing in base Python language. I didn't take enough time to think about the base Python code and its implementation. This is where online collaboration is gold. It isn't shy to point out mistakes. Good feedback was received via the reddit comments. @Rhomboid In the future I will do my best not to post "awful" code :)

From this post it sounds like the next version of pypy, 1.7 release, will have enough support to redo the original Laplace benchmark with the pypy numpy (the above TBC). Also note, the pypy-numpy has become a hot topic for good reason. Here are a couple links, blog1, blog2, blog3, and pypy-dev.

Comments

Comments
Write a Comment

Select to add a comment

[ - ]

Comment by nomebug●September 16, 2011

Not a PyPy dev, but I've been following their progress eagerly and am very interested in the optimizations that they are performing to the process of interpreting python code. If you'd like to speak to some of the developers implementing the Native version of NumPy for Pypy, come on over to #pypy on irc.freenode.net. They consider slowdowns in code (relative to CPython) to be bugs, so if you happen to run in to any, reporting it as a bug on http://bugs.pypy.org would be appreciated.

[ - ]

Comment by cfelton●September 17, 2011

@nomebug Thanks for the comments and information. I follow (rarely) the pypy dev from the gmane list, http://dir.gmane.org/gmane.comp.python.pypy. Do you know the difference between the gmane list and the irc.freenode.net? I think pypy is a great enabler for the Python language. I have been encouraged by the results for the numerical computing. Although the execution is much faster than Python it lags (I assume it always will) compared to the compiled languages, see the fortran speeds compared to the python and pypy in Travis's post. With pypy/numpy there is much power. It allows developers to define algorithms at a high-level and a more native feel, loops versus vectorizing an algorithm. Or it can be vectorized as well. There are many areas for comparison and investigation on the performance of the pypy.numpy combination.

[ - ]

Comment by nomebug●September 19, 2011

@cfelton, The gmane link you've posted seems to be a mirror of the original mailing list (pypy-dev hosted on python.org). Here is the link to the parent mailing list: http://mail.python.org/pipermail/pypy-dev/ As to the difference between IRC and the mailing list, IRC stands for "Internet Relay Chat", and is usually a much faster protocol to communicate in. The devs like to hang out on a channel called #pypy where they discuss most of the features that they implement in PyPy. If you are unfamiliar with IRC, there's a web client for freenode here: http://webchat.freenode.net/ To use it, you'd just need to enter a nickname (nospaces) and a channel (#pypy). If you have any questions for the NumPy devs, you may want to ask Alex_Gaynor or justinpeel, as they're some of the main developers on that branch. It's a pretty friendly channel so feel free to ask questions :) Later, nomebug

[ - ]

Comment by cfelton●September 19, 2011

@nomebug, Thanks for the additional information. Yes, there are an infinite number of ways to connect on the internet! Hopefully, we will see some great things come out of the pypy-numpy development.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers:

Choose a Username

E-Mail (Work, School or ieee)

First Name

Last Name

Employer

Job Title

Country

State

Password

Confirm Password

By checking this box, I agree with the terms of use and privacy policy By checking this box, I consent to receive occasional emails from the *Related sites and their partners. I understand that these emails will only contain relevant information and that I can unsubscribe at any time.

Python number crunching faster? Part I

Update 21-Oct-2011

Sign in

About Christopher Felton

Popular Posts by Christopher Felton

Blogs - Hall of Fame

Free PDF Downloads

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group