Accelerating Matlab DSP Code on the GPU
Intrigued by GPUs, I've spent a few days testing out Jacket, an interface that lets you accelerate MATLAB (my favorite, if frustrating language) on NVIDIA GPUs. It's definitely got some caveats. But it was really easy to accelerate my code. And the results were impressive. So I thought I'd put up a few simple DSP-related benchmarks I created and ran on my laptop (a Macbook Air with NVIDIA GeForce 9400M graphics card). The m-files for the two functions I benchmarked (2D FFT and 2D interpolation) can be downloaded here.
If you're interested in lower-level GPU DSP programming, I suggest you check out Shehrzad Qureshi's excellent blog on the subject.
NOTE: The benchmarks I'm putting up (and all benchmarks really), should be taken with large grains of salt. I threw my code together pretty quickly, and results will vary greatly depending on your system setup. My intent here is just to convey my impressions of the tool after spending a few hours with it.
I was pleasantly surprised by how quickly I was able to accelerate code. Basically, you just cast data you want processed on the GPU to one of Jacket's GPU data types. Then make normal MATLAB function calls. For example, the code below performs a 2D FFT on an image stored in a matrix 'I1'.
I1 = gsingle(I1);
I1_fft = fft2(I1);
I cast here to 'gsingle' (as opposed to 'gdouble') because my GPU only supports single precision.
If you want to bring data back to the CPU (for instance, to use a function not supported by Jacket), you just cast the data back to a standard MATLAB datatype.
Getting optimal performance when combining functions requires a bit more work (see vectorizing your code, minimizing data transfers between the CPU and GPU, and a couple other general principles). But it doesn't seem too difficult from reading the documentation.
Figure 1 shows the speedup for the 2D FFT funcion (fft2.m). The x-axis specifies the size of the image passed to fft2 (128x128, 256x256, …..2048x2048).
Figure 1. GPU vs CPU Speedup
Interestingly, the speedup starts dropping off at 2048x2048 pixels, and at 4096x4096 pixels, I got an out-of-memory error. Not sure what's happening there. Probably a limitation of my GPU...
A more interesting function for me was interp2(), which performs 2D interpolation. As this is a very computationally demanding function, and one used in many image processing algorithms, acceleration could be useful. Figure 2 shows the speedup.
Figure 2. GPU vs CPU Speedup
Again, I got an out-of-memory error on 4096x4096. But the speedup is impressive. I should also note that while the 128x128 speedup is actually a slight speedown (0.85X), this is probably due to not “warming up” the GPU properly (see this wiki on benchmarking Jacket for more on this subject). Running this benchmark (and FFT2) back to back produces a much better speedup for 128x128.
As for the tool's usefulness for DSP in general, it's probably not suitable for end implementations of most DSP applications, due to the real-time and cost-constrained nature of most DSP apps (a cellphone running MATLAB?). However, it will be useful in accelerating algorithm development for most DSP apps, which are usually designed in MATLAB. And it will be very useful for applications that process large datasets offline. Analysis of 3D seismic data for oil exploration, medical imaging, and analyzing radar/sonar/satellite data, are three examples that come to mind.
I should also note that if free is more your budget, there's also GPUmat, a freeware MATLAB accelerator also based on NVIDIA's CUDA platform. I haven't tested it out yet. But looking at the documentation, it's similar, but with fewer functions supported, less documentation, application examples, etc. It does FFTs, basic math and matrix operations, and some general MATLAB functions, but is sparse on more complex functions (for instance, no interpolation functions such as interp2.m).
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: