DSPRelated.com
Forums

Speed up of 2D Lorentzian filter?

Started by Christian Gollwitzer February 4, 2017
On Mon, 06 Feb 2017 14:32:46 -0600, Tim Wescott wrote:

> On Mon, 06 Feb 2017 08:47:47 +0100, Christian Gollwitzer wrote: > >> Am 06.02.17 um 04:35 schrieb Tim Wescott: >>> On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote: >>>> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase- >>> userguide.html#tth_sEc1.3 >>> >>> I don't think you can get there from here unless you're willing to >>> trade off circular symmetry for speed. >>> >>> You could implement a filter with transfer function 1/((1 + Au^2)(1 + >>> Av^2)) by filtering each row individually, then each column -- but I >>> don't think that's really what you're looking for. >> >> Thanks! I can see it now, that the circular symmetry is impossible to >> achieve with simple 1D filtering. The Gaussian is a very special case, >> because it can be factored into two 1D-Gaussians (and there the IIR >> filter beats the FFT method easily). I can't see a way how to factor >> the Lorentzian into two 1D functions, probably it's not possible. >> >> The thing is that the rest of the tomographic reconstruction runs on >> the GPU - using OpenCL I could get enormous speed ups and outperform a >> machine with 36 CPU cores by a factor of 2. This phase contrast filter, >> however, will be tough. FFTs on the GPU are not much faster than on the >> CPU according to some references, due to the irregular memory access >> pattern. Therefore I was hoping to get an improvement on that end. I'll >> have to do the FFT on the GPU to see how long it really takes. > > Something just popped out of my way-back machine and into my head: > > I have a DSP book that was handed to me in the 1980's, and it was old > then. In the chapter on the DFT, they discuss a bunch of different FFT > algorithms, including one that has the order of operations arranged so > that you can store the input, the intermediate results, and the output > all on one (or maybe two) tapes. > > Yes, tapes. I'm not so old to have fond memories of them, but I'm just > old enough to have seen the glow in my slightly-older peers. > > It occurs to me that -- assuming you want to pursue things that hard, > and if memory accesses in a line is indeed the issue -- you might be > able to use the same concept in a GPU. > > "Digital Signal Processing", Alan V. Oppenheim and Ronald W. Shafer, > Prentice-Hall, 1975, ISBN 0-13-214635-5: > > Figure 6.14 on page 301, titled "Rearrangement of fig 6.10 having the > same geometry for each stage, thereby permitting sequential data > accessing and storage." > > If you can't find any references shoot me an email and I'll see if I can > copy the relevant pages for you. > > Hope this helps...
BTW, I think the last time I read that page was when I was an undergrad, in the mid 1980's. I found it by flipping through the chapter and glancing at the figures until I saw the one I remembered. I can't remember where I leave my car keys -- but some bit of mathemagical trivia that I've never even used? SURE! -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
On 06.02.2017 23:32, Tim Wescott wrote:
> On Mon, 06 Feb 2017 08:47:47 +0100, Christian Gollwitzer wrote: > >> Am 06.02.17 um 04:35 schrieb Tim Wescott: >>> On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote: >>>> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase- >>> userguide.html#tth_sEc1.3 >>> >>> I don't think you can get there from here unless you're willing to >>> trade off circular symmetry for speed. >>> >>> You could implement a filter with transfer function 1/((1 + Au^2)(1 + >>> Av^2)) by filtering each row individually, then each column -- but I >>> don't think that's really what you're looking for. >> >> Thanks! I can see it now, that the circular symmetry is impossible to >> achieve with simple 1D filtering. The Gaussian is a very special case, >> because it can be factored into two 1D-Gaussians (and there the IIR >> filter beats the FFT method easily). I can't see a way how to factor the >> Lorentzian into two 1D functions, probably it's not possible. >> >> The thing is that the rest of the tomographic reconstruction runs on the >> GPU - using OpenCL I could get enormous speed ups and outperform a >> machine with 36 CPU cores by a factor of 2. This phase contrast filter, >> however, will be tough. FFTs on the GPU are not much faster than on the >> CPU according to some references, due to the irregular memory access >> pattern. Therefore I was hoping to get an improvement on that end. I'll >> have to do the FFT on the GPU to see how long it really takes. > > Something just popped out of my way-back machine and into my head: > > I have a DSP book that was handed to me in the 1980's, and it was old > then. In the chapter on the DFT, they discuss a bunch of different FFT > algorithms, including one that has the order of operations arranged so > that you can store the input, the intermediate results, and the output > all on one (or maybe two) tapes. > > Yes, tapes. I'm not so old to have fond memories of them, but I'm just > old enough to have seen the glow in my slightly-older peers. > > It occurs to me that -- assuming you want to pursue things that hard, and > if memory accesses in a line is indeed the issue -- you might be able to > use the same concept in a GPU. > > "Digital Signal Processing", Alan V. Oppenheim and Ronald W. Shafer, > Prentice-Hall, 1975, ISBN 0-13-214635-5: > > Figure 6.14 on page 301, titled "Rearrangement of fig 6.10 having the > same geometry for each stage, thereby permitting sequential data > accessing and storage." > > If you can't find any references shoot me an email and I'll see if I can > copy the relevant pages for you. > > Hope this helps... >
IDK, but the OP would likely just use one of the standard FFT packages. The fun part of GPU computing is the ease of getting started. If you have a not-very-old GPU and a Matlab, you can play with it using literally a couple lines of code. There's a single function to put an array of data into a GPU device, and a single function to retrieve it. Matlab overloads a lot of common functions to perform calculations on the device. You don't have to use Matlab, of course; it's just the easiest way to get started: http://blogs.mathworks.com/loren/2012/02/06/using-gpus-in-matlab/ Gene
On Tue, 07 Feb 2017 01:03:30 +0300, Evgeny Filatov wrote:

> On 06.02.2017 23:32, Tim Wescott wrote: >> On Mon, 06 Feb 2017 08:47:47 +0100, Christian Gollwitzer wrote: >> >>> Am 06.02.17 um 04:35 schrieb Tim Wescott: >>>> On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote: >>>>> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase- >>>> userguide.html#tth_sEc1.3 >>>> >>>> I don't think you can get there from here unless you're willing to >>>> trade off circular symmetry for speed. >>>> >>>> You could implement a filter with transfer function 1/((1 + Au^2)(1 + >>>> Av^2)) by filtering each row individually, then each column -- but I >>>> don't think that's really what you're looking for. >>> >>> Thanks! I can see it now, that the circular symmetry is impossible to >>> achieve with simple 1D filtering. The Gaussian is a very special case, >>> because it can be factored into two 1D-Gaussians (and there the IIR >>> filter beats the FFT method easily). I can't see a way how to factor >>> the Lorentzian into two 1D functions, probably it's not possible. >>> >>> The thing is that the rest of the tomographic reconstruction runs on >>> the GPU - using OpenCL I could get enormous speed ups and outperform a >>> machine with 36 CPU cores by a factor of 2. This phase contrast >>> filter, however, will be tough. FFTs on the GPU are not much faster >>> than on the CPU according to some references, due to the irregular >>> memory access pattern. Therefore I was hoping to get an improvement on >>> that end. I'll have to do the FFT on the GPU to see how long it really >>> takes. >> >> Something just popped out of my way-back machine and into my head: >> >> I have a DSP book that was handed to me in the 1980's, and it was old >> then. In the chapter on the DFT, they discuss a bunch of different FFT >> algorithms, including one that has the order of operations arranged so >> that you can store the input, the intermediate results, and the output >> all on one (or maybe two) tapes. >> >> Yes, tapes. I'm not so old to have fond memories of them, but I'm just >> old enough to have seen the glow in my slightly-older peers. >> >> It occurs to me that -- assuming you want to pursue things that hard, >> and if memory accesses in a line is indeed the issue -- you might be >> able to use the same concept in a GPU. >> >> "Digital Signal Processing", Alan V. Oppenheim and Ronald W. Shafer, >> Prentice-Hall, 1975, ISBN 0-13-214635-5: >> >> Figure 6.14 on page 301, titled "Rearrangement of fig 6.10 having the >> same geometry for each stage, thereby permitting sequential data >> accessing and storage." >> >> If you can't find any references shoot me an email and I'll see if I >> can copy the relevant pages for you. >> >> Hope this helps... >> >> > IDK, but the OP would likely just use one of the standard FFT packages. > > The fun part of GPU computing is the ease of getting started. If you > have a not-very-old GPU and a Matlab, you can play with it using > literally a couple lines of code. There's a single function to put an > array of data into a GPU device, and a single function to retrieve it. > Matlab overloads a lot of common functions to perform calculations on > the device. You don't have to use Matlab, of course; it's just the > easiest way to get started: > > http://blogs.mathworks.com/loren/2012/02/06/using-gpus-in-matlab/
I would kind of expect that SOMEONE would have a FFT algorithm that's optimized for the GPU. Assuming that it's even possible, if you just compiled code for a "regular" FFT then you might not get the best results. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Am 07.02.17 um 00:22 schrieb Tim Wescott:
> On Tue, 07 Feb 2017 01:03:30 +0300, Evgeny Filatov wrote: > >> On 06.02.2017 23:32, Tim Wescott wrote: >>> I have a DSP book that was handed to me in the 1980's, and it was old >>> then. In the chapter on the DFT, they discuss a bunch of different FFT >>> algorithms, including one that has the order of operations arranged so >>> that you can store the input, the intermediate results, and the output >>> all on one (or maybe two) tapes.
That sounds interesting, however, as guessed by Evgeny, I am not very keen to implement my own FFT algorithm.
>> The fun part of GPU computing is the ease of getting started. If you >> have a not-very-old GPU and a Matlab, you can play with it using >> literally a couple lines of code. There's a single function to put an >> array of data into a GPU device, and a single function to retrieve it. >> Matlab overloads a lot of common functions to perform calculations on >> the device. You don't have to use Matlab, of course; it's just the >> easiest way to get started:
I am using OpenCL to program the GPU. I haven't tried the Matlab toolbox so far, but I guess, for the kind of algorithms I'm after, using straight OpenCL is not just the fastest, but also relatively easy to implement.
> I would kind of expect that SOMEONE would have a FFT algorithm that's > optimized for the GPU.
Yes, of course. nVidia has cuFFT in their CUDA-Toolbox, and for OpenCL there are two serious imlementations, Apple FFT and clFFT (a project from AMD). clFFT can do 2D FFTs, and they also do a bag of tricks that a mere mortal like me would not come up with. For instance, after they transform the rows, they perform a transpose operation with caching in the local shared memory, to reduce the stride to 1, and then transpose back. This is much faster then performing a FFT with a stride of 8192 to transform the columns. Preliminary testing show approximately the same speed on the Intel Iris GPU as the CPU in my MacBook, hopefully I can gain something when I manage to compile this on my Windows machine at work equipped with a big nVidia GPU. Thanks for you ideas, Christian
On Wed, 8 Feb 2017 14:59:27 +0100, Christian Gollwitzer
<auriocus@gmx.de> wrote:

>Am 07.02.17 um 00:22 schrieb Tim Wescott: >> On Tue, 07 Feb 2017 01:03:30 +0300, Evgeny Filatov wrote: >> >>> On 06.02.2017 23:32, Tim Wescott wrote: >>>> I have a DSP book that was handed to me in the 1980's, and it was old >>>> then. In the chapter on the DFT, they discuss a bunch of different FFT >>>> algorithms, including one that has the order of operations arranged so >>>> that you can store the input, the intermediate results, and the output >>>> all on one (or maybe two) tapes. > >That sounds interesting, however, as guessed by Evgeny, I am not very >keen to implement my own FFT algorithm.
Most of the crazy variants of the DFT/FFT have already been implemented in some available library, somewhere, like FFTW or similar libraries. I suspect searching around on terms related to FIFO FFT may turn up something. The FIFO-oriented FFT algorithms have been around for many decades, so I'd be surprised if you had to code your own. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus