DSPRelated.com
Forums

Speed up of 2D Lorentzian filter?

Started by Christian Gollwitzer February 4, 2017
Hi all,

I'm looking for a way to speed up a special image reconstruction 
algorithm. [*]

Actually, the filter is very simple linear filter. In essence, the image 
is Fourier transformed, multiplied by

	1/(1+A*(u^2+v^2))

(u, v are the conjugate coordinates of x,y) and transformed back. 
Realistic images can be as large as 4096x4096, so with padding an 8kx8k 
FFT is performed on several thousand images, which takes hours even on 
powerful machines.

I am relatively ignorant of classic filter design, but looking at this 
transfer functions, I have the feeling that it should be possible to do 
using some kind of recursive filter. For instance, the 1D-Fourier 
transform of a Lorientzian is an exponential exp(-|x|) - therefore, in 
the 1D case, I think this would be a simple exponential smoothing filter 
- isn't it? Is there any way to design an IIR filter that operates 
similarly in 2D, and is rotationally symmetric?

Best regards,

	Christian



[*] The algorithm is used to correct X-ray images taken at synchrotrons 
where the X-rays are deflected from the straight line due to refraction 
and the details can be found here:

https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase-userguide.html#tth_sEc1.3
On 04.02.2017 15:21, Christian Gollwitzer wrote:
> Hi all, > > I'm looking for a way to speed up a special image reconstruction > algorithm. [*] > > Actually, the filter is very simple linear filter. In essence, the image > is Fourier transformed, multiplied by > > 1/(1+A*(u^2+v^2)) > > (u, v are the conjugate coordinates of x,y) and transformed back. > Realistic images can be as large as 4096x4096, so with padding an 8kx8k > FFT is performed on several thousand images, which takes hours even on > powerful machines. > > I am relatively ignorant of classic filter design, but looking at this > transfer functions, I have the feeling that it should be possible to do > using some kind of recursive filter. For instance, the 1D-Fourier > transform of a Lorientzian is an exponential exp(-|x|) - therefore, in > the 1D case, I think this would be a simple exponential smoothing filter > - isn't it? Is there any way to design an IIR filter that operates > similarly in 2D, and is rotationally symmetric? > > Best regards, > > Christian > > > > [*] The algorithm is used to correct X-ray images taken at synchrotrons > where the X-rays are deflected from the straight line due to refraction > and the details can be found here: > > https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase-userguide.html#tth_sEc1.3 >
I don't know anything about 2D filtering. Would just comment anyway, because it's fun. With the 1D transfer function exp(-|x|), it won't be a single exponential smoothing filter. More like, a sum of two exponential smoothing filters running in opposite directions. That sort of highlights the problem you'd have with the 2D case. Suppose you know the 2D transfer function as f(r), with the parameter r = sqrt(x^2 + y^2). [I don't know the analytical answer for f(r), but even if there's no analytical form you could still find a suitable approximation or set f(r) up as a table.] It looks now that if you want to calculate the answer in spatial domain (as a 2D filter), it would be the sum of low-pass smoothing filters propagating in all possible directions. What if you don't do it for all directions, but just for a few of them, like 8? I'd expect that you'd get some approximation to the desired answer, where what would supposed to be a circle appears as an octagon. However, even if that works for you, I highly doubt that such an approach would outperform the 2D FFT, because the latter is highly optimized. In terms of complexity, it's hard to beat O(N*log(N)). My approach would be still O(N). However, you'd need the matrix of an incredible size to worry about log(N). ;-) Gene
On 05.02.2017 4:43, Evgeny Filatov wrote:
> On 04.02.2017 15:21, Christian Gollwitzer wrote: >> Hi all, >> >> I'm looking for a way to speed up a special image reconstruction >> algorithm. [*] >> >> Actually, the filter is very simple linear filter. In essence, the image >> is Fourier transformed, multiplied by >> >> 1/(1+A*(u^2+v^2)) >> >> (u, v are the conjugate coordinates of x,y) and transformed back. >> Realistic images can be as large as 4096x4096, so with padding an 8kx8k >> FFT is performed on several thousand images, which takes hours even on >> powerful machines. >> >> I am relatively ignorant of classic filter design, but looking at this >> transfer functions, I have the feeling that it should be possible to do >> using some kind of recursive filter. For instance, the 1D-Fourier >> transform of a Lorientzian is an exponential exp(-|x|) - therefore, in >> the 1D case, I think this would be a simple exponential smoothing filter >> - isn't it? Is there any way to design an IIR filter that operates >> similarly in 2D, and is rotationally symmetric? >> >> Best regards, >> >> Christian >> >> >> >> [*] The algorithm is used to correct X-ray images taken at synchrotrons >> where the X-rays are deflected from the straight line due to refraction >> and the details can be found here: >> >> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase-userguide.html#tth_sEc1.3 >> >> > > I don't know anything about 2D filtering. Would just comment anyway, > because it's fun. > > With the 1D transfer function exp(-|x|), it won't be a single > exponential smoothing filter. More like, a sum of two exponential > smoothing filters running in opposite directions. > > That sort of highlights the problem you'd have with the 2D case. Suppose > you know the 2D transfer function as f(r), with the parameter r = > sqrt(x^2 + y^2). [I don't know the analytical answer for f(r), but even > if there's no analytical form you could still find a suitable > approximation or set f(r) up as a table.] It looks now that if you want > to calculate the answer in spatial domain (as a 2D filter), it would be > the sum of low-pass smoothing filters propagating in all possible > directions. > > What if you don't do it for all directions, but just for a few of them, > like 8? I'd expect that you'd get some approximation to the desired > answer, where what would supposed to be a circle appears as an octagon. > > However, even if that works for you, I highly doubt that such an > approach would outperform the 2D FFT, because the latter is highly > optimized. In terms of complexity, it's hard to beat O(N*log(N)). My > approach would be still O(N). However, you'd need the matrix of an > incredible size to worry about log(N). ;-) > > Gene >
I do have a question, though. You said you are using zero padding. But nothing in the algorithm description says it's necessary. Actually you do not lose any information by not using zero padding. Whatever you get by zero padding could be obtained by interpolation after the FFT. So you could sort of, perform the ffts without zero padding, and then, if you need to, run a simple interpolation filter. Anyway, I just wanted to draw your attention to the fact that zero padding shouldn't be considered an absolute must, but rather a feature the use of which needs to be justified. Gene
On Sunday, February 5, 2017 at 3:15:52 AM UTC-8, Evgeny Filatov wrote:
> On 05.02.2017 4:43, Evgeny Filatov wrote: > ... > > Anyway, I just wanted to draw your attention to the fact that zero > padding shouldn't be considered an absolute must, but rather a feature > the use of which needs to be justified. > > Gene
Zero padding is required to achieve linear convolution instead of circular convolution which would alias the opposite edges together. This fact is recognized in the algorithm description in the OP's reference. Dale B. Dalrymple
On 05.02.2017 20:44, dbd wrote:
> On Sunday, February 5, 2017 at 3:15:52 AM UTC-8, Evgeny Filatov wrote: >> On 05.02.2017 4:43, Evgeny Filatov wrote: >> ... >> >> Anyway, I just wanted to draw your attention to the fact that zero >> padding shouldn't be considered an absolute must, but rather a feature >> the use of which needs to be justified. >> >> Gene > > Zero padding is required to achieve linear convolution instead of circular convolution which would alias the opposite edges together. This fact is recognized in the algorithm description in the OP's reference. > > Dale B. Dalrymple >
Oops! Indeed, you are correct, zero padding is addressed in OP's link and there's a good reason to use it. Still, there doesn't seem to be any particular reason to make the margins that large. Perhaps it was a matter of convenience to make all dimensions a power of two, but it also looks like lame design. There are FFT routines with different radixes, so it looks like "a degree of freedom" which is ignored. Gene
On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote:

> Hi all, > > I'm looking for a way to speed up a special image reconstruction > algorithm. [*] > > Actually, the filter is very simple linear filter. In essence, the image > is Fourier transformed, multiplied by > > 1/(1+A*(u^2+v^2)) > > (u, v are the conjugate coordinates of x,y) and transformed back. > Realistic images can be as large as 4096x4096, so with padding an 8kx8k > FFT is performed on several thousand images, which takes hours even on > powerful machines. > > I am relatively ignorant of classic filter design, but looking at this > transfer functions, I have the feeling that it should be possible to do > using some kind of recursive filter. For instance, the 1D-Fourier > transform of a Lorientzian is an exponential exp(-|x|) - therefore, in > the 1D case, I think this would be a simple exponential smoothing filter > - isn't it? Is there any way to design an IIR filter that operates > similarly in 2D, and is rotationally symmetric? > > Best regards, > > Christian > > > > [*] The algorithm is used to correct X-ray images taken at synchrotrons > where the X-rays are deflected from the straight line due to refraction > and the details can be found here: > > https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase-
userguide.html#tth_sEc1.3 I don't think you can get there from here unless you're willing to trade off circular symmetry for speed. You could implement a filter with transfer function 1/((1 + Au^2)(1 + Av^2)) by filtering each row individually, then each column -- but I don't think that's really what you're looking for. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Am 06.02.17 um 04:35 schrieb Tim Wescott:
> On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote: >> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase- > userguide.html#tth_sEc1.3 > > I don't think you can get there from here unless you're willing to trade > off circular symmetry for speed. > > You could implement a filter with transfer function > 1/((1 + Au^2)(1 + Av^2)) by filtering each row individually, then each > column -- but I don't think that's really what you're looking for.
Thanks! I can see it now, that the circular symmetry is impossible to achieve with simple 1D filtering. The Gaussian is a very special case, because it can be factored into two 1D-Gaussians (and there the IIR filter beats the FFT method easily). I can't see a way how to factor the Lorentzian into two 1D functions, probably it's not possible. The thing is that the rest of the tomographic reconstruction runs on the GPU - using OpenCL I could get enormous speed ups and outperform a machine with 36 CPU cores by a factor of 2. This phase contrast filter, however, will be tough. FFTs on the GPU are not much faster than on the CPU according to some references, due to the irregular memory access pattern. Therefore I was hoping to get an improvement on that end. I'll have to do the FFT on the GPU to see how long it really takes. Best regards, Christian
On 06.02.2017 10:47, Christian Gollwitzer wrote:
> The thing is that the rest of the tomographic reconstruction runs on the > GPU - using OpenCL I could get enormous speed ups and outperform a > machine with 36 CPU cores by a factor of 2. This phase contrast filter, > however, will be tough. FFTs on the GPU are not much faster than on the > CPU according to some references, due to the irregular memory access > pattern. Therefore I was hoping to get an improvement on that end. I'll > have to do the FFT on the GPU to see how long it really takes. > > Best regards, > > Christian
With the current generation of GPUs the one real bottleneck is getting the data in and out of the GPU memory. If the entire processing is done on the GPU, you should also get some tangible gains with FFTs: https://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench Regardless of the hardware, it should be possible to speed up your algorithm if you curtail the FFT size by reducing the margins. The exact size might depend on the filter parameters, but quadrupling the FFT size (as you suggested in the OP) might be unnecessary. Gene
On Mon, 06 Feb 2017 08:47:47 +0100, Christian Gollwitzer wrote:

> Am 06.02.17 um 04:35 schrieb Tim Wescott: >> On Sat, 04 Feb 2017 13:21:21 +0100, Christian Gollwitzer wrote: >>> https://imagej.nih.gov/ij/plugins/ankaphase/ankaphase- >> userguide.html#tth_sEc1.3 >> >> I don't think you can get there from here unless you're willing to >> trade off circular symmetry for speed. >> >> You could implement a filter with transfer function 1/((1 + Au^2)(1 + >> Av^2)) by filtering each row individually, then each column -- but I >> don't think that's really what you're looking for. > > Thanks! I can see it now, that the circular symmetry is impossible to > achieve with simple 1D filtering. The Gaussian is a very special case, > because it can be factored into two 1D-Gaussians (and there the IIR > filter beats the FFT method easily). I can't see a way how to factor the > Lorentzian into two 1D functions, probably it's not possible. > > The thing is that the rest of the tomographic reconstruction runs on the > GPU - using OpenCL I could get enormous speed ups and outperform a > machine with 36 CPU cores by a factor of 2. This phase contrast filter, > however, will be tough. FFTs on the GPU are not much faster than on the > CPU according to some references, due to the irregular memory access > pattern. Therefore I was hoping to get an improvement on that end. I'll > have to do the FFT on the GPU to see how long it really takes.
Something just popped out of my way-back machine and into my head: I have a DSP book that was handed to me in the 1980's, and it was old then. In the chapter on the DFT, they discuss a bunch of different FFT algorithms, including one that has the order of operations arranged so that you can store the input, the intermediate results, and the output all on one (or maybe two) tapes. Yes, tapes. I'm not so old to have fond memories of them, but I'm just old enough to have seen the glow in my slightly-older peers. It occurs to me that -- assuming you want to pursue things that hard, and if memory accesses in a line is indeed the issue -- you might be able to use the same concept in a GPU. "Digital Signal Processing", Alan V. Oppenheim and Ronald W. Shafer, Prentice-Hall, 1975, ISBN 0-13-214635-5: Figure 6.14 on page 301, titled "Rearrangement of fig 6.10 having the same geometry for each stage, thereby permitting sequential data accessing and storage." If you can't find any references shoot me an email and I'll see if I can copy the relevant pages for you. Hope this helps... -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
On Mon, 06 Feb 2017 14:02:09 +0300, Evgeny Filatov wrote:

> On 06.02.2017 10:47, Christian Gollwitzer wrote: >> The thing is that the rest of the tomographic reconstruction runs on >> the GPU - using OpenCL I could get enormous speed ups and outperform a >> machine with 36 CPU cores by a factor of 2. This phase contrast filter, >> however, will be tough. FFTs on the GPU are not much faster than on the >> CPU according to some references, due to the irregular memory access >> pattern. Therefore I was hoping to get an improvement on that end. I'll >> have to do the FFT on the GPU to see how long it really takes. >> >> Best regards, >> >> Christian > > With the current generation of GPUs the one real bottleneck is getting > the data in and out of the GPU memory. If the entire processing is done > on the GPU, you should also get some tangible gains with FFTs: > > https://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench > > Regardless of the hardware, it should be possible to speed up your > algorithm if you curtail the FFT size by reducing the margins. The exact > size might depend on the filter parameters, but quadrupling the FFT size > (as you suggested in the OP) might be unnecessary. > > Gene
Maybe do an overlap-and-add? It should be just as possible with 2D FFT as one, I think. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!