Hi
I have previosly been using a image processing library called Imaq vision.
I am now wiriting my own functions.
In Imaq there is a convolution function that takes any sized kernel and
convolves it with an image.
For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250
millisecs on my machine.
My quick test case
of looping through the kernel values for each pixel in the image is much
slower.
ie
int x, y, i, j, out
for(y=0; y < image_height; y++)
for(x=0; x < image_width; x++)
for(j=0; j < kernel_height; j++)
for(i=0; i < _widthkernel; i++)
out++;
printf("%d", out);
This code takes around 650 millsecs, and doesn't do anthing useful.
Does anyone know how the Imaq convolution code achieves it speed.
I find it hard to believe that they wrote special cases for each possible kernel
size, to remove the inner loops ?
Also I know their implementation is not fourier based as the image requires a
border depending on the kernel size.
Thanks for any help.
Fast Convolution
Started by ●September 17, 2006
Reply by ●September 18, 20062006-09-18
On Fri, 15 Sep 2006 g...@yahoo.co.uk wrote:
> Does anyone know how the Imaq convolution code achieves it speed.
> I find it hard to believe that they wrote special cases for each
> possible kernel size, to remove the inner loops ?
>
> Also I know their implementation is not fourier based as the image
> requires a border depending on the kernel size.
Maybe this reference helps (however, I don't know if this is the strategy
that they follow, but it is faster than the trivial convolution
implementation:
Recursive Gaussian Derivative Filters (1998)
Lucas J. van Vliet, Ian T. Young, Piet W. Verbeek
http://citeseer.ist.psu.edu/vanvliet98recursive.html
The computational complexity is 2N multiplications per pixel per dimension
independent of the size (s) of the Gaussian kernel.
www.ph.tn.tudelft.nl/~lucas/publications/
Kind regards,
Filip Rooms
"We all know Linux is great... it does infinite loops in 5 seconds."
- Linus Torvalds about the superiority of Linux on the Amsterdam
Linux Symposium
> Does anyone know how the Imaq convolution code achieves it speed.
> I find it hard to believe that they wrote special cases for each
> possible kernel size, to remove the inner loops ?
>
> Also I know their implementation is not fourier based as the image
> requires a border depending on the kernel size.
Maybe this reference helps (however, I don't know if this is the strategy
that they follow, but it is faster than the trivial convolution
implementation:
Recursive Gaussian Derivative Filters (1998)
Lucas J. van Vliet, Ian T. Young, Piet W. Verbeek
http://citeseer.ist.psu.edu/vanvliet98recursive.html
The computational complexity is 2N multiplications per pixel per dimension
independent of the size (s) of the Gaussian kernel.
www.ph.tn.tudelft.nl/~lucas/publications/
Kind regards,
Filip Rooms
"We all know Linux is great... it does infinite loops in 5 seconds."
- Linus Torvalds about the superiority of Linux on the Amsterdam
Linux Symposium
Reply by ●September 18, 20062006-09-18
Hello glennpierce2001,
Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth*KernelSize*KernelSize operations in common case,
but takes 2*Width*Heigth*KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.
Friday, September 15, 2006, 4:04:18 PM, you wrote:
gycu> Hi
gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.
gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.
gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.
gycu> My quick test case
gycu> of looping through the kernel values for each pixel in the image is much slower.
gycu> ie
gycu> int x, y, i, j, out
gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;
gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing useful.
gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?
gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.
gycu> Thanks for any help.
--
Best regards,
Alexander mailto:0...@inbox.ru
Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth*KernelSize*KernelSize operations in common case,
but takes 2*Width*Heigth*KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.
Friday, September 15, 2006, 4:04:18 PM, you wrote:
gycu> Hi
gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.
gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.
gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.
gycu> My quick test case
gycu> of looping through the kernel values for each pixel in the image is much slower.
gycu> ie
gycu> int x, y, i, j, out
gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;
gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing useful.
gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?
gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.
gycu> Thanks for any help.
--
Best regards,
Alexander mailto:0...@inbox.ru
Reply by ●September 21, 20062006-09-21
I thought that was likly too but I did a test with a un separable kernel like
below.
The speed was similiar to a kernel of all 1's.
static float array[7][7] = {{1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 4.0, 1.0, 5.0, -1.0, 6.0, 1.0},
{7.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{8.0, 1.0, -1.0, 1.0, 9.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}};
void Imaq_Convolution (IPIImageRef in, IPIImageRef out)
{
int i, nPoints, fftX, fftY;
IPIConvoDesc matrix;
IPI_Cast (in, IPI_PIXEL_SGL);
IPI_Cast (out, IPI_PIXEL_SGL);
matrix.matrixWidth = 7;
matrix.matrixHeight = 7;
matrix.matrixElements = (float*) &array;
matrix.divider = 48.0;
IPI_Convolute (in, IPI_NOMASK, out, &matrix, IPI_BO_CLEAR);
}
----- Original Message ----
From: Alexander Osipov <0...@inbox.ru>
To: i...; g...@yahoo.co.uk
Sent: Sunday, 17 September, 2006 6:33:11 PM
Subject: Re: [imagedsp] Fast Convolution
Hello glennpierce2001,
Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth* KernelSize* KernelSize operations in common case,
but takes 2*Width*Heigth* KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.
Friday, September 15, 2006, 4:04:18 PM, you wrote:
gycu> Hi
gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.
gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.
gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.
gycu> My quick test case
gycu> of looping through the kernel values for each pixel in the image is much slower.
gycu> ie
gycu> int x, y, i, j, out
gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;
gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing useful.
gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?
gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.
gycu> Thanks for any help.
--
Best regards,
Alexander mailto:0xef15h@inbox. ru
The speed was similiar to a kernel of all 1's.
static float array[7][7] = {{1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 4.0, 1.0, 5.0, -1.0, 6.0, 1.0},
{7.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{8.0, 1.0, -1.0, 1.0, 9.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}};
void Imaq_Convolution (IPIImageRef in, IPIImageRef out)
{
int i, nPoints, fftX, fftY;
IPIConvoDesc matrix;
IPI_Cast (in, IPI_PIXEL_SGL);
IPI_Cast (out, IPI_PIXEL_SGL);
matrix.matrixWidth = 7;
matrix.matrixHeight = 7;
matrix.matrixElements = (float*) &array;
matrix.divider = 48.0;
IPI_Convolute (in, IPI_NOMASK, out, &matrix, IPI_BO_CLEAR);
}
----- Original Message ----
From: Alexander Osipov <0...@inbox.ru>
To: i...; g...@yahoo.co.uk
Sent: Sunday, 17 September, 2006 6:33:11 PM
Subject: Re: [imagedsp] Fast Convolution
Hello glennpierce2001,
Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth* KernelSize* KernelSize operations in common case,
but takes 2*Width*Heigth* KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.
Friday, September 15, 2006, 4:04:18 PM, you wrote:
gycu> Hi
gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.
gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.
gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.
gycu> My quick test case
gycu> of looping through the kernel values for each pixel in the image is much slower.
gycu> ie
gycu> int x, y, i, j, out
gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;
gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing useful.
gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?
gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.
gycu> Thanks for any help.
--
Best regards,
Alexander mailto:0xef15h@inbox. ru