I thought that was likly too but I did a test with a un separable kernel like below.
The speed was similiar to a kernel of all 1's.

static float array[7][7] = {{1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 4.0, 1.0, 5.0, -1.0, 6.0, 1.0},
{7.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{8.0, 1.0, -1.0, 1.0, 9.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}};

void Imaq_Convolution (IPIImageRef in, IPIImageRef out)
{
int i, nPoints, fftX, fftY;
IPIConvoDesc matrix;

IPI_Cast (in, IPI_PIXEL_SGL);
IPI_Cast (out, IPI_PIXEL_SGL);

matrix.matrixWidth = 7;
matrix.matrixHeight = 7;
matrix.matrixElements = (float*) &array;
matrix.divider = 48.0;

IPI_Convolute (in, IPI_NOMASK, out, &matrix, IPI_BO_CLEAR);
}

----- Original Message ----
From: Alexander Osipov <0...@inbox.ru>
To: i...; g...@yahoo.co.uk
Sent: Sunday, 17 September, 2006 6:33:11 PM
Subject: Re: [imagedsp] Fast Convolution

Hello glennpierce2001,

Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth* KernelSize* KernelSize operations in common case,
but takes 2*Width*Heigth* KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.

Friday, September 15, 2006, 4:04:18 PM, you wrote:

gycu> Hi

gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.

gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.

gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.

gycu> My quick test case

gycu> of looping through the kernel values for each pixel in the image is much slower.

gycu> ie

gycu> int x, y, i, j, out

gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;

gycu> printf("%d", out);

gycu> This code takes around 650 millsecs, and doesn't do anthing useful.

gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?

gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.

gycu> Thanks for any help.

--
Best regards,
Alexander mailto:0xef15h@inbox. ru

Hello glennpierce2001,

Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth*KernelSize*KernelSize operations in common case,
but takes 2*Width*Heigth*KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally, and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating point
calculations for calculating this convolution with small error,
so it often not more optimal.

Friday, September 15, 2006, 4:04:18 PM, you wrote:

gycu> Hi

gycu> I have previosly been using a image processing library called Imaq vision.
gycu> I am now wiriting my own functions.

gycu> In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.

gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.

gycu> My quick test case

gycu> of looping through the kernel values for each pixel in the image is much slower.
gycu> ie

gycu> int x, y, i, j, out

gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;

gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing useful.

gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?

gycu> Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.

gycu> Thanks for any help.

--
Best regards,
Alexander mailto:0...@inbox.ru

On Fri, 15 Sep 2006 g...@yahoo.co.uk wrote:

> Does anyone know how the Imaq convolution code achieves it speed.
> I find it hard to believe that they wrote special cases for each
> possible kernel size, to remove the inner loops ?
>
> Also I know their implementation is not fourier based as the image
> requires a border depending on the kernel size.

Maybe this reference helps (however, I don't know if this is the strategy
that they follow, but it is faster than the trivial convolution
implementation:

Recursive Gaussian Derivative Filters (1998)
Lucas J. van Vliet, Ian T. Young, Piet W. Verbeek

http://citeseer.ist.psu.edu/vanvliet98recursive.html

The computational complexity is 2N multiplications per pixel per dimension
independent of the size (s) of the Gaussian kernel.

www.ph.tn.tudelft.nl/~lucas/publications/

Kind regards,

Filip Rooms

"We all know Linux is great... it does infinite loops in 5 seconds."
- Linus Torvalds about the superiority of Linux on the Amsterdam
Linux Symposium

Hi

I have previosly been using a image processing library called Imaq vision.
I am now wiriting my own functions.

In Imaq there is a convolution function that takes any sized kernel and convolves it with an image.

For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250 millisecs on my machine.

My quick test case

of looping through the kernel values for each pixel in the image is much slower.
ie

int x, y, i, j, out

for(y=0; y < image_height; y++)
for(x=0; x < image_width; x++)
for(j=0; j < kernel_height; j++)
for(i=0; i < _widthkernel; i++)
out++;

printf("%d", out);
This code takes around 650 millsecs, and doesn't do anthing useful.

Does anyone know how the Imaq convolution code achieves it speed.
I find it hard to believe that they wrote special cases for each possible kernel size, to remove the inner loops ?

Also I know their implementation is not fourier based as the image requires a border depending on the kernel size.

Thanks for any help.