I thought that was likly too but I did a test with a un separable kernel like
below.
The speed was similiar to a kernel of all 1's.
static float array[7][7] = {{1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 4.0, 1.0, 5.0, -1.0, 6.0, 1.0},
{7.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{1.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0},
{8.0, 1.0, -1.0, 1.0, 9.0, 1.0, 1.0},
{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}};
void Imaq_Convolution (IPIImageRef in, IPIImageRef out)
{
int i, nPoints, fftX, fftY;
IPIConvoDesc matrix;
IPI_Cast (in, IPI_PIXEL_SGL);
IPI_Cast (out, IPI_PIXEL_SGL);
matrix.matrixWidth = 7;
matrix.matrixHeight = 7;
matrix.matrixElements = (float*) &array;
matrix.divider = 48.0;
IPI_Convolute (in, IPI_NOMASK, out, &matrix, IPI_BO_CLEAR);
}
----- Original Message ----
From: Alexander Osipov <0...@inbox.ru>
To: i...; g...@yahoo.co.uk
Sent: Sunday, 17 September, 2006 6:33:11 PM
Subject: Re: [imagedsp] Fast Convolution
Hello glennpierce2001,
Most probable that their code written as separable kernel
filter (like gaussian blur are).
In this case convolution realized not as 2D filter, but as two 1D
filters.
So it takes less than Width*Heigth* KernelSize* KernelSize operations in
common case,
but takes 2*Width*Heigth* KernelSize, so you can optimize
Gaussian Blur approximately at KernelSize/2 times.
At first filtering stage you apply 1D gaussian kernel horizontally,
and
at second filtering stage 1D gaussian kernel vertically.
Additionaly, Gaussian Blur (at sample) can be optimized further
using recursive filtering (separated too), but you should use floating
point
calculations for calculating this convolution with small error,
so it often not more optimal.
Friday, September 15, 2006, 4:04:18 PM, you wrote:
gycu> Hi
gycu> I have previosly been using a image processing library called Imaq
vision.
gycu> I am now wiriting my own functions.
gycu> In Imaq there is a convolution function that takes any sized kernel and
convolves it with an image.
gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around
250 millisecs on my machine.
gycu> My quick test case
gycu> of looping through the kernel values for each pixel in the image is much
slower.
gycu> ie
gycu> int x, y, i, j, out
gycu> for(y=0; y < image_height; y++)
gycu> for(x=0; x < image_width; x++)
gycu> for(j=0; j < kernel_height; j++)
gycu> for(i=0; i < _widthkernel; i++)
gycu> out++;
gycu> printf("%d", out);
gycu> This code takes around 650 millsecs, and doesn't do anthing
useful.
gycu> Does anyone know how the Imaq convolution code achieves it speed.
gycu> I find it hard to believe that they wrote special cases for each possible
kernel size, to remove the inner loops ?
gycu> Also I know their implementation is not fourier based as the image
requires a border depending on the kernel size.
gycu> Thanks for any help.
--
Best regards,
Alexander mailto:0xef15h@inbox. ru