Sign in

username:

password:



Not a member?

Search imagedsp



Search tips

Subscribe to imagedsp



imagedsp by Keywords

Error Concealment | JPEG | MPEG-4 | Wavelet | YUV

Ads

Discussion Groups

Discussion Groups | Image Signal Processing | Re: Re[2]: Fast Convolution

Technical Discussions related to Image Processing (image coding, compression, digital effects, mpeg, etc)

  

Post a new Thread

Re: Re[2]: Fast Convolution - Glenn Pierce - Sep 25 11:14:45 2006



Ah after some more research I think they may use MMX 
Now I just have to learn how to use it as well.

Thanks alot for the help.

Glenn

----- Original Message ----
From: Alexander Osipov <0...@inbox.ru>
To: Glenn Pierce <g...@yahoo.co.uk>
Sent: Thursday, 21 September, 2006 8:25:48 PM
Subject: Re[2]: [imagedsp] Fast Convolution

Hello Glenn,

      Ok, I see..
      May be its simple SSE/MMX optimization (vectorization)?
      You can test this convoluiton code about speed/kernel
      size scalability for different kernel sizes to be sure.

Thursday, September 21, 2006, 12:28:08 PM, you wrote:

GP> I thought that was likly too but I did a test with a un separable kernel like below.
GP> The speed was similiar to a kernel of all 1's.

GP> static float array[7][7] = {{1.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0},
GP>                              {1.0, 4.0, 1.0, 5.0, -1.0, 6.0, 1.0},
GP>                              {7.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0},
GP>                              {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0},
GP>                              {1.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0},
GP>                              {8.0, 1.0, -1.0, 1.0, 9.0, 1.0, 1.0},
GP>                              {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}};

GP> void Imaq_Convolution (IPIImageRef in, IPIImageRef out)
GP> {
GP>     int i, nPoints, fftX, fftY;
GP>     IPIConvoDesc matrix;

GP>     IPI_Cast (in, IPI_PIXEL_SGL);
GP>     IPI_Cast (out, IPI_PIXEL_SGL);

GP>     matrix.matrixWidth = 7;
GP>     matrix.matrixHeight = 7; 
GP>     matrix.matrixElements = (float*) &array;
GP>     matrix.divider = 48.0;
    
GP>     IPI_Convolute (in, IPI_NOMASK, out, &matrix, IPI_BO_CLEAR);
GP> }

GP> ----- Original Message ----
GP> From: Alexander Osipov <0...@inbox.ru>
GP> To: i...@yahoogroups.com; g...@yahoo.co.uk
GP> Sent: Sunday, 17 September, 2006 6:33:11 PM
GP> Subject: Re: [imagedsp] Fast Convolution

GP>                            Hello glennpierce2001,
 
GP>  Most probable that their code written as separable kernel
GP>        filter (like gaussian blur are).
GP>        In this case convolution realized not as 2D filter, but as two 1D
GP>        filters.
GP>        So it takes less than Width*Heigth* KernelSize* KernelSize operations in common
case,
GP>        but takes 2*Width*Heigth* KernelSize, so you can optimize
GP>        Gaussian Blur approximately at KernelSize/2 times.
GP>        At first filtering stage you apply 1D gaussian kernel horizontally, and
GP>        at second filtering stage 1D gaussian kernel vertically.
GP>        Additionaly, Gaussian Blur (at sample) can be optimized further
GP>        using recursive filtering (separated too), but you should use floating point
GP>        calculations for calculating this convolution with small error,
GP>        so it often not more optimal.
       
GP>  Friday, September 15, 2006, 4:04:18 PM, you wrote:
 
GP>  gycu> Hi
 
GP>  gycu> I have previosly been using a image processing library called Imaq vision.
GP>  gycu> I am now wiriting my own functions.
 
GP>  gycu> In Imaq there is a convolution function that takes any sized kernel and
convolves it with an image.
 
GP>  gycu> For a image of size 1280*1040 and a kernel of 7*7 it does this in around 250
millisecs on my machine.
 
GP>  gycu> My quick test case
 
GP>  gycu> of looping through the kernel values for each pixel in the image is much
slower.
 
GP>  gycu> ie
 
GP>  gycu> int x, y, i, j, out
 
GP>  gycu> for(y=0; y < image_height; y++)
GP>  gycu>     for(x=0; x < image_width; x++)
GP>  gycu>         for(j=0; j < kernel_height; j++)
GP>  gycu>                 for(i=0; i < _widthkernel; i++)
GP>  gycu>                         out++;
 
GP>  gycu> printf("%d", out);
 
GP>  gycu> This code takes around 650 millsecs, and doesn't do anthing useful.
 
GP>  gycu> Does anyone know how the Imaq convolution code achieves it speed. 
GP>  gycu> I find it hard to believe that they wrote special cases for each possible
kernel size, to remove the inner loops ?
 
GP>  gycu> Also I know their implementation is not fourier based as the image requires a
border depending on the kernel size.
 
GP>  gycu> Thanks for any help.
 
GP>  -- 
GP>  Best regards,
GP>   Alexander                            mailto:0xef15h@inbox. ru

GP>                        

-- 
Best regards,
 Alexander                            mailto:0...@inbox.ru



(You need to be a member of imagedsp -- send a blank email to imagedsp-subscribe@yahoogroups.com )