DSPRelated.com
Forums

convolution with fft for image in image search/similarity

Started by Andreas September 30, 2003
Hello!

I have been trying to implement an image in image search algorithm using
fft, as described in the dspguide, chapter 24 (http://www.dspguide.com/),
using FFT Convolution.

As far as I have understood it, the search image is used as some kind of a
filter (like sharpen, etc.) on the original image (in which I search) in
order to give the resulting image a distribution of similarity of the image
one is searching for at that position. (Bright values = the image you are
searching for fits this part of the image you search in very well.)
I hope this is kind of understandable..

Anyway, this is the algorithm as I have understood it, and I have been
trying to implement/copy it, but in vain:
1. embed the 2 images (large: image to search in, small: image to search in
the large one) into 2 large images, the edge-size of which is to be a power
of 2 and larger than the sum of the 2 original image-edge-sizes
The smaller image is being embedded after having been rotated by 180
degrees.
(is the restriction on the size of the edges necessary??)
2. fill the rest of the new images with 0s (black)
3. do a 2d fft on both images
4. multiply them, small * large (ordering is important, isnt it?)
5.do a 2d ifft on the product
6. this should be the desired image/matrix (high values imply a high
similarity between the small image and the large image at that position) (I
have complex numbers here.. they are not an image anymore. Do I just take
the real part?)

Unfortunately, all I get out of this is rubbish..  and I dont rather know,
why. I have tried a few different possibilities, like offsetting the image
(256 grayscale) values, so that they are between -128 and 127, and a few
other things, but nothing seems to work really well.

If you have any ideas about what I am doing wrong, what I should be doing,
or where I could find additional information about the matter, please tell
me!!

Many thx in advance,
Andreas


Andreas,

The method you're using seems OK, although I'm not sure about the
reversing one image by 180 degrees.

One thing you might need to do is to pre-whiten the images. The link:

http://www.itee.uq.edu.au/~kootsoop/prewhiteningexample.htm

shows an example of trying to find the singer's nose using
correlation.  It doesn't work very well unless you do some sort of
pre-whitening (like doing a row-wise difference, as a simple example).

Ciao,

Peter K.

-- 
Peter J. Kootsookos

"I will ignore all ideas for new works [..], the invention of which has
 reached its limits and for whose improvement I see no further hope."

- Julius Frontinus, c. AD 84
In article <blcgkv$2in$1@nets3.rz.RWTH-Aachen.DE>,
 "Andreas" <reiff(remove_me)@amo.de> wrote:

|  Anyway, this is the algorithm as I have understood it, and I have been
|  trying to implement/copy it, but in vain:
|  1. embed the 2 images (large: image to search in, small: image to search in
|  the large one) into 2 large images, the edge-size of which is to be a power
|  of 2 and larger than the sum of the 2 original image-edge-sizes

You can cut this down to be the power of 2 greater or equal to the 
larger image size, unless you don't want "wrap-around"  correlations 
(matches which begin at the right edge and wrap around to the left, or 
wrap bottom to top).

|  The smaller image is being embedded after having been rotated by 180
|  degrees.

Apparently the rotation by 180 degrees is to reverse the search image 
from top-to-bottom and left-to-right.  This allows the subsequent 
multiplication in the frequency domain to perform correlation instead of 
convolution, but usually it is much easier to just perform the 
multiplication by the complex conjugate of the frequency domain 
components of the search image (no reversal / rotation required).

|  4. multiply them, small * large (ordering is important, isnt it?)

No, the order is not important.  Multiplication is commutative.


|  5.do a 2d ifft on the product
|  6. this should be the desired image/matrix (high values imply a high
|  similarity between the small image and the large image at that position) (I
|  have complex numbers here.. they are not an image anymore. Do I just take
|  the real part?)

At this point you should have only real components (the imaginary part 
should be zero, or very close to zero).

The steps you state sound correct, so there appears to be something 
wrong in your implementation.  Verify that:

1) the result of both forward FFTs are conjugate symmetric (real 
components of first half are mirrored in the second half, and imaginary 
components are mirrored and negated).

2) The result of the multiplication of the two FFTs is also conjugate 
symmetric.


   -- Tim Olson
Many thanks for both your answers!

It was actually my own stupidity (or the strange ideas of the programmers of
Matlab). I had never used the program before and assumed when writing M1 *
M2 it would do like I have been told in endless mathematics lectures. But of
course, it didnt, and instead I had to use the .* operator.

By the way, multiplication of matrices is associative but not communitative!
This doesnt effect this application too much, because it doesnt matter which
is the input and which the convolution kernel, but in general, results
should be different:

1   0 *  2 3 =  2   3
0   0     0 0      0  0

2 3 * 1  0 =   2 0
0 0    0  0      0 0

(This actually is the normal multiplication.. in a program like Matlab I
would have to use .* as I know now.)

Just for correctness ;-)

Anyway, thx a lot for all your help, I have been despairing over this for
many, many days, and without you telling me I was on the right way I
probably would have given up!!