comp.dsp | Image correlation via FFT. Why doesn't this work?

Hello,

I have a program that performs FFT convolution to search
for occurrences of a small image inside another image.

I feel like I understand the concept, but my program doesn't
do what it should.  The small search image is simply cut out from the
larger image, so I should see a bright spike in the resultant FFT
convolution.  But I am seeing a very ambiguous result.

I created a web page with pictures and my fairly basic source code, if
anyone has the desire to take a look.. I'd really appreciate it.

The web page is located at: http://viz.nu/fft/

Thanks very much in advance,

Kristopher


_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

Reply by Oli Charlesworth ●April 20, 20072007-04-20

typewriter said the following on 20/04/2007 23:36:
> Hello,
> 
> I have a program that performs FFT convolution to search
> for occurrences of a small image inside another image.
> 
> I feel like I understand the concept, but my program doesn't
> do what it should.  The small search image is simply cut out from the
> larger image, so I should see a bright spike in the resultant FFT
> convolution.  But I am seeing a very ambiguous result.
> 
> I created a web page with pictures and my fairly basic source code, if
> anyone has the desire to take a look.. I'd really appreciate it.
> 
> The web page is located at: http://viz.nu/fft/


I tried your images in MATLAB, I get a very similar end result to you, 
but with a very prominent peak near the bottom right.

One thing to try is to first remove the DC offset from both your input 
images (i.e. by subtracting the mean in the spatial domain).


-- 
Oli

Reply by typewriter ●April 23, 20072007-04-23

Oli, Thanks for looking at this.  I will try the DC offset removal
you mentioned.  I wonder if you can answer a couple questions I have?

1. If the original image is less than 512x512, and I am zero padding
to 512x512, then what does any correlation information mean that
falls outside the original image boundaries?

2. Is there a good way to scale the generated correlation "image"
for viewing (ie, [0-1] grayscale)? I have tried linear scaling, 
and messing around with log scaling. Maybe the scaling I am 
using to make the correlation more visible is hurting my results.

Thanks for any more information, I really do appreciate it.


Kristopher


>typewriter said the following on 20/04/2007 23:36:
>> Hello,
>> 
>> I have a program that performs FFT convolution to search
>> for occurrences of a small image inside another image.
>> 
>> I feel like I understand the concept, but my program doesn't
>> do what it should.  The small search image is simply cut out from the
>> larger image, so I should see a bright spike in the resultant FFT
>> convolution.  But I am seeing a very ambiguous result.
>> 
>> I created a web page with pictures and my fairly basic source code, if
>> anyone has the desire to take a look.. I'd really appreciate it.
>> 
>> The web page is located at: http://viz.nu/fft/
>
>
>I tried your images in MATLAB, I get a very similar end result to you, 
>but with a very prominent peak near the bottom right.
>
>One thing to try is to first remove the DC offset from both your input 
>images (i.e. by subtracting the mean in the spatial domain).
>
>
>-- 
>Oli
>



_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

Reply by dbell ●April 24, 20072007-04-24

On Apr 23, 3:34 pm, "typewriter" <kristopher.coll...@gmail.com> wrote:
> Oli, Thanks for looking at this.  I will try the DC offset removal
> you mentioned.  I wonder if you can answer a couple questions I have?
>
> 1. If the original image is less than 512x512, and I am zero padding
> to 512x512, then what does any correlation information mean that
> falls outside the original image boundaries?
>
> 2. Is there a good way to scale the generated correlation "image"
> for viewing (ie, [0-1] grayscale)? I have tried linear scaling,
> and messing around with log scaling. Maybe the scaling I am
> using to make the correlation more visible is hurting my results.
>
> Thanks for any more information, I really do appreciate it.
>
> Kristopher
>
>
>
>
>
> >typewriter said the following on 20/04/2007 23:36:
> >> Hello,
>
> >> I have a program that performs FFT convolution to search
> >> for occurrences of a small image inside another image.
>
> >> I feel like I understand the concept, but my program doesn't
> >> do what it should.  The small search image is simply cut out from the
> >> larger image, so I should see a bright spike in the resultant FFT
> >> convolution.  But I am seeing a very ambiguous result.
>
> >> I created a web page with pictures and my fairly basic source code, if
> >> anyone has the desire to take a look.. I'd really appreciate it.
>
> >> The web page is located at:http://viz.nu/fft/
>
> >I tried your images in MATLAB, I get a very similar end result to you,
> >but with a very prominent peak near the bottom right.
>
> >One thing to try is to first remove the DC offset from both your input
> >images (i.e. by subtracting the mean in the spatial domain).
>
> >--
> >Oli
>
> _____________________________________
> Do you know a company who employs DSP engineers?  
> Is it already listed athttp://dsprelated.com/employers.php?- Hide quoted text -
>
> - Show quoted text -

Do you necessarily expect the maximum result when you get alignment?
May not happen even for an ideally extracted smaller image.  You might
try normalizing the energy in the parts of the images used to compute
the result for each 2D offset (obviously constant for the smaller one,
but offset-dependent for the larger image).

Also, the possibility for a "bright spike" is highly data dependent.
Best to hope for a maximum, rather than a spike.

Dirk

Reply by typewriter ●April 24, 20072007-04-24

>Also, the possibility for a "bright spike" is highly data dependent.
>Best to hope for a maximum, rather than a spike.

I was thinking the same thing, so I began finding the maximum.  However my
results show that the maximum is not always in the "correct" place.  For
*some* images (including the example Lichtenstein image on my page
viz.nu/fft), the program indicates a maximum at the *correct* point, and
for others it does not.  

I tried using the football/soccer images on this page : 
http://www.gamedev.net/reference/programming/features/imageproc/page3.asp
Where the author is performing the exact same method I am.  However, not
only do I not match his white peak, but my maximum is in the wrong place
(it falls in the man's shirt).  I whipped up another web page to show
what's happening : http://www.viz.nu/fft/page2.html .

I've gone over my functions many times, and do not see anything
suspicious.  

Thanks very much for your help guys.  I wouldn't be posting
if I hadn't done so much on my own to try and get this.  So
your help is greatly appreciated!

Kristopher

>Do you necessarily expect the maximum result when you get alignment?
>May not happen even for an ideally extracted smaller image.  You might
>the result for each 2D offset (obviously constant for the smaller one,
>but offset-dependent for the larger image).
>
>
>Dirk
>
>

>On Apr 23, 3:34 pm, "typewriter" <kristopher.coll...@gmail.com> wrote:
>> Oli, Thanks for looking at this.  I will try the DC offset removal
>> you mentioned.  I wonder if you can answer a couple questions I have?
>>
>> 1. If the original image is less than 512x512, and I am zero padding
>> to 512x512, then what does any correlation information mean that
>> falls outside the original image boundaries?
>>
>> 2. Is there a good way to scale the generated correlation "image"
>> for viewing (ie, [0-1] grayscale)? I have tried linear scaling,
>> and messing around with log scaling. Maybe the scaling I am
>> using to make the correlation more visible is hurting my results.
>>
>> Thanks for any more information, I really do appreciate it.
>>
>> Kristopher
>>
>>
>>
>>
>>
>> >typewriter said the following on 20/04/2007 23:36:
>> >> Hello,
>>
>> >> I have a program that performs FFT convolution to search
>> >> for occurrences of a small image inside another image.
>>
>> >> I feel like I understand the concept, but my program doesn't
>> >> do what it should.  The small search image is simply cut out from
the
>> >> larger image, so I should see a bright spike in the resultant FFT
>> >> convolution.  But I am seeing a very ambiguous result.
>>
>> >> I created a web page with pictures and my fairly basic source code,
if
>> >> anyone has the desire to take a look.. I'd really appreciate it.
>>
>> >> The web page is located at:http://viz.nu/fft/
>>
>> >I tried your images in MATLAB, I get a very similar end result to
you,
>> >but with a very prominent peak near the bottom right.
>>
>> >One thing to try is to first remove the DC offset from both your
input
>> >images (i.e. by subtracting the mean in the spatial domain).
>>
>> >--
>> >Oli
>>
>> _____________________________________
>> Do you know a company who employs DSP engineers?  
>> Is it already listed athttp://dsprelated.com/employers.php?- Hide
quoted text -
>>
>> - Show quoted text -
>

_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

Reply by typewriter ●April 24, 20072007-04-24

I have uploaded an interactive Java applet of my program.  Sometimes the
peak value corresponds to the correct search location, other times it does
not.  So strange!!!

http://www.viz.nu/fft/


>>Also, the possibility for a "bright spike" is highly data dependent.
>>Best to hope for a maximum, rather than a spike.
>
>I was thinking the same thing, so I began finding the maximum.  However
my
>results show that the maximum is not always in the "correct" place.  For
>*some* images (including the example Lichtenstein image on my page
>viz.nu/fft), the program indicates a maximum at the *correct* point, and
>for others it does not.  
>
>I tried using the football/soccer images on this page : 
>http://www.gamedev.net/reference/programming/features/imageproc/page3.asp
>Where the author is performing the exact same method I am.  However, not
>only do I not match his white peak, but my maximum is in the wrong place
>(it falls in the man's shirt).  I whipped up another web page to show
>what's happening : http://www.viz.nu/fft/page2.html .
>
>I've gone over my functions many times, and do not see anything
>suspicious.  
>
>Thanks very much for your help guys.  I wouldn't be posting
>if I hadn't done so much on my own to try and get this.  So
>your help is greatly appreciated!
>
>Kristopher
>
>
>

_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

Reply by Rune Allnor ●April 25, 20072007-04-25

On 21 Apr, 00:36, "typewriter" <kristopher.coll...@gmail.com> wrote:
> Hello,
>
> I have a program that performs FFT convolution to search
> for occurrences of a small image inside another image.
>
> I feel like I understand the concept, but my program doesn't
> do what it should.  The small search image is simply cut out from the
> larger image, so I should see a bright spike in the resultant FFT
> convolution.  But I am seeing a very ambiguous result.

First: Correlation methods only looks for "similar" results, not "the
same." So if the texture of your image is similar across, you might
find partial matches all over, meaning that the actual match does
not stand as clearly out as you might have hoped for.

As for correlaion: I am not sure is is the correct method to use.
Correlation works for zero-mean signals, as found in radar, sonar
and communication applications. Images are not zero mean,
so I am wondering if covariance would be better.

I would try this:

- Select your target sub-frame and remove its mean
- Select a sub-frame in your base image and remove is mean
- Compute the covariance coefficient between the two
  sub-frames as a double sum

   c(u,v) = sum_u sum_v x(n-u,m-v)y(n-u,m-v)

The reason for doing this the hard way is that the mean has
to be removed from each sub-frame independently.

To see why removing the mean is important, consider the
signals

x1 = [1 0]'
y1 = [0 1]'

which is a cosine and sine sampled at 0 ad pi/2. The two are
orthogonal, as expected:

c1 = x1'y2 = 1*0 + 0*1 = 0.

If we add a non-zero mean to either one, the story changes:

x2 = x1 + 10

c2 = x2'y1 = (1+10)*0 + (0+10)*1 = 10.

So if the mean varies significantly between subframes in the image,
correlation does not work. If you want to use covariance on a
frame-by-frame basis, you have to remove the mean of each
particular sub-frame, and FFT-based methods can no longer be
used.

Rune

Reply by typewriter ●April 25, 20072007-04-25

>>I have a program that performs FFT convolution to search
>> for occurrences of a small image inside another image.
>> Online Applet : http://www.viz.nu/fft/

Rune said:
>First: Correlation methods only looks for "similar" results, not "the
>same." So if the texture of your image is similar across, you might
>find partial matches all over, meaning that the actual match does
>not stand as clearly out as you might have hoped for.

Right, but if the small image is cut out directly from the larger one,
the convolution should generate a maximum at the match point(however
unclear it may be to the naked eye).  Also, I *want* to see "similar"
areas, but have the "best" match appear as the maximum, as I understand
this method to work. (see: http://www.dspguide.com/ch24/6.htm )

>As for correlaion: I am not sure is is the correct method to use.
>Correlation works for zero-mean signals, as found in radar, sonar

Perhaps I'm not performing "textbook" correlation.  Correct, images are
not zero mean, but it should still work.  This method has been used
succesfully by others, ( see :
http://www.gamedev.net/reference/programming/features/imageproc/page3.asp
).  I could subtract the mean, but doing so for each subframe would really
slow it down.  

I am using the same test images as the URL above, with which the author
sucessfully demonstrated this concept.  Although those images are among
the ones that my program fails on!  But it *does* work with others..  

_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

Reply by dbell ●April 26, 20072007-04-26

On Apr 24, 3:41 pm, "typewriter" <kristopher.coll...@gmail.com> wrote:
> >Also, the possibility for a "bright spike" is highly data dependent.
> >Best to hope for a maximum, rather than a spike.
>
> I was thinking the same thing, so I began finding the maximum.  However my
> results show that the maximum is not always in the "correct" place.  For
> *some* images (including the example Lichtenstein image on my page
> viz.nu/fft), the program indicates a maximum at the *correct* point, and
> for others it does not.  
>
> I tried using the football/soccer images on this page :http://www.gamedev.net/reference/programming/features/imageproc/page3...
> Where the author is performing the exact same method I am.  However, not
> only do I not match his white peak, but my maximum is in the wrong place
> (it falls in the man's shirt).  I whipped up another web page to show
> what's happening :http://www.viz.nu/fft/page2.html.
>
> I've gone over my functions many times, and do not see anything
> suspicious.  
>
> Thanks very much for your help guys.  I wouldn't be posting
> if I hadn't done so much on my own to try and get this.  So
> your help is greatly appreciated!
>
> Kristopher
>
>
>
>
>
>
>
> >Do you necessarily expect the maximum result when you get alignment?
> >May not happen even for an ideally extracted smaller image.  You might
> >the result for each 2D offset (obviously constant for the smaller one,
> >but offset-dependent for the larger image).
>
> >Dirk
>
> >On Apr 23, 3:34 pm, "typewriter" <kristopher.coll...@gmail.com> wrote:
> >> Oli, Thanks for looking at this.  I will try the DC offset removal
> >> you mentioned.  I wonder if you can answer a couple questions I have?
>
> >> 1. If the original image is less than 512x512, and I am zero padding
> >> to 512x512, then what does any correlation information mean that
> >> falls outside the original image boundaries?
>
> >> 2. Is there a good way to scale the generated correlation "image"
> >> for viewing (ie, [0-1] grayscale)? I have tried linear scaling,
> >> and messing around with log scaling. Maybe the scaling I am
> >> using to make the correlation more visible is hurting my results.
>
> >> Thanks for any more information, I really do appreciate it.
>
> >> Kristopher
>
> >> >typewriter said the following on 20/04/2007 23:36:
> >> >> Hello,
>
> >> >> I have a program that performs FFT convolution to search
> >> >> for occurrences of a small image inside another image.
>
> >> >> I feel like I understand the concept, but my program doesn't
> >> >> do what it should.  The small search image is simply cut out from
> the
> >> >> larger image, so I should see a bright spike in the resultant FFT
> >> >> convolution.  But I am seeing a very ambiguous result.
>
> >> >> I created a web page with pictures and my fairly basic source code,
> if
> >> >> anyone has the desire to take a look.. I'd really appreciate it.
>
> >> >> The web page is located at:http://viz.nu/fft/
>
> >> >I tried your images in MATLAB, I get a very similar end result to
> you,
> >> >but with a very prominent peak near the bottom right.
>
> >> >One thing to try is to first remove the DC offset from both your
> input
> >> >images (i.e. by subtracting the mean in the spatial domain).
>
> >> >--
> >> >Oli
>
> >> _____________________________________
> >> Do you know a company who employs DSP engineers?  
> >> Is it already listed athttp://dsprelated.com/employers.php?-Hide
> quoted text -
>
> >> - Show quoted text -
>
> _____________________________________
> Do you know a company who employs DSP engineers?  
> Is it already listed athttp://dsprelated.com/employers.php?- Hide quoted text -
>
> - Show quoted text -

Try doing the energy normalization (actually the root of it IIRC) on
the two images that I mentioned earlier.  Then all values should be in
range [-1.0,1.0]. Compute something along the lines of a correlation
coefficient (name might be something similar) for each possible
alignment.

There is an easy way (FFT based filtering) to compute the energy for
each point by starting with the larger image with all values squared
and a smaller image (same size as the actual smaller image) composed
entirely of a constant value (you can figure out the scaling) so that
the right squared values are summed for each offset.

In order to test your complete algorithm I would extract a smaller
image from a section of the larger image so the ideal result matching
result is possible and predictable.  If you don't get the ideal
result, you have errors in your code.

Dirk

Reply by Rune Allnor ●April 26, 20072007-04-26

On 25 Apr, 23:58, "typewriter" <kristopher.coll...@gmail.com> wrote:
>  I could subtract the mean, but doing so for each subframe would really
> slow it down.  

Ah. So you would rather do things fast but incorrectly than do things
correctly?

I know, the FFT is a cute tool, but it is *not* a silver bullet! There
*are*
DSP appliations where the FFT can't be used.

> I am using the same test images as the URL above, with which the author
> sucessfully demonstrated this concept.  Although those images are among
> the ones that my program fails on!  But it *does* work with others..  

If you try the same methods on the same data as were published
but achieve different results, there are three possibilities:

1) You don't apply the same algorithm as was published.
2) What is published is not what was done.
3) The data have changed between test and publication, as might
    be a side effect of images being compressed for web publication.

Of these options, item 1) is the one you can work with. Make sure
you understand the algorithm and make sure you implement it
correctly. Only when you are 100% sure of that, and still can't
reproduce the published results, can you start thinking about
the possibility of items 2) or 3) being relevant.

Rune

Previous12 Next

Image correlation via FFT. Why doesn't this work?

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group