cross correlation of two images

Started by kaz 6 years ago7 replieslatest reply 6 years ago503 views

Hi All,

I am designing cross correlation of two images to detect the offset between them. According to my model it looks like correlation works best if I correlate the two images directly without any pre-processing.

However, fellow engineers have favoured doing some expensive feature extraction before doing correlation such as contrast enhancement and edge detection.

It is true that feature extraction makes the two images have better visual quality to human eye but I can see it makes correlation worse, apparently because such processing creates more difference between the two images.

I wonder if people on this forum have thoughts on the issue of pre-processing images before correlation.


[ - ]
Reply by DaniloDaraMay 21, 2017

IMHO, if the system is not real time, or there is processing power enough, rough cross correlation with no prior pre processing is more powerful.
(provided that no geometrical realignment is necessary).

If system is real time and cross correlation is too expensive then some feature extraction with ad-hoc correlation is probably a good choice.

[ - ]
Reply by kazMay 21, 2017

It is real time and we have full cross correlation logic available. 

I am in fact comparing two scenarios, both use correlation:

1) direct cross correlation

2) feature extraction then cross correlation.

[ - ]
Reply by bmoersMay 22, 2017


Like most image processing problems, it depends on the image content!  If you are trying to align pages of text, then there is sufficient high frequency content to produce good correlation peaks with precise locations.  On the other hand if images are rolling plains in farm fields, low frequency content dominates and correlation peaks are broad.  Some feature extraction is useful on such images such as edge enhancement.

[ - ]
Reply by kazMay 22, 2017

Thanks bmoers & DaniloDara,

Indeed I tested random 2D data set to various ratios of greyscale values (0~255) & binary (0,1) and I can see the peak quality could get bad.

Interestingly I am using fft instead of time domain xcorr and for some reason the fft based correlation for greyscale seems quite resilient to accuracy and value of peak relative to time domain sliding but both are ok with binary images.

[ - ]
Reply by kazMay 23, 2017

A follow up.

I compared the classic time domain correlation(sliding) with fft based correlation using same input image. Sliding correlation result gets better if feature extraction is applied on image. But fft correlation can perform well with or without feature extraction.

I also noticed that the feature extraction may introduce some negative effect on similarity (correlation gets worse) as well as improvement with final result depending on net effect as function of image tones.

Of course, fft correlation should ideally be equivalent to sliding and it can be. This makes my above conclusion nonsense. In practice and due to resolution limitations the fft I used  produces a "somehow distorted" output relative to sliding but is accurate enough and also surprisingly doesn't care about feature extraction.

Thus I am getting advantage of a distortion!! I have never come to a situation like this.  

[ - ]
Reply by bmoersMay 23, 2017


As you say, time domain correlation can be equivalent to an fft based correlation, but it appears your implementations are not equavalent. You have not shared any details of the implementations (and till now they were irrelevant details). Often time domain implementations are not circular but a minimal fft correlation is circular.  In an fft correlation the image needs to be padded to twice its size (both dimensions) to avoid a circular correlation.  This is only one of many possible explanations for what you see as 'distortion'.

[ - ]
Reply by kazMay 23, 2017

Thanks bmoers,

image size is 160x288

fft resolution is 256 x 512, padded in mirrored pattern across two images.

ideally fft should be (160+160-1) x (288+288-1) using padded zeros and as such gives identical results to sliding case but my implementation platform is fpga and I have to use power of 2 fft. It gives me same performance for offset detection as sliding.