robert bristow-johnson wrote:
> > in article 1106476416.358718.66140@z14g2000cwz.googlegroups.com,
Rune Allnor
> > at allnor@tele.ntnu.no wrote on 01/23/2005 05:33:
> >
> >> I don't know how the implementation you found works, but there is
> >> basically only one way to implement the autocorrelation.
>
> in article KrQId.49525$re1.33287@fe2.columbus.rr.com, Mark Borgerding
at
> mark@borgerding.net wrote on 01/23/2005 11:40:
>
> > Not true.  But your comments below indicate you already suspect
there is
> > another way.
>
> i can't even remember all of the variations of computing
autocorrelation.
> there is the "fast autocorrelation" by use of FFT,
magnitude-squaring, and
> iFFT.  there are tricks that save you from having to compute the
> autocorrelation for every lag value, if you don't need it for every
lag.
> there are ways to compute a normalized autocorrelation from the AMDF
(or
> more precisely, the ASDF).  the normalized autocorrelation and half
the ASDF
> add to a constant or are supposed to, for a large window.  what else?
if a
> rectangular window is used, there are ways to save partial sums from
> previous autocorrelations to save computation of the present one.  i
dunno.
> lotsa tricks.

Thanks for your comments, guys.

Mark: I suspected the FFT could be used for these kinds of things,
but I haven't actually seen a way to compute the time-domain auto
correlation sequence by the FFT. Your comments indicate that I was
right in that there are implementational aspects one needs to keep
track of, and that the FFT is a relevant alternative at least for
large amounts of data. The OP did not mention what numbers of samples
are relevant for his application, but he did mention "speech codec",
which makes me suspect he will use on the order of 80 - 100 samples
per time sequence (10 ms frames sampled at 8 kHz). I'm not sure
the faster FFT wil make up for the added overhead and book-keeping
needed, in such short sequences.

RBJ: I know you have been doing lot of fast implementations, while
I'm more academically inclined. I used the word "implement" in my post
which, of course, is wrong. I should have said "there is one way to
*compute* the autocorrelation". As Mark commented, it's not strictly
correct but it's not quite as bad as what I first wrote. "Computation"
relates to the basic algorithm while "implementation" relates to
how the computations actually are done.

Rune

> in article 1106476416.358718.66140@z14g2000cwz.googlegroups.com, Rune Allnor
> at allnor@tele.ntnu.no wrote on 01/23/2005 05:33:
> 
>> I don't know how the implementation you found works, but there is
>> basically only one way to implement the autocorrelation.

in article KrQId.49525$re1.33287@fe2.columbus.rr.com, Mark Borgerding at
mark@borgerding.net wrote on 01/23/2005 11:40:

> Not true.  But your comments below indicate you already suspect there is
> another way.

i can't even remember all of the variations of computing autocorrelation.
there is the "fast autocorrelation" by use of FFT, magnitude-squaring, and
iFFT.  there are tricks that save you from having to compute the
autocorrelation for every lag value, if you don't need it for every lag.
there are ways to compute a normalized autocorrelation from the AMDF (or
more precisely, the ASDF).  the normalized autocorrelation and half the ASDF
add to a constant or are supposed to, for a large window.  what else?  if a
rectangular window is used, there are ways to save partial sums from
previous autocorrelations to save computation of the present one.  i dunno.
lotsa tricks.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Rune Allnor wrote:
> EC-AKD wrote:
>>  Is there any optimised method to write auto corelation of any signal

> I don't know how the implementation you found works, but there is
> basically only one way to implement the autocorrelation. 

Not true.  But your comments below indicate you already suspect there is 
another way.

> If you have very long sequences you might be able to save some
> time by using the FFT and do some of the computations in
> frequency domain. This approach opens several new cans of worms,
> as one have to deal with cyclic convolutions etc in time domain.
> It is not obvious that such an approach will work. Even if it
> does, the computational savings might not be relevant until the
> sequences are several thousand samples long.

Autocorrelation in the frequency domain works well.  The computational 
savings are considerable.

The computational cost for autocorrelation of N lags (produces 2N+1 
samples) is roughly half the cost of overlap add/save filtering for a 
FIR of length N.

If memory serves, I implemented the algorithm in a manner that took a 
3GHz Xeon only 1s to autocorrelate 50e6 samples, producing a 2049 point 
autocorrelation sequence.

Richard Blahut's "Fast Algorithms for Digital Signal Processing" 
discusses a very clever implementation of autocorrelation when the 
required number of lags (delays) N is much less than the full-length 
autocorrelation sequence. The algorithm is sort of like an extension to 
overlap-add ... but not quite.  For each block, the freq resp of a full 
buffer is mult'd by a half-zero buffer.

There are a couple of tricks worth doing, but I can't do justice to them 
in a post.  Read the book to find out how to a) postpone the inverse fft 
b) use the FFTs of two consecutive half-zero buffers to compute the FFT 
of their corresponding full buffer.

Unfortunately, this book is out of print.  Used copies can still be found.

If time is precious,  I can code it up for you.  I've already done it 
once before, so I could do it faster than someone starting cold (and 
thus, maybe even cheaper).  Email me with details if you are interested.

My work email is ( I'm trying to keep it somewhat spam-free )
	markb at 3db dash labs dot c0m
or my home
	mark@borgerding.net  -- already given up

-- Mark Borgerding

Dear Rune and Steve,

Thanks a lot for replies. Steve, actually the autocorelation in AMR
takes quite a few cycles, nearly 12% of the total cycles. The input
sequence is 160 samples long and the corelation is to be found for lag
ranging from 0 to about 140.
For each lag value, we have to fetch the same 160 samples of the input
signal again and again. Can we somehow prevent too much loading.
I am supposed to code this in assembly for ARM9E platform and hence was
wondering if any special instruction couild be used.
Thanks and Regards,
Anoop Deoras

EC-AKD wrote:
> Hi,
>
>   Is there any optimised method to write auto corelation of any
signal
> in C. I was referring to AMR Speech codec's one of the auto
corelation
> function and found that it is too costly in terms of cycles it takes.
>
> Has anyone come up with any optimised method to deal with
> autocorelation.

I don't know how the implementation you found works, but there is
basically only one way to implement the autocorrelation. There
are some variations where one use different scaling coefficients
for the different coefficients in the autocorrelation sequence,
but these are more expensive than the "naive" estimator for the
autocorrelation. Another variation is known as "autocovariance",
where one explicitly subtracts the mean from the sequence, and
have to estimate that as well. Again, this is more expensive
than the "naive" estimator.

If you have very long sequences you might be able to save some
time by using the FFT and do some of the computations in
frequency domain. This approach opens several new cans of worms,
as one have to deal with cyclic convolutions etc in time domain.
It is not obvious that such an approach will work. Even if it
does, the computational savings might not be relevant until the
sequences are several thousand samples long.

Rune

EC-AKD wrote:

>Hi,
>
>  Is there any optimised method to write auto corelation of any signal
>in C. I was referring to AMR Speech codec's one of the auto corelation
>function and found that it is too costly in terms of cycles it takes.
>
>Has anyone come up with any optimised method to deal with
>autocorelation.
>
>Any pointers would be appreciated.
>
>Thanks and Regards,
>Anoop Deoras
>  
>
The autocorrelation should be a tiny part of the total compute load in 
AMR. I can't imagine what you could do to make it otherwise. The 
autocorrelation code in the AMR reference code is pretty much how it is 
normally implemented.

Regards,
Steve