comp.dsp | TI 54x FIRS Not Compatible with Circular Buffering?| page 2

Reply by Brian Dam Pedersen ●September 14, 20042004-09-14

Randy Yates wrote:

> I don't mean to put you off, but go read the responses I had to Jerry
> Avins and understand what this instruction is doing. A key point is
> that xmem and ymem are pointing to symmetric points about a center
> point in the input data buffer, so everytime you move a notch (a
> coefficient) you must increment one and decrement the other.

[ Blushing - brushing off old 54xx manual to read more about FIRS 
(*cough*) ]

OK - here is a somewhat more thought-over shot at it. If we consider an 
8-tap symmetric FIR filter, and we denote x[n] as [0], x[n-1] as [1] 
etc, we want the following equation to be realized using the FIRS 
instruction:

y[n]=c[0]([0]+[7])+c[1]([1]+[6])+c[2]([2]+[5])+c[3]([3]+[4])

The key is to get data aligned in x/y memory so that we can form the 
sums inside the () by the FIRS instruction (the same pointer update for 
both X and Y). This can be done by forming data the following way (8 
consecutive samples shown, the rest is a repetition). Note that each 
iteration replaces [7] with a new sample, that then becomes the newest 
([0]), while the other samples gets "older".

          *                               *
n=0: x  [0][2][4][6] | n=1: x  [1][3][5][7]
      y  [7][5][3][1] |      y  [0][6][4][2]
          *                      *
                   *                   *
n=2: x  [2][4][6][0] | n=3: x  [3][5][7][1]
      y  [1][7][5][3] |      y  [2][0][6][4]
             *                      *
                *                   *
n=4: x  [4][6][0][2] | n=5: x  [5][7][1][3]
      y  [3][1][7][5] |      y  [4][2][0][6]
                *                      *
             *                   *
n=6: x  [6][0][2][4] | n=7: x  [7][1][3][5]
      y  [5][3][1][7] |      y  [6][4][2][0]
                   *                      *

If we start the pointers at the stars, a circular increment by 1 will 
implement the desired functionality , provided that the coefficient 
vector has the form [ c[0] c[2] c[3] c[1] ] for even samples and [ c[0] 
c[1] c[3] c[2] ] for odd samples, meaning that we need M coefficients 
(even though M/2 describes the filter fully). Maintaining the pointers 
and data input is a little tricky, but can be done with two circular 
pointers (modulo 4 in this case). If we denote these two index pointers 
px and py, and index the coefficients by pc (a linear index pointer) the 
following pseudocode implements our filter (how to map this to the TI is 
left as an exercise). The loop processes two samples at a time

px=py=0 # Actually - if you are not doing block processing with a block 

         # size modulo M, these should be stored /restored for each
         # sample block. The FIRS approach in here requires at least 2
         # samples per block.

do (an_even_number_of_samples){
   # Insert a sample in the delay line
   xmem[px]=new_sample_even
   B=0
   pc=0
   do (M/2){
     B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction
   }
   # Output B to the appropriate place here
   somemem[outputptr++]=B
   # Adjust the X pointer according to the scheme above
   px--
   # Insert another sample in the delay line
   ymem[py]=new_sample_odd
   # Note that we do NOT reset pc here since a different coeff order
   # is needed for the odd samples
   B=0
   do (M/2){
     B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction 

                                             # strikes again
   }
   # Output one more B
   somemem[outputptr++]=B
   # Adjust the Y pointer according to the scheme above
   py++
}

I hope this is a little more helpful than my first post. Generalization 
to an odd number of taps is left as an exercise.

DISCLAIMER: None of this is actually tested in code, but I think the 
idea is correct.

-- Brian Dam Pedersen
    M.Sc.EE.

Reply by Jerry Avins ●September 14, 20042004-09-14

Randy Yates wrote:

> Jerry Avins <jya@ieee.org> writes:
> 
> 
>>Randy Yates wrote:
>>
>>   ...
>>
>>
>>>Hi Jerry,
>>>xmem and ymem both point to data - pmad points to the
>>>coefficients. (pmad
>>
>>>stands for "program memory address"). This instruction computes the
>>>following, in C meta code
>>
>>>  B += A * *(pmad+n);
>>
>>>  A = *(xmem++) + *(ymem--);
>>>where n is incremented by one when you repeat the instruction and
>>
>>>I'm assuming the corresponding addressing form for xmem and ymem
>>>as shown in the code above (i.e., *ARx+, *AR7-).
>>>You see the idea? The data on both sides of the symmetric FIR are
>>
>>>added first, then multiplied by the one coefficient. This saves
>>>MIPS (since this instruction can be done in 1 cycle) AND memory
>>>since you only have to store ((M - 1) / 2) + 1 coefficients. Note
>>>that you must precompute the first coefficient's multiplication and
>>>first data sum before entering the repeat loop with FIRS.
>>
>>Gotcha, Randy; thanks. I see where it saves space, but not time if an
>>addition takes as long as a MAC. 
> 
> 
> But in this case it doesn't. The addition and the MAC are ALL done in
> 1 cycle. Pretty slick, eh? Them folks at TI sure are smart. Except that
> they forgot how to make it work with circular addressing...

Can you set the index stride? If so, you can index backward by setting
it to [size - 1].

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by ●September 14, 20042004-09-14

Jerry Avins <jya@ieee.org> writes:

> Randy Yates wrote:
> 
> > Jerry Avins <jya@ieee.org> writes:
> >
> 
> >>Randy Yates wrote:
> >>
> >>   ...
> >>
> >>
> >>>Hi Jerry,
> >>>xmem and ymem both point to data - pmad points to the
> >>>coefficients. (pmad
> >>
> >>>stands for "program memory address"). This instruction computes the
> >>>following, in C meta code
> >>
> >>>  B += A * *(pmad+n);
> >>
> >>>  A = *(xmem++) + *(ymem--);
> >>>where n is incremented by one when you repeat the instruction and
> >>
> >>>I'm assuming the corresponding addressing form for xmem and ymem
> >>>as shown in the code above (i.e., *ARx+, *AR7-).
> >>>You see the idea? The data on both sides of the symmetric FIR are
> >>
> >>>added first, then multiplied by the one coefficient. This saves
> >>>MIPS (since this instruction can be done in 1 cycle) AND memory
> >>>since you only have to store ((M - 1) / 2) + 1 coefficients. Note
> >>>that you must precompute the first coefficient's multiplication and
> >>>first data sum before entering the repeat loop with FIRS.
> >>
> >>Gotcha, Randy; thanks. I see where it saves space, but not time if an
> >> addition takes as long as a MAC.
> 
> > But in this case it doesn't. The addition and the MAC are ALL done in
> 
> > 1 cycle. Pretty slick, eh? Them folks at TI sure are smart. Except that
> > they forgot how to make it work with circular addressing...
> 
> Can you set the index stride? 

Not only can you, you must. The only circular addressing mode available
for this instruction is the one which increments by the amount in AR0
circularly, so both operands must stride the same way. 

I'm sure you could see this in an instant if you picked up the mnemonic
assembly language book from TI and looked at the instruction - it's
document SPRU172.

> If so, you can index backward by setting
> it to [size - 1].

You can, but then both operands would index backward.
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by Jerry Avins ●September 14, 20042004-09-14

Randy Yates wrote:

   ...

> I'm sure you could see this in an instant if you picked up the mnemonic
> assembly language book from TI and looked at the instruction - it's
> document SPRU172.

Speculation is foolish. To one who scorns voting to decide what time it 
is, your admonition to RTFM is particularly apt.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by ●September 16, 20042004-09-16

Hi Brian,

Thanks for this detailed response. You may have something here, but
I'm having a helluva time trying to understand your notation. I also
don't understand how you're shifting new data into your arrays. 

Would you mind re-expressing using actual memory indexes and
explaining exactly how new data is shifted into memory each
sample?

--Randy



Brian Dam Pedersen <brian.pedersen@mail.danbbs.dk> writes:

> Randy Yates wrote:
> 
> > I don't mean to put you off, but go read the responses I had to Jerry
> > Avins and understand what this instruction is doing. A key point is
> > that xmem and ymem are pointing to symmetric points about a center
> > point in the input data buffer, so everytime you move a notch (a
> > coefficient) you must increment one and decrement the other.
> 
> [ Blushing - brushing off old 54xx manual to read more about FIRS
> (*cough*) ]
> 
> 
> OK - here is a somewhat more thought-over shot at it. If we consider
> an 8-tap symmetric FIR filter, and we denote x[n] as [0], x[n-1] as
> [1] etc, we want the following equation to be realized using the FIRS
> instruction:
> 
> 
> y[n]=c[0]([0]+[7])+c[1]([1]+[6])+c[2]([2]+[5])+c[3]([3]+[4])
> 
> The key is to get data aligned in x/y memory so that we can form the
> sums inside the () by the FIRS instruction (the same pointer update
> for both X and Y). This can be done by forming data the following way
> (8 consecutive samples shown, the rest is a repetition). Note that
> each iteration replaces [7] with a new sample, that then becomes the
> newest ([0]), while the other samples gets "older".
> 
> 
>           *                               *
> n=0: x  [0][2][4][6] | n=1: x  [1][3][5][7]
>       y  [7][5][3][1] |      y  [0][6][4][2]
>           *                      *
>                    *                   *
> n=2: x  [2][4][6][0] | n=3: x  [3][5][7][1]
>       y  [1][7][5][3] |      y  [2][0][6][4]
>              *                      *
>                 *                   *
> n=4: x  [4][6][0][2] | n=5: x  [5][7][1][3]
>       y  [3][1][7][5] |      y  [4][2][0][6]
>                 *                      *
>              *                   *
> n=6: x  [6][0][2][4] | n=7: x  [7][1][3][5]
>       y  [5][3][1][7] |      y  [6][4][2][0]
>                    *                      *
> 
> If we start the pointers at the stars, a circular increment by 1 will
> implement the desired functionality , provided that the coefficient
> vector has the form [ c[0] c[2] c[3] c[1] ] for even samples and [
> c[0] c[1] c[3] c[2] ] for odd samples, meaning that we need M
> coefficients (even though M/2 describes the filter fully). Maintaining
> the pointers and data input is a little tricky, but can be done with
> two circular pointers (modulo 4 in this case). If we denote these two
> index pointers px and py, and index the coefficients by pc (a linear
> index pointer) the following pseudocode implements our filter (how to
> map this to the TI is left as an exercise). The loop processes two
> samples at a time
> 
> 
> 
> px=py=0 # Actually - if you are not doing block processing with a
> block        # size modulo M, these should be stored /restored for each
> 
>          # sample block. The FIRS approach in here requires at least 2
>          # samples per block.
> 
> do (an_even_number_of_samples){
>    # Insert a sample in the delay line
>    xmem[px]=new_sample_even
>    B=0
>    pc=0
>    do (M/2){
>      B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction
>    }
>    # Output B to the appropriate place here
>    somemem[outputptr++]=B
>    # Adjust the X pointer according to the scheme above
>    px--
>    # Insert another sample in the delay line
>    ymem[py]=new_sample_odd
>    # Note that we do NOT reset pc here since a different coeff order
>    # is needed for the odd samples
>    B=0
>    do (M/2){
>      B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction
>      # strikes again
> 
>    }
>    # Output one more B
>    somemem[outputptr++]=B
>    # Adjust the Y pointer according to the scheme above
>    py++
> }
> 
> I hope this is a little more helpful than my first
> post. Generalization to an odd number of taps is left as an exercise.
> 
> 
> DISCLAIMER: None of this is actually tested in code, but I think the
> idea is correct.
> 
> 
> -- Brian Dam Pedersen
>     M.Sc.EE.
> 

-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by Brian Dam Pedersen ●September 16, 20042004-09-16

Randy Yates wrote:
> Hi Brian,
> 
> Thanks for this detailed response. You may have something here, but
> I'm having a helluva time trying to understand your notation. I also
> don't understand how you're shifting new data into your arrays. 
> 
> Would you mind re-expressing using actual memory indexes and
> explaining exactly how new data is shifted into memory each
> sample?

I can try. In the example below I still use an 8 tap filter. If we 
consider a memory block in x/y memory at address 0, it will look like 
this for time instances t=0 and 1(seeing this will hopefully make you 
able to correlate this to my notation):

          t=0       !     t=1
       X      Y     !   X       Y
@0 x[n]    x[n-7]  ! x[n-1]  x[n]
@1 x[n-2]  x[n-5]  ! x[n-3]  x[n-6]
@2 x[n-4]  x[n-3]  ! x[n-5]  x[n-4]
@3 x[n-6]  x[n-1]  ! x[n-7]  x[n-2]

or to take absolute sample numbers (first eight samples):

          t=0       !     t=1
       X      Y     !   X       Y
@0 x[0]   x[-7]    ! x[0]   x[1]
@1 x[-2]  x[-5]    ! x[-2]  x[-5]
@2 x[-4]  x[-3]    ! x[-4]  x[-3]
@3 x[-6]  x[-1]    ! x[-6]  x[-1]

          t=2       !     t=3
       X      Y     !   X       Y
@0 x[0]   x[1]     ! x[0]   x[1]
@1 x[-2]  x[-5]    ! x[-2]  x[3]
@2 x[-4]  x[-3]    ! x[-4]  x[-3]
@3 x[2]   x[-1]    ! x[2]   x[-1]

          t=4       !     t=5
       X      Y     !   X       Y
@0 x[0]   x[1]     ! x[0]   x[1]
@1 x[-2]  x[3]     ! x[-2]  x[3]
@2 x[4]   x[-3]    ! x[4]   x[5]
@3 x[2]   x[-1]    ! x[2]   x[-1]

          t=6       !     t=7
       X      Y     !   X       Y
@0 x[0]   x[1]     ! x[0]   x[1]
@1 x[6]   x[3]     ! x[6]   x[3]
@2 x[4]   x[-3]    ! x[4]   x[5]
@3 x[2]   x[-1]    ! x[2]   x[7]

So x[n] is always the newest sample (must be inserted into the memory 
block prior to filtering of course) and is denoted [0] in my notation. 
[1] is then x[n-1], [2] is x[n-2] and so forth. I hope you can see the 
memory layout now at the 8 time instances. (lowest adress is leftmost)

>>The key is to get data aligned in x/y memory so that we can form the
>>sums inside the () by the FIRS instruction (the same pointer update
>>for both X and Y). This can be done by forming data the following way
>>(8 consecutive samples shown, the rest is a repetition). Note that
>>each iteration replaces [7] with a new sample, that then becomes the
>>newest ([0]), while the other samples gets "older".
>>
>>
>>          *                               *
>> n=0: x  [0][2][4][6] | n=1: x  [1][3][5][7]
>>      y  [7][5][3][1] |      y  [0][6][4][2]
>>          *                      *
>>                   *                   *
>> n=2: x  [2][4][6][0] | n=3: x  [3][5][7][1]
>>      y  [1][7][5][3] |      y  [2][0][6][4]
>>             *                      *
>>                *                   *
>> n=4: x  [4][6][0][2] | n=5: x  [5][7][1][3]
>>      y  [3][1][7][5] |      y  [4][2][0][6]
>>                *                      *
>>             *                   *
>> n=6: x  [6][0][2][4] | n=7: x  [7][1][3][5]
>>      y  [5][3][1][7] |      y  [6][4][2][0]
>>                   *                      *
>>
>>If we start the pointers at the stars, a circular increment by 1 will
>>implement the desired functionality , provided that the coefficient
>>vector has the form [ c[0] c[2] c[3] c[1] ] for even samples and [
>>c[0] c[1] c[3] c[2] ] for odd samples, meaning that we need M
>>coefficients (even though M/2 describes the filter fully). Maintaining
>>the pointers and data input is a little tricky, but can be done with
>>two circular pointers (modulo 4 in this case). If we denote these two
>>index pointers px and py, and index the coefficients by pc (a linear
>>index pointer) the following pseudocode implements our filter (how to
>>map this to the TI is left as an exercise). The loop processes two
>>samples at a time

So in order to get the filter to run correctly, the pointers into X and 
Y memory must start at the locations that are marked by stars above 
prior to startup of the FIRS sequence, but after putting a new sample 
into the block. If we use AR1 for X indexing and AR2 for Y indexing, 
that means that they should start according to the following table at 
t=0..7 for and 8 tap filter (again assuming that the memory block starts 
at addr 0). Also the new samples should be injected at the positions in 
the I column prior to running the filter:

t  AR1  AR2   I
0   0    0    X0
1   3    0    Y0
2   3    1    X3
3   2    1    Y1
4   2    2    X2
5   1    2    Y2
6   1    3    X1
7   0    3    Y3

As you can see AR1 should be decremented every other sample (after 
executing FIRS), and AR2 should be incremented every other sample - 
which is why I wrote the pseudocode to process two samples at a time. 
Also the AR registers can be used one at a time to inject new samples 
(prior to executing FIRS), even samples by AR1 and odd samples by AR2, 
as you can see from the indices where new samples should be injected in 
each bank. This pattern extends to any odd-order FIRS (even number of 
taps), so the  pseudocode I have below actually is valid for all even M. 
I use px for denoting a pointer to X memory - that would be AR1 in the 
above table. Similarly py would be AR2 in the above table. The ++ is 
circular, so is the -- .

The coefficients are tricky, but you will be able to see why they need 
to be different for even and odd samples by tracking the pointers 
through the patterns above (or just the tables I made in this new post). 
You will note that they always start out at the newest and oldest sample 
and proceed forward in memory. When the FIRS instructions are executed 
for even samples, they compute in this order:

c[0](x[n-0]+x[n-7])+
c[2](x[n-2]+x[n-5])+
c[3](x[n-3]+x[n-4])
c[1](x[n-1]+x[n-6])+

and for odd samples:

c[0](x[n-0]+x[n-7])+
c[1](x[n-1]+x[n-6])+
c[3](x[n-3]+x[n-4])
c[2](x[n-2]+x[n-5])+

[Hmmm.... Looking at this again, it looks like you can do with the [0 2 
3 1] version of the coeffs if you set the stride in AR0 to -1 for odd 
samples when executing the FIRS. It shouldn't matter which way you do 
it, since the pointers start and end the same place both going forward 
and backward ... Maybe ... It is late here in europe... #-]

Looking again at the pseudocode - the new comments refer to the first 
time the loop is executed:

>>
>>px=py=0 # Actually - if you are not doing block processing with a
>>block        # size modulo M, these should be stored /restored for each
>>
>>         # sample block. The FIRS approach in here requires at least 2
>>         # samples per block.
>>
>>do (an_even_number_of_samples){
>>   # Insert a sample in the delay line

      px is 0, and should be used according to my table to
      insert x[0] into memory

>>   xmem[px]=new_sample_even
>>   B=0
>>   pc=0
>>   do (M/2){
>>     B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction
>>   }
>>   # Output B to the appropriate place here
>>   somemem[outputptr++]=B
>>   # Adjust the X pointer according to the scheme above
>>   px--

      px is now 3 (assuming t=0), which is what it should be for t=1,
      which is started below. x[1] should be inserted at Y0, which is
      what is happening below:

>>   # Insert another sample in the delay line
>>   ymem[py]=new_sample_odd
>>   # Note that we do NOT reset pc here since a different coeff order
>>   # is needed for the odd samples
>>   B=0

      Now here is where I think you could do px--/py-- in order to use
      the same coefficients as for even samples.

>>   do (M/2){
>>     B+=coeffs[pc++]*(xmem[px++]+ymem[py++]) # the FIRS instruction
>>     # strikes again
>>
>>   }
>>   # Output one more B
>>   somemem[outputptr++]=B
>>   # Adjust the Y pointer according to the scheme above

      Again considering the first round, we increment py, so that it
      points to 1. Now px,py is 3,1 which is what it should be in order
      to process x[2] according to the table. px is used in the 
beginning
      of the next round to insert the sample into X3 (again correct
      according to the table, while py is used to insert x[3] into Y1 in
      the next round.
>>   py++
>>}
>>

I hope this made it a little more clear - otherwise post again, it would 
be a waste of both yours and my time if we give up now ;)

-- Brian

Reply by ●September 17, 20042004-09-17

Brian Dam Pedersen <brian.pedersen@mail.danbbs.dk> writes:

> Randy Yates wrote:
> > Hi Brian,
> > Thanks for this detailed response. You may have something here, but
> 
> > I'm having a helluva time trying to understand your notation. I also
> > don't understand how you're shifting new data into your
> > arrays. Would you mind re-expressing using actual memory indexes and
> 
> > explaining exactly how new data is shifted into memory each
> > sample?
> 
> I can try. In the example below I still use an 8 tap filter. If we
> consider a memory block in x/y memory at address 0, it will look like
> this for time instances t=0 and 1(seeing this will hopefully make you
> able to correlate this to my notation):
> 
> 
> 
>           t=0       !     t=1
>        X      Y     !   X       Y
> @0 x[n]    x[n-7]  ! x[n-1]  x[n]
> @1 x[n-2]  x[n-5]  ! x[n-3]  x[n-6]
> @2 x[n-4]  x[n-3]  ! x[n-5]  x[n-4]
> @3 x[n-6]  x[n-1]  ! x[n-7]  x[n-2]
> 
> or to take absolute sample numbers (first eight samples):
> 
>           t=0       !     t=1
>        X      Y     !   X       Y
> @0 x[0]   x[-7]    ! x[0]   x[1]
> @1 x[-2]  x[-5]    ! x[-2]  x[-5]
> @2 x[-4]  x[-3]    ! x[-4]  x[-3]
> @3 x[-6]  x[-1]    ! x[-6]  x[-1]
> 
>           t=2       !     t=3
>        X      Y     !   X       Y
> @0 x[0]   x[1]     ! x[0]   x[1]
> @1 x[-2]  x[-5]    ! x[-2]  x[3]
> @2 x[-4]  x[-3]    ! x[-4]  x[-3]
> @3 x[2]   x[-1]    ! x[2]   x[-1]
> 
>           t=4       !     t=5
>        X      Y     !   X       Y
> @0 x[0]   x[1]     ! x[0]   x[1]
> @1 x[-2]  x[3]     ! x[-2]  x[3]
> @2 x[4]   x[-3]    ! x[4]   x[5]
> @3 x[2]   x[-1]    ! x[2]   x[-1]
> 
>           t=6       !     t=7
>        X      Y     !   X       Y
> @0 x[0]   x[1]     ! x[0]   x[1]
> @1 x[6]   x[3]     ! x[6]   x[3]
> @2 x[4]   x[-3]    ! x[4]   x[5]
> @3 x[2]   x[-1]    ! x[2]   x[7]

Still utterly confused. What are the "@x" x = 0, 1, 2, 3 at the left? 
Is that the coefficient index? Is a new sample shifted in each time
into x[0], and the remaining samples shifted down? Specifically, 
for each new sample do we do the following:

x[-7] = x[-6]
x[-6] = x[-5]
x[-5] = x[-4]
x[-4] = x[-3]
x[-3] = x[-2]
x[-2] = x[-1]
x[-1] = x[0]
x[0] = a

? If so, why wouldn't the pattern be constant? I.e., why do we need
these two columns? Is t time?

Perhaps it would be easier just to write the code (in 54x assembly)
and post it? 

Here is the solution I settled on. Essentially I created a
buffer of length N+L-1, where L is the length of the filter and N is
the block length, copy the new data in, and then do a straight,
non-circularly-indexed, convolution within it.  In my design, L = 161
and N = 160. The buffer is arranged so that the first word of each new
block of 160 words is located at relative buffer address pcmBuffer+

;
; Get the input data buffer address into AR1
;
            STM      #pcmBuffer+80, AR1               ; [2]
;
; Set arithmetic modes required for this convolution
;
            RSBX     FRCT                             ; [1] no left shift by one on multiply
            RSBX     CMPT                             ; [1] no MAR compatibility mode
            SSBX     SXM                              ; set SXM for FIRS
;
; Perform the FIR filtering:
;
FIRFilter:
;
;    b. setup block repeat count
;
            STM      #PCM_BLOCK_SIZE-1, BRC           ; [2]
;
;    d. convolve
;
            RPTB     BbConvolveEnd-1                  ; [4]
;
;    e. Make initial computation for FIRS and setup data pointers AR2 and AR3 
;
            MVMM     AR1, AR2                         ; [1*]
            MVMM     AR1, AR3                         ; [1*]
            LD       *AR2-, 16, A                     ; [1*]
            MAR      *AR3+                            ; [1*]
;
            RPTZ     B, #((BB_FILTER_SIZE-1)/2)+1-1   ; [2*]
            FIRS     *AR2-, *AR3+, #filterCoefficients; [(FILTER_SIZE-1)/2*]
;
            SFTA     B, 16-15                         ; [1*]
            SAT      B                                ; [1*]
            STH      B, *AR7+                         ; [1*]
            LD       *AR1+, A                         ; [1*] bogus read - just get AR1 modified
                                                      ; [91*160]
;
BbConvolveEnd:
;
; Move the new data into the old data:
;
            STM      #pcmBuffer+PCM_BLOCK_SIZE, AR2   ; [2]
            STM      #pcmBuffer, AR3                  ; [2]
            RPT      #PCM_BLOCK_SIZE-1                ; [1]
            MVDD     *AR2+, *AR3+                     ; [PCM_BLOCK_SIZE]
;
            RET                                       ; [5]
                                                      ; 182 + 91*160 = 14742
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by ●September 17, 20042004-09-17

Randy Yates <randy.yates@sonyericsson.com> writes:

> block of 160 words is located at relative buffer address pcmBuffer+

160. 

:)
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Previous 12Next

TI 54x FIRS Not Compatible with Circular Buffering?

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group