I know that C64x+ has the circular addressing mode. But, do I have to use the assembly to implement a circular buffer? In C, you can not specify the register to perform the addressing. Does anyone how to implement it with C? Thank you
Circular addressing mode in TI C64x+
Started by ●March 9, 2008
Reply by ●March 9, 20082008-03-09
Kevin schrieb:> I know that C64x+ has the circular addressing mode. But, do I have to > use the assembly to implement a circular buffer?Hi Kevin, yes, you have to use assembly. The compiler won't do it for you. A little tip: Before you dive into assembly, simulate the circular addressing inside the address calculation in C and compile with and without this address calculation. Pointer update is rarely the execution path with the longest dependency chain. You have very good chances that "hand made"-circular addressing runs as fast as the real thing. Good luck, Nils
Reply by ●March 9, 20082008-03-09
Nils wrote:> A little tip: Before you dive into assembly, simulate the circular > addressing inside the address calculation in C and compile with and > without this address calculation. Pointer update is rarely the execution > path with the longest dependency chain. > > You have very good chances that "hand made"-circular addressing runs as > fast as the real thing.Checking for the pointer wraparound on every update can introduce the significant overhead. If this happens inside a loop, you can break the loop into two loops: before wraparound and after wraparound. So you have to do the check only at one time. BTW, the VDSP C++ compiler optimizes the modulo indexes into the circular addressing. So, the expressions like foo = bar[(i+1)%BUFFER_SIZE] are translated into the efficient code. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●March 9, 20082008-03-09
Hi Vladimir, Vladimir Vassilevsky wrote:> BTW, the VDSP C++ compiler optimizes the modulo indexes into the > circular addressing. So, the expressions like foo = > bar[(i+1)%BUFFER_SIZE] are translated into the efficient code.Does the VDSP generates code for the TI C64x? Would be interesting to know. I was under the impression that only the CCS from TI can compile for that architecture. Anyway, the circular addressing of the C64x can only do power of two ranges, so all it takes to simulate it in software is a single AND on the index. That can go into the L, S and D execution units. It's unlikely that a c-compiled loop is so dense that there isn't a single unused execution unit. I've never seen such dense loops in practice except for *very* simple loops. Nils
Reply by ●March 11, 20082008-03-11
I am a newbie here and I hope to learn something from this discussion. Why do you need circular addressing? Is this for managing an i/o buffer or inside a compute loop? I have been wondering why one would need circular addressing in a compute loop such as a FIR filter, especially with a 64x+ device. I have used circular addressing with a 55xx family device. Writing assembly code for 55xx device is relatively simple compared to 64x+. Introduction of SPLOOP in 64x+ has helped address pipeline issues but assembly level programming is still not easy. TI has been encouraging the user community to use either intrinsics or linear assembly instead of assembly. My experience so far is that use of intrinsics and pragmas can improve execution efficiency so much that it is hard to justify anything in assembly or linear assembly. Here are a couple of TI documents which I found very useful: "Introduction to Compiler Consultant" (SPRAA14, http://focus.ti.com/lit/an/spraa14/spraa14.pdf) by George Mock. "Hand-Tuning Loops and Control Code on the TMS320C6000" (SPRA666, http://focus.ti.com/lit/an/spra666/spra666.pdf) by Elana Granston. When I looked at source code for dsplib and imglib I did not come across anything that utilized built-in features for circular addressing. In the case of FIR filter, it may be more efficient to copy the samples to the history buffer first, carry out filtering operations and update the history buffer. Increased execution efficiency you get in the filter loop may be enough to justify the overhead associated copying. This depends on relative sizes of filter and sample set. Perhaps the GURUs, Gurus and gooroos of this forum can enlighten us a bit. On Mar 9, 11:31 am, "Kevin" <ke...@inbox.com> wrote:> I know that C64x+ has the circular addressing mode. But, do I have to use > the assembly to implement > a circular buffer? In C, you can not specify the register to perform the > addressing. Does anyone > how to implement it with C? > > Thank you
Reply by ●March 11, 20082008-03-11
RamachandraPailoor@gmail.com wrote:> I am a newbie here and I hope to learn something from this discussion. > > Why do you need circular addressing? Is this for managing an i/o > buffer or inside a compute loop?A circular buffer is an efficient way to implement a shift register in a computer. Instead of moving the data, one moves pointers to the two important data elements: the next datum to be read from the buffer (the output), and the next datum to be overwritten by new data (the input). In most DSP applications, those can be the same. You might be able to take it from there, sparing me the typing of an example, but if you let me know that the spark doesn't leap into flame I'll elaborate. ...>> Thank youYou're welcome. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●March 13, 20082008-03-13
First thank you all for the discussion. I learned a lot from you guys. I am a newbie myself. For my case, the circular addressing is for managing an I/O buffer. It is an acquisition process. The channel samples are constantly written to the memory at a very high speed by FPGA. The DSP detects the start of a packet. Usually, the acquisition is done by hardware. <RamachandraPailoor@gmail.com> wrote in message news:f65f9014-6b89-4689-a814-51edaa68289e@e25g2000prg.googlegroups.com...> I am a newbie here and I hope to learn something from this discussion. > > Why do you need circular addressing? Is this for managing an i/o > buffer or inside a compute loop? > > I have been wondering why one would need circular addressing in a > compute loop such as a FIR filter, especially with a 64x+ device. I > have used circular addressing with a 55xx family device. Writing > assembly code for 55xx device is relatively simple compared to 64x+. > Introduction of SPLOOP in 64x+ has helped address pipeline issues but > assembly level programming is still not easy. TI has been encouraging > the user community to use either intrinsics or linear assembly instead > of assembly. My experience so far is that use of intrinsics and > pragmas can improve execution efficiency so much that it is hard to > justify anything in assembly or linear assembly. > > Here are a couple of TI documents which I found very useful: > > "Introduction to Compiler Consultant" (SPRAA14, > http://focus.ti.com/lit/an/spraa14/spraa14.pdf) by George Mock. > "Hand-Tuning Loops and Control Code on the TMS320C6000" (SPRA666, > http://focus.ti.com/lit/an/spra666/spra666.pdf) by Elana Granston. > > When I looked at source code for dsplib and imglib I did not come > across anything that utilized built-in features for circular > addressing. > > In the case of FIR filter, it may be more efficient to copy the > samples to the history buffer first, carry out filtering operations > and update the history buffer. Increased execution efficiency you get > in the filter loop may be enough to justify the overhead associated > copying. This depends on relative sizes of filter and sample set. > > Perhaps the GURUs, Gurus and gooroos of this forum can enlighten us a > bit. > > > On Mar 9, 11:31 am, "Kevin" <ke...@inbox.com> wrote: >> I know that C64x+ has the circular addressing mode. But, do I have to use >> the assembly to implement >> a circular buffer? In C, you can not specify the register to perform the >> addressing. Does anyone >> how to implement it with C? >> >> Thank you
Reply by ●March 13, 20082008-03-13
Kevin, From your comments it sounds like the FPGA is generating the write addresses and somehow it is able to generate addresses for circular writes. Am I right? Does the DSP get events marking the arrival of input samples? Do you have to "process" directly from this input buffer for start of packet detection? Can you afford to copy chunks of input data to a local buffer and then carry out the processing? I can not comment further without a better picture of your situation. In some cases, the overhead associated with copying is more than offset by increased execution efficiency you get when you can make assumptions about input such as alignment and size. What I am trying to say is that there may be ways to circumvent you inability to efficiently generate circular addresses. Rama On Mar 13, 7:20 am, "Kevin" <ke...@inbox.com> wrote:> First thank you all for the discussion. I learned a lot from you guys. > > I am a newbie myself. For my case, the circular addressing is for managing > an I/O buffer. It is an acquisition process. The channel samples are > constantly > written to the memory at a very high speed by FPGA. The DSP detects the > start of a packet. > Usually, the acquisition is done by hardware. >
Reply by ●March 13, 20082008-03-13
Kevin wrote: ...>> Perhaps the GURUs, Gurus and gooroos of this forum can enlighten us a >> bit.Those of us who are short on theory but make things work anyway are can-gurus. We tend to jump from topic to topic. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●March 13, 20082008-03-13
<RamachandraPailoor@gmail.com> wrote in message news:2aba8822-f5cf-46c8-b240-18d7db72c8d8@i7g2000prf.googlegroups.com...> Kevin, > > From your comments it sounds like the FPGA is generating the write > addresses and somehow it is able to generate addresses for circular > writes. Am I right? Does the DSP get events marking the arrival of > input samples?Yes, you are right. And DSP needs get notified of the input samples.> > Do you have to "process" directly from this input buffer for start of > packet detection? Can you afford to copy chunks of input data to a > local buffer and then carry out the processing? I can not comment > further without a better picture of your situation.> > In some cases, the overhead associated with copying is more than > offset by increased execution efficiency you get when you can make > assumptions about input such as alignment and size. What I am trying > to say is that there may be ways to circumvent you inability to > efficiently generate circular addresses.I agree with you. We are exploring many options to find a best solution. For now, I don't think we can afford to copy the input data to a local buffer and then process it. The processing is not dword-aligned. You are trying to say that we can copy the input data to a local buffer so that we can greatly improve the efficiency, right? Well, it seems that it doesn't improve that much (_amemd8 vs. _memd8). By the way, it is just a correlation. Maybe my code is not efficient enough. I have not tried all the options yet.> > Rama > > On Mar 13, 7:20 am, "Kevin" <ke...@inbox.com> wrote: >> First thank you all for the discussion. I learned a lot from you guys. >> >> I am a newbie myself. For my case, the circular addressing is for >> managing >> an I/O buffer. It is an acquisition process. The channel samples are >> constantly >> written to the memory at a very high speed by FPGA. The DSP detects the >> start of a packet. >> Usually, the acquisition is done by hardware. >>






