Hi, I've met an extremely weird EDMA problem. Please help me to figure it out. Many thanks! My system is a wireless baseband processing system. The digital signal processing part incorporates a Vertex-E 2600 FPGA and a TI C6416. In every 20 ms long wireless frame, the DSP processor need to write several segments of data to FPGA RAM via EDMA channels. The data amount is not large, so the normal transfer time is about 20us. But sporadically (about every several minutes) there will be one time that the transfer time of EDMA write becomes very long, up to about 500us, then it goes to the proper value again. The EDMA read works properly, only the write has this problem. The DSP RAM is connected to the FPGA RAM through EMIFA CE3. The related configurations are as follows: Memory Type: 32-bit async. interf. Read Strobe Width: 4 Read Setup Width: 2 Read Hold Width: 1 Write Strobe Width: 26 Write Setup Width: 5 Write Hold Width: 3 Turn around time: 3 Sync. interf. data read latency: 2 Sync. interf. data write latency: 0 CE Extension Register: Inactive Read Enable Enable: ADS mode Synchronization clock: Sync. to ECLKOUT1 DSP processor clock is 480 MHz. FPGA memory clock is 80MHz. All the EDMA channels has the priority "Urgent" (is this a possible cause of EDMA transfer block? But the transfer cannot happen simultaneously). Please help me figure out what's wrong. Thanks in advance for any kind of suggestion! Rose
Help!!! An EDMA problem!
Started by ●December 9, 2004
Reply by ●December 9, 20042004-12-09
Don't put all your EDMA channels at the same priority. That will definitely cause problems as the EDMA becomes more highly loaded. Take the example of a transfer to external memory. Due to the slowness of external memory, the EDMA cannot transfer data every cycle. If you have transfers on other priority levels then those transfers will take place while you're waiting to write to external memory again. On the other hand, if you're using everything in one queue then all your other transfers will have to wait for the current transfer to finish. Typically you should put the shorter, faster transfers at the higher priority levels. It's the same sort of philosophy as ISR prioritization. Brad "Rose" <enow@sina.com> wrote in message news:1102638912.480310.69040@c13g2000cwb.googlegroups.com...> Hi, > > I've met an extremely weird EDMA problem. Please help me to figure it > out. Many thanks! > > My system is a wireless baseband processing system. The digital signal > processing part incorporates a Vertex-E 2600 FPGA and a TI C6416. In > every 20 ms long wireless frame, the DSP processor need to write > several segments of data to FPGA RAM via EDMA channels. The data amount > is not large, so the normal transfer time is about 20us. But > sporadically (about every several minutes) there will be one time that > the transfer time of EDMA write becomes very long, up to about 500us, > then it goes to the proper value again. The EDMA read works properly, > only the write has this problem. > > The DSP RAM is connected to the FPGA RAM through EMIFA CE3. The related > configurations are as follows: > Memory Type: 32-bit async. interf. > Read Strobe Width: 4 > Read Setup Width: 2 > Read Hold Width: 1 > Write Strobe Width: 26 > Write Setup Width: 5 > Write Hold Width: 3 > Turn around time: 3 > Sync. interf. data read latency: 2 > Sync. interf. data write latency: 0 > CE Extension Register: Inactive > Read Enable Enable: ADS mode > Synchronization clock: Sync. to ECLKOUT1 > > DSP processor clock is 480 MHz. FPGA memory clock is 80MHz. > > All the EDMA channels has the priority "Urgent" (is this a possible > cause of EDMA transfer block? But the transfer cannot happen > simultaneously). > > Please help me figure out what's wrong. Thanks in advance for any kind > of suggestion! > > Rose >
Reply by ●December 12, 20042004-12-12
Thanks for your suggestions. I tried to distribute different EDMA channels to different priorities. But the problem still exists. In my project, those EDMA channels that write data from DSP to FPGA are started sequentially. That is to say, the transmission of the channel that is started later relies on the completion of the channel that is started earlier. If the transmission completion interrupt of channel A does not come, the transmission of channel B will not start. I make a minimum system: all the tasks are removed from DSP/BIOS, and only one EDMA channel is activited by interrupt 4 to transfer data from DSP to FPGA. The problem still exists, but the long transfer time (>500us) happens much less frequently. Could the problem be caused by something wrong in the EMIFA configuration? Brad Griffis wrote:> Don't put all your EDMA channels at the same priority. That willdefinitely> cause problems as the EDMA becomes more highly loaded. Take theexample of> a transfer to external memory. Due to the slowness of externalmemory, the> EDMA cannot transfer data every cycle. If you have transfers onother> priority levels then those transfers will take place while you'rewaiting to> write to external memory again. On the other hand, if you're using > everything in one queue then all your other transfers will have towait for> the current transfer to finish. > > Typically you should put the shorter, faster transfers at the higher > priority levels. It's the same sort of philosophy as ISRprioritization.> > Brad >
Reply by ●December 12, 20042004-12-12
Rose, A few more questions for you: 1) How much data are you transferring? 2) Do you have the L2 cache turned on? 3) How are you measuring the 20us and 500us times? 4) What did you configure in the EDMA parameters as the source of the transfer? Brad "Rose" <enow@sina.com> wrote in message news:1102862547.559543.77570@f14g2000cwb.googlegroups.com...> Thanks for your suggestions. > > I tried to distribute different EDMA channels to different priorities. > But the problem still exists. In my project, those EDMA channels that > write data from DSP to FPGA are started sequentially. That is to say, > the transmission of the channel that is started later relies on the > completion of the channel that is started earlier. If the transmission > completion interrupt of channel A does not come, the transmission of > channel B will not start. > > I make a minimum system: all the tasks are removed from DSP/BIOS, and > only one EDMA channel is activited by interrupt 4 to transfer data from > DSP to FPGA. The problem still exists, but the long transfer time > (>500us) happens much less frequently. > > Could the problem be caused by something wrong in the EMIFA > configuration? > > Brad Griffis wrote: >> Don't put all your EDMA channels at the same priority. That will > definitely >> cause problems as the EDMA becomes more highly loaded. Take the > example of >> a transfer to external memory. Due to the slowness of external > memory, the >> EDMA cannot transfer data every cycle. If you have transfers on > other >> priority levels then those transfers will take place while you're > waiting to >> write to external memory again. On the other hand, if you're using >> everything in one queue then all your other transfers will have to > wait for >> the current transfer to finish. >> >> Typically you should put the shorter, faster transfers at the higher >> priority levels. It's the same sort of philosophy as ISR > prioritization. >> >> Brad >> >
Reply by ●December 13, 20042004-12-13
Thank you, Brad. Please see the following for my answer to the questions. Brad Griffis wrote:> Rose, > > A few more questions for you: > 1) How much data are you transferring?I transfer 224 16-bit wide word from DSP to FPGA each time the channel is started.> 2) Do you have the L2 cache turned on?I configured L2 mode as "4 way cache (0k)", and L2 requester priority queue as "urgent". Does this mean that the L2 cache is turned on?> 3) How are you measuring the 20us and 500us times?After calling API function EDMA_setChannel to initiate a transfer, I called API function GPIO_pinWrite to output a high level on GPIO pin 0; after the transfer completed, the EDMA interrupt post a software interrupt, in which I use if... if... conditions to process different channels. When the condition of the channel that write data from DSP to FPGA is entered, API function GPIO_pinWrite is called again to output a low level on GPIO pin 0. I think the pulse width on GPIO pin 0 can represent the transfer time of the corresponding channel. I use an oscilloscope and the logic analyzer to capture the long-width pulse.> 4) What did you configure in the EDMA parameters as the source ofthe> transfer?I configured a global array name as the source address of the transfer, which I declared in the DSP/BIOS CSL extern declaration section.> > Brad >BTW, I noticed that my silicon version of the 6416 chip is "TMX320C6416GLZ". It seems not a released and reliable version. Could this be the cause of my problem? Where can I find the bug list of such a silicon version?
Reply by ●December 13, 20042004-12-13
Rose, More questions: 1) Does the FPGA have a 32-bit wide interface which you can use to write to it? You said before the EMIF was configured as 32-bit asynchronous though below you said you do 224 16-bit writes. ? Perhaps you have a 32-bit interface so that is implemented as 112 32-bit writes? 2) Please tell me either how you configured the cache or better yet the value of the CCFG register. Your description, "4 way cache (0k)", isn't one of the options in the Two Level Memory Guide. If there isn't any cache configured how can it be 4-way associative? 3) Do you have any other interrupts in your system besides the EDMA transfer completion interrupt? 4) Are you using any of the DSP library functions (filters, etc.) or do you have any loops with lots of number crunching going on? 5) Are you running any code from external memory such as SDRAM. 6) Are you starting the EDMA transfer in an ISR generated by EXT_INT4? Rather than using EDMA_set you could have the transfer sourced directly from EXT_INT4 such that you never enter the ISR (or you at least eliminate a little bit of code from the ISR). Brad "Rose" <enow@sina.com> wrote in message news:1102913012.289234.55990@z14g2000cwz.googlegroups.com...> Thank you, Brad. Please see the following for my answer to the > questions. > > Brad Griffis wrote: >> Rose, >> >> A few more questions for you: >> 1) How much data are you transferring? > I transfer 224 16-bit wide word from DSP to FPGA each time the channel > is started. > >> 2) Do you have the L2 cache turned on? > I configured L2 mode as "4 way cache (0k)", and L2 requester priority > queue as "urgent". Does this mean that the L2 cache is turned on? > >> 3) How are you measuring the 20us and 500us times? > After calling API function EDMA_setChannel to initiate a transfer, I > called API function GPIO_pinWrite to output a high level on GPIO pin 0; > after the transfer completed, the EDMA interrupt post a software > interrupt, in which I use if... if... conditions to process different > channels. When the condition of the channel that write data from DSP to > FPGA is entered, API function GPIO_pinWrite is called again to output a > low level on GPIO pin 0. I think the pulse width on GPIO pin 0 can > represent the transfer time of the corresponding channel. I use an > oscilloscope and the logic analyzer to capture the long-width pulse. > >> 4) What did you configure in the EDMA parameters as the source of > the >> transfer? > I configured a global array name as the source address of the transfer, > which I declared in the DSP/BIOS CSL extern declaration section. > >> >> Brad >> > > BTW, I noticed that my silicon version of the 6416 chip is > "TMX320C6416GLZ". It seems not a released and reliable version. Could > this be the cause of my problem? Where can I find the bug list of such > a silicon version? >
Reply by ●December 13, 20042004-12-13
Brad, Thanks for your kindly attention. Below are my answers: Brad Griffis wrote:> Rose, > > More questions: > 1) Does the FPGA have a 32-bit wide interface which you can use towrite to> it? You said before the EMIF was configured as 32-bit asynchronousthough> below you said you do 224 16-bit writes. ? Perhaps you have a32-bit> interface so that is implemented as 112 32-bit writes?The FPGA do has a 32-bit wide memory interface. In the particular case of the EDMA channel I mentioned, it is configured to update source address (DSP internal memory address ) 16-bit wide, while update destination address (FPGA SRAM address) 32-bit wide. In fact, FPGA only made the memory interface 32-bit wide, the internal data element is 16-bit wide, in order to save memory space.> 2) Please tell me either how you configured the cache or better yetthe> value of the CCFG register. Your description, "4 way cache (0k)",isn't one> of the options in the Two Level Memory Guide. If there isn't anycache> configured how can it be 4-way associative?The value of CCFG is "0x00000000". I am not familiar with the L2 cache config. Could this be a wrong config?> 3) Do you have any other interrupts in your system besides the EDMA > transfer completion interrupt?Yes, I have several other external interrupt enabled, such as interrupt 4, 5, and 6 for communication between FPGA and DSP. In my system, FPGA performs the chip-rate processing of CDMA signals, such as de-spreading, and DSP performs the symbol rate processing, such as channel estimation and compensation.> 4) Are you using any of the DSP library functions (filters, etc.) ordo you> have any loops with lots of number crunching going on?I am not very clear with what you mean by "DSP library functions (filters)" and "number crunching". I do called some CSL API functions and some of the intrincs.> 5) Are you running any code from external memory such as SDRAM.No. All the codes run from the internal RAM. You know, 6416 has pretty large 1M Bytes internal RAM. :)> 6) Are you starting the EDMA transfer in an ISR generated byEXT_INT4?> Rather than using EDMA_set you could have the transfer sourceddirectly from> EXT_INT4 such that you never enter the ISR (or you at least eliminatea> little bit of code from the ISR).Actually, the EDMA transfer is started in a task - let's call it task "T3". The whole process is like this: Interrupt 6 initiate an EDMA transfer through which despread values of the wireless baseband signals are read from FPGA to DSP, then a task is activated by posting a semaphore to perform channel estimation and compensation. After this, the compensated signals is converted into soft-decision LLR values to perform channel decoding. These LLRs are written to FPGA via EDMA. FPGA controls decoding of these data using another Turbo codec ASIC. After the decode is over, an interrupt is generated by FPGA, informing DSP the end of decoding. Then DSP initiate an EDMA transfer to read the decoded information bits (together with the CRC checksum) back to DSP. Finally, DSP performs CRC decoding of these data in task "T3" and write the information bits to FPGA to count bit errors (A BERT is implemented in FPGA). The process is a little bit complex, but we have to do it in such an awkward way because we are using old designs to develop new demo systems and the project schedule is tight. We cannot make too much revisions. :(> > Brad >
Reply by ●December 13, 20042004-12-13
Your cache configuration is fine. According to your CCFG settings you have the cache disabled (i.e. L2 memory is configured as SRAM). I'm a little puzzled at the transfer times you're reporting. Based on your setup, hold, and strobe times (34 cycles total) and the amount of data you need to transfer (112 32-bit elements) I don't see how you could ever transfer it in 20us. I calculate it should take about 47 us. I think your problem is most likely linked to interrupt latency. You mentioned that you start your EDMA transfer in "T3". However, you did not specify how you measure when it has ended. I'm assuming your using an EDMA completion interrupt. Is that right? Probably what is happening is that some other interrupt is sometimes holding off execution of your EDMA interrupt. Brad "Rose" <enow@sina.com> wrote in message news:1102952696.254343.153770@f14g2000cwb.googlegroups.com...> Brad, Thanks for your kindly attention. Below are my answers: > > Brad Griffis wrote: >> Rose, >> >> More questions: >> 1) Does the FPGA have a 32-bit wide interface which you can use to > write to >> it? You said before the EMIF was configured as 32-bit asynchronous > though >> below you said you do 224 16-bit writes. ? Perhaps you have a > 32-bit >> interface so that is implemented as 112 32-bit writes? > The FPGA do has a 32-bit wide memory interface. In the particular case > of the EDMA channel I mentioned, it is configured to update source > address (DSP internal memory address ) 16-bit wide, while update > destination address (FPGA SRAM address) 32-bit wide. In fact, FPGA only > made the memory interface 32-bit wide, the internal data element is > 16-bit wide, in order to save memory space. > >> 2) Please tell me either how you configured the cache or better yet > the >> value of the CCFG register. Your description, "4 way cache (0k)", > isn't one >> of the options in the Two Level Memory Guide. If there isn't any > cache >> configured how can it be 4-way associative? > The value of CCFG is "0x00000000". I am not familiar with the L2 cache > config. Could this be a wrong config? > >> 3) Do you have any other interrupts in your system besides the EDMA >> transfer completion interrupt? > Yes, I have several other external interrupt enabled, such as interrupt > 4, 5, and 6 for communication between FPGA and DSP. In my system, FPGA > performs the chip-rate processing of CDMA signals, such as > de-spreading, and DSP performs the symbol rate processing, such as > channel estimation and compensation. > >> 4) Are you using any of the DSP library functions (filters, etc.) or > do you >> have any loops with lots of number crunching going on? > I am not very clear with what you mean by "DSP library functions > (filters)" and "number crunching". I do called some CSL API functions > and some of the intrincs. > >> 5) Are you running any code from external memory such as SDRAM. > No. All the codes run from the internal RAM. You know, 6416 has pretty > large 1M Bytes internal RAM. :) > >> 6) Are you starting the EDMA transfer in an ISR generated by > EXT_INT4? >> Rather than using EDMA_set you could have the transfer sourced > directly from >> EXT_INT4 such that you never enter the ISR (or you at least eliminate > a >> little bit of code from the ISR). > Actually, the EDMA transfer is started in a task - let's call it task > "T3". The whole process is like this: > Interrupt 6 initiate an EDMA transfer through which despread values of > the wireless baseband signals are read from FPGA to DSP, then a task is > activated by posting a semaphore to perform channel estimation and > compensation. After this, the compensated signals is converted into > soft-decision LLR values to perform channel decoding. These LLRs are > written to FPGA via EDMA. FPGA controls decoding of these data using > another Turbo codec ASIC. After the decode is over, an interrupt is > generated by FPGA, informing DSP the end of decoding. Then DSP initiate > an EDMA transfer to read the decoded information bits (together with > the CRC checksum) back to DSP. Finally, DSP performs CRC decoding of > these data in task "T3" and write the information bits to FPGA to count > bit errors (A BERT is implemented in FPGA). The process is a little bit > complex, but we have to do it in such an awkward way because we are > using old designs to develop new demo systems and the project schedule > is tight. We cannot make too much revisions. :( > >> >> Brad >> >
Reply by ●December 14, 20042004-12-14
Sorry, I made a mistake with the EMIFA config. The parameters I used are: Write Strobe: 9 Write Setup: 4 Write Hold: 3 Yes, I use the EDMA interrupt routine to handle transfer completion event of each channel. Can you give me some hint about how to check if any other interrupt is holding off the EDMA interrupt? Brad Griffis wrote:> Your cache configuration is fine. According to your CCFG settingsyou have> the cache disabled (i.e. L2 memory is configured as SRAM). > > I'm a little puzzled at the transfer times you're reporting. Basedon your> setup, hold, and strobe times (34 cycles total) and the amount ofdata you> need to transfer (112 32-bit elements) I don't see how you could ever> transfer it in 20us. I calculate it should take about 47 us. > > I think your problem is most likely linked to interrupt latency. You> mentioned that you start your EDMA transfer in "T3". However, youdid not> specify how you measure when it has ended. I'm assuming your usingan EDMA> completion interrupt. Is that right? Probably what is happening isthat> some other interrupt is sometimes holding off execution of your EDMA > interrupt. > > Brad > >
Reply by ●December 14, 20042004-12-14
Are you using DSP/BIOS? If so, are you using the Interrupt Dispatcher (i.e. did you check the box on the "dispatcher" tab of the HWI properties)? If you're using the Interrupt Dispatcher that should make things easier as the RTOS kernel will always decide which thread should be running. Any HWI of higher priority than your EDMA completion interrupt will hold it off from running if the HWI is triggered. The other possibility is that some uninterruptible code is running. Whenever you are in the delay slots of a branch on the 64xx you are uninterruptible. This is especially evident in a very tight loop (common in filters, etc.) since you remain in the delay slots for the entire duration of the loop if the loop is less than 5 cycles total. This is what I was alluding to when asking if you were using any of the dsp libraries (dsplib). These libraries are C callable assembly code libraries for things such as filters, etc. written by TI. Brad "Rose" <enow@sina.com> wrote in message news:1103007652.769273.215670@c13g2000cwb.googlegroups.com...> Sorry, I made a mistake with the EMIFA config. The parameters I used > are: > Write Strobe: 9 > Write Setup: 4 > Write Hold: 3 > > Yes, I use the EDMA interrupt routine to handle transfer completion > event of each channel. Can you give me some hint about how to check if > any other interrupt is holding off the EDMA interrupt? > > Brad Griffis wrote: >> Your cache configuration is fine. According to your CCFG settings > you have >> the cache disabled (i.e. L2 memory is configured as SRAM). >> >> I'm a little puzzled at the transfer times you're reporting. Based > on your >> setup, hold, and strobe times (34 cycles total) and the amount of > data you >> need to transfer (112 32-bit elements) I don't see how you could ever > >> transfer it in 20us. I calculate it should take about 47 us. >> >> I think your problem is most likely linked to interrupt latency. You > >> mentioned that you start your EDMA transfer in "T3". However, you > did not >> specify how you measure when it has ended. I'm assuming your using > an EDMA >> completion interrupt. Is that right? Probably what is happening is > that >> some other interrupt is sometimes holding off execution of your EDMA >> interrupt. >> >> Brad >> >> >






