Forums

Dredge Up ADSP-21161 RTI(DB) Thread

Started by Unknown May 23, 2006
Does anyone know if this anomaly had / has a resolution or work-around?
 ( See below for old posts )

Here is what my group is experiencing:

My project uses a 21161 with the following architecture:

3 x 16-bit SRAM ( for execution of 48-bit instruction )
16-bit flash for NV storage or program and data

2 ADC connected via SPORT0 and SPORT2

UART connected to IRQ0

Memory mapped custom ARINC-429 Communications PLD connected to IRQ2

We are using v3.5 of the tools.  Majority of the code is written in
C.  The only assembly is that written by Analog Devices and slightly
modified ( mostly to remove unused code ).  We are not using the VDK.

We run the majority of our code out of external SRAM.  The Analog
Devices libraries that we used are located in internal RAM, bank 0.
We chose to use the provided interrupt() handling for all of our
interrupts.

We are seeing a problem where our system will reset aperiodically.
It was first seen in the field, but by increasing ARINC bus traffic,
we are able to make it happen more frequently.  The symptoms are
varied, and we are having a very difficult time tracking down the
problem.

Symptoms include:
- Failed CRC checks on program memory, and configurable parameter
memory ( configurable parameters stored in external SRAM during
execution ).  We force a reset in these cases.

- Corrupt Data being transmitted on ARINC.

- Processor execution getting 'lost' and the watchdog resets.

- Changes in code ( adding / removing instructions ) can make the
problem better or worse.

We have implemented a scheme to instrument where the code is
executing ( writing data to unused external RAM ) and then dumping
that data out upon reset.  I am not seeing anything specific in the
execution where the instrumentation stops after a certain ISR or
function.  With reduced incoming ARINC bus traffic ( only essential
data ), I was able to run the system for 6 days without it resetting.


With the sporadic nature of the problem, we are focusing on the
interrupt handling.  We have already converted the UART to be polled
instead of interrupt driven, but that hasn't eliminated our problem.

Questions:
1.  Is anyone using 21161 with the interrupt handling provided by
Analog Devices ( vectoring scheme, we have our own C handlers ).

2.  Has anyone had any problems similar to this?

3.  I'd like to use the HW break points to try and catch
the 'culprit' code corrupting memory that is not supposed to be
written to after it is loaded.  Anyone familiar with using them?  I
tried once, but it seemed to break all the time.  But when it broke
it did not appear that break happened when I asked it to.

4.  Are there any schemes for protecting areas of external RAM that
don't require HW?

5.  Any ideas on how to track down something that is trampling
registers or memory, but not in any sort of identifiable pattern?

6.  If we migrate to v4.0 of the tools, we will need to perform a
significant amount of re-test and documentation and delay the
delivery of our product to our customer ( probably 2 months and we
are already behind schedule ).  Is it worth the effort to upgrade?


Thanks in advance for any comments, suggestions, wisdom, etc. that
you can provide.

Regards,

Robert Allen
Senior Software Engineer
Goodrich Sensor Systems








>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Old Posts >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I spent a bunch of time tracking down an issue with the 21161 SHARC, so I thought I'd mention it to you guys. The issue has to do with the timing of the automatic context switching on returning from interrupts, specifically with SIMD mode and a delayed RTI instruction. On the SHARC, when certain interrupts are received, the MODE1 and ASTAT registers are automatically pushed onto the stack, and then popped off when the interrupt routine is finished. This all happens transparently to the user. When returning from an interrupt, you can use the syntax RTI (DB); (delayed branch) so that you can squeeze in 2 more instruction instead of suffering a 2-cycle pipeline hit. It appears that the timing of the automatic popping of the MODE1 register is such that the 2nd delayed instruction executes with the incorrect SIMD mode setting. An example will illustrate. main code: assume SIMD mode is always turned on here interrupt code: interrupt happens, MODE1 is pushed interrupt code explicitly turns SIMD off and does processing at the end of the ISR: RTI (DB); r0 = 1234; //first instruction dm(0x50000) = r0; //second instruction //return actually happens here The result of this code is that the second instruction, dm(0x50000) = r0 performs the additional (aka implicit) write dm(0x50001) = s0 because SIMD mode was on in the interrupted code. This will corrupt location 0x50001, which is what was happening to me. Does this sound like a hardware anomaly to you guys? If so, who should I report it to? The documentation seems to be silent on the timing of the popping of the status registers and a delayed RTI. Reply Rate this post: Text for clearing space From: Mattias Schick - view profile Date: Mon, May 24 2004 4:35 am Email: "Mattias Schick" <mattias.schick@lawo.de> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Hi Jon, that is not an anomaly of the chip. The "effect latency" of the MODEx and ASTATx-registers is one cycle. (HW-RefMan Table 3-2). So the first instr. in the delayed branch is executed with the unchanged state of MODE1 and ASTAT, the second instr. with the popped values. Mattias Reply Rate this post: Text for clearing space From: Andor Bariska - view profile Date: Mon, May 24 2004 4:35 am Email: Andor Bariska <an2or@nospam.net> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon Harris wrote: ...
> On the SHARC, when certain interrupts are received, the MODE1 and ASTAT > registers are automatically pushed onto the stack, and then popped off when the > interrupt routine is finished. This all happens transparently to the user. > When returning from an interrupt, you can use the syntax RTI (DB); (delayed > branch) so that you can squeeze in 2 more instruction instead of suffering a > 2-cycle pipeline hit.
Jon, are you sure the SHARC does not behave the same way? I would think that the automatic pop of the status stack at the RTI has the same latency as the "pop sts" instruction, or a direct write to the mode1 register, namely one instruction cycle. This would be consistent with you finding that the second instruction after the RTI already reacts to the popped mode1 register. But, as you said, the manual is silent about this. I think I also noticed different behaviour in the simulator as compared to the target hardware in this issue. Regards, Andor Reply Rate this post: Text for clearing space From: Jon Harris - view profile Date: Mon, May 24 2004 11:22 am Email: "Jon Harris" <goldentully@hotmail.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author "Andor Bariska" <an2or@nospam.net> wrote in message news:40b1c1fa$1@pfaff2.ethz.ch...
> Jon Harris wrote: > ... > > On the SHARC, when certain interrupts are received, the MODE1 and ASTAT > > registers are automatically pushed onto the stack, and then popped off when
the
> > interrupt routine is finished. This all happens transparently to the user. > > When returning from an interrupt, you can use the syntax RTI (DB); (delayed > > branch) so that you can squeeze in 2 more instruction instead of suffering a > > 2-cycle pipeline hit.
> Jon, are you sure the SHARC does not behave the same way?
I'm confused about this. I'm only talking about the SHARC.
> I would think > that the automatic pop of the status stack at the RTI has the same > latency as the "pop sts" instruction, or a direct write to the mode1 > register, namely one instruction cycle.
> This would be consistent with you finding that the second instruction > after the RTI already reacts to the popped mode1 register.
> But, as you said, the manual is silent about this. I think I also > noticed different behaviour in the simulator as compared to the target > hardware in this issue.
Another odd thing I noticed is that the effect of the register/DAG primary/secondary select bits also in MODE1 seems to be different than the SIMD bit. I haven't confirmed this for sure, but it sure looks like that 2nd instruction after the RTI still uses the interrupt code's register/DAG select setting rather than that of the interrupted code. I guess the bottom line here is to use RTI(DB) with caution! Reply Rate this post: Text for clearing space From: Bernhard Holzmayer - view profile Date: Tues, May 25 2004 12:31 am Email: Bernhard Holzmayer <holzmayer.bernhard@deadspam.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon Harris wrote:
> ... > I guess the bottom line here is to use RTI(DB) with caution!
So do I... I was just tracing a bug around such a RTI(db) command, and I found that it works fine if I code it as: pop sts; rti; while it fails if I modify it to rti(db) pop sts; nop; I watched another related issue (which meanwhile made it into the anomaly list): the integrated status push of mode1 and astat fails under certain circumstances. Eventually, an additional <push sts>-cmd (and the corresponding pop sts at the return location) helped. Newest versions of VisualDSP should handle this correctly. Concerning your initial topic: I checked the anomaly list, but didn't find a corresponding entry. However, in the anomaly list for the 21160 (issue 16) reveals a difference depending on how mode1 register is written. In combination with the strange behaviour I watched, I guess that there are risks around rti(db) in combination with the status stack recovery. While it's recommended to either place a nop after a pop sts or avoid critical accesses immediately after, I assume that this is necessary too, if you deal with the implicit pop sts which rti(db) does, if recurring from distinct interrupts like IRQ0..2 Problems might increase, if you have nested interrupts enabled. I'm wondering if it works correctly, if a higher priority interrupt occurs during the rti(db) ...?? Bernhard Reply Rate this post: Text for clearing space From: Andor - view profile Date: Tues, May 25 2004 2:16 am Email: an2or@mailcircuit.com (Andor) Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon Harris wrote:
> "Andor Bariska" <an2or@nospam.net> wrote in message
...
> > Jon, are you sure the SHARC does not behave the same way?
> I'm confused about this. I'm only talking about the SHARC.
Sorry. I thought you were differentiating between SHARC (2106x) family and Hammerhead (21161). It was my confusion. - Hide quoted text - - Show quoted text -
> > I would think > > that the automatic pop of the status stack at the RTI has the same > > latency as the "pop sts" instruction, or a direct write to the mode1 > > register, namely one instruction cycle.
> > This would be consistent with you finding that the second instruction > > after the RTI already reacts to the popped mode1 register.
> > But, as you said, the manual is silent about this. I think I also > > noticed different behaviour in the simulator as compared to the target > > hardware in this issue.
> Another odd thing I noticed is that the effect of the register/DAG > primary/secondary select bits also in MODE1 seems to be different than the SIMD > bit. I haven't confirmed this for sure, but it sure looks like that 2nd > instruction after the RTI still uses the interrupt code's register/DAG select > setting rather than that of the interrupted code.
I'm sure this isn't the case with the SHARC - don't know (but suspect the same) for the Hammerhead.
> I guess the bottom line here is to use RTI(DB) with caution!
Definitely. I once had a bug very similar to yours - popping the status stack in the second instruction after a RTI (DB) :-). The simple solution was to exchange last with second to last instruction in the interrupt routine. Regards, Andor Reply Rate this post: Text for clearing space From: Jon Harris - view profile Date: Tues, May 25 2004 10:27 am Email: "Jon Harris" <goldentully@hotmail.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Andor <an2or@mailcircuit.com> wrote in message news:ce45f9ed.0405242316.a96ccef@posting.google.com...
> Jon Harris wrote: > > "Andor Bariska" <an2or@nospam.net> wrote in message > ... > > > Jon, are you sure the SHARC does not behave the same way?
> > I'm confused about this. I'm only talking about the SHARC.
> Sorry. I thought you were differentiating between SHARC (2106x) family > and Hammerhead (21161). It was my confusion.
I see. In my usage, SHARC refers to anything in the 21x6x family. It looks to me that ADI really isn't using the Hammerhead name much anymore. It doesn't seem to appear in 21161 documentation, and a web search at analog.com gets only a few hits. I'm going to use part numbers for here on out to avoid ambiguity. I have code for both 21065L and 21161. The problem was originally noticed on the 21161 in conjunction with SIMD mode. I then went back to look at the 21065L code which is a bit different. The strange thing was that it looked like it should have had a similar problem with register and DAG select but didn't! - Hide quoted text - - Show quoted text -
> > > I would think > > > that the automatic pop of the status stack at the RTI has the same > > > latency as the "pop sts" instruction, or a direct write to the mode1 > > > register, namely one instruction cycle.
> > > This would be consistent with you finding that the second instruction > > > after the RTI already reacts to the popped mode1 register.
> > > But, as you said, the manual is silent about this. I think I also > > > noticed different behaviour in the simulator as compared to the target > > > hardware in this issue.
> > Another odd thing I noticed is that the effect of the register/DAG > > primary/secondary select bits also in MODE1 seems to be different than the
SIMD
> > bit. I haven't confirmed this for sure, but it sure looks like that 2nd > > instruction after the RTI still uses the interrupt code's register/DAG
select
> > setting rather than that of the interrupted code.
> I'm sure this isn't the case with the SHARC - don't know (but suspect > the same) for the Hammerhead.
When I have a chance I'd like to re-try this on both 21161 and 21065L and see what the behavior is.
> > I guess the bottom line here is to use RTI(DB) with caution!
> Definitely. I once had a bug very similar to yours - popping the > status stack in the second instruction after a RTI (DB) :-). The > simple solution was to exchange last with second to last instruction > in the interrupt routine.
Glad I'm not alone! The fact that we've both suffered from this indicates that ADI should include some info on this in their documentation. Reply Rate this post: Text for clearing space From: Jon Harris - view profile Date: Tues, May 25 2004 6:36 pm Email: "Jon Harris" <goldentully@hotmail.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author "Jon Harris" <goldentully@hotmail.com> wrote in message news:2hh6p7Fd0kbfU1@uni-berlin.de... - Hide quoted text - - Show quoted text -
> Andor <an2or@mailcircuit.com> wrote in message > news:ce45f9ed.0405242316.a96ccef@posting.google.com...
> I have code for both 21065L and 21161. The problem was originally noticed on > the 21161 in conjunction with SIMD mode. I then went back to look at the
21065L
> code which is a bit different. The strange thing was that it looked like it > should have had a similar problem with register and DAG select but didn't!
> > > > I would think > > > > that the automatic pop of the status stack at the RTI has the same > > > > latency as the "pop sts" instruction, or a direct write to the mode1 > > > > register, namely one instruction cycle.
> > > > This would be consistent with you finding that the second instruction > > > > after the RTI already reacts to the popped mode1 register.
> > > > But, as you said, the manual is silent about this. I think I also > > > > noticed different behaviour in the simulator as compared to the target > > > > hardware in this issue.
> > > Another odd thing I noticed is that the effect of the register/DAG > > > primary/secondary select bits also in MODE1 seems to be different than the > SIMD > > > bit. I haven't confirmed this for sure, but it sure looks like that 2nd > > > instruction after the RTI still uses the interrupt code's register/DAG > select > > > setting rather than that of the interrupted code.
> > I'm sure this isn't the case with the SHARC - don't know (but suspect > > the same) for the Hammerhead.
> When I have a chance I'd like to re-try this on both 21161 and 21065L and see > what the behavior is.
OK, I just tested this and have a definitive answer. The upshot is that changes to the alternate register bits in MODE1 do not take place on the second instruction after RTI (DB), but changes to the SIMD bit does. This sounds like an anomaly to me--either one or the other is wrong. Here is my test (in a mix of pseudo code and assembler): Test A: Main loop code is running with alternate registers selected Interrupt code: set r14 and r15 primary and alternate to known values select primary registers RTI(DB); dm(test1) = r14; // instruction 1 dm(test2) = r15; // instruction 2 Results: test1 and test2 both receive the values in primary registers. This means that the change to MODE1 did NOT take effect until after instruction 2. Behavior was the same between 21065L and 21161. Test B: Main loop code is running with SIMD bit turned on Interrupt code: turn off SIMD mode set r14/s14 and r15/s15 primary and alternate to known values RTI(DB); dm(test1) = r14; // instruction 1 dm(test2) = r15; // instruction 2 Results: test1 receives the value in primary r14. Location test1+1 is not affected (no SIMD write). test2 receives the value in primary r15. Location test2+1 receives the values in primary s15 (SIMD write occurred). This means that the change to MODE1 took effect just before instruction 2. This was on the 21161. I could not try Test B on the 21065L because it doesn't have SIMD mode. Conclusion: the effect latency for MODE1 is not consistent among various bits in the case of an automatic pop of STS in an RTI(DB) instruction. The SIMD bit takes affect before the alternate register select bits. Reply Rate this post: Text for clearing space From: Bernhard Holzmayer - view profile Date: Wed, May 26 2004 12:26 am Email: Bernhard Holzmayer <holzmayer.bernhard@deadspam.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon Harris wrote:
> ... > Conclusion: the effect latency for MODE1 is not consistent among > various bits in > the case of an automatic pop of STS in an RTI(DB) instruction. > The SIMD bit takes affect before the alternate register select > bits.
Thanks for this examination. If this is true (and I see no reason to doubt it...), it will certainly be a great help to avoid conflicts. Bernhard Reply Rate this post: Text for clearing space From: Mattias Schick - view profile Date: Wed, May 26 2004 3:17 am Email: "Mattias Schick" <mattias.schick@lawo.de> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Reply Rate this post: Text for clearing space From: Ron Huizen - view profile Date: Wed, May 26 2004 1:47 pm Email: "Ron Huizen" <rhuizen@bittware.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon, In response to your earlier question of how to report an anomaly, send a message to dsp.support@analog.com with the details, including your test code. This will get the process started. The more people report their problems, the better. It won't even be considered to be fixed in new silicon if they don't know about it, and even if it never gets fixed, at least if it's documented people shouldn't get burned (as badly) by it. All silicon has anomalies, some companies do better than others at making them available to the users. ADI does a decent job, posting them all on their web site. While some people are shocked to learn all products (including DSPs) have problems, good engineers understand that everything has problems, and knowing about them is a heck of a lot better than not. Ron ----------- Ron Huizen BittWare "Jon Harris" <goldentully@hotmail.com> wrote in message news:2hi37dFdbdiiU1@uni-berlin.de... - Hide quoted text - - Show quoted text -
> "Jon Harris" <goldentully@hotmail.com> wrote in message > news:2hh6p7Fd0kbfU1@uni-berlin.de... > > Andor <an2or@mailcircuit.com> wrote in message > > news:ce45f9ed.0405242316.a96ccef@posting.google.com...
> > I have code for both 21065L and 21161. The problem was originally
noticed on
> > the 21161 in conjunction with SIMD mode. I then went back to look at
the
> 21065L > > code which is a bit different. The strange thing was that it looked
like it
> > should have had a similar problem with register and DAG select but
didn't!
> > > > > I would think > > > > > that the automatic pop of the status stack at the RTI has the same > > > > > latency as the "pop sts" instruction, or a direct write to the
mode1
> > > > > register, namely one instruction cycle.
> > > > > This would be consistent with you finding that the second
instruction
> > > > > after the RTI already reacts to the popped mode1 register.
> > > > > But, as you said, the manual is silent about this. I think I also > > > > > noticed different behaviour in the simulator as compared to the
target
> > > > > hardware in this issue.
> > > > Another odd thing I noticed is that the effect of the register/DAG > > > > primary/secondary select bits also in MODE1 seems to be different
than the
> > SIMD > > > > bit. I haven't confirmed this for sure, but it sure looks like that
2nd
> > > > instruction after the RTI still uses the interrupt code's
register/DAG
> > select > > > > setting rather than that of the interrupted code.
> > > I'm sure this isn't the case with the SHARC - don't know (but suspect > > > the same) for the Hammerhead.
> > When I have a chance I'd like to re-try this on both 21161 and 21065L
and see
> > what the behavior is.
> OK, I just tested this and have a definitive answer. The upshot is that
changes
> to the alternate register bits in MODE1 do not take place on the second > instruction after RTI (DB), but changes to the SIMD bit does. This sounds
like
> an anomaly to me--either one or the other is wrong.
> Here is my test (in a mix of pseudo code and assembler):
> Test A: Main loop code is running with alternate registers selected
> Interrupt code: > set r14 and r15 primary and alternate to known values > select primary registers > RTI(DB); > dm(test1) = r14; // instruction 1 > dm(test2) = r15; // instruction 2
> Results: test1 and test2 both receive the values in primary registers.
This
> means that the change to MODE1 did NOT take effect until after instruction
2.
> Behavior was the same between 21065L and 21161.
> Test B: Main loop code is running with SIMD bit turned on
> Interrupt code: > turn off SIMD mode > set r14/s14 and r15/s15 primary and alternate to known values > RTI(DB); > dm(test1) = r14; // instruction 1 > dm(test2) = r15; // instruction 2
> Results: test1 receives the value in primary r14. Location test1+1 is not > affected (no SIMD write). test2 receives the value in primary r15.
Location
> test2+1 receives the values in primary s15 (SIMD write occurred). This
means
> that the change to MODE1 took effect just before instruction 2. This was
on the
> 21161. I could not try Test B on the 21065L because it doesn't have SIMD
mode.
> Conclusion: the effect latency for MODE1 is not consistent among various
bits in
> the case of an automatic pop of STS in an RTI(DB) instruction. The SIMD
bit
> takes affect before the alternate register select bits.
Reply Rate this post: Text for clearing space From: Andor - view profile Date: Wed, May 26 2004 5:02 pm Email: an2or@mailcircuit.com (Andor) Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author Jon Harris wrote: ...
> OK, I just tested this and have a definitive answer.
Thanks for the update. Jon, as far as I can see you are not a member of the ADSP e-mail group. This would be the perfect place to post this information, and we sure can use every experienced hand available :). http://groups.yahoo.com/group/adsp Regards, Andor Reply Rate this post: Text for clearing space From: Jon Harris - view profile Date: Thurs, May 27 2004 5:05 pm Email: "Jon Harris" <goldentully@hotmail.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author OK, I sent the information to ADI DSP support and also signed up for ADSP group. I'm not sure how much time I'll have to spend in that group, but I'll give it a try. -Jon "Andor" <an2or@mailcircuit.com> wrote in message news:ce45f9ed.0405261402.1e68d3cc@posting.google.com... - Hide quoted text - - Show quoted text -
> Jon Harris wrote: > ... > > OK, I just tested this and have a definitive answer.
> Thanks for the update.
> Jon, as far as I can see you are not a member of the ADSP e-mail > group. This would be the perfect place to post this information, and > we sure can use every experienced hand available :).
> http://groups.yahoo.com/group/adsp
> Regards, > Andor
Reply Rate this post: Text for clearing space From: Jaime Andres Aranguren Cardona - view profile Date: Thurs, May 27 2004 9:20 pm Email: jaime.aranguren@ieee.org (Jaime Andres Aranguren Cardona) Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author an2or@mailcircuit.com (Andor) wrote in message <news:ce45f9ed.0405261402.1e68d3cc@posting.google.com>...
> Jon Harris wrote: > ... > > OK, I just tested this and have a definitive answer.
> Thanks for the update.
> Jon, as far as I can see you are not a member of the ADSP e-mail > group. This would be the perfect place to post this information, and > we sure can use every experienced hand available :).
> http://groups.yahoo.com/group/adsp
I agree that joining that group is very useful for asking and answering questions regarding ADI's chips and tools. However, I think that a good practice is to crosspost to comp.dsp too, because that way your questions and answers can be viewed by a broader audience. Regards, JaaC - Hide quoted text - - Show quoted text -
> Regards, > Andor
Reply Rate this post: Text for clearing space From: Jon Harris - view profile Date: Wed, Jun 2 2004 8:05 pm Email: "Jon Harris" <goldentully@hotmail.com> Groups: comp.dsp Not yet ratedRating: show options Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author An update, I received this reply from Analog Devices DSP support: Thank you for contacting Analog Devices DSP Support. We were able to replicate your problem on the ADSP-21161 EZ-KIT Lite. Please note that the status stack (ASTAT and mode1 register) should get popped only after executing the two instructions that are followed by the delayed branch instruction. That means if instructions RTI(db); dm(test1) = r14; dm(test2) = r15 are executed, status stack should be popped only after executing the instruction dm(test2) = r15. The behavior that you/we are seeing in case of alternate register set (SRRFH) is perfectly fine. But in case of SIMD enable bit (PEYEN), you/we have observed that status stack is getting popped immediately after executing rti instruction. As there is an effect latency of one cycle, you/we have observed that both implicit and explicit part of the instruction is executed only for dm(test2) = r15 and not for dm(test1) = r14. We need to confirm this behavior from the design team before documenting this. We shall keep you posted on this. "Jon Harris" <goldentully@hotmail.com> wrote in message news:2hi37dFdbdiiU1@uni-berlin.de.
rallen_prsch911@yahoo.com wrote:

> Does anyone know if this anomaly had / has a resolution or work-around? > ( See below for old posts )
[old posts and much else snipped]
> Here is what my group is experiencing: > > My project uses a 21161 with the following architecture: > > 3 x 16-bit SRAM ( for execution of 48-bit instruction ) > 16-bit flash for NV storage or program and data > > 2 ADC connected via SPORT0 and SPORT2 > > UART connected to IRQ0 > > Memory mapped custom ARINC-429 Communications PLD connected to IRQ2 > > We are using v3.5 of the tools. Majority of the code is written in > C. The only assembly is that written by Analog Devices and slightly > modified ( mostly to remove unused code ). We are not using the VDK. > > We run the majority of our code out of external SRAM. The Analog > Devices libraries that we used are located in internal RAM, bank 0. > We chose to use the provided interrupt() handling for all of our > interrupts. > > We are seeing a problem where our system will reset aperiodically. > It was first seen in the field, but by increasing ARINC bus traffic, > we are able to make it happen more frequently. The symptoms are > varied, and we are having a very difficult time tracking down the > problem. > > Symptoms include:
... Have you considered that a sub- or interrupt routine might leave an extra push on the stack? Does any routine manipulate data on the stack in a way that would be disastrous if it were interrupted? Jerry -- Engineering is the art of making what you want from things you can get. &#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
Jerry,

Thanks for the reply.

The only code that is doing any stack manipulation is the code that
shipped with Analog Devices tools.  We are using their generic
interrupt dispatching with our own C interrupt handlers.  We do have
nested interrupts enabled, so we are allowing higher priority
interrupts to happen during lower priority interrupts.

Would you expect things to go wrong pretty quickly if we had a
mismatched push/pop pair?  Our system can run several hours without
having a reset, but it is possible that a certain combination of events
is causing a problem.  That is what I am trying to determine right now,
but it doesn't appear to be the case, so far.

I've looked at the PC Stack and the C run-time stack for overflows, but
I haven't found any evidence yet.

So far, there are only 2 things that seem to make the problem better:
  1.  Reduce the frequency of interrupts
  2.  Make a minor code change ( seems like moving things around
changes the behavior ).

Thanks again for your reply.

Rob


Jerry Avins wrote:
> rallen_prsch911@yahoo.com wrote: > > > Does anyone know if this anomaly had / has a resolution or work-around? > > ( See below for old posts ) > > [old posts and much else snipped] > > > Here is what my group is experiencing: > > > > My project uses a 21161 with the following architecture: > > > > 3 x 16-bit SRAM ( for execution of 48-bit instruction ) > > 16-bit flash for NV storage or program and data > > > > 2 ADC connected via SPORT0 and SPORT2 > > > > UART connected to IRQ0 > > > > Memory mapped custom ARINC-429 Communications PLD connected to IRQ2 > > > > We are using v3.5 of the tools. Majority of the code is written in > > C. The only assembly is that written by Analog Devices and slightly > > modified ( mostly to remove unused code ). We are not using the VDK. > > > > We run the majority of our code out of external SRAM. The Analog > > Devices libraries that we used are located in internal RAM, bank 0. > > We chose to use the provided interrupt() handling for all of our > > interrupts. > > > > We are seeing a problem where our system will reset aperiodically. > > It was first seen in the field, but by increasing ARINC bus traffic, > > we are able to make it happen more frequently. The symptoms are > > varied, and we are having a very difficult time tracking down the > > problem. > > > > Symptoms include: > > ... > > Have you considered that a sub- or interrupt routine might leave an > extra push on the stack? Does any routine manipulate data on the stack > in a way that would be disastrous if it were interrupted? > > Jerry > -- > Engineering is the art of making what you want from things you can get. > =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF= =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF
"rallen911" <rallen_prsch911@yahoo.com> wrote in
news:1148477743.733355.28300@j73g2000cwa.googlegroups.com: 

> Jerry, > > Thanks for the reply. > > The only code that is doing any stack manipulation is the code that > shipped with Analog Devices tools. We are using their generic > interrupt dispatching with our own C interrupt handlers. We do have > nested interrupts enabled, so we are allowing higher priority > interrupts to happen during lower priority interrupts. > > Would you expect things to go wrong pretty quickly if we had a > mismatched push/pop pair? Our system can run several hours without > having a reset, but it is possible that a certain combination of > events is causing a problem. That is what I am trying to determine > right now, but it doesn't appear to be the case, so far. > > I've looked at the PC Stack and the C run-time stack for overflows, > but I haven't found any evidence yet. > > So far, there are only 2 things that seem to make the problem better: > 1. Reduce the frequency of interrupts > 2. Make a minor code change ( seems like moving things around > changes the behavior ). > > Thanks again for your reply. > > Rob >
Since the problems tend to be "random", I would concentrate on the ISRs. It might be that something is changing in the ISR that is affecting either the foreground program or another ISR. You might start by unnesting interrupts. This will make the ISRs independent of each other. If an ISR is damaging another ISR, this should stop. I would look for wild pointers, instructions with effect latencies (which I don't remember specifics with the 21161), etc. I would also examine loops for proper exit and also examine code for CALLs near the end of a loop. It's been a long time since I coded for a 21161, so some of my advise might not be relevant. I would reread the user manual, with these issues in mind. Another completely different kind of problem could be due to hardware noise (or layout). I tend to think this is less likely due to the timing of the problem and your observation that increasing interrupt frequency has an effect. You could write a simple hello world program and run it for a long time. My guess is that it will run fine. Are you using old silicon? There was a problem with the 21161 PLL this this has been fixed for a long time. Given that you can influence the problem by 1 & 2, I think you will find its a software problem. -- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com
rallen_prsch911@yahoo.com wrote:
> Does anyone know if this anomaly had / has a resolution or work-around? > ( See below for old posts ) >
Have you checked for stack overflow? The 21xxx family uses i7 as the stack pointer, and the RTH sets it up as a circular buffer. The processor can generate an interrupt when you get a circular buffer wrap-around. If I were you, I'd turn that on and set a breakpoint at the CBI7 interrupt vector. As for the HW breakpoints, I found them to be most useful when used to define illegal areas - break when you try to execute outside your "regular" code, or try to access data from non-existent memory. But it has been a looong time since I've used them. I think upgrading from 3.5 to 4.0 would be unlikely to resolve anything. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 In theory, theory and practice are the same, but in practice, they're not
Jim,

My current efforts have instrumentation that takes a snapshot of the
current execution point ( I have an enumeration that tells where in the
program we are executing ), current C run-time stack ( I7 / M7 ) value,
and PCSKTP value.  I have the main loop instrumented, a few other
functions that we supsected, as well as the C interrupt handlers.

The last reset that I trapped, the PCSTKP was at 1 and the I7 / M7
values were well within their allocated stack area.

When you say the RTH sets it up as a CB, I'm not sure I understand what
you're saying.  I've looked at the startup code a bit, and I don't
remember seeing that.  If you could point me in the right direction, I
would appreciate it.

I also tried rebuilding the code source code with v4.0 and did not get
rid of the resetting.

rallen911 wrote:
> Jim, > > My current efforts have instrumentation that takes a snapshot of the > current execution point ( I have an enumeration that tells where in the > program we are executing ), current C run-time stack ( I7 / M7 ) value, > and PCSKTP value. I have the main loop instrumented, a few other > functions that we supsected, as well as the C interrupt handlers. > > The last reset that I trapped, the PCSTKP was at 1 and the I7 / M7 > values were well within their allocated stack area. > > When you say the RTH sets it up as a CB, I'm not sure I understand what > you're saying. I've looked at the startup code a bit, and I don't > remember seeing that. If you could point me in the right direction, I > would appreciate it.
IIRC, the RTH sets B7 equal to the lowest address in the stack (which is the last stack location to be used), and it sets the L7 register equal to the stack size. If the stack pointer wraps, you should get an interrupt on CBI7. If the RTH no longer does that, you can do it by hand - just be warned that anything you write to B7 is automatically written to I7 as well. If you're using I7 as the stack pointer at the time you do this, well... you know what happens. IMHO, the problem you are attacking right now is one of the most difficult types to find. Sometimes you find it's a heretofore unknown silicon bug. I have seen problems in older ADI floating-point chips where an interrupt on the second-to-the-last instruction of a counter-based loop caused problems, or where a hardware interrupt returning to an uncached program memory access zeroed the ASTAT register. These were exceedingly difficult to find. Good luck. Another thing you could try would be to crank up the interrupt frequency to force the problem to occur more quickly. I know that's hard to do in a live system. It could be that bringing more types of interrupts into the picture makes the bug worse (which really improves your chances of finding it). Are you using the timer? You could set it to a couple of microseconds and have an isr do almost nothing (timer_counter++;) and see if that has an impact. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 In theory, theory and practice are the same, but in practice, they're not
Ron Huizen wrote:
> We debugged a nasty problem with the 21160 some time ago, which also applies > to the 21161 (see anomaly #63 about DAG stalls). Perhaps Jim Thomas can > provide some more details, but I believe that a certain sequence of > instructions which cause a DAG stall made it act as though an external > access had occurred, and it ended up the standard ADI library (can't recall > if it was the ISR support) did this. > > The fix was a tools patch (again, can't recall which one, again, maybe Jim > can help) to avoid the situation. > > Sorry for the sketchy memory, but it was some time ago. We were the first > to find the anomaly, and it took a lot of digging. >
I had almost forgotten about that. I can't remember the details off the top of my head now anymore either. I'll dig around and see if I can't unearth the details. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 In theory, theory and practice are the same, but in practice, they're not
Jim Thomas wrote:
> Ron Huizen wrote: >> We debugged a nasty problem with the 21160 some time ago, which also >> applies to the 21161 (see anomaly #63 about DAG stalls). Perhaps Jim >> Thomas can provide some more details, but I believe that a certain >> sequence of instructions which cause a DAG stall made it act as though >> an external access had occurred, and it ended up the standard ADI >> library (can't recall if it was the ISR support) did this. >> >> The fix was a tools patch (again, can't recall which one, again, maybe >> Jim can help) to avoid the situation. >> >> Sorry for the sketchy memory, but it was some time ago. We were the >> first to find the anomaly, and it took a lot of digging. >> > > I had almost forgotten about that. I can't remember the details off the > top of my head now anymore either. I'll dig around and see if I can't > unearth the details. >
OK - it was definitely a silicon bug, but ADI choose to leave it there and change the tools so that the compiler no longer generated code that would make it happen. The code they used for returning from a C-call would cause this to happen, and when it did, we would see multiprocessor system hangs - one processor would want the bus, and assumed the dag-stalled processor had it. But the dag-stalled processor didn't think he had the bus at all, so he didn't release it. This doesn't sound too much like it's your problem, especially since you've switched to 4.0 of the tools, and I believe the work-around made it into that version. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 In theory, theory and practice are the same, but in practice, they're not