DSPRelated.com
Forums

ADSP-21161 Reset Problem

Started by rallen_prsch911 May 19, 2006
My project uses a 21161 with the following architecture:

3 x 16-bit SRAM ( for execution of 48-bit instruction )
16-bit flash for NV storage or program and data

2 ADC connected via SPORT0 and SPORT2

UART connected to IRQ0

Memory mapped custom ARINC-429 Communications PLD connected to IRQ2

We are using v3.5 of the tools. Majority of the code is written in
C. The only assembly is that written by Analog Devices and slightly
modified ( mostly to remove unused code ). We are not using the VDK.

We run the majority of our code out of external SRAM. The Analog
Devices libraries that we used are located in internal RAM, bank 0.
We chose to use the provided interrupt() handling for all of our
interrupts.

We are seeing a problem where our system will reset aperiodically.
It was first seen in the field, but by increasing ARINC bus traffic,
we are able to make it happen more frequently. The symptoms are
varied, and we are having a very difficult time tracking down the
problem.

Symptoms include:
- Failed CRC checks on program memory, and configurable parameter
memory ( configurable parameters stored in external SRAM during
execution ). We force a reset in these cases.

- Corrupt Data being transmitted on ARINC.

- Processor execution getting 'lost' and the watchdog resets.

- Changes in code ( adding / removing instructions ) can make the
problem better or worse.

We have implemented a scheme to instrument where the code is
executing ( writing data to unused external RAM ) and then dumping
that data out upon reset. I am not seeing anything specific in the
execution where the instrumentation stops after a certain ISR or
function. With reduced incoming ARINC bus traffic ( only essential
data ), I was able to run the system for 6 days without it resetting.
With the sporadic nature of the problem, we are focusing on the
interrupt handling. We have already converted the UART to be polled
instead of interrupt driven, but that hasn't eliminated our problem.

Questions:
1. Is anyone using 21161 with the interrupt handling provided by
Analog Devices ( vectoring scheme, we have our own C handlers ).

2. Has anyone had any problems similar to this?

3. I'd like to use the HW break points to try and catch
the 'culprit' code corrupting memory that is not supposed to be
written to after it is loaded. Anyone familiar with using them? I
tried once, but it seemed to break all the time. But when it broke
it did not appear that break happened when I asked it to.

4. Are there any schemes for protecting areas of external RAM that
don't require HW?

5. Any ideas on how to track down something that is trampling
registers or memory, but not in any sort of identifiable pattern?

6. If we migrate to v4.0 of the tools, we will need to perform a
significant amount of re-test and documentation and delay the
delivery of our product to our customer ( probably 2 months and we
are already behind schedule ). Is it worth the effort to upgrade?
Thanks in advance for any comments, suggestions, wisdom, etc. that
you can provide.

Regards,

Robert Allen
Senior Software Engineer
Goodrich Sensor Systems
On Fri, 19 May 2006, rallen_prsch911 wrote:

> We have implemented a scheme to instrument where the code is
> executing ( writing data to unused external RAM ) and then dumping
> that data out upon reset. I am not seeing anything specific in the
> execution where the instrumentation stops after a certain ISR or
> function. With reduced incoming ARINC bus traffic ( only essential
> data ), I was able to run the system for 6 days without it resetting.

Put a current meter on the system and see if it changes as a function of
bus traffic. You'll be lucky if it's a simple hardware fix, but it's
worth a look.

> 3. I'd like to use the HW break points to try and catch
> the 'culprit' code corrupting memory that is not supposed to be
> written to after it is loaded. Anyone familiar with using them? I
> tried once, but it seemed to break all the time. But when it broke
> it did not appear that break happened when I asked it to.

No, it has to empty the pipe. You'll stop 3 instructions after the
requested break. You can set EMUN via jtag and it will stop after that
registers number of counts. For additional possible ways to break see
chapter 10 in the 21160 hardware manual.

> 4. Are there any schemes for protecting areas of external RAM that
> don't require HW?

No, you can limit access to 4 different banks but there is no "supervisor"
mode like 68k micro's.

> 5. Any ideas on how to track down something that is trampling
> registers or memory, but not in any sort of identifiable pattern?

Yeah, but they are all a pain in the butt :-) Sounds like you are
already on the right track, more detailed hunting will get you there
eventually.

Check the stack for overflow conditions, you can only go 8 levels deep on
loops. Maybe some combination of interrupts and code causes a loop
counter overflow.

Try doing a memory halt - it may be that writing to a certain bank causes
you to load weird vectors later on.

Good luck! Sounds like a challenge that will be a great war story a year
or 2 from now :-)

Patience, persistence, truth,
Dr. mike