Forums

Occasional code restart.

Started by Roberto Bonacina May 5, 2005
On DSP56F807, I found a condition on which occasional code restart
happens.

The code in execution when the code restarting occurs is the following
(C code and relative disassembly):

// CAN Rx interrupt disabled during queue counter manipulation.
periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
P:00003471: 80F411850001 bfclr #0x1,X:0x1185
can_rx_queue.counter--;
P:00003474: F0540429 move X:0x0429,X0
P:00003476: 6411 decw X0
P:00003477: D0540429 move X0,X:0x0429
periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
P:00003479: 82F411850001 bfset #0x1,X:0x1185

Occasionally, during the execution of this snippet, the bfset
instruction is not reached and the code restarts from the beginning
(P:0x0000 or so), probably because a CAN RX interrupt occurs after the
bfclr instruction.

The snippet is part of a function that parses a CAN message, and it is
periodically called (every 10 ms).

I have reproduced the effect in lab sending the following CAN message to
the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0 0x4C
0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a maximum
time of 30 minutes (on field, the event is very rare -and hard to debug-
because the same message is sent every 2 seconds).

It seems to be a pipeline dependency, in the case CAN RX interrupt
occurs in the critical code section.

The problem seems to disappear modifying the code this way:

// CAN Rx interrupt disabled during queue counter manipulation.
periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
asm(nop);
can_rx_queue.counter--;
periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);

or this one:

// CAN Rx interrupt disabled during queue counter manipulation.
periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
asm(decw can_rx_queue.counter);
periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);

Is there someone that had similar things, and knows details? Are there
other cases in which we have to pay particular attention, other than the
ones documented in Freescale FAQs and Errata?

Best Regards,
Roberto Bonacina


--- In motoroladsp@moto..., Roberto Bonacina <rbonacina@r...>
wrote:
> On DSP56F807, I found a condition on which occasional code restart
> happens.
>
> The code in execution when the code restarting occurs is the
following
> (C code and relative disassembly):
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> P:00003471: 80F411850001 bfclr #0x1,X:0x1185
> can_rx_queue.counter--;
> P:00003474: F0540429 move X:0x0429,X0
> P:00003476: 6411 decw X0
> P:00003477: D0540429 move X0,X:0x0429
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> P:00003479: 82F411850001 bfset #0x1,X:0x1185

According to FAQ 24969 you need at least one nop between the
instruction
that disables an interrupt and the code that is supposed to be
protected.

> Occasionally, during the execution of this snippet, the bfset
> instruction is not reached and the code restarts from the beginning
> (P:0x0000 or so), probably because a CAN RX interrupt occurs after
the
> bfclr instruction.

What causes the device to reset? Are you sure? You can track down
whether the reset was caused by the COP or POR by checking the SYS_STS
(System Status) register in your boot code and reporting the result.

If this doesn't show what happened then you need to add diagnostics to
your vector table to determine what happened.

> The snippet is part of a function that parses a CAN message, and it
is
> periodically called (every 10 ms).
>
> I have reproduced the effect in lab sending the following CAN
message to
> the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0
0x4C
> 0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a
maximum
> time of 30 minutes (on field, the event is very rare -and hard to
debug-
> because the same message is sent every 2 seconds).
>
> It seems to be a pipeline dependency, in the case CAN RX interrupt
> occurs in the critical code section.
>
> The problem seems to disappear modifying the code this way:
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> asm(nop);
> can_rx_queue.counter--;
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
>

According to FAQ 24969 you need at least one nop between the
instruction
that disables an interrupt and the code that is supposed to be
protected. > or this one:
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> asm(decw can_rx_queue.counter);
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
>
> Is there someone that had similar things, and knows details? Are
there
> other cases in which we have to pay particular attention, other
than the
> ones documented in Freescale FAQs and Errata?
>
> Best Regards,
> Roberto Bonacina


Really, the FAQ 24969 talks about adding a nop after the interrupt
masking in the status register, and not about specific interrupt
masking. As you can see, I am only masking the CAN RX interrupt in the
CAN register. Or maybe the FAQ is incomplete?
Then, you need to know that if I replace the periphBitClear and
periphBitSet instructions with the archDisableInt and archEnableInt
instructions (that work on status register), the problem disappears,
without inserting nop's. So, I assume that the problem could be related
to a pipeline dependency or something similar (a DSP errata), when
disabling the CAN RX int and during the next three instructions an
interrupts occurs (probably the CAN RX int itself), and this is messing
up the DSP. I intend: it happens during the next three instructions of
my specific sequence, because trying other sequences the problem
disappears.

The device is not resetting itself by external causes. I do traced the
SYS_STS: POR and EXTR are not occurring; I didn't take care of COP
because I don't use it (I have an external watchdog/supervisor), and
even if this is the cause, you should explain me the fault model.
Anyway, there is a consistent way to reproduce the problem, and I think
I have documented it sufficiently (one thing in particular was missing:
my CAN speed is 62.5 kbps): I was expecting Freescale to reproduce the
problem and find out what is happening; I cannot do a lot more with my
means because I have no way to trace the program counter step-by-step,
but I think Freescale can.

My vector table is indirectly traced because if an unhandled interrupt
occurs, an infinite loop callback is called, and in this case the
external watchdog will reset (EXTR flag); if a handled interrupt occurs,
the SDK function insterruptXX.asm is called, and this calls the
FastDispatcher which, in turn, calls the Dispatcher (I don't have fast
interrupts). The Dispatcher has been modified in order to trace the
interrupt code, the Dispatcher address itself and the return address.
Well, when the program restarts I don't have the Dispatcher trace in
proximity of last things traced, so if a handled interrupt has occurred
(that is my opinion), it was generating troubles before calling the
Dispatcher.

As I already said, my guess is that Freescale tries to reproduce the
problem and tries to find out what is happening, but until now I
received only interlocutory answers (I even opened with Freescale the
service request 1-186722008 on 28.04.2005, with still no significant
results, and with Metrowerks the service request 1-61031541 on
28.04.2005, with the same result as Freescale). Ok, I exactly described
the problem on 05.05.2005, but now a week has gone...

Last, I want to know that, as a customer, I'm quite unsatisfied of
Frescale/Metrowerks products relatively to DPS568xx (why should I
C-programmer take care of pipeline dependencies? Which is useful for the
DSP itself and/or the compiler/assembler?) and of support, which seems
has some difficulties to focus the problems.

Waiting for feedback and hoping for the best of it.

Best Regards,
Roberto Bonacina

-----Messaggio originale-----
Da: motoroladsp@moto... [mailto:motoroladsp@moto...] Per
conto di Michael W. Mann
Inviato: gioved12 maggio 2005 19.13
A: motoroladsp@moto...
Oggetto: [motoroladsp] Re: Occasional code restart. --- In motoroladsp@moto..., Roberto Bonacina <rbonacina@r...>
wrote:
> On DSP56F807, I found a condition on which occasional code restart
> happens.
>
> The code in execution when the code restarting occurs is the
following
> (C code and relative disassembly):
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> P:00003471: 80F411850001 bfclr #0x1,X:0x1185
> can_rx_queue.counter--;
> P:00003474: F0540429 move X:0x0429,X0
> P:00003476: 6411 decw X0
> P:00003477: D0540429 move X0,X:0x0429
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> P:00003479: 82F411850001 bfset #0x1,X:0x1185

According to FAQ 24969 you need at least one nop between the
instruction
that disables an interrupt and the code that is supposed to be
protected.

> Occasionally, during the execution of this snippet, the bfset
> instruction is not reached and the code restarts from the beginning
> (P:0x0000 or so), probably because a CAN RX interrupt occurs after
the
> bfclr instruction.

What causes the device to reset? Are you sure? You can track down
whether the reset was caused by the COP or POR by checking the SYS_STS
(System Status) register in your boot code and reporting the result.

If this doesn't show what happened then you need to add diagnostics to
your vector table to determine what happened.

> The snippet is part of a function that parses a CAN message, and it
is
> periodically called (every 10 ms).
>
> I have reproduced the effect in lab sending the following CAN
message to
> the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0
0x4C
> 0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a
maximum
> time of 30 minutes (on field, the event is very rare -and hard to
debug-
> because the same message is sent every 2 seconds).
>
> It seems to be a pipeline dependency, in the case CAN RX interrupt
> occurs in the critical code section.
>
> The problem seems to disappear modifying the code this way:
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> asm(nop);
> can_rx_queue.counter--;
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
>

According to FAQ 24969 you need at least one nop between the
instruction
that disables an interrupt and the code that is supposed to be
protected. > or this one:
>
> // CAN Rx interrupt disabled during queue counter manipulation.
> periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
> asm(decw can_rx_queue.counter);
> periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg);
>
> Is there someone that had similar things, and knows details? Are
there
> other cases in which we have to pay particular attention, other
than the
> ones documented in Freescale FAQs and Errata?
>
> Best Regards,
> Roberto Bonacina
To