motoroladsp | Occasional code restart.

On DSP56F807, I found a condition on which occasional code restart happens. The code in execution when the code restarting occurs is the following (C code and relative disassembly): // CAN Rx interrupt disabled during queue counter manipulation. periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); P:00003471: 80F411850001 bfclr #0x1,X:0x1185 can_rx_queue.counter--; P:00003474: F0540429 move X:0x0429,X0 P:00003476: 6411 decw X0 P:00003477: D0540429 move X0,X:0x0429 periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); P:00003479: 82F411850001 bfset #0x1,X:0x1185 Occasionally, during the execution of this snippet, the bfset instruction is not reached and the code restarts from the beginning (P:0x0000 or so), probably because a CAN RX interrupt occurs after the bfclr instruction. The snippet is part of a function that parses a CAN message, and it is periodically called (every 10 ms). I have reproduced the effect in lab sending the following CAN message to the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0 0x4C 0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a maximum time of 30 minutes (on field, the event is very rare -and hard to debug- because the same message is sent every 2 seconds). It seems to be a pipeline dependency, in the case CAN RX interrupt occurs in the critical code section. The problem seems to disappear modifying the code this way: // CAN Rx interrupt disabled during queue counter manipulation. periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); asm(nop); can_rx_queue.counter--; periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); or this one: // CAN Rx interrupt disabled during queue counter manipulation. periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); asm(decw can_rx_queue.counter); periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); Is there someone that had similar things, and knows details? Are there other cases in which we have to pay particular attention, other than the ones documented in Freescale FAQs and Errata? Best Regards, Roberto Bonacina

Reply by Michael W. Mann ●May 12, 20052005-05-12

--- In motoroladsp@moto..., Roberto Bonacina <rbonacina@r...> wrote: > On DSP56F807, I found a condition on which occasional code restart > happens. > > The code in execution when the code restarting occurs is the following > (C code and relative disassembly): > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > P:00003471: 80F411850001 bfclr #0x1,X:0x1185 > can_rx_queue.counter--; > P:00003474: F0540429 move X:0x0429,X0 > P:00003476: 6411 decw X0 > P:00003477: D0540429 move X0,X:0x0429 > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > P:00003479: 82F411850001 bfset #0x1,X:0x1185 According to FAQ 24969 you need at least one nop between the instruction that disables an interrupt and the code that is supposed to be protected. > Occasionally, during the execution of this snippet, the bfset > instruction is not reached and the code restarts from the beginning > (P:0x0000 or so), probably because a CAN RX interrupt occurs after the > bfclr instruction. What causes the device to reset? Are you sure? You can track down whether the reset was caused by the COP or POR by checking the SYS_STS (System Status) register in your boot code and reporting the result. If this doesn't show what happened then you need to add diagnostics to your vector table to determine what happened. > The snippet is part of a function that parses a CAN message, and it is > periodically called (every 10 ms). > > I have reproduced the effect in lab sending the following CAN message to > the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0 0x4C > 0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a maximum > time of 30 minutes (on field, the event is very rare -and hard to debug- > because the same message is sent every 2 seconds). > > It seems to be a pipeline dependency, in the case CAN RX interrupt > occurs in the critical code section. > > The problem seems to disappear modifying the code this way: > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > asm(nop); > can_rx_queue.counter--; > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > According to FAQ 24969 you need at least one nop between the instruction that disables an interrupt and the code that is supposed to be protected. > or this one: > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > asm(decw can_rx_queue.counter); > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > > Is there someone that had similar things, and knows details? Are there > other cases in which we have to pay particular attention, other than the > ones documented in Freescale FAQs and Errata? > > Best Regards, > Roberto Bonacina

Reply by Roberto Bonacina ●May 13, 20052005-05-13

Really, the FAQ 24969 talks about adding a nop after the interrupt masking in the status register, and not about specific interrupt masking. As you can see, I am only masking the CAN RX interrupt in the CAN register. Or maybe the FAQ is incomplete? Then, you need to know that if I replace the periphBitClear and periphBitSet instructions with the archDisableInt and archEnableInt instructions (that work on status register), the problem disappears, without inserting nop's. So, I assume that the problem could be related to a pipeline dependency or something similar (a DSP errata), when disabling the CAN RX int and during the next three instructions an interrupts occurs (probably the CAN RX int itself), and this is messing up the DSP. I intend: it happens during the next three instructions of my specific sequence, because trying other sequences the problem disappears. The device is not resetting itself by external causes. I do traced the SYS_STS: POR and EXTR are not occurring; I didn't take care of COP because I don't use it (I have an external watchdog/supervisor), and even if this is the cause, you should explain me the fault model. Anyway, there is a consistent way to reproduce the problem, and I think I have documented it sufficiently (one thing in particular was missing: my CAN speed is 62.5 kbps): I was expecting Freescale to reproduce the problem and find out what is happening; I cannot do a lot more with my means because I have no way to trace the program counter step-by-step, but I think Freescale can. My vector table is indirectly traced because if an unhandled interrupt occurs, an infinite loop callback is called, and in this case the external watchdog will reset (EXTR flag); if a handled interrupt occurs, the SDK function insterruptXX.asm is called, and this calls the FastDispatcher which, in turn, calls the Dispatcher (I don't have fast interrupts). The Dispatcher has been modified in order to trace the interrupt code, the Dispatcher address itself and the return address. Well, when the program restarts I don't have the Dispatcher trace in proximity of last things traced, so if a handled interrupt has occurred (that is my opinion), it was generating troubles before calling the Dispatcher. As I already said, my guess is that Freescale tries to reproduce the problem and tries to find out what is happening, but until now I received only interlocutory answers (I even opened with Freescale the service request 1-186722008 on 28.04.2005, with still no significant results, and with Metrowerks the service request 1-61031541 on 28.04.2005, with the same result as Freescale). Ok, I exactly described the problem on 05.05.2005, but now a week has gone... Last, I want to know that, as a customer, I'm quite unsatisfied of Frescale/Metrowerks products relatively to DPS568xx (why should I C-programmer take care of pipeline dependencies? Which is useful for the DSP itself and/or the compiler/assembler?) and of support, which seems has some difficulties to focus the problems. Waiting for feedback and hoping for the best of it. Best Regards, Roberto Bonacina -----Messaggio originale----- Da: motoroladsp@moto... [mailto:motoroladsp@moto...] Per conto di Michael W. Mann Inviato: gioved12 maggio 2005 19.13 A: motoroladsp@moto... Oggetto: [motoroladsp] Re: Occasional code restart. --- In motoroladsp@moto..., Roberto Bonacina <rbonacina@r...> wrote: > On DSP56F807, I found a condition on which occasional code restart > happens. > > The code in execution when the code restarting occurs is the following > (C code and relative disassembly): > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > P:00003471: 80F411850001 bfclr #0x1,X:0x1185 > can_rx_queue.counter--; > P:00003474: F0540429 move X:0x0429,X0 > P:00003476: 6411 decw X0 > P:00003477: D0540429 move X0,X:0x0429 > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > P:00003479: 82F411850001 bfset #0x1,X:0x1185 According to FAQ 24969 you need at least one nop between the instruction that disables an interrupt and the code that is supposed to be protected. > Occasionally, during the execution of this snippet, the bfset > instruction is not reached and the code restarts from the beginning > (P:0x0000 or so), probably because a CAN RX interrupt occurs after the > bfclr instruction. What causes the device to reset? Are you sure? You can track down whether the reset was caused by the COP or POR by checking the SYS_STS (System Status) register in your boot code and reporting the result. If this doesn't show what happened then you need to add diagnostics to your vector table to determine what happened. > The snippet is part of a function that parses a CAN message, and it is > periodically called (every 10 ms). > > I have reproduced the effect in lab sending the following CAN message to > the processor once every 3 ms: CAN ID = 0x10003C00; data: 0x07 0xC0 0x4C > 0x01 0x01 0x00 0x05 0x02. In this case, the code restarts in a maximum > time of 30 minutes (on field, the event is very rare -and hard to debug- > because the same message is sent every 2 seconds). > > It seems to be a pipeline dependency, in the case CAN RX interrupt > occurs in the critical code section. > > The problem seems to disappear modifying the code this way: > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > asm(nop); > can_rx_queue.counter--; > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > According to FAQ 24969 you need at least one nop between the instruction that disables an interrupt and the code that is supposed to be protected. > or this one: > > // CAN Rx interrupt disabled during queue counter manipulation. > periphBitClear(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > asm(decw can_rx_queue.counter); > periphBitSet(CANRXFIE, &ArchIO.CAN.RxIntEnableReg); > > Is there someone that had similar things, and knows details? Are there > other cases in which we have to pay particular attention, other than the > ones documented in Freescale FAQs and Errata? > > Best Regards, > Roberto Bonacina To

Occasional code restart.

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group