Hi all, This is my first post here and I'm fairly new to DSP's (I'm much more used to microcontrollers). I have inherited an 'almost finished' dsp project (small motion controller using a TMS320LF2406A). It is now mostly working except that it sometimes stops working. This can be after only a few moves or it can run for an hour or more. I'v been searching alternately for hardware (also new) and software errors and fixed a few bugs along the way, but the problem remains. But I think (hope) I've now narrowed it down to interrupts getting disabled permanently somehow. If I check INTM in ST0 and toggle an output when it is set (intr. disable), I get an output squarewave as soon as the controller stops working. As most operations depend on flags set in the timer interrupt, everything grinds to a halt. At this time I suspect the following piece of code is the source of the problem: _ResetCurrentPower: POPD *+ ; relax hardware stack SST #0,*+ ; save current ST0 SETC INTM ; disable interrupts for a while LDP #motion_pwm SPLK #0h,motion_pwm ; reset power control field MAR *- ; access saved ST0 BIT *-,6 ; check if interrupt mask was set on call BCND RSP_01,TC ; exit if not CLRC INTM ; enable interrupts RSP_01: PSHD * ; load return register from SW stack into HW stack RET ; return to caller This code can be called from an interrupt routine or from the (C) main program. That is (I suspect) why INTM is only conditionally cleared and not always. My suspicion is that there is an unsafe access in here, but I have not found it yet. Can anyone comment on this code? Are there safer ways to set/clear INTM and to test if it was originally set? TIA, -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) Real programmers don't comment their code. It was hard to write, it should be hard to understand.
TMS320LF2406A, interrupts getting permanently disabled
Started by ●February 1, 2005
Reply by ●February 1, 20052005-02-01
stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes:> Hi all, > > This is my first post here and I'm fairly new to DSP's (I'm much > more used to microcontrollers). > > I have inherited an 'almost finished' dsp project (small motion > controller using a TMS320LF2406A). It is now mostly working except > that it sometimes stops working. This can be after only a few moves > or it can run for an hour or more. > > I'v been searching alternately for hardware (also new) and software > errors and fixed a few bugs along the way, but the problem remains. But I > think (hope) I've now narrowed it down to interrupts getting disabled > permanently somehow. If I check INTM in ST0 and toggle an output when it > is set (intr. disable), I get an output squarewave as soon as the > controller stops working. As most operations depend on flags set in the > timer interrupt, everything grinds to a halt. > > At this time I suspect the following piece of code is the source of the > problem: > > _ResetCurrentPower: > POPD *+ ; relax hardware stack > SST #0,*+ ; save current ST0 > SETC INTM ; disable interrupts for a while > > LDP #motion_pwm > SPLK #0h,motion_pwm ; reset power control field > > MAR *- ; access saved ST0Stef, if this routine was called from main and you get an interrupt between these two instructions which also invokes this routine, then the saved value of ST0 will be overwritten by the hardware stack value in the interrupt routine, so that when the interrupt returns the BIT test below will read the wrong value. I think - if my analysis is correct.> BIT *-,6 ; check if interrupt mask was set on call > BCND RSP_01,TC ; exit if not > > CLRC INTM ; enable interrupts > > RSP_01: > PSHD * ; load return register from SW stack into HW stack > RET ; return to caller > > > This code can be called from an interrupt routine or from the (C) main > program. That is (I suspect) why INTM is only conditionally cleared and > not always. My suspicion is that there is an unsafe access in here, but > I have not found it yet. > > Can anyone comment on this code? > Are there safer ways to set/clear INTM and to test if it was originally > set? > > TIA,-- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Reply by ●February 1, 20052005-02-01
In comp.dsp, Randy Yates <randy.yates@sonyericsson.com> wrote:>stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes: > >> I'v been searching alternately for hardware (also new) and software >> errors and fixed a few bugs along the way, but the problem remains. But I >> think (hope) I've now narrowed it down to interrupts getting disabled >> permanently somehow. If I check INTM in ST0 and toggle an output when it >> is set (intr. disable), I get an output squarewave as soon as the >> controller stops working. As most operations depend on flags set in the >> timer interrupt, everything grinds to a halt. >> >> At this time I suspect the following piece of code is the source of the >> problem: >> >> _ResetCurrentPower: >> POPD *+ ; relax hardware stack >> SST #0,*+ ; save current ST0 >> SETC INTM ; disable interrupts for a while >> >> LDP #motion_pwm >> SPLK #0h,motion_pwm ; reset power control field >> >> MAR *- ; access saved ST0 > >Stef, if this routine was called from main and you get an interrupt >between these two instructions which also invokes this routine, then >the saved value of ST0 will be overwritten by the hardware stack value >in the interrupt routine, so that when the interrupt returns the BIT >test below will read the wrong value. I think - if my analysis is >correct. >That should not be possible (should it?) as the interrupts are disabled at this point by the "SETC INTM" above. But thanks for thinking with me. I have sofar failed to come up with a plausible explanation.>> BIT *-,6 ; check if interrupt mask was set on call >> BCND RSP_01,TC ; exit if not >> >> CLRC INTM ; enable interrupts >> >> RSP_01: >> PSHD * ; load return register from SW stack into HW stack >> RET ; return to caller >> >> >> This code can be called from an interrupt routine or from the (C) main >> program. That is (I suspect) why INTM is only conditionally cleared and >> not always. My suspicion is that there is an unsafe access in here, but >> I have not found it yet. >> >> Can anyone comment on this code? >> Are there safer ways to set/clear INTM and to test if it was originally >> set? >>I have now duplicated the routine and call one from interrupts and one from main code. In the IRQ version I have removed all int disable/enable stuff and the non-irq version always does the enable/disable, unconditionally. Sofar this has run for half an hour, but I won't be sure till it has at least survived and all-night test. So it gets more likely that the above code is indeed the cause of the problem. Does anyone know of a safe way to first test if interrupts are disabled, then disable them and then re-enable them if they where enable to begin with? If not, I'll just go with the duplicate stuff. These routines get rather trivial if the conditional enable is removed. The irq version can even be reduced to this (no disable/enable required): _ResetCurrentPowerIRQ: LDP #motion_pwm SPLK #0h,motion_pwm RET And as it is only used in 2 locations, it is probably best to just inline it. Oh, the joy of debugging someone else's code for things like this and an occasional divide by zero and using variables in the main loop that are updated in an irq. It really gets to you after a week or so. :-((( -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) NEWS FLASH!! Today the East German pole-vault champion became the West German pole-vault champion.
Reply by ●February 1, 20052005-02-01
stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes:> In comp.dsp, > Randy Yates <randy.yates@sonyericsson.com> wrote: > >stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes: > > > >> I'v been searching alternately for hardware (also new) and software > >> errors and fixed a few bugs along the way, but the problem remains. But I > >> think (hope) I've now narrowed it down to interrupts getting disabled > >> permanently somehow. If I check INTM in ST0 and toggle an output when it > >> is set (intr. disable), I get an output squarewave as soon as the > >> controller stops working. As most operations depend on flags set in the > >> timer interrupt, everything grinds to a halt. > >> > >> At this time I suspect the following piece of code is the source of the > >> problem: > >> > >> _ResetCurrentPower: > >> POPD *+ ; relax hardware stack > >> SST #0,*+ ; save current ST0 > >> SETC INTM ; disable interrupts for a while > >> > >> LDP #motion_pwm > >> SPLK #0h,motion_pwm ; reset power control field > >> > >> MAR *- ; access saved ST0 > > > >Stef, if this routine was called from main and you get an interrupt > >between these two instructions which also invokes this routine, then > >the saved value of ST0 will be overwritten by the hardware stack value > >in the interrupt routine, so that when the interrupt returns the BIT > >test below will read the wrong value. I think - if my analysis is > >correct. > > > That should not be possible (should it?) as the interrupts are disabled > at this point by the "SETC INTM" above.Doh!> But thanks for thinking with > me. I have sofar failed to come up with a plausible explanation.Will it blow something up if you simply remove the conditional temporarily? If not, then you can verify that it is indeed this section of code (or not). There are LOTS possibilities here because you haven't really explained to us the entire context. For example, are you sure you're not going into the weeds? Sometimes you can get a rogues execution and then come back into your main loop. Are you checking this with an emulator? You could also try a different tack, e.g., just do the BIT test on entry instead of saving ST0 and then checking it later. I don't think what you're doing in the mean time will affect the TC. Are you sure you're saving all used registers in the interrupt handler? --RY> > > >> BIT *-,6 ; check if interrupt mask was set on call > >> BCND RSP_01,TC ; exit if not > >> > >> CLRC INTM ; enable interrupts > >> > >> RSP_01: > >> PSHD * ; load return register from SW stack into HW stack > >> RET ; return to caller > >> > >> > >> This code can be called from an interrupt routine or from the (C) main > >> program. That is (I suspect) why INTM is only conditionally cleared and > >> not always. My suspicion is that there is an unsafe access in here, but > >> I have not found it yet. > >> > >> Can anyone comment on this code? > >> Are there safer ways to set/clear INTM and to test if it was originally > >> set? > >> > > I have now duplicated the routine and call one from interrupts and one > from main code. In the IRQ version I have removed all int disable/enable > stuff and the non-irq version always does the enable/disable, > unconditionally. Sofar this has run for half an hour, but I won't be sure > till it has at least survived and all-night test. > > So it gets more likely that the above code is indeed the cause of the > problem. Does anyone know of a safe way to first test if interrupts > are disabled, then disable them and then re-enable them if they where > enable to begin with? > > If not, I'll just go with the duplicate stuff. These routines get > rather trivial if the conditional enable is removed. The irq version > can even be reduced to this (no disable/enable required): > > _ResetCurrentPowerIRQ: > LDP #motion_pwm > SPLK #0h,motion_pwm > RET > > And as it is only used in 2 locations, it is probably best to just > inline it. > > Oh, the joy of debugging someone else's code for things like this and > an occasional divide by zero and using variables in the main loop that > are updated in an irq. It really gets to you after a week or so. :-((( > > -- > Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) > > NEWS FLASH!! > Today the East German pole-vault champion became the West German pole-vault > champion.-- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Reply by ●February 1, 20052005-02-01
Stef wrote: - snip -> > I have now duplicated the routine and call one from interrupts and one > from main code. In the IRQ version I have removed all int disable/enable > stuff and the non-irq version always does the enable/disable, > unconditionally. Sofar this has run for half an hour, but I won't be sure > till it has at least survived and all-night test. > > So it gets more likely that the above code is indeed the cause of the > problem. Does anyone know of a safe way to first test if interrupts > are disabled, then disable them and then re-enable them if they where > enable to begin with?On the 28xx there's a mechanism for saving the interrupt state to memory (or the stack) and restoring it. That way you don't have to test -- you just do.> > If not, I'll just go with the duplicate stuff. These routines get > rather trivial if the conditional enable is removed. The irq version > can even be reduced to this (no disable/enable required): > > _ResetCurrentPowerIRQ: > LDP #motion_pwm > SPLK #0h,motion_pwm > RET > > And as it is only used in 2 locations, it is probably best to just > inline it. > > Oh, the joy of debugging someone else's code for things like this and > an occasional divide by zero and using variables in the main loop that > are updated in an irq. It really gets to you after a week or so. :-((( >Well, it's pretty common to use an IRQ to receive information, stuff it into a magic location and set a magic flag -- but the code should be _really well documented_, or language features should be used to protect the data from idle inspection/modification/whatnot. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●February 1, 20052005-02-01
In comp.dsp, Randy Yates <randy.yates@sonyericsson.com> wrote:>stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes: > >> In comp.dsp, >> Randy Yates <randy.yates@sonyericsson.com> wrote: >> >stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes: >> > >> >> At this time I suspect the following piece of code is the source of the >> >> problem: >> >> >> >> _ResetCurrentPower: >> >> POPD *+ ; relax hardware stack >> >> SST #0,*+ ; save current ST0 >> >> SETC INTM ; disable interrupts for a while >> >> >> >> LDP #motion_pwm >> >> SPLK #0h,motion_pwm ; reset power control field >> >> >> >> MAR *- ; access saved ST0 >> > >> >Stef, if this routine was called from main and you get an interrupt >> >between these two instructions which also invokes this routine, then >> >the saved value of ST0 will be overwritten by the hardware stack value >> >in the interrupt routine, so that when the interrupt returns the BIT >> >test below will read the wrong value. I think - if my analysis is >> >correct. >> > >> That should not be possible (should it?) as the interrupts are disabled >> at this point by the "SETC INTM" above. > >Doh! >:-)>> But thanks for thinking with >> me. I have sofar failed to come up with a plausible explanation. > >Will it blow something up if you simply remove the conditional temporarily? >If not, then you can verify that it is indeed this section of code (or not). >See my earlier explanation below, that's just what I have done: No test and have seperate routines for IRQ and non IRQ.>There are LOTS possibilities here because you haven't really explained >to us the entire context. For example, are you sure you're not >going into the weeds? Sometimes you can get a rogues execution and >then come back into your main loop.I'm toggling an output in my main. This toggles at 500-800 us when running normally (interrupts on). When the controller fails, the toggle immediately accelerates to 35us with no 'long' gap inbetween. This fast looping comes from an idle main loop when not flagged by the irq.>Are you checking this with an emulator? >Unfortunately I do not have an emulator. And I actually doubt it will make you find this particular fault earlier. But they do have advantages and I this takes any longer I'll try to get one ordered.>You could also try a different tack, e.g., just do the BIT test on entry instead >of saving ST0 and then checking it later. I don't think what you're doing in >the mean time will affect the TC. >To my knowledge, you can not test ST0 directly, you'll first have to get it to memory as the BIT only operates on memory.>Are you sure you're saving all used registers in the interrupt handler? >Yes, there is a common save call at the start of each interrupt handler.>> >> I have now duplicated the routine and call one from interrupts and one >> from main code. In the IRQ version I have removed all int disable/enable >> stuff and the non-irq version always does the enable/disable, >> unconditionally. Sofar this has run for half an hour, but I won't be sure >> till it has at least survived and all-night test. >>Now running OK for 5 hours, see what it does overnight. -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) Hey dol! merry dol! ring a dong dillo! Ring a dong! hop along! fal lal the willow! Tom Bom, jolly Tom, Tom Bombadillo! -- J. R. R. Tolkien
Reply by ●February 1, 20052005-02-01
In comp.dsp, Tim Wescott <tim@wescottnospamdesign.com> wrote:>Stef wrote: > >> So it gets more likely that the above code is indeed the cause of the >> problem. Does anyone know of a safe way to first test if interrupts >> are disabled, then disable them and then re-enable them if they where >> enable to begin with? > >On the 28xx there's a mechanism for saving the interrupt state to memory >(or the stack) and restoring it. That way you don't have to test -- you >just do. >>And I think the 24xx has it too. And it was already half in the actual code! Saving ST0 on the stack also saves the INTM bit (as it is in ST0), restoring ST0 when done should also restore the old state of INTM. Right? Was this the method you where thinking of? I'm actually a bit affraid to put this stuff back in, as it is working OK now (for at least 5 hours, wait the night). -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) Youth is a blunder, manhood a struggle, old age a regret. -- Benjamin Disraeli, "Coningsby"
Reply by ●February 1, 20052005-02-01
Stef wrote:> In comp.dsp, > Tim Wescott <tim@wescottnospamdesign.com> wrote: > >>Stef wrote: >> >> >>>So it gets more likely that the above code is indeed the cause of the >>>problem. Does anyone know of a safe way to first test if interrupts >>>are disabled, then disable them and then re-enable them if they where >>>enable to begin with? >> >>On the 28xx there's a mechanism for saving the interrupt state to memory >>(or the stack) and restoring it. That way you don't have to test -- you >>just do. >> > And I think the 24xx has it too. And it was already half in the actual code! > Saving ST0 on the stack also saves the INTM bit (as it is in ST0), > restoring ST0 when done should also restore the old state of INTM. Right? > > Was this the method you where thinking of? I'm actually a bit affraid to > put this stuff back in, as it is working OK now (for at least 5 hours, wait > the night). >Sounds like it -- I forget the details, but I remember (a) checking the assembly and seeing that it worked correctly, and (b) that it's been working fine for over a year now. If you have something that works, and you can put it to bed, and it won't get up later and bite you then maybe you should just keep it the way it is. If you or someone with access to you will be cursing you later for doing it then maybe you should fix it. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●February 1, 20052005-02-01
In comp.dsp, Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid> wrote:>In comp.dsp, >Tim Wescott <tim@wescottnospamdesign.com> wrote: >>Stef wrote: >> >>> So it gets more likely that the above code is indeed the cause of the >>> problem. Does anyone know of a safe way to first test if interrupts >>> are disabled, then disable them and then re-enable them if they where >>> enable to begin with? >> >>On the 28xx there's a mechanism for saving the interrupt state to memory >>(or the stack) and restoring it. That way you don't have to test -- you >>just do. >>> >And I think the 24xx has it too. And it was already half in the actual code! >Saving ST0 on the stack also saves the INTM bit (as it is in ST0), >restoring ST0 when done should also restore the old state of INTM. Right? >That will not be safe as well. See this code (the original): _ResetCurrentPower: POPD *+ ; relax hardware stack SST #0,*+ ; save current ST0 SETC INTM ; disable interrupts for a while ; Do something MAR *- ; access saved ST0 BIT *-,6 ; check if interrupt mask was set on call BCND RSP_01,TC ; exit if not CLRC INTM ; enable interrupts RSP_01_Org: PSHD * ; load return register from SW stack into HW stack RET ; return to caller The "PSHD *" at the end looks a bit suspicious. The stackpointer was already decremented to get access to the return address before the INTM. If an interrupt occurs between INTM and PSHD, the value on the stack is destroyed. But in spra357.pdf (from ti) I read this: "By definition the next instruction after CLRC cannot be interrupted". Why is this not printed in the datasheets at the description of interrupts an the CLRC instruction description? So this code is actually safe as it is (I mean the return, not the conditional INTM). But if I now rewrite it to pop ST0 from the stack: _ResetCurrentPower: POPD *+ ; relax hardware stack SST #0,*+ ; save current ST0 SETC INTM ; disable interrupts for a while ; Do something MAR *- ; access saved ST0 LST #0,*- ; Restore ST0 and with INTM to it's old state RSP_01_Org: PSHD * ; load return register from SW stack into HW stack RET ; return to caller In this case, there is no CLRC so the next instruction can be interrupted, (?) making the PSHD unsafe. spra357 mentions only CLRC specifically after which no immediate interrupt can occur. But from the pipeline examples, it looks like in fact a number of instructions is always executed after interrupts are re-enabled before next interrupt occurs. Not depending on them being re-anabled by CLRC or LST. But as it is not mentioned anywhere I can find, I'm not gonna gamble on it. Why does this processor only have post increment/decrement and no pre increment/decrement? Then I could just do this: LST #0,-* ; Restore ST0 from stack and with INTM to it's old state RSP_01_Org: PSHD -* ; load return register from SW stack into HW stack RET ; return to caller That would be perfectly safe and neat, sigh. -- Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) "It ain't so much the things we don't know that get us in trouble. It's the things we know that ain't so." -- Artemus Ward aka Charles Farrar Brown
Reply by ●February 1, 20052005-02-01
stef33d@yahooI-N-V-A-L-I-D.com.invalid (Stef) writes:> In comp.dsp, > Randy Yates <randy.yates@sonyericsson.com> wrote: >>There are LOTS possibilities here because you haven't really explained >>to us the entire context. For example, are you sure you're not >>going into the weeds? Sometimes you can get a rogues execution and >>then come back into your main loop. > > I'm toggling an output in my main. This toggles at 500-800 us when running > normally (interrupts on). When the controller fails, the toggle immediately > accelerates to 35us with no 'long' gap inbetween. This fast looping comes > from an idle main loop when not flagged by the irq.Got it. These simple troubleshooting tricks are often the most powerful.>>Are you checking this with an emulator? >> > Unfortunately I do not have an emulator. And I actually doubt it will > make you find this particular fault earlier. But they do have advantages > and I this takes any longer I'll try to get one ordered.The 54x has a mode (I think) where it "reflects" the current PC on the external address lines. If the 24x has the same capability, you might try hooking a logic analyzer to it and determining what happens at crash time, i.e., where the processor is executing. You can also put phantom writes (or reads) at various points in your code to unused memory that maps externally (again I'm making some assumptions on the HW architecture of the 24x, which I don't know anything about) and then use the logic analyzer to capture them so you have a "trail" showing where your execution is. Good luck. -- % Randy Yates % "How's life on earth? %% Fuquay-Varina, NC % ... What is it worth?" %%% 919-577-9882 % 'Mission (A World Record)', %%%% <yates@ieee.org> % *A New World Record*, ELO http://home.earthlink.net/~yatescr






