Reply by Andrew Nesterov November 12, 20072007-11-12
Hi Jim,

The last question first: no there is nothing else to correctly set up
interrupts. On the startup before the application tasks are created and
the BIOS started I would check and clear any spurious interrupts, but
the ISR handles them anyway.

This problem relates to robustness of the code (altogether BIOS and
CSL and the application). Under any circumstances PCE1 should always
be in the .text section plus whatever executable subsections are defined
in the BIOS. If it is off, there is a bug either in pointers or in the
way of handling interrupts.

When the CPU start servicing an interrupt, it saves (among other things)
the current PCE1 value in the IRP (interrupt return pointer) and jumps
to the corresponding IST entry, which is ISTP+INT_NO*32, where the branch
to the actual ISR is usually taken. The ISR uses B IRP to return, instead
of the regular B B3 return, so there is no chances for the ISR to go astray,
if it keeps the IRP intact.

This is what is going on the CPU side of the game. The OS adds to this
more complexity: as there is practically no difference from the CPU side
in which exactly context (interrupt (i.e. a "privileged" one) or "user")
it currently is, the OS should make it explicit. This is required for the OS
in order to do not preform a context switch right in the interrupt mode -
which is possible provided there is a semaphore post call. So the OS
does prevent itself from doing such things at the cost that a C interrupt
handler now is defined without the interrupt modifier! :) See Chpt 4
of SPRU423. You need to remove the interrupt modifier from the mb_isr()
definition (in fact it was not there originally).

The simple algorithm is as follows:

Interrupts occurs
CPU branches to the IST
IST branches to ISR
ISR saves context
ISR completes its actions
ISR restores context
ISR branches back via B IRP

What BIOS does:

Before interrupts are enabled, BIOS stores the address of the ISR by an
HWI_dispatchPlug() call, start tasks, etc. BIOS starts.

Interrupt occurs
CPU branches to the IST
IST branches to the BIOS dispatcher
Dispatcher saves context
Dispatcher INCREMENTS NESTED INTERRUPTS COUNTER (hwi.h62)
Dispatcher calls ISR as a regular C subroutine
ISR does its actions and possibly calls BIOS service that may
lead to task rescheduling. BIOS in response checks the interrupt
counter and postpones task switch until after the counter is 0
ISR returns
Dispatcher DECREMENTS COUNTER, if counter is 0, calls the scheduler and
restores not the previously saved context, but the context returned
by the scheduler.
Dispatcher branches to the restored context.

Other suggestions:

One potential place for wrong pointers is in the call of the application
callback function, that is called in the ISR before return. There are a couple
of things to check:

* is the pointer to the callback routine does passed correctly?
* does the declaration of the callback function matches to what the BIOS
expects? That might corrupt the stack which is common to the ISR and the
callback. It might be useful to compare SP before and after the callback
call. Did you step over the callback and make sure that it returns to the
ISR?

I also thought that a few other places in the code (based on your next
to the last revision) need to be checked again:

The transmitters tasks before exit (via implicit TSK_exit() call) go to
sleep for 2 ticks, which is redundant and can be removed.

The WaitForDMA() calls PCI_dspIntReqClear(), therefore it is called twice
before the PCI_dspIntReqSet() call, also redundant.

Next, the WaitForDMA() resets the semaphore, which might already have been
set, this would miss one host to dsp interrupt. Actually, I would better
use SEM_post/pend+Binary calls to make it a mutex, which it is supposed to be.
A very close issue is the use of the validxfer variable. Currently it mimics
the semaphore, but is not protected, I would avoid using unprotected objects.

The last thing that is yet unclear to me is the logic of posting an interrupt
to host. So far, the interrupt is sent (by calling PCI_dspIntReqSet() ) as
soon as all (or part of) the 64K chunk transfers have been queued, but might
not been actually transferred via the PCI, this is due to the ASYNC_write()
operation. Does the host logic really expects this? I guess that here the
place where race condition might occur. What would happen if the buildall_tsk()
would sleep in its infinite loop before setting an interrupt to the host
for a second. This will certainly make all the outstanding transfers to
complete.

I also wonder if it is absolutely necessary to create a task to handle a 64K
chunk, an async chan_submit call would postpone in a queue transfers that
are not possible to start instantly, therefore a single task can call
ASYNC_write() in a loop similar to what is used to create separate tasks.
The order of transactions might be different, but the overall result is
the same :)

By the way, currently the CHAN_NUM constant is 1, which means that only one
transmitter task is created, no matter what the transfer size is.

Did you find out how to assign the mb_isr() to INT13 instead of default INT4?

Hope this helps,

Andrew
> Subject: ISR initialization/reset issue
> Posted by: "jim" j...@justsosoftware.com jiml8
> Date: Sun Nov 11, 2007 12:39 pm ((PST))
>
> Following up on my problem of an ISR "falling off the cliff...sometimes", I
> have been single-stepping through a lot of assembler, and doing a lot of
> analysis. Based upon this (and the feedback from this mailing list) I have
> re-written the ISR provided in the c64xx_pci.c module to incorporate the
> features that I need for my Linux - DSP DMA transfer mechanism, and I have
> deleted the ISR that I had placed in the code in my customized async_pci.c
> module since that capability is now carried in the single ISR in c64xx_pci.c.
>
> I now have the following repeatable problem, which indicates some
> initialization/reconfiguration problem someplace in the code.
>
> 1. When I first start the DSP, and load my compiled program, run it under the
> Code Composer debugger, then trigger an interrupt from the Linux host, and
> single-step through the code, the ISR runs exactly once. It branches
> properly to handle the Host interrupt, does a SEM_post to inform the task
> which is waiting that the interrupt has occurred, then "falls off the cliff".
> The task waiting on the semaphore never stops, and the program counter
> usually winds up in the IRQ_intTable bss section, from which it never emerges
> because it is in a loop. However sometimes the PC has wound up in the
> _ftable bss section.
>
> 2. If I then halt the DSP program and reload it (without resetting the DSP),
> and re-run it, then re-trigger from the host and single step through the
> code, the ISR runs to completion and the program then branches properly to
> the task which is waiting on the semaphore, and that task proceeds. It
> builds all of its tasks to do the asynchronous transfer, then sleeps which
> passes control to the first of these tasks.
>
> The first task then sets up its asynchronous transfer and sleeps. When it
> sleeps, the ISR runs again since apparently an interrupt (MASTER_OK) has been
> set. The ISR processes the interrupt properly and handles the transfer, then
> returns. When it returns, it "falls off the cliff" and control is never
> passed back to the next waiting process (which would be transferring more
> data). This time, when it "falls off the cliff", it usually winds up well
> down in a section of memory (internal memory) that is labeled "GBL_stackbeg,
> _HWI_STKTOP, __stack. However, I have found it a few times in a section that
> apparently has to do with the non maskable interrupt, suggesting a PCI error
> or some such. Again at this point it is in a loop and never returns.
>
> Also, at this point the PCI bus is deadlocked forcing me to reboot the host.
>
> 3. If I try to run the DSP code with no breakpoints set, the ISR
> always "falls off the cliff" and the PC may wind up anywhere. Often I get a
> display that is blank (actually, just dashed lines), showing no memory
> present with no values.
>
> What this says to me is that there is something associated with interrupsts
> that is not being configured properly at initialization time, but is being
> configured serendipitously during the first run, thus enabling to ISR to run
> more or less properly the first time through, after the program has been
> reloaded. However, this configuration item, whatever it is, is being
> improperly reset while the GIO processing occurs and the system is then going
> off the rails again.
>
> Could someone either provide me with or steer me toward something other than
> reference documentation that discusses the dispatcher program, and also tells
> me the considerations and implications for setting up an ISR?
>
> Presently I am hooking the ISR and the semaphore that I am using into the
> system dynamically; no static definitions in a cdb file. I have attached the
> relevant code modules for inspection.
>
Reply by jim November 11, 20072007-11-11
Following up on my problem of an ISR "falling off the cliff...sometimes", I
have been single-stepping through a lot of assembler, and doing a lot of
analysis. Based upon this (and the feedback from this mailing list) I have
re-written the ISR provided in the c64xx_pci.c module to incorporate the
features that I need for my Linux - DSP DMA transfer mechanism, and I have
deleted the ISR that I had placed in the code in my customized async_pci.c
module since that capability is now carried in the single ISR in c64xx_pci.c.

I now have the following repeatable problem, which indicates some
initialization/reconfiguration problem someplace in the code.

1. When I first start the DSP, and load my compiled program, run it under the
Code Composer debugger, then trigger an interrupt from the Linux host, and
single-step through the code, the ISR runs exactly once. It branches
properly to handle the Host interrupt, does a SEM_post to inform the task
which is waiting that the interrupt has occurred, then "falls off the cliff".
The task waiting on the semaphore never stops, and the program counter
usually winds up in the IRQ_intTable bss section, from which it never emerges
because it is in a loop. However sometimes the PC has wound up in the
_ftable bss section.

2. If I then halt the DSP program and reload it (without resetting the DSP),
and re-run it, then re-trigger from the host and single step through the
code, the ISR runs to completion and the program then branches properly to
the task which is waiting on the semaphore, and that task proceeds. It
builds all of its tasks to do the asynchronous transfer, then sleeps which
passes control to the first of these tasks.

The first task then sets up its asynchronous transfer and sleeps. When it
sleeps, the ISR runs again since apparently an interrupt (MASTER_OK) has been
set. The ISR processes the interrupt properly and handles the transfer, then
returns. When it returns, it "falls off the cliff" and control is never
passed back to the next waiting process (which would be transferring more
data). This time, when it "falls off the cliff", it usually winds up well
down in a section of memory (internal memory) that is labeled "GBL_stackbeg,
_HWI_STKTOP, __stack. However, I have found it a few times in a section that
apparently has to do with the non maskable interrupt, suggesting a PCI error
or some such. Again at this point it is in a loop and never returns.

Also, at this point the PCI bus is deadlocked forcing me to reboot the host.

3. If I try to run the DSP code with no breakpoints set, the ISR
always "falls off the cliff" and the PC may wind up anywhere. Often I get a
display that is blank (actually, just dashed lines), showing no memory
present with no values.

What this says to me is that there is something associated with interrupsts
that is not being configured properly at initialization time, but is being
configured serendipitously during the first run, thus enabling to ISR to run
more or less properly the first time through, after the program has been
reloaded. However, this configuration item, whatever it is, is being
improperly reset while the GIO processing occurs and the system is then going
off the rails again.

Could someone either provide me with or steer me toward something other than
reference documentation that discusses the dispatcher program, and also tells
me the considerations and implications for setting up an ISR?

Presently I am hooking the ISR and the semaphore that I am using into the
system dynamically; no static definitions in a cdb file. I have attached the
relevant code modules for inspection.