DSPRelated.com
Forums

ISR falls off cliff...sometimes.

Started by jim October 23, 2007
Sorry for late response. I have been having some severe computer problems.
Comes from dropping the system HD onto a concrete floor a couple of months
ago while replacing power supply, then not noticing that some I/O errors were
appearing on drive, then trying to do major system software update. I use
SCSI drives in my workstation, and they practically never fail - and when
they do they tend to do it gracefully. My fault that I wasn't running
diagnostics on this one periodically, which would have given me a heads-up
about problems coming.

I have it more or less sorted out now, though.

On Tuesday 30 October 2007 08:20:05 you wrote:
> Jim,
>
> On 10/29/07, Andrew Nesterov wrote:
> > > Subject: Re: ISR falls off cliff...sometimes.
> > > Posted by: "jim" j...@justsosoftware.com jiml8
> > > Date: Sun Oct 28, 2007 7:20 am ((PDT))
> > >
> > > ... And now...the REST of the story!
> > >
> > > My client came up with a brand new board, and we wired it to be a PCI
> > > interface (rather than GPIO) and plugged it in. Behavior of this new
> > > board was consistent; it would transfer data two or three times, then
> > > hang. I checked the initialization EMIF stuff; the board was set to run
> > > at 500 MHz when it was a 1000 MHz board. So I turned up the speed.
> > > Behavior remained exactly the same as it had been; two or three correct
> > > transfers, then hang.
> >
> > I saw that Spectrum Digital's C6416 EVM does not have a PCI connector,
> > does it? So you needed to to wire its on board connector into an external
> > PCI connector, right? Perhaps this on-board connector hass muxed GPIO and
> > PCI pins - this is probably is what Jeff was asking, if I got him
> > correct.
>
> The brand new board was designed by the customer. Correct??
> The Spectrum Digital board was probably using HPI as opposed to the
> stated 'GPIO'. Correct??
>
> Just trying to get calibrated.
>
No, the new board was another Spectrum DIgital board. The GPIO connector
provided on that board can be turned into a PCI connector by the addition of
a couple of jumpers.

Turns out that there is some problem in the PCI interface. After installing
the new board, my software drivers worked perfectly on both sides of the
interface. This lasted about two days, then I started seeing the same
symptoms on the new board that I had been seeing on the old one. This has to
be a hardware issue, and the only possibility that I or my client can see is
some kind of mismatch on that bus that is causing a failure.

It seems that my client was wiring these jumpers in place to pull up some pins
on the GPIO bus, making it a PCI bus, by wiring directly to the +3.3V power
line without using a pullup resistor. I propose to him that this lack of a
resistor (and the consequent lack of current limiting on that pullup line).
However, I have not tested for that (he is testing for that now) and I am not
at all certain that this is right.

It also seems that we have uncovered a bug in the TI diagnostic program; the
original "symptom" we were seeing involved memory errors, accompanied by a
failure of Flash indicated by the TI diagnostic software. Seems that TI now
claims that this indicated failure is spurious because they were not properly
testing the case where the GPIO bus had been configured as a PCI bus, and one
pin that floats on the GPIO bus was being pulled for the PCI case, resulting
in a spurious indication of failure.

That may be true, but Code Composer is mature enough that I am a bit
skeptical.

> > > Now, while working with the previous board, I had added a number of
> > > code hacks while trying to deal with the inconsistency in behavior,
> > > before finally determining absolutely that mine was a hardware problem.
> > >
> > > So, I removed these hacks. Instantly, the thing ran exactly the way I
> > > wanted it to run.
> > >
> > > My PCI DMA Master code is derived from the asynch_pcitest code provided
> > > with the driver development environment, and described in spru616.pdf,
> > > and it now works...almost.
> >
> > I assume that your DSP-side code does master reads from the host, is it
> > close to the truth?
> >
> > > Well, it DOES work, but there is a bug which will become a problem
> > > eventually (though I'll get through my demo on Monday).
>
> Its not Monday any more, I wonder how the demo went... :-)
>
My demo to my client went as planned. His demo to his client (who is funding
the whole thing) got postponed.

> > > The one problem I still have that I know about is this. My Linux driver
> > > (written from scratch) triggers an interrupt to the DSP to tell it to
> > > do a DMA transfer. The DSP reconfigures the interrupts and registers as
> > > required (the async code does this), transfers the data, then my
> > > program reconfigures the interrupts to what I need and sends an
> > > interrupt to the Linux system telling it the data is transferred.
> >
> > Can you be more specific here? I am not certain what do you mean by
> > reconfiguring interrupts. Does the DSP-side program remaps interrupts
> > on the fly? What do you mean by DMA transfer? Is it simply a PCI Master
> > read or you've set up a separate DMA transfer to move the data that had
> > been read by the PCI controller into an input buffer from this
> > intermediate buffer into a final destination?
>
> I agree with Andrew. It is not clear what you are trying to accomplish.
>
Basically, I am integrating both sides of this interface. On the Linux side,
I wrote a driver from the ground up, and it works. On the DSP side, I am
unfamiliar with the environment and needed something working pretty quickly.

I need a program on the DSP side which will function as a PCI bus master and
do a DMA transfer of arbitrary size to a specified buffer on the Linux host.
When I say "arbitrary size", that is what I mean; presently I have provided a
single 4 meg buffer on the Linux host but intend to add another one, and I am
using two 4 Meg pages on the DSP SDRAM as data buffers, and my ultimate goal
is to double-buffer the DMA transfer, and ping-pong back and forth between
the buffers for an indefinite period of time with the DSP doing data
processing on the buffer which is NOT currently being transferred and the
Linux host emptying the buffer on its side of the interface which is NOT
currently being filled.

There is a demo program called async_pci that is provide by TI and uses the
async extensions to the GIO PCI interface, as described in SPRU616.pdf. I
started with that demo program, chopped out the demo portions that I didn't
need, added an interrupt handler so I could trigger it from Linux, put in the
necessary semaphores and so forth, and ran with it.

The asynchronous capability of this driver is particularly appealing to me; I
gain the capability to queue up 64K transfers (the maximum the hardware will
do in one burst) up to the size of the 4 meg buffer and have them run off
with minimal CPU intervention until the entire buffer is transferred. This
frees the CPU for other things.

This demo program uses a PCI IOM "mini-driver" named c64xx_pci.c (and an
associated .h file), which is also provided by TI. I have found and fixed
three bugs in this code - it simply didn't work right as shipped. This
mini-driver is invoked from a library (CSL library) provided by TI for which
I do not have the source. Thus, my calls to, for instance, ASYNC_write get
translated into GIO calls, then pass through the CSL library and reappear in
c64xx_pci.c. I can, of course, do source level debugging in my top-level
code and in c64xx_pci.c, but not in the intermediate level which is a
library.

I can read/write assembler, but given this is object code on a platform which
I am just learning, sorting out exactly what it is doing is a very, very time
consuming process.

Now, I configure interrupts so that INT13 is my Host to DSP interrupt, which
I use to trigger an interrupt handler which then kicks off my DMA transfer
code, which is heavily derived from the async_pci demo code. Over the course
of the GIO calls, routines in c64xx_pci.c are repeatedly invoked, and some of
those routines reconfigure the interrupt structure in a fashion that is
apparently required by the ASYNC system. The interrupts are apparently not
properly restored to their original state when the ASYNC system is finished,
and this requires me to reconfigure them when my program is done with all
transfers.

After all transfers for this buffer are completed, my driver then sets a DSP
to Host interrupt so that the Linux host knows a buffer has been filled
(which may or may not be the end of the transfer, but Linux has to empty that
buffer).

All of this appears to be working, but I am getting some counter not being
reset, and as a consequence the DSP to Host interrupt is getting set N times,
where N is the number of times my DSP driver has gone through the main
transfer loop since the last time the DSP was powered up.

Why is this happening? I have no idea. I actually trigger the interrupt with
the PCI_dspIntReqSet() routine out of the CSL library.

Actually, the entire segment of code which resets the interrupts to the way I
want them, then triggers the interrupt is this:

IRQ_disable(IRQ_EVT_DSPINT);
IRQ_map(IRQ_EVT_DSPINT, 13);
IRQ_enable(IRQ_EVT_DSPINT);
*datastatusregister = *datastatusregister | 0x2;
PCI_dspIntReqClear();
PCI_dspIntReqSet();

This happens at the bottom of the main processing loop and immediately after
this the program branches back to the top of that loop and sleeps on a
semaphore waiting for the next interrupt from the host.
> > > Now, my Linux driver expects to share the interrupt, and if the
> > > interrupt really is for it, it tests to see if the interrupt is valid.
> > > If the interrupt is not valid, it writes a message into the log saying
> > > that it received an interrupt for an unknown reason, then clears the
> > > interrupt and takes no action.
> >
> > To share the interrupt with whom? Do you mean that there are several PCI
> > devices on your host machine, or there are several processes on the DSP
> > side that may assert the PCI interrupt? How does your host-side code
> > decides on the interrupt is for it and is valid?
>
> Andrew,
> This is part of the PCI spec. Two separate PCI drivers must be able
> to share a single interrupt. I don't know if you were around in the
> early days of PCs transistioning to PCI. You sometimes [dpending on
> interrupt assignments] couldn't run certain "board combinations" [it
> was actually driver combinations].
>
> mikedunn
>
Yes. Presently only the one device is using the interrupt, but I am not
willing to assume this will always be the case. My Linux driver incorporates
the code to process the interrupt to determine if the interrupt really is for
it, and if not it just passes it on.

> > The problem looks for me like a sync problem. The code (both host and DSP
> > side) do not work in concert, do not synchronize their actions with each
> > other. Have you thought over the communication protocol, ready/not ready
> > /ack software signaling?
> >
> > Rgds,
> >
> > Andrew
> >
Oh my yes. I have taken over several SDRAM memory locations on the DSP side
of the interface for status information which is tested/set/cleared as
appropriate by the host or the dsp, and the whole thing is interrupt driven,
with tests on both sides for valid interrupts. The major problem at this
time is that for some unknown reason the DSP interrupts, and interrupts, and
interrupts. The Linux driver discards these spurious interrupts, but at some
point they'll slow the bus down enough that it matters.

> > > So, here is the bug. After loading the DSP program, the first time my
> > > Linux client program orders the DSP to do a DMA transfer, I get one
> > > message in the Linux log file saying that an interrupt was received for
> > > an unknown reason.
> > >
> > > The second time my client program orders a DMA transfer, I get two
> > > messages in the log file about an interrupt for an unknown reason
> > >
> > > The tenth time, I get ten messages in the log (actually I get one
> > > message, then another message which says: "the previous message was
> > > repeated 9 times").
> > >
> > > The 100th time, 100 messages.
> > >
> > > What this means is that something someplace in the DSP is counting and
> > > not clearing, and it is setting NOT EMIFA as many times as it has in
> > > its count. Eventually, this is going to slow down the PCI bus to a
> > > noticeable degree.
> > >
> > > Given that I am using the async demo code (so far, I might add, I have
> > > found and fixed no fewer than three bugs in that code) as the basis for
> > > the DSP side of my PCI interface driver, does anyone have any idea what
> > > is counting?
> > >
> > > On Wednesday 24 October 2007 12:08:22 you wrote:
> > >> Jim,
> > >>
> > >> On 10/24/07, jim wrote:
> > >>> On Wednesday 24 October 2007 11:06:48 you wrote:
> > >>>> Jim,
> > >>>>
> > >>>> On 10/24/07, jim wrote:
> > >>>>> Well, I did find at least part of my problem.
> > >>>>>
> > >>>>> There seems to be a hardware issue with the memory address/refresh
> > >>>>> logic. I've been wrestling with inconsistent behavior, and while I
> > >>>>> have strongly suspected hardware, that isn't an easy call to make
> > >>>>> particularly when programming at such a low level.
> > >>>>
> > >>>> Make sure that you go through the EMIF setup parameters, clocks,
> > >>>> etc. to be sure that it is setup correctly. If you used the
> > >>>> 'delivered EMIF settings' and changed any of the clock
> > >>>> configuration, your refresh rate could be too slow.
> > >>>>
> > >>>> mikedunn
> > >>>
> > >>> I will certainly look at that. I would not have thought it would be
> > >>> possible to configure the refresh rate from outside. It is certainly
> > >>> not something I would have ever looked for.
> > >>
> > >> I am not sure what you mean by "from outside" [from outside the
> > >> RAM??]. Hopefully your [or someone's] low level initialization code is
> > >> taking care of the EMIF setup. Are you using DSP/BIOS?? or some other
> > >> executive??
> > >>
> > >>> That actually is a possibility???
> > >>
> > >> Yes, and I have the "have I lost my mind??" experience and the gray
> > >> hair to prove it. I abstain from pulling my hair out :-)
> > >>
> > >>>>> However, I can now document some random changes in program code
> > >>>>> (which cannot be accounted for by wild pointers), and using code
> > >>>>> composer, I watched the program change a local unsigned int
> > >>>>> variable (which it was supposed to do), which also caused the next
> > >>>>> local unsigned int variable on the same stack to change as well to
> > >>>>> the same value (which should be impossible). The variable that was
> > >>>>> supposed to be changed was a return from an exec function
> > >>>>> (HWI_disable), so if this change is due to some pointer problem,
> > >>>>> the problem must be in the HWI_disable function.
> > >>>>>
> > >>>>> This accounts for a lot of things, and I still have some hair left,
> > >>>>> having not pulled it all out.
> > >>>>>
> > >>>>> On Tuesday 23 October 2007 11:42:31 you wrote:
> > >>>>>> Hi Jim,
> > >>>>>>
> > >>>>>> Please look at the page 77 of SPRU581C.pdf, on the bit 4 INTRST of
> > >>>>>> RSTSRC. It says that "This bit must be asserted before another
> > >>>>>> host interrupt can be generated."
> > >>>>>>
> > >>>>>> Next, it would help to parse all the bits in the PCIIS (even if
> > >>>>>> they are all disabled in the PCIIEN) and clear them in the PCI
> > >>>>>> ISR.
> > >>>>>>
> > >>>>>> Third, it is recommended to parse an interrupt source register
> > >>>>>> (any of them, not only the PCIIS) in a loop inside an ISR and
> > >>>>>> clear any set bits until the register becomes zero, e.g. for the
> > >>>>>> PCI:
> > >>>>>>
> > >>>>>> volatile uint32 temp;
> > >>>>>>
> > >>>>>> while (temp = PCIIS) // this reads the PCIIS
> > >>>>>> {
> > >>>>>> test if bit[0], ..., [n] is set
> > >>>>>> if set, clear bit[0], ... [n] and perform the necessary
> > >>>>>> actions, e.g: if bit[3] HOSTSW was set then clear bit[4] INTRST in
> > >>>>>> RSTSRC }
> > >>>>>>
> > >>>>>> // PCIIS is clear now, exit the ISR
> > >>>>>>
> > >>>>>> Hope this helps,
> > >>>>>>
> > >>>>>> Andrew
> > >>>>>>
> > >>>>>>> 11a. ISR falls off cliff...sometimes.
> > >>>>>>> Posted by: "jim" j...@justsosoftware.com jiml8
> > >>>>>>> Date: Mon Oct 22, 2007 8:42 pm ((PDT))
> > >>>>>>>
> > >>>>>>> I have defined an ISR that responds to an interrupt on the PCI
> > >>>>>>> bus from a host. The current code for this ISR is this:
> > >>>>>>>
> > >>>>>>> interrupt void DMAtoHost(void)
> > >>>>>>> {
> > >>>>>>> unsigned int intval,*datastatusregister;
> > >>>>>>> puts("interrupted\n");
> > >>>>>>> datastatusregister = (unsigned int *)DATASTATUSREGISTER;
> > >>>>>>> intval = *datastatusregister & 0x04;
> > >>>>>>> if(intval == 4) { /* is this a host interrupt */
> > >>>>>>> intval = *datastatusregister;
> > >>>>>>> intval = intval & 0xfffffffb;
> > >>>>>>> *datastatusregister = intval; /* if so clear it */
> > >>>>>>> SEM_post(HostDMASem);
> > >>>>>>> puts("resetting PCIIS\n");
> > >>>>>>> }
> > >>>>>>> PCI_RSET(PCIIS,0x00000008);
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> Now, DATASTATUSREGISTER is a memory location on the DSP SDRAM
> > >>>>>>> that I have taken over for tracking status information between my
> > >>>>>>> Linux driver and my DSP interface. Here I am testing a flag in
> > >>>>>>> that register to make sure that this interrupt really was set by
> > >>>>>>> the host (my linux driver both sets the interrupt and sets this
> > >>>>>>> flag to tell that it did it), and if so, I clear that location as
> > >>>>>>> well as clearing the PCIIS register.
> > >>>>>>>
> > >>>>>>> Basically, if this really was an interrupt from the host, this
> > >>>>>>> ISR posts to a semaphore called HostDMASem, and there is a task
> > >>>>>>> sleeping on this semaphore waiting to be told to transfer data.
> > >>>>>>>
> > >>>>>>> I have defined this ISR statically using the Code Composer
> > >>>>>>> configuration tool, and it is hooked to interrupt 13.
> > >>>>>>>
> > >>>>>>> I have tried various ways to set that semaphore; my current
> > >>>>>>> iteration has the semaphore defined statically using the
> > >>>>>>> configuration tool, and it is initialized as one of the very
> > >>>>>>> first things done in the main routine of the program when it
> > >>>>>>> starts, like this:
> > >>>>>>>
> > >>>>>>> SEM_new(HostDMASem,0);
> > >>>>>>>
> > >>>>>>> That main task then spawns a new task called buildall_tsk, which
> > >>>>>>> is the task which winds up sleeping on the semaphore. This task
> > >>>>>>> arrives at the semaphore through a subroutine call, like this:
> > >>>>>>>
> > >>>>>>> void WaitForDMA(void)
> > >>>>>>> {
> > >>>>>>> Bool semstatus;
> > >>>>>>> SEM_reset(HostDMASem,0);
> > >>>>>>> semstatus = SEM_pend(HostDMASem,SYS_FOREVER);
> > >>>>>>> if(!semstatus){puts("semaphore bombed\n");} else {puts("semaphore
> > >>>>>>> worked\n");}
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> My problem is this. When an interrupt from the host is set, this
> > >>>>>>> ISR invokes apparently correctly. However, sometimes when it
> > >>>>>>> reaches the end, it falls off the cliff (apparently the semaphore
> > >>>>>>> is not being recognized properly or some such) and nothing
> > >>>>>>> happens.
> > >>>>>>>
> > >>>>>>> Further, when the routine falls off the cliff, subsequent
> > >>>>>>> interrupts are apparently ignored; when the routine falls off the
> > >>>>>>> cliff, it doesn't get invoked again until I completely reset the
> > >>>>>>> system (which sometimes involves cycling power as well as
> > >>>>>>> restarting Code Composer.
> > >>>>>>>
> > >>>>>>> This seems to be erratic and I can't define specific conditions
> > >>>>>>> that cause it or not. I suspecte some initialization thing that
> > >>>>>>> Code Composer is doing, but I have no idea what.
> > >>>>>>>
> > >>>>>>> The card is a Spectrum Digital 6416 card.
> > >>>>>>>
> > >>>>>>> I am tearing my hair out over this, and trust me; I don't look
> > >>>>>>> good bald. Anyone here have any idea what is going on?
> >
Sorry for late response; I have been having computer problems.

On Sunday 28 October 2007 11:02:28 you wrote:
> Jim-
>
> > ... And now...the REST of the story!
> >
> > My client came up with a brand new board, and we wired it to be a PCI
> > interface (rather than GPIO) and plugged it in. Behavior of this new
> > board was consistent; it would transfer data two or three times, then
> > hang. I checked the initialization EMIF stuff; the board was set to run
> > at 500 MHz when it was a 1000 MHz board. So I turned up the speed.
> > Behavior remained exactly the same as it had been; two or three correct
> > transfers, then hang.
>
> If the 6416 runs at 1 GHz, what is your external mem speed? 641x supports
> SDRAM, not DDR2, so your mem speed can't be more than 166 MHz, probably
> better to use 125 to 133 MHz.
>
Memory is running at whatever the card sets it to; I have not set that and
didn't see any way to do so.

> Also what do you mean by GPIO? There is no effective way to use
> host-to-DSP interface using GPIO.
>
Spectrum Digital board provides a GPIO interface which can be configured as
PCI by soldering on some jumpers.

> The basic way to know your mem transfer is truly working is to write an
> area of DSP internal memory, then write another area with different
> address, then go back and read the first area. This avoids any possible
> "bus hold" or "stale data" issue, whether at driver, bus, or DSP interface
> level. I would think this would be a necessary requirement for you to give
> a solid demo. If you do this, what do you get?
>
On the old card, response was erratic. I first defined the card as having a
memory problem when this didn't work. Long story, but changing cards seems
to have fixed that problem.

> -Jeff
Correction for woefully sloppy language.

I wrote:

>It seems that my client was wiring these jumpers in place to pull up some
>pins on the GPIO bus, making it a PCI bus, by wiring directly to the +3.3V
>power line without using a pullup resistor. I propose to him that this lack
>of a resistor (and the consequent lack of current limiting on that pullup
>line).

Which should be finished. "I propose to him that this lack of a resistor (and
the consequent lack of current limiting on that pullup line)" ... is a
possible cause of the problem.
I wrote:

> It also seems that we have uncovered a bug in the TI diagnostic program;
> the original "symptom" we were seeing involved memory errors, accompanied
> by a failure of Flash indicated by the TI diagnostic software. Seems that
> TI now claims that this indicated failure is spurious because they were not
> properly testing the case where the GPIO bus had been configured as a PCI
> bus, and one pin that floats on the GPIO bus was being pulled for the PCI
> case, resulting in a spurious indication of failure.
>
Which should be made more explicit. The indicated Flash memory failures are
what TI is labeling spurious because of a pin being pulled rather than
floating; the memory failures were real.

On Tuesday 30 October 2007 13:27:01 you wrote:

> >>> Now, while working with the previous board, I had added a number of
> >>> code hacks while trying to deal with the inconsistency in behavior,
> >>> before finally determining absolutely that mine was a hardware problem.
> >>>
> >>> So, I removed these hacks. Instantly, the thing ran exactly the way I
> >>> wanted it to run.
> >>>
> >>> My PCI DMA Master code is derived from the asynch_pcitest code provided
> >>> with the driver development environment, and described in spru616.pdf,
> >>> and it now works...almost.
> >>
> >> I assume that your DSP-side code does master reads from the host, is it
> >> close to the truth?
> >>

One important clarification.

I have the DSP configured to do DMA Master WRITES (Not reads) from the DSP to
the Linux host. This DSP is going to be buried deep in a system which does a
LOT of things. Actually, data will be arriving at the DSP via a serial port
(I get to write that code next) from the backend of an antenna/downconvert
system. It gets processed in the DSP based upon the mode of operation, then
passed on to the Linux host for further processing, storage, and display.

I want the whole thing to be asynchronous; any synchronicity is going to bite
me in the a$$ further down the road.
Jim L-

> Sorry for late response; I have been having computer problems.
>
> On Sunday 28 October 2007 11:02:28 you wrote:
> > Jim-
> >
> > > ... And now...the REST of the story!
> > >
> > > My client came up with a brand new board, and we wired it to be a PCI
> > > interface (rather than GPIO) and plugged it in. Behavior of this new
> > > board was consistent; it would transfer data two or three times, then
> > > hang. I checked the initialization EMIF stuff; the board was set to run
> > > at 500 MHz when it was a 1000 MHz board. So I turned up the speed.
> > > Behavior remained exactly the same as it had been; two or three correct
> > > transfers, then hang.
> >
> > If the 6416 runs at 1 GHz, what is your external mem speed? 641x supports
> > SDRAM, not DDR2, so your mem speed can't be more than 166 MHz, probably
> > better to use 125 to 133 MHz.
> >
> Memory is running at whatever the card sets it to; I have not set that and
> didn't see any way to do so.

EMIF clock rate is specified in EMIF control registers; see this document
"TMS320C6000 DSP External Memory Interface (EMIF):

http://focus.ti.com/lit/ug/spru266e/spru266e.pdf

See specifically Table 4-9 in section 4.5, look for mention of EMIF clock rates and
ECLKOUT1 and ECLKOUT2 pins.

> > Also what do you mean by GPIO? There is no effective way to use
> > host-to-DSP interface using GPIO.
> >
> Spectrum Digital board provides a GPIO interface which can be configured as
> PCI by soldering on some jumpers.

This is the "HPI Expansion Header", not GPIO. When used as a PCI interface, the J1
header is intended for daughtercards, not for a PC interface. The PCI spec requires
certain minimum trace lengths and other aspects concerning PCI bus loading. With a
PC, it might work if your client made an adapter card that places the PCI mating
connector for J1 very near the motherboard. That would require the DSK board to fit
inside the PC, while still obtaining its power through an external source -- a very
risky situation indeed. If this configuration doesn't zap either the motherboard or
the DSK at some point, I would be surprised.

> > The basic way to know your mem transfer is truly working is to write an
> > area of DSP internal memory, then write another area with different
> > address, then go back and read the first area. This avoids any possible
> > "bus hold" or "stale data" issue, whether at driver, bus, or DSP interface
> > level. I would think this would be a necessary requirement for you to give
> > a solid demo. If you do this, what do you get?
> >
> On the old card, response was erratic. I first defined the card as having a
> memory problem when this didn't work. Long story, but changing cards seems
> to have fixed that problem.

The mem test method I described, if successful with various areas and types of
memories and range of block lengths from single transfer to very long blocks, is the
gold standard for testing C6x board drivers.

-Jeff
On Friday 02 November 2007 07:46:06 you wrote:
> Jim L-
>
> > Sorry for late response; I have been having computer problems.
> >
> > On Sunday 28 October 2007 11:02:28 you wrote:
> > > Jim-
> > >
> > > > ... And now...the REST of the story!
> > > >
> > > > My client came up with a brand new board, and we wired it to be a PCI
> > > > interface (rather than GPIO) and plugged it in. Behavior of this new
> > > > board was consistent; it would transfer data two or three times, then
> > > > hang. I checked the initialization EMIF stuff; the board was set to
> > > > run at 500 MHz when it was a 1000 MHz board. So I turned up the
> > > > speed. Behavior remained exactly the same as it had been; two or
> > > > three correct transfers, then hang.
> > >
> > > If the 6416 runs at 1 GHz, what is your external mem speed? 641x
> > > supports SDRAM, not DDR2, so your mem speed can't be more than 166 MHz,
> > > probably better to use 125 to 133 MHz.
> >
> > Memory is running at whatever the card sets it to; I have not set that
> > and didn't see any way to do so.
>
> EMIF clock rate is specified in EMIF control registers; see this document
> "TMS320C6000 DSP External Memory Interface (EMIF):
>
> http://focus.ti.com/lit/ug/spru266e/spru266e.pdf
>
> See specifically Table 4-9 in section 4.5, look for mention of EMIF clock
> rates and ECLKOUT1 and ECLKOUT2 pins.
>
> > > Also what do you mean by GPIO? There is no effective way to use
> > > host-to-DSP interface using GPIO.
> >
> > Spectrum Digital board provides a GPIO interface which can be configured
> > as PCI by soldering on some jumpers.
>
> This is the "HPI Expansion Header", not GPIO. When used as a PCI
> interface, the J1 header is intended for daughtercards, not for a PC
> interface. The PCI spec requires certain minimum trace lengths and other
> aspects concerning PCI bus loading. With a PC, it might work if your
> client made an adapter card that places the PCI mating connector for J1
> very near the motherboard. That would require the DSK board to fit inside
> the PC, while still obtaining its power through an external source -- a
> very risky situation indeed. If this configuration doesn't zap either the
> motherboard or the DSK at some point, I would be surprised.
>
I stand corrected; you are right. It is the HP interface. At this time my
client has the hardware (and most of the hardware docs) and I don't. The
actual layout does not have us installing the spectrum digital board into a
standard PC; we are developing using an unboxed single board computer (which
physically is smaller than the spectrum digital board) and a custom-wired
header to properly mate the PCI interfaces. Overall cable lengths are
completely acceptable. The SBC has its own power, as does the Spectrum
Digital board (both brick power supplies).

I actually thought originally there might be a bucking regulator problem
between the two boards, but the interface spec on the Spectrum Digital board
does not call out any power being transferred between cards. However one
light does come on on the Spectrum Digital card when the SBC is powered up,
even if the Spectrum Digital card is not powered, so power is making it
through somehow.

My job has just grown; I get to sort out the evident hardware issue between
the two cards. I can do this probably (if I have enough equipment or if it
is simple enough) and I am going to start by making sure there is no power
being transferred between the cards, just signals and ground. I rather
strongly suspect that this will solve the problems we are having.

> > > The basic way to know your mem transfer is truly working is to write an
> > > area of DSP internal memory, then write another area with different
> > > address, then go back and read the first area. This avoids any
> > > possible "bus hold" or "stale data" issue, whether at driver, bus, or
> > > DSP interface level. I would think this would be a necessary
> > > requirement for you to give a solid demo. If you do this, what do you
> > > get?
> >
> > On the old card, response was erratic. I first defined the card as
> > having a memory problem when this didn't work. Long story, but changing
> > cards seems to have fixed that problem.
>
> The mem test method I described, if successful with various areas and types
> of memories and range of block lengths from single transfer to very long
> blocks, is the gold standard for testing C6x board drivers.
>
Yes, well...actually, that is the gold standard for testing memory, period.
Hello Jim,

Before you finally sorted the hardware issues, a quick question:
could you set up a bit pattern on the both sides, do two PCI
transfers (PC->DSP and DSP->PC) and check with debuggers that
both transferred data are correct. I do not mean to send MBs,
4-8 bytes is fine for now to assume that PCI is ok.

Thanks for a deliberate description of how your system works.
(Apparently is does not do it in a manner you've planned it :)
Assuming that linear growth of the number of interrupts sent
to the host is not a hardware issue, the problem is in the
interaction of the application code with the md code. Thus
I need to look at the code, as even the detailed description
you posted is not enough to track the problem down.

I need three parts: the minidriver c64xx_pci (I've got probably
a very similar one, evm642_pci.c, but without your modifications),
your PCI interrupt routine and the loop that initiates PCI
transactions.

What really makes me curious is the linear growth of the number
of the DSP->PC interrupts. So far I think that this is a certain
combination of semaphore signaling(?), switching between two interrupt
routines(?) and firing of 64x64K packets(?) that makes it happen. Without
looking at the code it is really difficult to figure it out.

Does the c64xx_pci code have a line "#define ISR_VECTOR_ID 4" ?
Can you check that the CCS debugger does not enter these two
cases (if they are present in the c64xx_pci code):
case C64XX_PCI_DSP_INT_REQ_SET:
or
case C64XX_PCI_DSP_INT_REQ_CLEAR:
inside a mdControlChan() function?

What is the value of the size parameter to the GIO_write() call,
in case you use it?

Regards,

Andrew

> Subject: Re: ISR falls off cliff...sometimes.
> Posted by: "jim" j...@justsosoftware.com jiml8
> Date: Thu Nov 1, 2007 3:14 pm ((PDT))
>
> Sorry for late response. I have been having some severe computer problems.
> Comes from dropping the system HD onto a concrete floor a couple of months
> ago while replacing power supply, then not noticing that some I/O errors were
> appearing on drive, then trying to do major system software update. I use
> SCSI drives in my workstation, and they practically never fail - and when
> they do they tend to do it gracefully. My fault that I wasn't running
> diagnostics on this one periodically, which would have given me a heads-up
> about problems coming.
>
> I have it more or less sorted out now, though.
>
> One important clarification.
>
> I have the DSP configured to do DMA Master WRITES (Not reads) from the DSP to
> the Linux host. This DSP is going to be buried deep in a system which does a
> LOT of things. Actually, data will be arriving at the DSP via a serial port
> (I get to write that code next) from the backend of an antenna/downconvert
> system. It gets processed in the DSP based upon the mode of operation, then
> passed on to the Linux host for further processing, storage, and display.
>
> I want the whole thing to be asynchronous; any synchronicity is going to bite
> me in the a$$ further down the road.
> On Tuesday 30 October 2007 08:20:05 you wrote:
>> Jim,
>>
>> On 10/29/07, Andrew Nesterov wrote:
>>>> Subject: Re: ISR falls off cliff...sometimes.
>>>> Posted by: "jim" j...@justsosoftware.com jiml8
>>>> Date: Sun Oct 28, 2007 7:20 am ((PDT))
>>>>
>>>> ... And now...the REST of the story!
>>>>
>>>> My client came up with a brand new board, and we wired it to be a PCI
>>>> interface (rather than GPIO) and plugged it in. Behavior of this new
>>>> board was consistent; it would transfer data two or three times, then
>>>> hang. I checked the initialization EMIF stuff; the board was set to run
>>>> at 500 MHz when it was a 1000 MHz board. So I turned up the speed.
>>>> Behavior remained exactly the same as it had been; two or three correct
>>>> transfers, then hang.
>>>
>>> I saw that Spectrum Digital's C6416 EVM does not have a PCI connector,
>>> does it? So you needed to to wire its on board connector into an external
>>> PCI connector, right? Perhaps this on-board connector hass muxed GPIO and
>>> PCI pins - this is probably is what Jeff was asking, if I got him
>>> correct.
>>
>> The brand new board was designed by the customer. Correct??
>> The Spectrum Digital board was probably using HPI as opposed to the
>> stated 'GPIO'. Correct??
>>
>> Just trying to get calibrated.
>>
> No, the new board was another Spectrum DIgital board. The GPIO connector
> provided on that board can be turned into a PCI connector by the addition of
> a couple of jumpers.
>
> Turns out that there is some problem in the PCI interface. After installing
> the new board, my software drivers worked perfectly on both sides of the
> interface. This lasted about two days, then I started seeing the same
> symptoms on the new board that I had been seeing on the old one. This has to
> be a hardware issue, and the only possibility that I or my client can see is
> some kind of mismatch on that bus that is causing a failure.
>
> It seems that my client was wiring these jumpers in place to pull up some pins
> on the GPIO bus, making it a PCI bus, by wiring directly to the +3.3V power
> line without using a pullup resistor. I propose to him that this lack of a
> resistor (and the consequent lack of current limiting on that pullup line).
> However, I have not tested for that (he is testing for that now) and I am not
> at all certain that this is right.
>
> It also seems that we have uncovered a bug in the TI diagnostic program; the
> original "symptom" we were seeing involved memory errors, accompanied by a
> failure of Flash indicated by the TI diagnostic software. Seems that TI now
> claims that this indicated failure is spurious because they were not properly
> testing the case where the GPIO bus had been configured as a PCI bus, and one
> pin that floats on the GPIO bus was being pulled for the PCI case, resulting
> in a spurious indication of failure.
>
> That may be true, but Code Composer is mature enough that I am a bit
> skeptical.
>
>>>> Now, while working with the previous board, I had added a number of
>>>> code hacks while trying to deal with the inconsistency in behavior,
>>>> before finally determining absolutely that mine was a hardware problem.
>>>>
>>>> So, I removed these hacks. Instantly, the thing ran exactly the way I
>>>> wanted it to run.
>>>>
>>>> My PCI DMA Master code is derived from the asynch_pcitest code provided
>>>> with the driver development environment, and described in spru616.pdf,
>>>> and it now works...almost.
>>>
>>> I assume that your DSP-side code does master reads from the host, is it
>>> close to the truth?
>>>
>>>> Well, it DOES work, but there is a bug which will become a problem
>>>> eventually (though I'll get through my demo on Monday).
>>
>> Its not Monday any more, I wonder how the demo went... :-)
>>
> My demo to my client went as planned. His demo to his client (who is funding
> the whole thing) got postponed.
>
>>>> The one problem I still have that I know about is this. My Linux driver
>>>> (written from scratch) triggers an interrupt to the DSP to tell it to
>>>> do a DMA transfer. The DSP reconfigures the interrupts and registers as
>>>> required (the async code does this), transfers the data, then my
>>>> program reconfigures the interrupts to what I need and sends an
>>>> interrupt to the Linux system telling it the data is transferred.
>>>
>>> Can you be more specific here? I am not certain what do you mean by
>>> reconfiguring interrupts. Does the DSP-side program remaps interrupts
>>> on the fly? What do you mean by DMA transfer? Is it simply a PCI Master
>>> read or you've set up a separate DMA transfer to move the data that had
>>> been read by the PCI controller into an input buffer from this
>>> intermediate buffer into a final destination?
>>
>> I agree with Andrew. It is not clear what you are trying to accomplish.
>>
> Basically, I am integrating both sides of this interface. On the Linux side,
> I wrote a driver from the ground up, and it works. On the DSP side, I am
> unfamiliar with the environment and needed something working pretty quickly.
>
> I need a program on the DSP side which will function as a PCI bus master and
> do a DMA transfer of arbitrary size to a specified buffer on the Linux host.
> When I say "arbitrary size", that is what I mean; presently I have provided a
> single 4 meg buffer on the Linux host but intend to add another one, and I am
> using two 4 Meg pages on the DSP SDRAM as data buffers, and my ultimate goal
> is to double-buffer the DMA transfer, and ping-pong back and forth between
> the buffers for an indefinite period of time with the DSP doing data
> processing on the buffer which is NOT currently being transferred and the
> Linux host emptying the buffer on its side of the interface which is NOT
> currently being filled.
>
> There is a demo program called async_pci that is provide by TI and uses the
> async extensions to the GIO PCI interface, as described in SPRU616.pdf. I
> started with that demo program, chopped out the demo portions that I didn't
> need, added an interrupt handler so I could trigger it from Linux, put in the
> necessary semaphores and so forth, and ran with it.
>
> The asynchronous capability of this driver is particularly appealing to me; I
> gain the capability to queue up 64K transfers (the maximum the hardware will
> do in one burst) up to the size of the 4 meg buffer and have them run off
> with minimal CPU intervention until the entire buffer is transferred. This
> frees the CPU for other things.
>
> This demo program uses a PCI IOM "mini-driver" named c64xx_pci.c (and an
> associated .h file), which is also provided by TI. I have found and fixed
> three bugs in this code - it simply didn't work right as shipped. This
> mini-driver is invoked from a library (CSL library) provided by TI for which
> I do not have the source. Thus, my calls to, for instance, ASYNC_write get
> translated into GIO calls, then pass through the CSL library and reappear in
> c64xx_pci.c. I can, of course, do source level debugging in my top-level
> code and in c64xx_pci.c, but not in the intermediate level which is a
> library.
>
> I can read/write assembler, but given this is object code on a platform which
> I am just learning, sorting out exactly what it is doing is a very, very time
> consuming process.
>
> Now, I configure interrupts so that INT13 is my Host to DSP interrupt, which
> I use to trigger an interrupt handler which then kicks off my DMA transfer
> code, which is heavily derived from the async_pci demo code. Over the course
> of the GIO calls, routines in c64xx_pci.c are repeatedly invoked, and some of
> those routines reconfigure the interrupt structure in a fashion that is
> apparently required by the ASYNC system. The interrupts are apparently not
> properly restored to their original state when the ASYNC system is finished,
> and this requires me to reconfigure them when my program is done with all
> transfers.
>
> After all transfers for this buffer are completed, my driver then sets a DSP
> to Host interrupt so that the Linux host knows a buffer has been filled
> (which may or may not be the end of the transfer, but Linux has to empty that
> buffer).
>
> All of this appears to be working, but I am getting some counter not being
> reset, and as a consequence the DSP to Host interrupt is getting set N times,
> where N is the number of times my DSP driver has gone through the main
> transfer loop since the last time the DSP was powered up.
>
> Why is this happening? I have no idea. I actually trigger the interrupt with
> the PCI_dspIntReqSet() routine out of the CSL library.
>
> Actually, the entire segment of code which resets the interrupts to the way I
> want them, then triggers the interrupt is this:
>
> IRQ_disable(IRQ_EVT_DSPINT);
> IRQ_map(IRQ_EVT_DSPINT, 13);
> IRQ_enable(IRQ_EVT_DSPINT);
> *datastatusregister = *datastatusregister | 0x2;
> PCI_dspIntReqClear();
> PCI_dspIntReqSet();
>
> This happens at the bottom of the main processing loop and immediately after
> this the program branches back to the top of that loop and sleeps on a
> semaphore waiting for the next interrupt from the host.
>>>> Now, my Linux driver expects to share the interrupt, and if the
>>>> interrupt really is for it, it tests to see if the interrupt is valid.
>>>> If the interrupt is not valid, it writes a message into the log saying
>>>> that it received an interrupt for an unknown reason, then clears the
>>>> interrupt and takes no action.
>>>
>>> To share the interrupt with whom? Do you mean that there are several PCI
>>> devices on your host machine, or there are several processes on the DSP
>>> side that may assert the PCI interrupt? How does your host-side code
>>> decides on the interrupt is for it and is valid?
>>
>> Andrew,
>> This is part of the PCI spec. Two separate PCI drivers must be able
>> to share a single interrupt. I don't know if you were around in the
>> early days of PCs transistioning to PCI. You sometimes [dpending on
>> interrupt assignments] couldn't run certain "board combinations" [it
>> was actually driver combinations].
>>
>> mikedunn
>>
> Yes. Presently only the one device is using the interrupt, but I am not
> willing to assume this will always be the case. My Linux driver incorporates
> the code to process the interrupt to determine if the interrupt really is for
> it, and if not it just passes it on.
>
>>> The problem looks for me like a sync problem. The code (both host and DSP
>>> side) do not work in concert, do not synchronize their actions with each
>>> other. Have you thought over the communication protocol, ready/not ready
>>> /ack software signaling?
>>>
>>> Rgds,
>>>
>>> Andrew
>>>
> Oh my yes. I have taken over several SDRAM memory locations on the DSP side
> of the interface for status information which is tested/set/cleared as
> appropriate by the host or the dsp, and the whole thing is interrupt driven,
> with tests on both sides for valid interrupts. The major problem at this
> time is that for some unknown reason the DSP interrupts, and interrupts, and
> interrupts. The Linux driver discards these spurious interrupts, but at some
> point they'll slow the bus down enough that it matters.
>
>>>> So, here is the bug. After loading the DSP program, the first time my
>>>> Linux client program orders the DSP to do a DMA transfer, I get one
>>>> message in the Linux log file saying that an interrupt was received for
>>>> an unknown reason.
>>>>
>>>> The second time my client program orders a DMA transfer, I get two
>>>> messages in the log file about an interrupt for an unknown reason
>>>>
>>>> The tenth time, I get ten messages in the log (actually I get one
>>>> message, then another message which says: "the previous message was
>>>> repeated 9 times").
>>>>
>>>> The 100th time, 100 messages.
>>>>
>>>> What this means is that something someplace in the DSP is counting and
>>>> not clearing, and it is setting NOT EMIFA as many times as it has in
>>>> its count. Eventually, this is going to slow down the PCI bus to a
>>>> noticeable degree.
>>>>
>>>> Given that I am using the async demo code (so far, I might add, I have
>>>> found and fixed no fewer than three bugs in that code) as the basis for
>>>> the DSP side of my PCI interface driver, does anyone have any idea what
>>>> is counting?
>>>>
>>>> On Wednesday 24 October 2007 12:08:22 you wrote:
>>>>> Jim,
>>>>>
>>>>> On 10/24/07, jim wrote:
>>>>>> On Wednesday 24 October 2007 11:06:48 you wrote:
>>>>>>> Jim,
>>>>>>>
>>>>>>> On 10/24/07, jim wrote:
>>>>>>>> Well, I did find at least part of my problem.
>>>>>>>>
>>>>>>>> There seems to be a hardware issue with the memory address/refresh
>>>>>>>> logic. I've been wrestling with inconsistent behavior, and while I
>>>>>>>> have strongly suspected hardware, that isn't an easy call to make
>>>>>>>> particularly when programming at such a low level.
>>>>>>>
>>>>>>> Make sure that you go through the EMIF setup parameters, clocks,
>>>>>>> etc. to be sure that it is setup correctly. If you used the
>>>>>>> 'delivered EMIF settings' and changed any of the clock
>>>>>>> configuration, your refresh rate could be too slow.
>>>>>>>
>>>>>>> mikedunn
>>>>>>
>>>>>> I will certainly look at that. I would not have thought it would be
>>>>>> possible to configure the refresh rate from outside. It is certainly
>>>>>> not something I would have ever looked for.
>>>>>
>>>>> I am not sure what you mean by "from outside" [from outside the
>>>>> RAM??]. Hopefully your [or someone's] low level initialization code is
>>>>> taking care of the EMIF setup. Are you using DSP/BIOS?? or some other
>>>>> executive??
>>>>>
>>>>>> That actually is a possibility???
>>>>>
>>>>> Yes, and I have the "have I lost my mind??" experience and the gray
>>>>> hair to prove it. I abstain from pulling my hair out :-)
>>>>>
>>>>>>>> However, I can now document some random changes in program code
>>>>>>>> (which cannot be accounted for by wild pointers), and using code
>>>>>>>> composer, I watched the program change a local unsigned int
>>>>>>>> variable (which it was supposed to do), which also caused the next
>>>>>>>> local unsigned int variable on the same stack to change as well to
>>>>>>>> the same value (which should be impossible). The variable that was
>>>>>>>> supposed to be changed was a return from an exec function
>>>>>>>> (HWI_disable), so if this change is due to some pointer problem,
>>>>>>>> the problem must be in the HWI_disable function.
>>>>>>>>
>>>>>>>> This accounts for a lot of things, and I still have some hair left,
>>>>>>>> having not pulled it all out.
>>>>>>>>
>>>>>>>> On Tuesday 23 October 2007 11:42:31 you wrote:
>>>>>>>>> Hi Jim,
>>>>>>>>>
>>>>>>>>> Please look at the page 77 of SPRU581C.pdf, on the bit 4 INTRST of
>>>>>>>>> RSTSRC. It says that "This bit must be asserted before another
>>>>>>>>> host interrupt can be generated."
>>>>>>>>>
>>>>>>>>> Next, it would help to parse all the bits in the PCIIS (even if
>>>>>>>>> they are all disabled in the PCIIEN) and clear them in the PCI
>>>>>>>>> ISR.
>>>>>>>>>
>>>>>>>>> Third, it is recommended to parse an interrupt source register
>>>>>>>>> (any of them, not only the PCIIS) in a loop inside an ISR and
>>>>>>>>> clear any set bits until the register becomes zero, e.g. for the
>>>>>>>>> PCI:
>>>>>>>>>
>>>>>>>>> volatile uint32 temp;
>>>>>>>>>
>>>>>>>>> while (temp = PCIIS) // this reads the PCIIS
>>>>>>>>> {
>>>>>>>>> test if bit[0], ..., [n] is set
>>>>>>>>> if set, clear bit[0], ... [n] and perform the necessary
>>>>>>>>> actions, e.g: if bit[3] HOSTSW was set then clear bit[4] INTRST in
>>>>>>>>> RSTSRC }
>>>>>>>>>
>>>>>>>>> // PCIIS is clear now, exit the ISR
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>>> 11a. ISR falls off cliff...sometimes.
>>>>>>>>>> Posted by: "jim" j...@justsosoftware.com jiml8
>>>>>>>>>> Date: Mon Oct 22, 2007 8:42 pm ((PDT))
>>>>>>>>>>
>>>>>>>>>> I have defined an ISR that responds to an interrupt on the PCI
>>>>>>>>>> bus from a host. The current code for this ISR is this:
>>>>>>>>>>
>>>>>>>>>> interrupt void DMAtoHost(void)
>>>>>>>>>> {
>>>>>>>>>> unsigned int intval,*datastatusregister;
>>>>>>>>>> puts("interrupted\n");
>>>>>>>>>> datastatusregister = (unsigned int *)DATASTATUSREGISTER;
>>>>>>>>>> intval = *datastatusregister & 0x04;
>>>>>>>>>> if(intval == 4) { /* is this a host interrupt */
>>>>>>>>>> intval = *datastatusregister;
>>>>>>>>>> intval = intval & 0xfffffffb;
>>>>>>>>>> *datastatusregister = intval; /* if so clear it */
>>>>>>>>>> SEM_post(HostDMASem);
>>>>>>>>>> puts("resetting PCIIS\n");
>>>>>>>>>> }
>>>>>>>>>> PCI_RSET(PCIIS,0x00000008);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Now, DATASTATUSREGISTER is a memory location on the DSP SDRAM
>>>>>>>>>> that I have taken over for tracking status information between my
>>>>>>>>>> Linux driver and my DSP interface. Here I am testing a flag in
>>>>>>>>>> that register to make sure that this interrupt really was set by
>>>>>>>>>> the host (my linux driver both sets the interrupt and sets this
>>>>>>>>>> flag to tell that it did it), and if so, I clear that location as
>>>>>>>>>> well as clearing the PCIIS register.
>>>>>>>>>>
>>>>>>>>>> Basically, if this really was an interrupt from the host, this
>>>>>>>>>> ISR posts to a semaphore called HostDMASem, and there is a task
>>>>>>>>>> sleeping on this semaphore waiting to be told to transfer data.
>>>>>>>>>>
>>>>>>>>>> I have defined this ISR statically using the Code Composer
>>>>>>>>>> configuration tool, and it is hooked to interrupt 13.
>>>>>>>>>>
>>>>>>>>>> I have tried various ways to set that semaphore; my current
>>>>>>>>>> iteration has the semaphore defined statically using the
>>>>>>>>>> configuration tool, and it is initialized as one of the very
>>>>>>>>>> first things done in the main routine of the program when it
>>>>>>>>>> starts, like this:
>>>>>>>>>>
>>>>>>>>>> SEM_new(HostDMASem,0);
>>>>>>>>>>
>>>>>>>>>> That main task then spawns a new task called buildall_tsk, which
>>>>>>>>>> is the task which winds up sleeping on the semaphore. This task
>>>>>>>>>> arrives at the semaphore through a subroutine call, like this:
>>>>>>>>>>
>>>>>>>>>> void WaitForDMA(void)
>>>>>>>>>> {
>>>>>>>>>> Bool semstatus;
>>>>>>>>>> SEM_reset(HostDMASem,0);
>>>>>>>>>> semstatus = SEM_pend(HostDMASem,SYS_FOREVER);
>>>>>>>>>> if(!semstatus){puts("semaphore bombed\n");} else {puts("semaphore
>>>>>>>>>> worked\n");}
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> My problem is this. When an interrupt from the host is set, this
>>>>>>>>>> ISR invokes apparently correctly. However, sometimes when it
>>>>>>>>>> reaches the end, it falls off the cliff (apparently the semaphore
>>>>>>>>>> is not being recognized properly or some such) and nothing
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>> Further, when the routine falls off the cliff, subsequent
>>>>>>>>>> interrupts are apparently ignored; when the routine falls off the
>>>>>>>>>> cliff, it doesn't get invoked again until I completely reset the
>>>>>>>>>> system (which sometimes involves cycling power as well as
>>>>>>>>>> restarting Code Composer.
>>>>>>>>>>
>>>>>>>>>> This seems to be erratic and I can't define specific conditions
>>>>>>>>>> that cause it or not. I suspecte some initialization thing that
>>>>>>>>>> Code Composer is doing, but I have no idea what.
>>>>>>>>>>
>>>>>>>>>> The card is a Spectrum Digital 6416 card.
>>>>>>>>>>
>>>>>>>>>> I am tearing my hair out over this, and trust me; I don't look
>>>>>>>>>> good bald. Anyone here have any idea what is going on?
>>>
As I investigate this thing further, it would appear that the isr defined in
the c64xx_pci.c routine is never being invoked. This implies (I suppose)
that the interrupts are not being set up properly by the mini driver.

I have constructed another version of this program, with my isr and the isr
provided in c64cc_pci.c combined into one routine. To do this I broke a lot
of the modularity of the minidriver, but for testing purposes that is OK.

This version also does not ever have the interrupt routine called while the
DMA transfers are underway. Also, this version interrupts exactly once then
apparently the interrupt does not get reset and is not again invoked.

On Wednesday 07 November 2007 07:16:16 you wrote:
> Hi Jim,
>
> Apparently your version of the mb is more recent than mine:
>
> /* "@(#) DDK 1.10.00.23 07-02-03 (ddk-b12)" */
>
> A quick question: where is ASYNC_read/write() definition? I didn't find it
> in neither BIOS nor CLS APIs.
>
> After a quick inspection of the code I think that the most dangerous place
> so far is in using TWO interrupt routines to process the same PCI
> interrupt. You've got to move your specific processing code into the md's
> ISR, under the relevant bits processing of the PCIIS.
>
> It is a real problem to try to switch between two interrupt processing
> routines on the fly, as you will never know which one is going to be
> used to process a logically the same event.
>
> More comments will follow.
>
> (My notes will always concern the DSP side, for simplicity and because
> I have no information on the host side logic).
>
> Rgds,
>
> Andrew
I continue to have computer problems. New hard drive is on order. :(

On Saturday 03 November 2007 04:30:51 you wrote:
> Hello Jim,
>
> Before you finally sorted the hardware issues, a quick question:
> could you set up a bit pattern on the both sides, do two PCI
> transfers (PC->DSP and DSP->PC) and check with debuggers that
> both transferred data are correct. I do not mean to send MBs,
> 4-8 bytes is fine for now to assume that PCI is ok.
>
I do that all the time, in both directions. It routinely happens several
times over the course of one transaction. If it wasn't working correctly,
the linux interface would complain, and the TI interface should complain.

> Thanks for a deliberate description of how your system works.
> (Apparently is does not do it in a manner you've planned it :)
> Assuming that linear growth of the number of interrupts sent
> to the host is not a hardware issue, the problem is in the
> interaction of the application code with the md code. Thus
> I need to look at the code, as even the detailed description
> you posted is not enough to track the problem down.
>
> I need three parts: the minidriver c64xx_pci (I've got probably
> a very similar one, evm642_pci.c, but without your modifications),
> your PCI interrupt routine and the loop that initiates PCI
> transactions.
>
Attached. As I continue to debug this, it looks more and more like there
could be some timing/synchronization issue in the software. This must be
happening someplace in the TI code, either due to a bug or due to my failure
to properly comprehend how it is supposed to be working. I have found bugs
in the TI code that grossly impaired its functioning; I cannot rule out
another bug that I haven't identified yet.

Basically, it seems that many of the 64K transfers are incomplete. Not all of
them, but (depending on how I set my configuration) as many as 9 out of 10 -
actually every tenth one is complete - or with other configurations, 2 out of
3 are short. I don't know for sure if this is hardware or a race condition
in the software (there shouldn't be ANY races), but a network analyzer will
answer the question for me.

> What really makes me curious is the linear growth of the number
> of the DSP->PC interrupts. So far I think that this is a certain
> combination of semaphore signaling(?), switching between two interrupt
> routines(?) and firing of 64x64K packets(?) that makes it happen. Without
> looking at the code it is really difficult to figure it out.
>
It is difficult to debug even WITH the code! :) I now have a network analyzer
sitting here, and when I figure out how to get the bloody thing to connect
(the custom PCI riser on this project is making that tough), I will be able
to look at the bus, which probably will help me sort this out.

I am not sure the callback function called out as necessary in the ASYNC
documentation is being called, and I do need to sort out how it should work
to synchronize this whole thing.
> Does the c64xx_pci code have a line "#define ISR_VECTOR_ID 4" ?

Yes.

> Can you check that the CCS debugger does not enter these two
> cases (if they are present in the c64xx_pci code):
> case C64XX_PCI_DSP_INT_REQ_SET:
> or
> case C64XX_PCI_DSP_INT_REQ_CLEAR:
> inside a mdControlChan() function?
>
It is not entering these cases.

> What is the value of the size parameter to the GIO_write() call,
> in case you use it?
>
This variable was not getting set by my code, although it was getting defined
and passed into the GIO calls. If I read the documentation correctly, it
should be redundant in my case.

> Regards,
>
> Andrew
>
> > Subject: Re: ISR falls off cliff...sometimes.
> > Posted by: "jim" j...@justsosoftware.com jiml8
> > Date: Thu Nov 1, 2007 3:14 pm ((PDT))
> >
> > Sorry for late response. I have been having some severe computer
> > problems. Comes from dropping the system HD onto a concrete floor a
> > couple of months ago while replacing power supply, then not noticing that
> > some I/O errors were appearing on drive, then trying to do major system
> > software update. I use SCSI drives in my workstation, and they
> > practically never fail - and when they do they tend to do it gracefully.
> > My fault that I wasn't running diagnostics on this one periodically,
> > which would have given me a heads-up about problems coming.
> >
> > I have it more or less sorted out now, though.
> >
> > One important clarification.
> >
> > I have the DSP configured to do DMA Master WRITES (Not reads) from the
> > DSP to the Linux host. This DSP is going to be buried deep in a system
> > which does a LOT of things. Actually, data will be arriving at the DSP
> > via a serial port (I get to write that code next) from the backend of an
> > antenna/downconvert system. It gets processed in the DSP based upon the
> > mode of operation, then passed on to the Linux host for further
> > processing, storage, and display.
> >
> > I want the whole thing to be asynchronous; any synchronicity is going to
> > bite me in the a$$ further down the road.
> >
> > On Tuesday 30 October 2007 08:20:05 you wrote:
> >> Jim,
> >>
> >> On 10/29/07, Andrew Nesterov wrote:
> >>>> Subject: Re: ISR falls off cliff...sometimes.
> >>>> Posted by: "jim" j...@justsosoftware.com jiml8
> >>>> Date: Sun Oct 28, 2007 7:20 am ((PDT))
> >>>>
> >>>> ... And now...the REST of the story!
> >>>>
> >>>> My client came up with a brand new board, and we wired it to be a PCI
> >>>> interface (rather than GPIO) and plugged it in. Behavior of this new
> >>>> board was consistent; it would transfer data two or three times, then
> >>>> hang. I checked the initialization EMIF stuff; the board was set to
> >>>> run at 500 MHz when it was a 1000 MHz board. So I turned up the speed.
> >>>> Behavior remained exactly the same as it had been; two or three
> >>>> correct transfers, then hang.
> >>>
> >>> I saw that Spectrum Digital's C6416 EVM does not have a PCI connector,
> >>> does it? So you needed to to wire its on board connector into an
> >>> external PCI connector, right? Perhaps this on-board connector hass
> >>> muxed GPIO and PCI pins - this is probably is what Jeff was asking, if
> >>> I got him correct.
> >>
> >> The brand new board was designed by the customer. Correct??
> >> The Spectrum Digital board was probably using HPI as opposed to the
> >> stated 'GPIO'. Correct??
> >>
> >> Just trying to get calibrated.
> >
> > No, the new board was another Spectrum DIgital board. The GPIO connector
> > provided on that board can be turned into a PCI connector by the addition
> > of a couple of jumpers.
> >
> > Turns out that there is some problem in the PCI interface. After
> > installing the new board, my software drivers worked perfectly on both
> > sides of the interface. This lasted about two days, then I started
> > seeing the same symptoms on the new board that I had been seeing on the
> > old one. This has to be a hardware issue, and the only possibility that
> > I or my client can see is some kind of mismatch on that bus that is
> > causing a failure.
> >
> > It seems that my client was wiring these jumpers in place to pull up some
> > pins on the GPIO bus, making it a PCI bus, by wiring directly to the
> > +3.3V power line without using a pullup resistor. I propose to him that
> > this lack of a resistor (and the consequent lack of current limiting on
> > that pullup line). However, I have not tested for that (he is testing for
> > that now) and I am not at all certain that this is right.
> >
> > It also seems that we have uncovered a bug in the TI diagnostic program;
> > the original "symptom" we were seeing involved memory errors, accompanied
> > by a failure of Flash indicated by the TI diagnostic software. Seems
> > that TI now claims that this indicated failure is spurious because they
> > were not properly testing the case where the GPIO bus had been configured
> > as a PCI bus, and one pin that floats on the GPIO bus was being pulled
> > for the PCI case, resulting in a spurious indication of failure.
> >
> > That may be true, but Code Composer is mature enough that I am a bit
> > skeptical.
> >
> >>>> Now, while working with the previous board, I had added a number of
> >>>> code hacks while trying to deal with the inconsistency in behavior,
> >>>> before finally determining absolutely that mine was a hardware
> >>>> problem.
> >>>>
> >>>> So, I removed these hacks. Instantly, the thing ran exactly the way I
> >>>> wanted it to run.
> >>>>
> >>>> My PCI DMA Master code is derived from the asynch_pcitest code
> >>>> provided with the driver development environment, and described in
> >>>> spru616.pdf, and it now works...almost.
> >>>
> >>> I assume that your DSP-side code does master reads from the host, is it
> >>> close to the truth?
> >>>
> >>>> Well, it DOES work, but there is a bug which will become a problem
> >>>> eventually (though I'll get through my demo on Monday).
> >>
> >> Its not Monday any more, I wonder how the demo went... :-)
> >
> > My demo to my client went as planned. His demo to his client (who is
> > funding the whole thing) got postponed.
> >
> >>>> The one problem I still have that I know about is this. My Linux
> >>>> driver (written from scratch) triggers an interrupt to the DSP to tell
> >>>> it to do a DMA transfer. The DSP reconfigures the interrupts and
> >>>> registers as required (the async code does this), transfers the data,
> >>>> then my program reconfigures the interrupts to what I need and sends
> >>>> an interrupt to the Linux system telling it the data is transferred.
> >>>
> >>> Can you be more specific here? I am not certain what do you mean by
> >>> reconfiguring interrupts. Does the DSP-side program remaps interrupts
> >>> on the fly? What do you mean by DMA transfer? Is it simply a PCI Master
> >>> read or you've set up a separate DMA transfer to move the data that had
> >>> been read by the PCI controller into an input buffer from this
> >>> intermediate buffer into a final destination?
> >>
> >> I agree with Andrew. It is not clear what you are trying to accomplish.
> >
> > Basically, I am integrating both sides of this interface. On the Linux
> > side, I wrote a driver from the ground up, and it works. On the DSP
> > side, I am unfamiliar with the environment and needed something working
> > pretty quickly.
> >
> > I need a program on the DSP side which will function as a PCI bus master
> > and do a DMA transfer of arbitrary size to a specified buffer on the
> > Linux host. When I say "arbitrary size", that is what I mean; presently I
> > have provided a single 4 meg buffer on the Linux host but intend to add
> > another one, and I am using two 4 Meg pages on the DSP SDRAM as data
> > buffers, and my ultimate goal is to double-buffer the DMA transfer, and
> > ping-pong back and forth between the buffers for an indefinite period of
> > time with the DSP doing data processing on the buffer which is NOT
> > currently being transferred and the Linux host emptying the buffer on its
> > side of the interface which is NOT currently being filled.
> >
> > There is a demo program called async_pci that is provide by TI and uses
> > the async extensions to the GIO PCI interface, as described in
> > SPRU616.pdf. I started with that demo program, chopped out the demo
> > portions that I didn't need, added an interrupt handler so I could
> > trigger it from Linux, put in the necessary semaphores and so forth, and
> > ran with it.
> >
> > The asynchronous capability of this driver is particularly appealing to
> > me; I gain the capability to queue up 64K transfers (the maximum the
> > hardware will do in one burst) up to the size of the 4 meg buffer and
> > have them run off with minimal CPU intervention until the entire buffer
> > is transferred. This frees the CPU for other things.
> >
> > This demo program uses a PCI IOM "mini-driver" named c64xx_pci.c (and an
> > associated .h file), which is also provided by TI. I have found and
> > fixed three bugs in this code - it simply didn't work right as shipped.
> > This mini-driver is invoked from a library (CSL library) provided by TI
> > for which I do not have the source. Thus, my calls to, for instance,
> > ASYNC_write get translated into GIO calls, then pass through the CSL
> > library and reappear in c64xx_pci.c. I can, of course, do source level
> > debugging in my top-level code and in c64xx_pci.c, but not in the
> > intermediate level which is a library.
> >
> > I can read/write assembler, but given this is object code on a platform
> > which I am just learning, sorting out exactly what it is doing is a very,
> > very time consuming process.
> >
> > Now, I configure interrupts so that INT13 is my Host to DSP interrupt,
> > which I use to trigger an interrupt handler which then kicks off my DMA
> > transfer code, which is heavily derived from the async_pci demo code.
> > Over the course of the GIO calls, routines in c64xx_pci.c are repeatedly
> > invoked, and some of those routines reconfigure the interrupt structure
> > in a fashion that is apparently required by the ASYNC system. The
> > interrupts are apparently not properly restored to their original state
> > when the ASYNC system is finished, and this requires me to reconfigure
> > them when my program is done with all transfers.
> >
> > After all transfers for this buffer are completed, my driver then sets a
> > DSP to Host interrupt so that the Linux host knows a buffer has been
> > filled (which may or may not be the end of the transfer, but Linux has to
> > empty that buffer).
> >
> > All of this appears to be working, but I am getting some counter not
> > being reset, and as a consequence the DSP to Host interrupt is getting
> > set N times, where N is the number of times my DSP driver has gone
> > through the main transfer loop since the last time the DSP was powered
> > up.
> >
> > Why is this happening? I have no idea. I actually trigger the interrupt
> > with the PCI_dspIntReqSet() routine out of the CSL library.
> >
> > Actually, the entire segment of code which resets the interrupts to the
> > way I want them, then triggers the interrupt is this:
> >
> > IRQ_disable(IRQ_EVT_DSPINT);
> > IRQ_map(IRQ_EVT_DSPINT, 13);
> > IRQ_enable(IRQ_EVT_DSPINT);
> > *datastatusregister = *datastatusregister | 0x2;
> > PCI_dspIntReqClear();
> > PCI_dspIntReqSet();
> >
> > This happens at the bottom of the main processing loop and immediately
> > after this the program branches back to the top of that loop and sleeps
> > on a semaphore waiting for the next interrupt from the host.
> >
> >>>> Now, my Linux driver expects to share the interrupt, and if the
> >>>> interrupt really is for it, it tests to see if the interrupt is valid.
> >>>> If the interrupt is not valid, it writes a message into the log saying
> >>>> that it received an interrupt for an unknown reason, then clears the
> >>>> interrupt and takes no action.
> >>>
> >>> To share the interrupt with whom? Do you mean that there are several
> >>> PCI devices on your host machine, or there are several processes on the
> >>> DSP side that may assert the PCI interrupt? How does your host-side
> >>> code decides on the interrupt is for it and is valid?
> >>
> >> Andrew,
> >> This is part of the PCI spec. Two separate PCI drivers must be able
> >> to share a single interrupt. I don't know if you were around in the
> >> early days of PCs transistioning to PCI. You sometimes [dpending on
> >> interrupt assignments] couldn't run certain "board combinations" [it
> >> was actually driver combinations].
> >>
> >> mikedunn
> >
> > Yes. Presently only the one device is using the interrupt, but I am not
> > willing to assume this will always be the case. My Linux driver
> > incorporates the code to process the interrupt to determine if the
> > interrupt really is for it, and if not it just passes it on.
> >
> >>> The problem looks for me like a sync problem. The code (both host and
> >>> DSP side) do not work in concert, do not synchronize their actions with
> >>> each other. Have you thought over the communication protocol, ready/not
> >>> ready /ack software signaling?
> >>>
> >>> Rgds,
> >>>
> >>> Andrew
> >
> > Oh my yes. I have taken over several SDRAM memory locations on the DSP
> > side of the interface for status information which is tested/set/cleared
> > as appropriate by the host or the dsp, and the whole thing is interrupt
> > driven, with tests on both sides for valid interrupts. The major problem
> > at this time is that for some unknown reason the DSP interrupts, and
> > interrupts, and interrupts. The Linux driver discards these spurious
> > interrupts, but at some point they'll slow the bus down enough that it
> > matters.
> >
> >>>> So, here is the bug. After loading the DSP program, the first time my
> >>>> Linux client program orders the DSP to do a DMA transfer, I get one
> >>>> message in the Linux log file saying that an interrupt was received
> >>>> for an unknown reason.
> >>>>
> >>>> The second time my client program orders a DMA transfer, I get two
> >>>> messages in the log file about an interrupt for an unknown reason
> >>>>
> >>>> The tenth time, I get ten messages in the log (actually I get one
> >>>> message, then another message which says: "the previous message was
> >>>> repeated 9 times").
> >>>>
> >>>> The 100th time, 100 messages.
> >>>>
> >>>> What this means is that something someplace in the DSP is counting and
> >>>> not clearing, and it is setting NOT EMIFA as many times as it has in
> >>>> its count. Eventually, this is going to slow down the PCI bus to a
> >>>> noticeable degree.
> >>>>
> >>>> Given that I am using the async demo code (so far, I might add, I have
> >>>> found and fixed no fewer than three bugs in that code) as the basis
> >>>> for the DSP side of my PCI interface driver, does anyone have any idea
> >>>> what is counting?
> >>>>
> >>>> On Wednesday 24 October 2007 12:08:22 you wrote:
> >>>>> Jim,
> >>>>>
> >>>>> On 10/24/07, jim wrote:
> >>>>>> On Wednesday 24 October 2007 11:06:48 you wrote:
> >>>>>>> Jim,
> >>>>>>>
> >>>>>>> On 10/24/07, jim wrote:
> >>>>>>>> Well, I did find at least part of my problem.
> >>>>>>>>
> >>>>>>>> There seems to be a hardware issue with the memory address/refresh
> >>>>>>>> logic. I've been wrestling with inconsistent behavior, and while I
> >>>>>>>> have strongly suspected hardware, that isn't an easy call to make
> >>>>>>>> particularly when programming at such a low level.
> >>>>>>>
> >>>>>>> Make sure that you go through the EMIF setup parameters, clocks,
> >>>>>>> etc. to be sure that it is setup correctly. If you used the
> >>>>>>> 'delivered EMIF settings' and changed any of the clock
> >>>>>>> configuration, your refresh rate could be too slow.
> >>>>>>>
> >>>>>>> mikedunn
> >>>>>>
> >>>>>> I will certainly look at that. I would not have thought it would be
> >>>>>> possible to configure the refresh rate from outside. It is certainly
> >>>>>> not something I would have ever looked for.
> >>>>>
> >>>>> I am not sure what you mean by "from outside" [from outside the
> >>>>> RAM??]. Hopefully your [or someone's] low level initialization code
> >>>>> is taking care of the EMIF setup. Are you using DSP/BIOS?? or some
> >>>>> other executive??
> >>>>>
> >>>>>> That actually is a possibility???
> >>>>>
> >>>>> Yes, and I have the "have I lost my mind??" experience and the gray
> >>>>> hair to prove it. I abstain from pulling my hair out :-)
> >>>>>
> >>>>>>>> However, I can now document some random changes in program code
> >>>>>>>> (which cannot be accounted for by wild pointers), and using code
> >>>>>>>> composer, I watched the program change a local unsigned int
> >>>>>>>> variable (which it was supposed to do), which also caused the next
> >>>>>>>> local unsigned int variable on the same stack to change as well to
> >>>>>>>> the same value (which should be impossible). The variable that was
> >>>>>>>> supposed to be changed was a return from an exec function
> >>>>>>>> (HWI_disable), so if this change is due to some pointer problem,
> >>>>>>>> the problem must be in the HWI_disable function.
> >>>>>>>>
> >>>>>>>> This accounts for a lot of things, and I still have some hair
> >>>>>>>> left, having not pulled it all out.
> >>>>>>>>
> >>>>>>>> On Tuesday 23 October 2007 11:42:31 you wrote:
> >>>>>>>>> Hi Jim,
> >>>>>>>>>
> >>>>>>>>> Please look at the page 77 of SPRU581C.pdf, on the bit 4 INTRST
> >>>>>>>>> of RSTSRC. It says that "This bit must be asserted before another
> >>>>>>>>> host interrupt can be generated."
> >>>>>>>>>
> >>>>>>>>> Next, it would help to parse all the bits in the PCIIS (even if
> >>>>>>>>> they are all disabled in the PCIIEN) and clear them in the PCI
> >>>>>>>>> ISR.
> >>>>>>>>>
> >>>>>>>>> Third, it is recommended to parse an interrupt source register
> >>>>>>>>> (any of them, not only the PCIIS) in a loop inside an ISR and
> >>>>>>>>> clear any set bits until the register becomes zero, e.g. for the
> >>>>>>>>> PCI:
> >>>>>>>>>
> >>>>>>>>> volatile uint32 temp;
> >>>>>>>>>
> >>>>>>>>> while (temp = PCIIS) // this reads the PCIIS
> >>>>>>>>> {
> >>>>>>>>> test if bit[0], ..., [n] is set
> >>>>>>>>> if set, clear bit[0], ... [n] and perform the necessary
> >>>>>>>>> actions, e.g: if bit[3] HOSTSW was set then clear bit[4] INTRST
> >>>>>>>>> in RSTSRC }
> >>>>>>>>>
> >>>>>>>>> // PCIIS is clear now, exit the ISR
> >>>>>>>>>
> >>>>>>>>> Hope this helps,
> >>>>>>>>>
> >>>>>>>>> Andrew
> >>>>>>>>>
> >>>>>>>>>> 11a. ISR falls off cliff...sometimes.
> >>>>>>>>>> Posted by: "jim" j...@justsosoftware.com jiml8
> >>>>>>>>>> Date: Mon Oct 22, 2007 8:42 pm ((PDT))
> >>>>>>>>>>
> >>>>>>>>>> I have defined an ISR that responds to an interrupt on the PCI
> >>>>>>>>>> bus from a host. The current code for this ISR is this:
> >>>>>>>>>>
> >>>>>>>>>> interrupt void DMAtoHost(void)
> >>>>>>>>>> {
> >>>>>>>>>> unsigned int intval,*datastatusregister;
> >>>>>>>>>> puts("interrupted\n");
> >>>>>>>>>> datastatusregister = (unsigned int *)DATASTATUSREGISTER;
> >>>>>>>>>> intval = *datastatusregister & 0x04;
> >>>>>>>>>> if(intval == 4) { /* is this a host interrupt */
> >>>>>>>>>> intval = *datastatusregister;
> >>>>>>>>>> intval = intval & 0xfffffffb;
> >>>>>>>>>> *datastatusregister = intval; /* if so clear it */
> >>>>>>>>>> SEM_post(HostDMASem);
> >>>>>>>>>> puts("resetting PCIIS\n");
> >>>>>>>>>> }
> >>>>>>>>>> PCI_RSET(PCIIS,0x00000008);
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> Now, DATASTATUSREGISTER is a memory location on the DSP SDRAM
> >>>>>>>>>> that I have taken over for tracking status information between
> >>>>>>>>>> my Linux driver and my DSP interface. Here I am testing a flag
> >>>>>>>>>> in that register to make sure that this interrupt really was set
> >>>>>>>>>> by the host (my linux driver both sets the interrupt and sets
> >>>>>>>>>> this flag to tell that it did it), and if so, I clear that
> >>>>>>>>>> location as well as clearing the PCIIS register.
> >>>>>>>>>>
> >>>>>>>>>> Basically, if this really was an interrupt from the host, this
> >>>>>>>>>> ISR posts to a semaphore called HostDMASem, and there is a task
> >>>>>>>>>> sleeping on this semaphore waiting to be told to transfer data.
> >>>>>>>>>>
> >>>>>>>>>> I have defined this ISR statically using the Code Composer
> >>>>>>>>>> configuration tool, and it is hooked to interrupt 13.
> >>>>>>>>>>
> >>>>>>>>>> I have tried various ways to set that semaphore; my current
> >>>>>>>>>> iteration has the semaphore defined statically using the
> >>>>>>>>>> configuration tool, and it is initialized as one of the very
> >>>>>>>>>> first things done in the main routine of the program when it
> >>>>>>>>>> starts, like this:
> >>>>>>>>>>
> >>>>>>>>>> SEM_new(HostDMASem,0);
> >>>>>>>>>>
> >>>>>>>>>> That main task then spawns a new task called buildall_tsk, which
> >>>>>>>>>> is the task which winds up sleeping on the semaphore. This task
> >>>>>>>>>> arrives at the semaphore through a subroutine call, like this:
> >>>>>>>>>>
> >>>>>>>>>> void WaitForDMA(void)
> >>>>>>>>>> {
> >>>>>>>>>> Bool semstatus;
> >>>>>>>>>> SEM_reset(HostDMASem,0);
> >>>>>>>>>> semstatus = SEM_pend(HostDMASem,SYS_FOREVER);
> >>>>>>>>>> if(!semstatus){puts("semaphore bombed\n");} else
> >>>>>>>>>> {puts("semaphore worked\n");}
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> My problem is this. When an interrupt from the host is set, this
> >>>>>>>>>> ISR invokes apparently correctly. However, sometimes when it
> >>>>>>>>>> reaches the end, it falls off the cliff (apparently the
> >>>>>>>>>> semaphore is not being recognized properly or some such) and
> >>>>>>>>>> nothing happens.
> >>>>>>>>>>
> >>>>>>>>>> Further, when the routine falls off the cliff, subsequent
> >>>>>>>>>> interrupts are apparently ignored; when the routine falls off
> >>>>>>>>>> the cliff, it doesn't get invoked again until I completely reset
> >>>>>>>>>> the system (which sometimes involves cycling power as well as
> >>>>>>>>>> restarting Code Composer.
> >>>>>>>>>>
> >>>>>>>>>> This seems to be erratic and I can't define specific conditions
> >>>>>>>>>> that cause it or not. I suspecte some initialization thing that
> >>>>>>>>>> Code Composer is doing, but I have no idea what.
> >>>>>>>>>>
> >>>>>>>>>> The card is a Spectrum Digital 6416 card.
> >>>>>>>>>>
> >>>>>>>>>> I am tearing my hair out over this, and trust me; I don't look
> >>>>>>>>>> good bald. Anyone here have any idea what is going on?
On Tuesday 06 November 2007 10:48:11 you wrote:
> On 11/6/07, jim wrote:
> > Basically, it seems that many of the 64K transfers are incomplete. Not
> > all of
> > them, but (depending on how I set my configuration) as many as 9 out of
> > 10 -
> > actually every tenth one is complete - or with other configurations, 2
> > out of
> > 3 are short. I don't know for sure if this is hardware or a race
> > condition
> > in the software (there shouldn't be ANY races), but a network analyzer
> > will
> > answer the question for me.
>
> I am going to respond to just this part since I have seen similar symptoms
> with a custom DM642 board. The thing I would see on the PC side when
> transfering data from the DSP to the PC main memory though pci would be
> that the last transfer seemed to be working fine but the preceding ones
> would only show small sections of the begining done. The problem ended up
> being a problem with starting more than one pci master transaction from the
> DSP without waiting correctly for it to complete, then sending another
> transfer request from the DSP.
>
> I hope this might help you diagnose your particular problem.
>
> Gregory Lee

I continue to diagnose, but I have formed the opinion that this is indeed the
case.

However, all of that is being handled (supposedly) by code provided by TI,
using the core of a demo provided by TI that was supposed to do these things.
So either the TI code is broken or my understanding of it is faulty, or (more
likely) both.