DSPRelated.com
Forums

Reload issue with SummitICE on 21161

Started by chris_tieline June 12, 2003
We have been observing some strange behaviour on our hardware
platform when reloading an image using the ICE.

The board is booting a simple application from FLASH on
powerup/hardware reset (to configure an FPGA). When VisualDSP is
loaded, everything appears normal- we can read memory from all
external bus peripherals, including the SDRAM.

Problem is, *sometimes* when a new image is loaded, the nRD and nWR
strobes become inactive for external bus accesses on MS1, MS2 and
MS3. I verified this with a scope. Strange thing is, the SDRAM can
still be accessed(ie its not the BS0 bit- that kills SDRAM as well).

The application executes from internal memory, and SDRAM without any
problems, except that accesses to external memory banks other than 0
do not function.

The condition can be cleared by performing a hardware reset, and
reloading- a few times. Eventually it loads OK without screwing up
the rd and wr strobe function.

I have confirmed that it is not a hardware conflict issue- the only
peripheral capable of driving onto the rd and wr lines is the FPGA.
I forced it into unconfigured more to confirm that it was not
driving out accidently, but the problem persisted. The strobes never
hint at being forced externally- the level is always rock steady 3V3
with no dips.

I checked the PSU rails- they are clean. We did find that the target
platform has to be well-grounded externally to the PC (via a strap)
for the ICE to work 'reliably'. Initially our target ran from a
floating PSU. The ICE clearly doesn't handle this.

I tried looking for errata, but AD have changed their website
(again) to make things even harder for us to use their products
effectively.

Has anyone else had similar experiences, or found a solution? We
have been accepting this as another AD "feature", but its a real
pain having to spend a few minutes reloading software to get it
working when it decides to have a bad day.

cheers,

Chris



On Thu, 12 Jun 2003, chris_tieline wrote:

> We have been observing some strange behaviour on our hardware
> platform when reloading an image using the ICE.
>
> The board is booting a simple application from FLASH on
> powerup/hardware reset (to configure an FPGA). When VisualDSP is
> loaded, everything appears normal- we can read memory from all
> external bus peripherals, including the SDRAM.
>
> Problem is, *sometimes* when a new image is loaded, the nRD and nWR
> strobes become inactive for external bus accesses on MS1, MS2 and
> MS3. I verified this with a scope. Strange thing is, the SDRAM can
> still be accessed(ie its not the BS0 bit- that kills SDRAM as well).

What are your DMA lines hooked up to? Do you use the DMA engine at all?

> The application executes from internal memory, and SDRAM without any
> problems, except that accesses to external memory banks other than 0
> do not function.
>
> The condition can be cleared by performing a hardware reset, and
> reloading- a few times. Eventually it loads OK without screwing up
> the rd and wr strobe function.
>
> I have confirmed that it is not a hardware conflict issue- the only
> peripheral capable of driving onto the rd and wr lines is the FPGA.
> I forced it into unconfigured more to confirm that it was not
> driving out accidently, but the problem persisted. The strobes never
> hint at being forced externally- the level is always rock steady 3V3
> with no dips.

I have locked up the rd and wr lines with the dma engine. Reset always
clears it. I could fix the problem by attaching a logic analyzer probe
to the dmar line (which kind of tells me the problem is dirty signals,
but sure doesn't help find them!!)

> I checked the PSU rails- they are clean. We did find that the target
> platform has to be well-grounded externally to the PC (via a strap)
> for the ICE to work 'reliably'. Initially our target ran from a
> floating PSU. The ICE clearly doesn't handle this.
>
> I tried looking for errata, but AD have changed their website
> (again) to make things even harder for us to use their products
> effectively.

Great engineering is not great marketing. We'd all be rich if it
were :-)

> Has anyone else had similar experiences, or found a solution? We
> have been accepting this as another AD "feature", but its a real
> pain having to spend a few minutes reloading software to get it
> working when it decides to have a bad day.

Unfortunatly I gave up on the dma and used irq's instead. Check the
timing on the load signals, check the levels on the dmarx and dmagx
lines, and might as well check the host port lines too. You may have
just one line floating someplace, and if it floats the wrong way the
whole system locks up. I've seen it happen but in a totally different
way. Definitly a pain in the butt.

Patience, persistence, truth,
Dr. mike



Hi Mike,

Thanks for the info. The lack of read or write strobe suggests that its
going into a 'slave' mode, which could happen if either DMARx lines
assert. These were floating (internal pull-ups too weak?) so I tried
tying them directly to VCC- no difference.

I went through every IO pin and made sure they were directly connected
to VCC if they were input-only, and pulled the rest up with 10K where an
internal pullup resistor was not specified.

It appears to end up in several different states each time the program
is loaded through JTAG. One of these treats the entire external memory
space as a single register (?!!). Chip selects appear to be functioning
correctly, but the data does not get through. Ie I write a value
anywhere, and any address in external memory reads that value back.
SDRAM and internal memory still functions normally. Its as if the
external memory controller is disconnected from the internal bus.

Another weird condition just stops all accesses to external memory- no
read or write strobes. The strange thing is it appears to read the last
value that was read from external memory before the JTAG load operation.

The only way it can recover is if I physically repower the DSP. The
hardware reset does not fix it, even if I pull off the emulator as well.

Could it be that the JTAG data is getting corrupted, and the dodgy data
is putting the DSP into some weird, unrecoverable state? The summitICE
emulators we use have always been notoriously unreliable when it comes
to getting a connection. This does not inspire much confidence.

This problem seems closely related to the size of the executable. Small
projects seem to have less problems. Large ones very rarely succeed.

My thoughts are that it might be a JTAG problem. Hard to confirm though!
The only difference between this and other platforms using 21161 is the
distance of the connector to the DSP. Ours is a whole 2" (5cm) away!
Anyone know of any issues with this?

It still hasn't crashed when booting from FLASH.

Regards,

Chris Our platform has a single DSP running on ID 1. All inputs are either
pulled up or rely on the internal pull-ups.

-----Original Message-----
From: Mike Rosing [mailto:]
Sent: Thursday, June 12, 2003 9:36 PM
To: chris_tieline
Cc:
Subject: Re: [adsp] Reload issue with SummitICE on 21161

On Thu, 12 Jun 2003, chris_tieline wrote:

> We have been observing some strange behaviour on our hardware
> platform when reloading an image using the ICE.
>
> The board is booting a simple application from FLASH on
> powerup/hardware reset (to configure an FPGA). When VisualDSP is
> loaded, everything appears normal- we can read memory from all
> external bus peripherals, including the SDRAM.
>
> Problem is, *sometimes* when a new image is loaded, the nRD and nWR
> strobes become inactive for external bus accesses on MS1, MS2 and
> MS3. I verified this with a scope. Strange thing is, the SDRAM can
> still be accessed(ie its not the BS0 bit- that kills SDRAM as well).

What are your DMA lines hooked up to? Do you use the DMA engine at all?

> The application executes from internal memory, and SDRAM without any
> problems, except that accesses to external memory banks other than 0
> do not function.
>
> The condition can be cleared by performing a hardware reset, and
> reloading- a few times. Eventually it loads OK without screwing up
> the rd and wr strobe function.
>
> I have confirmed that it is not a hardware conflict issue- the only
> peripheral capable of driving onto the rd and wr lines is the FPGA.
> I forced it into unconfigured more to confirm that it was not
> driving out accidently, but the problem persisted. The strobes never
> hint at being forced externally- the level is always rock steady 3V3
> with no dips.

I have locked up the rd and wr lines with the dma engine. Reset always
clears it. I could fix the problem by attaching a logic analyzer probe
to the dmar line (which kind of tells me the problem is dirty signals,
but sure doesn't help find them!!)

> I checked the PSU rails- they are clean. We did find that the target
> platform has to be well-grounded externally to the PC (via a strap)
> for the ICE to work 'reliably'. Initially our target ran from a
> floating PSU. The ICE clearly doesn't handle this.
>
> I tried looking for errata, but AD have changed their website
> (again) to make things even harder for us to use their products
> effectively.

Great engineering is not great marketing. We'd all be rich if it
were :-)

> Has anyone else had similar experiences, or found a solution? We
> have been accepting this as another AD "feature", but its a real
> pain having to spend a few minutes reloading software to get it
> working when it decides to have a bad day.

Unfortunatly I gave up on the dma and used irq's instead. Check the
timing on the load signals, check the levels on the dmarx and dmagx
lines, and might as well check the host port lines too. You may have
just one line floating someplace, and if it floats the wrong way the
whole system locks up. I've seen it happen but in a totally different
way. Definitly a pain in the butt.

Patience, persistence, truth,
Dr. mike


On Fri, 13 Jun 2003, Chris Lockwood wrote:

> Thanks for the info. The lack of read or write strobe suggests that its
> going into a 'slave' mode, which could happen if either DMARx lines
> assert. These were floating (internal pull-ups too weak?) so I tried
> tying them directly to VCC- no difference.
>
> I went through every IO pin and made sure they were directly connected
> to VCC if they were input-only, and pulled the rest up with 10K where an
> internal pullup resistor was not specified.

It was a good try anyway!

> It appears to end up in several different states each time the program
> is loaded through JTAG. One of these treats the entire external memory
> space as a single register (?!!). Chip selects appear to be functioning
> correctly, but the data does not get through. Ie I write a value
> anywhere, and any address in external memory reads that value back.
> SDRAM and internal memory still functions normally. Its as if the
> external memory controller is disconnected from the internal bus.

Either that or the internal emulator functions are messed up. The
px register is used by the emulator to transfer data to the jtag,
as well as external memory. If the address decoder is locked up,
the px register is all you can see.

> Another weird condition just stops all accesses to external memory- no
> read or write strobes. The strange thing is it appears to read the last
> value that was read from external memory before the JTAG load operation.
>
> The only way it can recover is if I physically repower the DSP. The
> hardware reset does not fix it, even if I pull off the emulator as well.

The internal emulator is screwed. If you can replicate it, it's worth
talking to the ADI support guys about it. It may be the current supply
or it may be a bug in the emulator.

> Could it be that the JTAG data is getting corrupted, and the dodgy data
> is putting the DSP into some weird, unrecoverable state? The summitICE
> emulators we use have always been notoriously unreliable when it comes
> to getting a connection. This does not inspire much confidence.

I don't think the data is corrupt, I think the internal logic of the
emulator is messed up some how.

> This problem seems closely related to the size of the executable. Small
> projects seem to have less problems. Large ones very rarely succeed.

That was my experience with the dma lock up too. That could be a current
limit problem. What's the power supply look like during the loading?
Maybe look at each Vcc or Vdd and see if they are all rock solid.

> My thoughts are that it might be a JTAG problem. Hard to confirm though!
> The only difference between this and other platforms using 21161 is the
> distance of the connector to the DSP. Ours is a whole 2" (5cm) away!
> Anyone know of any issues with this?

Short cables are good, short lines are good too. 5cm shouldn't be much
of a problem, especially if you have ground planes.

> It still hasn't crashed when booting from FLASH.

That's because the emulator logic isn't invoked. If your power supplys
look good on every pin, then start talking with ADI support. They may
have a work around for a known bug already.

Patience, persistence, truth,
Dr. mike



Hi Mike,

Thanks for your help so far, but we still haven't solved the problem.
Its been a week of (expensive) hell! I contacted AD the other day and
provided them with info. Still haven't heard back from them (apart from
the automated response).

Since last post-
1)Supplies are OK (within the 100mV noise floor of our HP cro) at all
times.

2)I added another 10 decoupling caps under the DSP just to be sure! No
difference.

3)The problem is not the emulator. It happens without the emulator ever
being attached after a power-up, with the JTAG lines jumpered to ground,
booting from FLASH. The problem actually happens consistently with some
applications during the bootload operation's copying phase. The problem
is related to the data being moved around.

4)The problem is marginal at times. It ran for over 12 hours last night,
with a full TCP stack pinging away. It runs entirely from SDRAM-
program, stack, and heap are all in SDRAM for this module. If it runs
from SDRAM for 12 or more hours, I doubt it's a bus timing problem.
Besides- this should not cause DSP silicon to lock up. The code was
loaded through the emulator, set running, and then JTAG pins jumpered to
ground.

5) The bootloader DMAs 6 bytes at a time from FLASH using the BMS line,
packs this into 48 bits, then writes to SDRAM (48 bits wide). The
strobes die at different places in the operation, but ALWAYS after a
successful DMA transfer (ie 6-byte boundaries are OK). All subsequent
DMA transfers copy the same value- being the last byte successfully read
from FLASH, indicating the point where the strobes died.

6) The last data value successfully written to SDRAM before it dies
always has mostly 0xFs in it (ie 0xXXXXFFFFFFXX). The preceding value
written to SDRAM is mostly 0x0s (ie 0xXXXX000000XX). It happens at
different parts of the image with this pattern- this pattern doesn't
always cause it to fail, but every failure happens with this pattern.
Coincidence?

7) During bootload, the FPGA is in unconfigured state- it does not pull
any of the bus lines. The only other device attached to the strobes is
the FLASH. The data and address lines are hooked to the SDRAM, FLASH,
FPGA(inactive), and the upper 16 bits to a LAN device (input only on
these pins).

Any other ideas?????

Next is to try a different revision of the silicon. We are using 1.1-
both prototypes use devices from the same tray. We have a few old protos
with 1.0, and we never saw this problem running very similar code, and
similar hardware design (different layout). Not too keen on trying to
recycle a BGA device- we have no new 1.0s.

Chris -----Original Message-----
From: Mike Rosing [mailto:]
Sent: Friday, June 13, 2003 11:28 PM
To: Chris Lockwood
Cc:
Subject: RE: [adsp] Reload issue with SummitICE on 21161

On Fri, 13 Jun 2003, Chris Lockwood wrote:

> Thanks for the info. The lack of read or write strobe suggests that
its
> going into a 'slave' mode, which could happen if either DMARx lines
> assert. These were floating (internal pull-ups too weak?) so I tried
> tying them directly to VCC- no difference.
>
> I went through every IO pin and made sure they were directly connected
> to VCC if they were input-only, and pulled the rest up with 10K where
an
> internal pullup resistor was not specified.

It was a good try anyway!

> It appears to end up in several different states each time the program
> is loaded through JTAG. One of these treats the entire external memory
> space as a single register (?!!). Chip selects appear to be
functioning
> correctly, but the data does not get through. Ie I write a value
> anywhere, and any address in external memory reads that value back.
> SDRAM and internal memory still functions normally. Its as if the
> external memory controller is disconnected from the internal bus.

Either that or the internal emulator functions are messed up. The
px register is used by the emulator to transfer data to the jtag,
as well as external memory. If the address decoder is locked up,
the px register is all you can see.

> Another weird condition just stops all accesses to external memory- no
> read or write strobes. The strange thing is it appears to read the
last
> value that was read from external memory before the JTAG load
operation.
>
> The only way it can recover is if I physically repower the DSP. The
> hardware reset does not fix it, even if I pull off the emulator as
well.

The internal emulator is screwed. If you can replicate it, it's worth
talking to the ADI support guys about it. It may be the current supply
or it may be a bug in the emulator.

> Could it be that the JTAG data is getting corrupted, and the dodgy
data
> is putting the DSP into some weird, unrecoverable state? The summitICE
> emulators we use have always been notoriously unreliable when it comes
> to getting a connection. This does not inspire much confidence.

I don't think the data is corrupt, I think the internal logic of the
emulator is messed up some how.

> This problem seems closely related to the size of the executable.
Small
> projects seem to have less problems. Large ones very rarely succeed.

That was my experience with the dma lock up too. That could be a
current
limit problem. What's the power supply look like during the loading?
Maybe look at each Vcc or Vdd and see if they are all rock solid.

> My thoughts are that it might be a JTAG problem. Hard to confirm
though!
> The only difference between this and other platforms using 21161 is
the
> distance of the connector to the DSP. Ours is a whole 2" (5cm) away!
> Anyone know of any issues with this?

Short cables are good, short lines are good too. 5cm shouldn't be much
of a problem, especially if you have ground planes.

> It still hasn't crashed when booting from FLASH.

That's because the emulator logic isn't invoked. If your power supplys
look good on every pin, then start talking with ADI support. They may
have a work around for a known bug already.

Patience, persistence, truth,
Dr. mike
_____________________________________
Note: If you do a simple "reply" with your email client, only the author
of this message will receive your answer. You need to do a "reply all"
if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to

To Post: Send an email to

To Leave: Send an email to

Archives: http://groups.yahoo.com/group/adsp

Other Groups: http://www.dsprelated.com/groups.php3 ">http://docs.yahoo.com/info/terms/




Hi Chris,

we had a problem with SDRAM access in that the DSP's minimum SDRAM data
input hold time tHDSDK was larger than expected, see anomaly #31 of the
ADSP21161 anomaly sheet.

This resulted in corrupt data being read from the SDRAM, but only when the
reading was in consecutive cycles.

The solution, approved by ADI, was to add a small capacitor on the SDRAM
clock line to ground, this skewed the clock and moved the time window for valid
data.

Our specific solution was to place the cap. (22pF, surface mount NP0) near to
the
SDRAM. As the anomaly sheet explains, for each board layout, the value and
location may need to be different.

Hope this helps,

Alex Young
DSP software Engineer
Consultant for Philips Digital Systems Laboratories
To: "'Mike Rosing'"
<>

cc: < (bcc: Alex
Young/LEU/PDSL/PHILIPS)

Subject: RE: [adsp] Reload
issue with SummitICE on 21161

"Chris Lockwood" < Classification: >


20/06/03 08:58
Hi Mike,

Thanks for your help so far, but we still haven't solved the problem.
Its been a week of (expensive) hell! I contacted AD the other day and
provided them with info. Still haven't heard back from them (apart from
the automated response).

Since last post-
1)Supplies are OK (within the 100mV noise floor of our HP cro) at all
times.

2)I added another 10 decoupling caps under the DSP just to be sure! No
difference.

3)The problem is not the emulator. It happens without the emulator ever
being attached after a power-up, with the JTAG lines jumpered to ground,
booting from FLASH. The problem actually happens consistently with some
applications during the bootload operation's copying phase. The problem
is related to the data being moved around.

4)The problem is marginal at times. It ran for over 12 hours last night,
with a full TCP stack pinging away. It runs entirely from SDRAM-
program, stack, and heap are all in SDRAM for this module. If it runs
from SDRAM for 12 or more hours, I doubt it's a bus timing problem.
Besides- this should not cause DSP silicon to lock up. The code was
loaded through the emulator, set running, and then JTAG pins jumpered to
ground.

5) The bootloader DMAs 6 bytes at a time from FLASH using the BMS line,
packs this into 48 bits, then writes to SDRAM (48 bits wide). The
strobes die at different places in the operation, but ALWAYS after a
successful DMA transfer (ie 6-byte boundaries are OK). All subsequent
DMA transfers copy the same value- being the last byte successfully read
from FLASH, indicating the point where the strobes died.

6) The last data value successfully written to SDRAM before it dies
always has mostly 0xFs in it (ie 0xXXXXFFFFFFXX). The preceding value
written to SDRAM is mostly 0x0s (ie 0xXXXX000000XX). It happens at
different parts of the image with this pattern- this pattern doesn't
always cause it to fail, but every failure happens with this pattern.
Coincidence?

7) During bootload, the FPGA is in unconfigured state- it does not pull
any of the bus lines. The only other device attached to the strobes is
the FLASH. The data and address lines are hooked to the SDRAM, FLASH,
FPGA(inactive), and the upper 16 bits to a LAN device (input only on
these pins).

Any other ideas?????

Next is to try a different revision of the silicon. We are using 1.1-
both prototypes use devices from the same tray. We have a few old protos
with 1.0, and we never saw this problem running very similar code, and
similar hardware design (different layout). Not too keen on trying to
recycle a BGA device- we have no new 1.0s.

Chris -----Original Message-----
From: Mike Rosing [mailto:]
Sent: Friday, June 13, 2003 11:28 PM
To: Chris Lockwood
Cc:
Subject: RE: [adsp] Reload issue with SummitICE on 21161

On Fri, 13 Jun 2003, Chris Lockwood wrote:

> Thanks for the info. The lack of read or write strobe suggests that
its
> going into a 'slave' mode, which could happen if either DMARx lines
> assert. These were floating (internal pull-ups too weak?) so I tried
> tying them directly to VCC- no difference.
>
> I went through every IO pin and made sure they were directly connected
> to VCC if they were input-only, and pulled the rest up with 10K where
an
> internal pullup resistor was not specified.

It was a good try anyway!

> It appears to end up in several different states each time the program
> is loaded through JTAG. One of these treats the entire external memory
> space as a single register (?!!). Chip selects appear to be
functioning
> correctly, but the data does not get through. Ie I write a value
> anywhere, and any address in external memory reads that value back.
> SDRAM and internal memory still functions normally. Its as if the
> external memory controller is disconnected from the internal bus.

Either that or the internal emulator functions are messed up. The
px register is used by the emulator to transfer data to the jtag,
as well as external memory. If the address decoder is locked up,
the px register is all you can see.

> Another weird condition just stops all accesses to external memory- no
> read or write strobes. The strange thing is it appears to read the
last
> value that was read from external memory before the JTAG load
operation.
>
> The only way it can recover is if I physically repower the DSP. The
> hardware reset does not fix it, even if I pull off the emulator as
well.

The internal emulator is screwed. If you can replicate it, it's worth
talking to the ADI support guys about it. It may be the current supply
or it may be a bug in the emulator.

> Could it be that the JTAG data is getting corrupted, and the dodgy
data
> is putting the DSP into some weird, unrecoverable state? The summitICE
> emulators we use have always been notoriously unreliable when it comes
> to getting a connection. This does not inspire much confidence.

I don't think the data is corrupt, I think the internal logic of the
emulator is messed up some how.

> This problem seems closely related to the size of the executable.
Small
> projects seem to have less problems. Large ones very rarely succeed.

That was my experience with the dma lock up too. That could be a
current
limit problem. What's the power supply look like during the loading?
Maybe look at each Vcc or Vdd and see if they are all rock solid.

> My thoughts are that it might be a JTAG problem. Hard to confirm
though!
> The only difference between this and other platforms using 21161 is
the
> distance of the connector to the DSP. Ours is a whole 2" (5cm) away!
> Anyone know of any issues with this?

Short cables are good, short lines are good too. 5cm shouldn't be much
of a problem, especially if you have ground planes.

> It still hasn't crashed when booting from FLASH.

That's because the emulator logic isn't invoked. If your power supplys
look good on every pin, then start talking with ADI support. They may
have a work around for a known bug already.

Patience, persistence, truth,
Dr. mike
_____________________________________
Note: If you do a simple "reply" with your email client, only the author
of this message will receive your answer. You need to do a "reply all"
if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to

To Post: Send an email to

To Leave: Send an email to

Archives: http://groups.yahoo.com/group/adsp

Other Groups: http://www.dsprelated.com/groups.php3 ">http://docs.yahoo.com/info/terms/

_____________________________________
Note: If you do a simple "reply" with your email client, only the author of this
message will receive your answer. You need to do a "reply all" if you want your
answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to

To Post: Send an email to

To Leave: Send an email to

Archives: http://groups.yahoo.com/group/adsp

Other Groups: http://www.dsprelated.com/groups.php3 ">http://docs.yahoo.com/info/terms/


On Fri, 20 Jun 2003, Chris Lockwood wrote:

> Thanks for your help so far, but we still haven't solved the problem.
> Its been a week of (expensive) hell! I contacted AD the other day and
> provided them with info. Still haven't heard back from them (apart from
> the automated response).
>
> Since last post-
> 1)Supplies are OK (within the 100mV noise floor of our HP cro) at all
> times.

That's pretty good.

> 2)I added another 10 decoupling caps under the DSP just to be sure! No
> difference.

That sure eliminates current feed from the outside. Might be a problem
inside the chip. Nothing you can do about it tho, that's ADI's problem.

> 3)The problem is not the emulator. It happens without the emulator ever
> being attached after a power-up, with the JTAG lines jumpered to ground,
> booting from FLASH. The problem actually happens consistently with some
> applications during the bootload operation's copying phase. The problem
> is related to the data being moved around.

So it's just the DMA engine itself. Ouch.

> 4)The problem is marginal at times. It ran for over 12 hours last night,
> with a full TCP stack pinging away. It runs entirely from SDRAM-
> program, stack, and heap are all in SDRAM for this module. If it runs
> from SDRAM for 12 or more hours, I doubt it's a bus timing problem.
> Besides- this should not cause DSP silicon to lock up. The code was
> loaded through the emulator, set running, and then JTAG pins jumpered to
> ground.

Something is right at the margin. What kind of heat sink is on the chip?
I can't imagine this would cause a heat load, or that heat should cause
a problem, but it's at least something to beat up ADI with.

> 5) The bootloader DMAs 6 bytes at a time from FLASH using the BMS line,
> packs this into 48 bits, then writes to SDRAM (48 bits wide). The
> strobes die at different places in the operation, but ALWAYS after a
> successful DMA transfer (ie 6-byte boundaries are OK). All subsequent
> DMA transfers copy the same value- being the last byte successfully read
> from FLASH, indicating the point where the strobes died.

So it's definitly the DMA engine, and only on a write of the full bus.

> 6) The last data value successfully written to SDRAM before it dies
> always has mostly 0xFs in it (ie 0xXXXXFFFFFFXX). The preceding value
> written to SDRAM is mostly 0x0s (ie 0xXXXX000000XX). It happens at
> different parts of the image with this pattern- this pattern doesn't
> always cause it to fail, but every failure happens with this pattern.
> Coincidence?

That's why I'm thinking it has to be a current related problem.
Flipping all those bits takes a lot of power, and I bet there's a
weird restriction in the copper layer someplace in the chip that prevents
all the registers from flipping, and that chokes the external port logic.
Given your supplies are ok, and all the extra caps, it sure sounds like
an internal problem on the chip.

> 7) During bootload, the FPGA is in unconfigured state- it does not pull
> any of the bus lines. The only other device attached to the strobes is
> the FLASH. The data and address lines are hooked to the SDRAM, FLASH,
> FPGA(inactive), and the upper 16 bits to a LAN device (input only on
> these pins).

It ain't external. It's internal. I don't think there's much more
you can do but cool it with water!

> Any other ideas?????
>
> Next is to try a different revision of the silicon. We are using 1.1-
> both prototypes use devices from the same tray. We have a few old protos
> with 1.0, and we never saw this problem running very similar code, and
> similar hardware design (different layout). Not too keen on trying to
> recycle a BGA device- we have no new 1.0s.

I would definitly write this up as a formal bug report for ADI. Get the
local FAE's and distributors involved too so you have some loud voices
on it. If you can change back to the previous revision and show the
bug goes away, it's pretty damn good proof they have an internal current
limit problem. I've never built chips, but I imagine it's a tough job
making sure everything gets enough current when all the bits have to
flip at once, and it sure seems like that's where you see the problem.

I doubt cooling will help, but what the hell, it's worth a try!

Patience, persistence, truth,
Dr. mike



--On Friday, June 20, 2003 6:52 AM -0700 Mike Rosing <>
wrote:

> That's why I'm thinking it has to be a current related problem.
> Flipping all those bits takes a lot of power, and I bet there's a
> weird restriction in the copper layer someplace in the chip that prevents
> all the registers from flipping, and that chokes the external port logic.
> Given your supplies are ok, and all the extra caps, it sure sounds like
> an internal problem on the chip.

Sounds like a promising diagnosis. You could try forcing this by creating an
array with alternating zero and -1 values. My guess is that the initial load
would fail shortly after hitting this.