DSPRelated.com
Forums

AW: problem with vdsp

Started by Andor Bariska April 12, 2000

-----Ursprgliche Nachricht-----
Von: Ganesan Ramachandran [mailto:]
Gesendet am: Freitag, 7. April 2000 11:31
An:
Betreff: [adsp] problem with vdsp >hi group,
...
>i've put my entire program code into external memory and defined the
seg_pmco >accordingly. is there anything to be taken care while using
external memory as program >memory?

Hi Ganesan,
putting your code into external memory is an unwise thing to do. At
best, you get half the nominal processor MIPS (that is, when your SDRAM
is optimally configured for the SHARC, and is capable of continuous full
speed data transfer), because of the 48bit instruction word width of the
SHARC (i.e. requiring two 32bit word transfers from memory). At worst,
you have a loop crossing SDRAM page boundaries, which requires about 8
or more cycles to fetch one instruction (depending on SDRAM make),
effectively reducing your SHARC to a 5MHz processor.

So if you run into timing problems due to processing speed, try to copy
the code internal PM memory and then run it from there.

I've written quite a couple of SHARC programs (in assembler), and the
largest of them required just a little more than 2k instruction words.
The 61 has 8k words instruction space, plenty if you program in
assembler. Maybe it would be worth it for you to switch from C.

Regards,
Andor Bariska WEISS ENGINEERING LTD. - Professional Digital Audio Products
Florastrasse 42 8610 Uster Switzerland
phone: +41 1 940 20 06, fax: +41 1 940 22 14
mailto: web: http://www.weiss.ch
You *can* afford the best



Hello,

Andor Bariska wrote:
> >hi group,
> ...
> > i've put my entire program code into external memory and defined
> > the
> > seg_pmco >accordingly. is there anything to be taken care while
> > using
> > external memory as program >memory?
>
> Hi Ganesan,
> putting your code into external memory is an unwise thing to do. At
> best, you get half the nominal processor MIPS (that is, when your SDRAM
> is optimally configured for the SHARC, and is capable of continuous full
> speed data transfer), because of the 48bit instruction word width of the
> SHARC (i.e. requiring two 32bit word transfers from memory). At worst,
> you have a loop crossing SDRAM page boundaries, which requires about 8
> or more cycles to fetch one instruction (depending on SDRAM make),
> effectively reducing your SHARC to a 5MHz processor.

well, first i want to mention that most sharcs can adress external
instructions in external 48bit wide SRAM with zero waitstates. so
ideally an external instruction can be fetched and executed in the
same time as an internal instruction.

the exceptions to this are:
- collisions on the external bus (adding one extra cycle)
- the 21065
- the 21160 (don't know too much about this one yet)

the second point (21065) is well described by Andor above, although
i'd like to mention that if you do use external SRAM instead of
SDRAM then the result will be a bit better.

another point is, every instruction (external as well as internal
ones) are cached if they conflict with a data move on the pm-bus.
i remember a case (a DCT) where the execution time of the
external code was "only" a factor of abt. two above the internally
executed code. though all data was internal then. and i have to
admit, this was nearly my best case. the worst case was a factor
of abt. 25: code fetch, pm-data fetch and dm-data fetch from the
external SDRAM with one hold cycle (due to problems with the on-
chip SDRAM controller of silicon rev 0.1).

BTW the best case was memcpyPD() with code external and all
data internal. with large arrays you nearly get 1 cycle/copy. > So if you run into timing problems due to processing speed, try
> to copy
> the code internal PM memory and then run it from there.
>
> I've written quite a couple of SHARC programs (in assembler), and the
> largest of them required just a little more than 2k instruction words.
> The 61 has 8k words instruction space, plenty if you program in
> assembler. Maybe it would be worth it for you to switch from C.

three things come to my mind here:
- you can also use bank1 of the internal memory for additional
code (but it eats up data memory).
- the overlay manager. i tried it abt. a year ago and IIRC it
couldn't handle c functions. this might have changed.
- there are applications wich are larger than 2k code even if
written in assembler. and large programs are defintely
easier to debug when written in a high level language like C
(which blows up the code size even more).

so for the last point: _sometimes_ you can't get away without
placing code externally, especially on the 21065 ...

Regards,
Michael

--
/* */