On Apr 29, 10:48 pm, Brad Griffis <bradgrif...@hotmail.com> wrote:
> jleslie48 wrote:
> > On Apr 29, 12:18 pm, "William C Bonner <wbon...@wimsworld.com>"
> > <wimbon...@gmail.com> wrote:
> >> On Apr 28, 5:27 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> >>> On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>"
> >>> <wimbon...@gmail.com> wrote:
> >>>> Are you writing a program that utilizes the TI DSP BIOS, or just runs
> >>>> straight on the processor with no BIOS?
> >>>> Are you using all 256k of DSP ram, or have you reserved up to 64k of
> >>>> it for cache, and possibly enabled the cache controller?
> >>>> How many of your variables are automatic / stack variables, vs how
> >>>> many are global?
> >>>> I believe the pragma before any variables you want to label for a
> >>>> different section is #pragma DATA_SECTION("sectionname") but I don't
> >>>> have my compiler in front of me so I can't be sure. If that's correct,
> >>>> then it goes along with #pragma CODE_SECTION("othersectionname").
> >>>> Take note that the C++ does not declare the variable in the pragma,
> >>>> while the C version does.
> >>>> On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote:
> >>>>> I'm running way too slow, but I know I've done a few things
> >>>>> inefficiently on in my C program.
> >>>>> 1) I ran out of IRAM memory so I moved all my variables to ERAM.
> >>>>> What did this cost me?
> >>>>> 1A) How can I pick and choose where in memory my C++ variables
> >>>>> reside?
> >>>>> 2) instead of using 'float variablea;" I used 'double variablea;'
> >>>>> what did this cost me and what can I expect by changing all my
> >>>>> variables to float (32 bit vs 64 bit.) ?
> >>>>> 3) How else can I effect the runtime of my program, I see there is a
> >>>>> clock properties, setting, I know by removing all my fprintf's I pick
> >>>>> up save some 20% of the runtime, What about switching from debug to
> >>>>> release mode, or something else I haven't considered?
> >>> I'm using the TI DSP with BIOS.
> >>> "Are you using all 256k of DSP ram, or have you reserved up to 64k of
> >>> it for cache, and possibly enabled the cache controller?"
> >>> I don't know about the 256k DSP ram, or the reserve 64k cache, how
> >>> would check and for that matter what are the implications of it? I'm
> >>> new to DSP programming, and I've gotten as far as to get my routines
> >>> and algorithms to run, but now I need to optimize and I'm not sure how
> >>> to proceed.
> >>> " How many of your variables are automatic / stack variables, vs how
> >>> many are global?"
> >>> I was using mostly global static variables, specifically an array of
> >>> 10,000 of a structure consisting of several double values:
> >>> typedef struct {
> >>> double dtimeindex;
> >>> double damplitude;
> >>> double dampfrombestfitline;
> >>> double dsdvalue;
> >>> } itmlstrec_type;
> >>> itmlstrec_type itemlist [10000];
> >>> this is of course over the top, but in the PC world of 2gb memory
> >>> machines, out-of-sight out of mind. Now that I'm dealing with a real
> >>> machine, I have remember my roots and build and design clean. The
> >>> above stucture is clearly full of pork, and needs to be trimmed. #1)
> >>> float will cut the size in half, and as I understand it,#2) the C6713
> >>> chip has a floating point math that's fast, but only for the 32-bit
> >>> version, with the 64-bit floating point precision values, I'm on the
> >>> slow boat to china...
> >>> "#pragma CODE_SECTION("othersectionname").
> >>> Take note that the C++ does not declare the variable in the pragma,
> >>> while the C version does."
> >>> pragma is fine with me, I've seen it used before but I've personally
> >>> never had the need to use it. I'm programming in straight C.- Hide quoted text -
> >> I've spent the las two years porting code to a DSP from a windows
> >> environment, so have gone through many of the problems you are facing
> >> now.
>
> >> I'm not using the BIOS environment. My build environment may be
> >> significantly different because of that. I'm used to using a linker
> >> command file with both a SECTIONS and MEMORY chunk in it, that map the
> >> memory on my board, and then map the symbols into various memory
> >> chunks. I'm not sure if you are using the BIOS if those items may be
> >> configured graphicaly in the environment instead of in a text cmd
> >> file.
>
> >> I'm manually including the CSL (Chip Support Library) headers and
> >> linking to the csl library. My memory map uses only 192k of internal
> >> ram, and then I explicitly call the CSL call to enable caching of my
> >> external ram using the other 64k of internal ram. (I'm using the 6713
> >> DSP, which has 256k of internal L2 ram, up to 64k of which can be used
> >> by the cache controller)
>
> >> Your simple structure above takes up 32 bytes, so an array of 10,000
> >> is taking up 320k, or 0x4E200. That would mean that it won't fit in
> >> internal memory at all on a 6713. Converting to use floats instead of
> >> doubles would drop you to 160k, which would at least fit in internal
> >> ram, and leave you at least 32k for other code or variables, (or 96k
> >> if you are not using cacheing.)
>
> >> On a completely different subject, I think you started asking
> >> questions on a DSP mailing list that I follow and were flamed by one
> >> person for asking uninformed questions. I'd recommend sticking with
> >> that list, and just fine tuning your questions a bit more with as much
> >> supporting evidence as possible. For me, reading that list is a much
> >> more common happening than news feeds. Usually the worst that happens
> >> is to be completely ignored.
>
> >> Wim.
>
> > Wim,
>
> > Thanks very much for your analysis. Monday morning will see how well
> > I run in float mode, and whether I fit back into IRAM memory space,
> > and most importantly if my runtime changes significantly. One more
> > follow up, what exactly is the Caching setting and where do I change
> > it?
>
> > Jonathan
>
> In BIOS tcf GUI right-click on Global Settings and go to Properties.
> Then on 621x/671x tab you can set the cache mode to the setting you
> want. The 4-way cache corresponds to L2 being split as 64k cache and
> 192k SRAM. Be sure that you also set the MAR bitmask to 0x0001 such
> that your external SDRAM is cacheable. That will make a big difference
> when it comes to your performance with external code/data. Note that if
> yo use select the 4-way cache you must MANUALLY make sure that your IRAM
> section is not bigger than 192k or else you will probably have some
> run-time failure.
>
> In terms of performance a single-precision floating point multiply ties
> up a functional unit for one cycle whereas double-precision floating
> point multiply ties up the functional unit for 4 cycles. There's also
> the obvious size differences which can affect your performance by using
> up more registers, etc.
>
> Brad
well, converting from double to float was easy enough, but it didn't
change my results. I'm still way overbudget on time. I clocked
memset(data,0,40000) at 6.6ms in ERAM. There's no way I can do any
calculations in ERAM and still fit in my time budget of 15ms.
Initially when I build using IRAM only, I get an out of .far memory
link error. I reduced my array to struct struca[1000] just to see if
1/10 of the memory would fit. I didn't get a .far memory link error,
but a whole host of other ones popped up. Any Idea why?
I see in the TCF a whole lot of stuff is loaded into IRAM, (printf,
const, string functions) can I realistically move them to ERAM since
all I care about is the number crunching, and will I really move from
milli Second runtimes to micro Second timeframes by moving all data
to IRAM vs ERAM?