Forums

speeding up my runtime on a c6713.

Started by jleslie48 April 28, 2007
I'm running way too slow, but I know I've done a few things
inefficiently on in my C program.

1) I ran out of IRAM memory  so I moved all my variables to ERAM.
What did this cost me?
1A)   How can I pick and choose where in memory my C++ variables
reside?

2) instead of using 'float variablea;" I used 'double variablea;'
what did this cost me and what can I expect by changing all my
variables to float (32 bit vs 64 bit.) ?

3) How else can I effect the runtime of my program, I see there is a
clock properties, setting, I know by removing all my fprintf's I pick
up save some 20% of the runtime, What about switching from debug to
release mode,  or something else I haven't considered?

Are you writing a program that utilizes the TI DSP BIOS, or just runs
straight on the processor with no BIOS?

Are you using all 256k of DSP ram, or have you reserved up to 64k of
it for cache, and possibly enabled the cache controller?

How many of your variables are automatic / stack variables, vs how
many are global?

I believe the pragma before any variables you want to label for a
different section is #pragma DATA_SECTION("sectionname") but I don't
have my compiler in front of me so I can't be sure. If that's correct,
then it goes along with #pragma CODE_SECTION("othersectionname").
Take note that the C++ does not declare the variable in the pragma,
while the C version does.

On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote:
> I'm running way too slow, but I know I've done a few things > inefficiently on in my C program. > > 1) I ran out of IRAM memory so I moved all my variables to ERAM. > What did this cost me? > 1A) How can I pick and choose where in memory my C++ variables > reside? > > 2) instead of using 'float variablea;" I used 'double variablea;' > what did this cost me and what can I expect by changing all my > variables to float (32 bit vs 64 bit.) ? > > 3) How else can I effect the runtime of my program, I see there is a > clock properties, setting, I know by removing all my fprintf's I pick > up save some 20% of the runtime, What about switching from debug to > release mode, or something else I haven't considered?
On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>"
<wimbon...@gmail.com> wrote:
> Are you writing a program that utilizes the TI DSP BIOS, or just runs > straight on the processor with no BIOS? > > Are you using all 256k of DSP ram, or have you reserved up to 64k of > it for cache, and possibly enabled the cache controller? > > How many of your variables are automatic / stack variables, vs how > many are global? > > I believe the pragma before any variables you want to label for a > different section is #pragma DATA_SECTION("sectionname") but I don't > have my compiler in front of me so I can't be sure. If that's correct, > then it goes along with #pragma CODE_SECTION("othersectionname"). > Take note that the C++ does not declare the variable in the pragma, > while the C version does. > > On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote: > > > I'm running way too slow, but I know I've done a few things > > inefficiently on in my C program. > > > 1) I ran out of IRAM memory so I moved all my variables to ERAM. > > What did this cost me? > > 1A) How can I pick and choose where in memory my C++ variables > > reside? > > > 2) instead of using 'float variablea;" I used 'double variablea;' > > what did this cost me and what can I expect by changing all my > > variables to float (32 bit vs 64 bit.) ? > > > 3) How else can I effect the runtime of my program, I see there is a > > clock properties, setting, I know by removing all my fprintf's I pick > > up save some 20% of the runtime, What about switching from debug to > > release mode, or something else I haven't considered?
I'm using the TI DSP with BIOS. "Are you using all 256k of DSP ram, or have you reserved up to 64k of it for cache, and possibly enabled the cache controller?" I don't know about the 256k DSP ram, or the reserve 64k cache, how would check and for that matter what are the implications of it? I'm new to DSP programming, and I've gotten as far as to get my routines and algorithms to run, but now I need to optimize and I'm not sure how to proceed. " How many of your variables are automatic / stack variables, vs how many are global?" I was using mostly global static variables, specifically an array of 10,000 of a structure consisting of several double values: typedef struct { double dtimeindex; double damplitude; double dampfrombestfitline; double dsdvalue; } itmlstrec_type; itmlstrec_type itemlist [10000]; this is of course over the top, but in the PC world of 2gb memory machines, out-of-sight out of mind. Now that I'm dealing with a real machine, I have remember my roots and build and design clean. The above stucture is clearly full of pork, and needs to be trimmed. #1) float will cut the size in half, and as I understand it,#2) the C6713 chip has a floating point math that's fast, but only for the 32-bit version, with the 64-bit floating point precision values, I'm on the slow boat to china... "#pragma CODE_SECTION("othersectionname"). Take note that the C++ does not declare the variable in the pragma, while the C version does." pragma is fine with me, I've seen it used before but I've personally never had the need to use it. I'm programming in straight C.
On Apr 28, 5:27 pm, jleslie48 <j...@jonathanleslie.com> wrote:
> On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>" > <wimbon...@gmail.com> wrote: > > Are you writing a program that utilizes the TI DSP BIOS, or just runs > > straight on the processor with no BIOS? > > > Are you using all 256k of DSP ram, or have you reserved up to 64k of > > it for cache, and possibly enabled the cache controller? > > > How many of your variables are automatic / stack variables, vs how > > many are global? > > > I believe the pragma before any variables you want to label for a > > different section is #pragma DATA_SECTION("sectionname") but I don't > > have my compiler in front of me so I can't be sure. If that's correct, > > then it goes along with #pragma CODE_SECTION("othersectionname"). > > Take note that the C++ does not declare the variable in the pragma, > > while the C version does. > > > On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote: > > > > I'm running way too slow, but I know I've done a few things > > > inefficiently on in my C program. > > > > 1) I ran out of IRAM memory so I moved all my variables to ERAM. > > > What did this cost me? > > > 1A) How can I pick and choose where in memory my C++ variables > > > reside? > > > > 2) instead of using 'float variablea;" I used 'double variablea;' > > > what did this cost me and what can I expect by changing all my > > > variables to float (32 bit vs 64 bit.) ? > > > > 3) How else can I effect the runtime of my program, I see there is a > > > clock properties, setting, I know by removing all my fprintf's I pick > > > up save some 20% of the runtime, What about switching from debug to > > > release mode, or something else I haven't considered? > > I'm using the TI DSP with BIOS. > > "Are you using all 256k of DSP ram, or have you reserved up to 64k of > it for cache, and possibly enabled the cache controller?" > > I don't know about the 256k DSP ram, or the reserve 64k cache, how > would check and for that matter what are the implications of it? I'm > new to DSP programming, and I've gotten as far as to get my routines > and algorithms to run, but now I need to optimize and I'm not sure how > to proceed. > > " How many of your variables are automatic / stack variables, vs how > many are global?" > > I was using mostly global static variables, specifically an array of > 10,000 of a structure consisting of several double values: > > typedef struct { > double dtimeindex; > double damplitude; > double dampfrombestfitline; > double dsdvalue; > } itmlstrec_type; > > itmlstrec_type itemlist [10000]; > > this is of course over the top, but in the PC world of 2gb memory > machines, out-of-sight out of mind. Now that I'm dealing with a real > machine, I have remember my roots and build and design clean. The > above stucture is clearly full of pork, and needs to be trimmed. #1) > float will cut the size in half, and as I understand it,#2) the C6713 > chip has a floating point math that's fast, but only for the 32-bit > version, with the 64-bit floating point precision values, I'm on the > slow boat to china... > > "#pragma CODE_SECTION("othersectionname"). > Take note that the C++ does not declare the variable in the pragma, > while the C version does." > > pragma is fine with me, I've seen it used before but I've personally > never had the need to use it. I'm programming in straight C.- Hide quoted text -
I've spent the las two years porting code to a DSP from a windows environment, so have gone through many of the problems you are facing now. I'm not using the BIOS environment. My build environment may be significantly different because of that. I'm used to using a linker command file with both a SECTIONS and MEMORY chunk in it, that map the memory on my board, and then map the symbols into various memory chunks. I'm not sure if you are using the BIOS if those items may be configured graphicaly in the environment instead of in a text cmd file. I'm manually including the CSL (Chip Support Library) headers and linking to the csl library. My memory map uses only 192k of internal ram, and then I explicitly call the CSL call to enable caching of my external ram using the other 64k of internal ram. (I'm using the 6713 DSP, which has 256k of internal L2 ram, up to 64k of which can be used by the cache controller) Your simple structure above takes up 32 bytes, so an array of 10,000 is taking up 320k, or 0x4E200. That would mean that it won't fit in internal memory at all on a 6713. Converting to use floats instead of doubles would drop you to 160k, which would at least fit in internal ram, and leave you at least 32k for other code or variables, (or 96k if you are not using cacheing.) On a completely different subject, I think you started asking questions on a DSP mailing list that I follow and were flamed by one person for asking uninformed questions. I'd recommend sticking with that list, and just fine tuning your questions a bit more with as much supporting evidence as possible. For me, reading that list is a much more common happening than news feeds. Usually the worst that happens is to be completely ignored. Wim.
On Apr 29, 12:18 pm, "William C Bonner <wbon...@wimsworld.com>"
<wimbon...@gmail.com> wrote:
> On Apr 28, 5:27 pm, jleslie48 <j...@jonathanleslie.com> wrote: > > > > > On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>" > > <wimbon...@gmail.com> wrote: > > > Are you writing a program that utilizes the TI DSP BIOS, or just runs > > > straight on the processor with no BIOS? > > > > Are you using all 256k of DSP ram, or have you reserved up to 64k of > > > it for cache, and possibly enabled the cache controller? > > > > How many of your variables are automatic / stack variables, vs how > > > many are global? > > > > I believe the pragma before any variables you want to label for a > > > different section is #pragma DATA_SECTION("sectionname") but I don't > > > have my compiler in front of me so I can't be sure. If that's correct, > > > then it goes along with #pragma CODE_SECTION("othersectionname"). > > > Take note that the C++ does not declare the variable in the pragma, > > > while the C version does. > > > > On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote: > > > > > I'm running way too slow, but I know I've done a few things > > > > inefficiently on in my C program. > > > > > 1) I ran out of IRAM memory so I moved all my variables to ERAM. > > > > What did this cost me? > > > > 1A) How can I pick and choose where in memory my C++ variables > > > > reside? > > > > > 2) instead of using 'float variablea;" I used 'double variablea;' > > > > what did this cost me and what can I expect by changing all my > > > > variables to float (32 bit vs 64 bit.) ? > > > > > 3) How else can I effect the runtime of my program, I see there is a > > > > clock properties, setting, I know by removing all my fprintf's I pick > > > > up save some 20% of the runtime, What about switching from debug to > > > > release mode, or something else I haven't considered? > > > I'm using the TI DSP with BIOS. > > > "Are you using all 256k of DSP ram, or have you reserved up to 64k of > > it for cache, and possibly enabled the cache controller?" > > > I don't know about the 256k DSP ram, or the reserve 64k cache, how > > would check and for that matter what are the implications of it? I'm > > new to DSP programming, and I've gotten as far as to get my routines > > and algorithms to run, but now I need to optimize and I'm not sure how > > to proceed. > > > " How many of your variables are automatic / stack variables, vs how > > many are global?" > > > I was using mostly global static variables, specifically an array of > > 10,000 of a structure consisting of several double values: > > > typedef struct { > > double dtimeindex; > > double damplitude; > > double dampfrombestfitline; > > double dsdvalue; > > } itmlstrec_type; > > > itmlstrec_type itemlist [10000]; > > > this is of course over the top, but in the PC world of 2gb memory > > machines, out-of-sight out of mind. Now that I'm dealing with a real > > machine, I have remember my roots and build and design clean. The > > above stucture is clearly full of pork, and needs to be trimmed. #1) > > float will cut the size in half, and as I understand it,#2) the C6713 > > chip has a floating point math that's fast, but only for the 32-bit > > version, with the 64-bit floating point precision values, I'm on the > > slow boat to china... > > > "#pragma CODE_SECTION("othersectionname"). > > Take note that the C++ does not declare the variable in the pragma, > > while the C version does." > > > pragma is fine with me, I've seen it used before but I've personally > > never had the need to use it. I'm programming in straight C.- Hide quoted text - > > I've spent the las two years porting code to a DSP from a windows > environment, so have gone through many of the problems you are facing > now. > > I'm not using the BIOS environment. My build environment may be > significantly different because of that. I'm used to using a linker > command file with both a SECTIONS and MEMORY chunk in it, that map the > memory on my board, and then map the symbols into various memory > chunks. I'm not sure if you are using the BIOS if those items may be > configured graphicaly in the environment instead of in a text cmd > file. > > I'm manually including the CSL (Chip Support Library) headers and > linking to the csl library. My memory map uses only 192k of internal > ram, and then I explicitly call the CSL call to enable caching of my > external ram using the other 64k of internal ram. (I'm using the 6713 > DSP, which has 256k of internal L2 ram, up to 64k of which can be used > by the cache controller) > > Your simple structure above takes up 32 bytes, so an array of 10,000 > is taking up 320k, or 0x4E200. That would mean that it won't fit in > internal memory at all on a 6713. Converting to use floats instead of > doubles would drop you to 160k, which would at least fit in internal > ram, and leave you at least 32k for other code or variables, (or 96k > if you are not using cacheing.) > > On a completely different subject, I think you started asking > questions on a DSP mailing list that I follow and were flamed by one > person for asking uninformed questions. I'd recommend sticking with > that list, and just fine tuning your questions a bit more with as much > supporting evidence as possible. For me, reading that list is a much > more common happening than news feeds. Usually the worst that happens > is to be completely ignored. > > Wim.
Wim, Thanks very much for your analysis. Monday morning will see how well I run in float mode, and whether I fit back into IRAM memory space, and most importantly if my runtime changes significantly. One more follow up, what exactly is the Caching setting and where do I change it? Jonathan
jleslie48 wrote:
> On Apr 29, 12:18 pm, "William C Bonner <wbon...@wimsworld.com>" > <wimbon...@gmail.com> wrote: >> On Apr 28, 5:27 pm, jleslie48 <j...@jonathanleslie.com> wrote: >> >> >> >>> On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>" >>> <wimbon...@gmail.com> wrote: >>>> Are you writing a program that utilizes the TI DSP BIOS, or just runs >>>> straight on the processor with no BIOS? >>>> Are you using all 256k of DSP ram, or have you reserved up to 64k of >>>> it for cache, and possibly enabled the cache controller? >>>> How many of your variables are automatic / stack variables, vs how >>>> many are global? >>>> I believe the pragma before any variables you want to label for a >>>> different section is #pragma DATA_SECTION("sectionname") but I don't >>>> have my compiler in front of me so I can't be sure. If that's correct, >>>> then it goes along with #pragma CODE_SECTION("othersectionname"). >>>> Take note that the C++ does not declare the variable in the pragma, >>>> while the C version does. >>>> On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote: >>>>> I'm running way too slow, but I know I've done a few things >>>>> inefficiently on in my C program. >>>>> 1) I ran out of IRAM memory so I moved all my variables to ERAM. >>>>> What did this cost me? >>>>> 1A) How can I pick and choose where in memory my C++ variables >>>>> reside? >>>>> 2) instead of using 'float variablea;" I used 'double variablea;' >>>>> what did this cost me and what can I expect by changing all my >>>>> variables to float (32 bit vs 64 bit.) ? >>>>> 3) How else can I effect the runtime of my program, I see there is a >>>>> clock properties, setting, I know by removing all my fprintf's I pick >>>>> up save some 20% of the runtime, What about switching from debug to >>>>> release mode, or something else I haven't considered? >>> I'm using the TI DSP with BIOS. >>> "Are you using all 256k of DSP ram, or have you reserved up to 64k of >>> it for cache, and possibly enabled the cache controller?" >>> I don't know about the 256k DSP ram, or the reserve 64k cache, how >>> would check and for that matter what are the implications of it? I'm >>> new to DSP programming, and I've gotten as far as to get my routines >>> and algorithms to run, but now I need to optimize and I'm not sure how >>> to proceed. >>> " How many of your variables are automatic / stack variables, vs how >>> many are global?" >>> I was using mostly global static variables, specifically an array of >>> 10,000 of a structure consisting of several double values: >>> typedef struct { >>> double dtimeindex; >>> double damplitude; >>> double dampfrombestfitline; >>> double dsdvalue; >>> } itmlstrec_type; >>> itmlstrec_type itemlist [10000]; >>> this is of course over the top, but in the PC world of 2gb memory >>> machines, out-of-sight out of mind. Now that I'm dealing with a real >>> machine, I have remember my roots and build and design clean. The >>> above stucture is clearly full of pork, and needs to be trimmed. #1) >>> float will cut the size in half, and as I understand it,#2) the C6713 >>> chip has a floating point math that's fast, but only for the 32-bit >>> version, with the 64-bit floating point precision values, I'm on the >>> slow boat to china... >>> "#pragma CODE_SECTION("othersectionname"). >>> Take note that the C++ does not declare the variable in the pragma, >>> while the C version does." >>> pragma is fine with me, I've seen it used before but I've personally >>> never had the need to use it. I'm programming in straight C.- Hide quoted text - >> I've spent the las two years porting code to a DSP from a windows >> environment, so have gone through many of the problems you are facing >> now. >> >> I'm not using the BIOS environment. My build environment may be >> significantly different because of that. I'm used to using a linker >> command file with both a SECTIONS and MEMORY chunk in it, that map the >> memory on my board, and then map the symbols into various memory >> chunks. I'm not sure if you are using the BIOS if those items may be >> configured graphicaly in the environment instead of in a text cmd >> file. >> >> I'm manually including the CSL (Chip Support Library) headers and >> linking to the csl library. My memory map uses only 192k of internal >> ram, and then I explicitly call the CSL call to enable caching of my >> external ram using the other 64k of internal ram. (I'm using the 6713 >> DSP, which has 256k of internal L2 ram, up to 64k of which can be used >> by the cache controller) >> >> Your simple structure above takes up 32 bytes, so an array of 10,000 >> is taking up 320k, or 0x4E200. That would mean that it won't fit in >> internal memory at all on a 6713. Converting to use floats instead of >> doubles would drop you to 160k, which would at least fit in internal >> ram, and leave you at least 32k for other code or variables, (or 96k >> if you are not using cacheing.) >> >> On a completely different subject, I think you started asking >> questions on a DSP mailing list that I follow and were flamed by one >> person for asking uninformed questions. I'd recommend sticking with >> that list, and just fine tuning your questions a bit more with as much >> supporting evidence as possible. For me, reading that list is a much >> more common happening than news feeds. Usually the worst that happens >> is to be completely ignored. >> >> Wim. > > Wim, > > Thanks very much for your analysis. Monday morning will see how well > I run in float mode, and whether I fit back into IRAM memory space, > and most importantly if my runtime changes significantly. One more > follow up, what exactly is the Caching setting and where do I change > it? > > Jonathan > >
In BIOS tcf GUI right-click on Global Settings and go to Properties. Then on 621x/671x tab you can set the cache mode to the setting you want. The 4-way cache corresponds to L2 being split as 64k cache and 192k SRAM. Be sure that you also set the MAR bitmask to 0x0001 such that your external SDRAM is cacheable. That will make a big difference when it comes to your performance with external code/data. Note that if yo use select the 4-way cache you must MANUALLY make sure that your IRAM section is not bigger than 192k or else you will probably have some run-time failure. In terms of performance a single-precision floating point multiply ties up a functional unit for one cycle whereas double-precision floating point multiply ties up the functional unit for 4 cycles. There's also the obvious size differences which can affect your performance by using up more registers, etc. Brad
On Apr 29, 10:48 pm, Brad Griffis <bradgrif...@hotmail.com> wrote:
> jleslie48 wrote: > > On Apr 29, 12:18 pm, "William C Bonner <wbon...@wimsworld.com>" > > <wimbon...@gmail.com> wrote: > >> On Apr 28, 5:27 pm, jleslie48 <j...@jonathanleslie.com> wrote: > > >>> On Apr 28, 7:09 pm, "William C Bonner <wbon...@wimsworld.com>" > >>> <wimbon...@gmail.com> wrote: > >>>> Are you writing a program that utilizes the TI DSP BIOS, or just runs > >>>> straight on the processor with no BIOS? > >>>> Are you using all 256k of DSP ram, or have you reserved up to 64k of > >>>> it for cache, and possibly enabled the cache controller? > >>>> How many of your variables are automatic / stack variables, vs how > >>>> many are global? > >>>> I believe the pragma before any variables you want to label for a > >>>> different section is #pragma DATA_SECTION("sectionname") but I don't > >>>> have my compiler in front of me so I can't be sure. If that's correct, > >>>> then it goes along with #pragma CODE_SECTION("othersectionname"). > >>>> Take note that the C++ does not declare the variable in the pragma, > >>>> while the C version does. > >>>> On Apr 28, 1:41 pm, jleslie48 <j...@jonathanleslie.com> wrote: > >>>>> I'm running way too slow, but I know I've done a few things > >>>>> inefficiently on in my C program. > >>>>> 1) I ran out of IRAM memory so I moved all my variables to ERAM. > >>>>> What did this cost me? > >>>>> 1A) How can I pick and choose where in memory my C++ variables > >>>>> reside? > >>>>> 2) instead of using 'float variablea;" I used 'double variablea;' > >>>>> what did this cost me and what can I expect by changing all my > >>>>> variables to float (32 bit vs 64 bit.) ? > >>>>> 3) How else can I effect the runtime of my program, I see there is a > >>>>> clock properties, setting, I know by removing all my fprintf's I pick > >>>>> up save some 20% of the runtime, What about switching from debug to > >>>>> release mode, or something else I haven't considered? > >>> I'm using the TI DSP with BIOS. > >>> "Are you using all 256k of DSP ram, or have you reserved up to 64k of > >>> it for cache, and possibly enabled the cache controller?" > >>> I don't know about the 256k DSP ram, or the reserve 64k cache, how > >>> would check and for that matter what are the implications of it? I'm > >>> new to DSP programming, and I've gotten as far as to get my routines > >>> and algorithms to run, but now I need to optimize and I'm not sure how > >>> to proceed. > >>> " How many of your variables are automatic / stack variables, vs how > >>> many are global?" > >>> I was using mostly global static variables, specifically an array of > >>> 10,000 of a structure consisting of several double values: > >>> typedef struct { > >>> double dtimeindex; > >>> double damplitude; > >>> double dampfrombestfitline; > >>> double dsdvalue; > >>> } itmlstrec_type; > >>> itmlstrec_type itemlist [10000]; > >>> this is of course over the top, but in the PC world of 2gb memory > >>> machines, out-of-sight out of mind. Now that I'm dealing with a real > >>> machine, I have remember my roots and build and design clean. The > >>> above stucture is clearly full of pork, and needs to be trimmed. #1) > >>> float will cut the size in half, and as I understand it,#2) the C6713 > >>> chip has a floating point math that's fast, but only for the 32-bit > >>> version, with the 64-bit floating point precision values, I'm on the > >>> slow boat to china... > >>> "#pragma CODE_SECTION("othersectionname"). > >>> Take note that the C++ does not declare the variable in the pragma, > >>> while the C version does." > >>> pragma is fine with me, I've seen it used before but I've personally > >>> never had the need to use it. I'm programming in straight C.- Hide quoted text - > >> I've spent the las two years porting code to a DSP from a windows > >> environment, so have gone through many of the problems you are facing > >> now. > > >> I'm not using the BIOS environment. My build environment may be > >> significantly different because of that. I'm used to using a linker > >> command file with both a SECTIONS and MEMORY chunk in it, that map the > >> memory on my board, and then map the symbols into various memory > >> chunks. I'm not sure if you are using the BIOS if those items may be > >> configured graphicaly in the environment instead of in a text cmd > >> file. > > >> I'm manually including the CSL (Chip Support Library) headers and > >> linking to the csl library. My memory map uses only 192k of internal > >> ram, and then I explicitly call the CSL call to enable caching of my > >> external ram using the other 64k of internal ram. (I'm using the 6713 > >> DSP, which has 256k of internal L2 ram, up to 64k of which can be used > >> by the cache controller) > > >> Your simple structure above takes up 32 bytes, so an array of 10,000 > >> is taking up 320k, or 0x4E200. That would mean that it won't fit in > >> internal memory at all on a 6713. Converting to use floats instead of > >> doubles would drop you to 160k, which would at least fit in internal > >> ram, and leave you at least 32k for other code or variables, (or 96k > >> if you are not using cacheing.) > > >> On a completely different subject, I think you started asking > >> questions on a DSP mailing list that I follow and were flamed by one > >> person for asking uninformed questions. I'd recommend sticking with > >> that list, and just fine tuning your questions a bit more with as much > >> supporting evidence as possible. For me, reading that list is a much > >> more common happening than news feeds. Usually the worst that happens > >> is to be completely ignored. > > >> Wim. > > > Wim, > > > Thanks very much for your analysis. Monday morning will see how well > > I run in float mode, and whether I fit back into IRAM memory space, > > and most importantly if my runtime changes significantly. One more > > follow up, what exactly is the Caching setting and where do I change > > it? > > > Jonathan > > In BIOS tcf GUI right-click on Global Settings and go to Properties. > Then on 621x/671x tab you can set the cache mode to the setting you > want. The 4-way cache corresponds to L2 being split as 64k cache and > 192k SRAM. Be sure that you also set the MAR bitmask to 0x0001 such > that your external SDRAM is cacheable. That will make a big difference > when it comes to your performance with external code/data. Note that if > yo use select the 4-way cache you must MANUALLY make sure that your IRAM > section is not bigger than 192k or else you will probably have some > run-time failure. > > In terms of performance a single-precision floating point multiply ties > up a functional unit for one cycle whereas double-precision floating > point multiply ties up the functional unit for 4 cycles. There's also > the obvious size differences which can affect your performance by using > up more registers, etc. > > Brad
well, converting from double to float was easy enough, but it didn't change my results. I'm still way overbudget on time. I clocked memset(data,0,40000) at 6.6ms in ERAM. There's no way I can do any calculations in ERAM and still fit in my time budget of 15ms. Initially when I build using IRAM only, I get an out of .far memory link error. I reduced my array to struct struca[1000] just to see if 1/10 of the memory would fit. I didn't get a .far memory link error, but a whole host of other ones popped up. Any Idea why? I see in the TCF a whole lot of stuff is loaded into IRAM, (printf, const, string functions) can I realistically move them to ERAM since all I care about is the number crunching, and will I really move from milli Second runtimes to micro Second timeframes by moving all data to IRAM vs ERAM?
"jleslie48" <jon@jonathanleslie.com> wrote in message 
news:1177977325.831800.118960@o5g2000hsb.googlegroups.com...

> > well, converting from double to float was easy enough, but it didn't > change my results. I'm still way overbudget on time. I clocked > memset(data,0,40000) at 6.6ms in ERAM. There's no way I can do any > calculations in ERAM and still fit in my time budget of 15ms. > Initially when I build using IRAM only, I get an out of .far memory > link error. I reduced my array to struct struca[1000] just to see if > 1/10 of the memory would fit. I didn't get a .far memory link error, > but a whole host of other ones popped up. Any Idea why?
Trying to fit to much in ?
> > I see in the TCF a whole lot of stuff is loaded into IRAM, (printf, > const, string functions) can I realistically move them to ERAM since > all I care about is the number crunching, and will I really move from > milli Second runtimes to micro Second timeframes by moving all data > to IRAM vs ERAM?
Do you really need printf ? Are you using printf or log printf ? Move the least used variables to ERAM not all of them. Move the string functions as the first thing. Look at your linker file and modify the SECTIONS one at a time and test. Moving to iram you should get a speed up. What compiler settings are you using and have you tested with both debug and release ? Selected speed most critical ? See compiler build options tab. Are you using compiler instrinsics where available ? Have you removed/turned off the unecessary bios functions ? Like rtdx etc ? Keep your operations simple so the compiler can optimise. Sometimes can help to add intermediate operations so the compiler can split things onto the different data paths. Alex
On May 2, 12:06 am, "Alex Gibson" <n...@alxx.org> wrote:
> "jleslie48" <j...@jonathanleslie.com> wrote in message > > news:1177977325.831800.118960@o5g2000hsb.googlegroups.com... > > > > > well, converting from double to float was easy enough, but it didn't > > change my results. I'm still way overbudget on time. I clocked > > memset(data,0,40000) at 6.6ms in ERAM. There's no way I can do any > > calculations in ERAM and still fit in my time budget of 15ms. > > Initially when I build using IRAM only, I get an out of .far memory > > link error. I reduced my array to struct struca[1000] just to see if > > 1/10 of the memory would fit. I didn't get a .far memory link error, > > but a whole host of other ones popped up. Any Idea why? > > Trying to fit to much in ? > > > > > I see in the TCF a whole lot of stuff is loaded into IRAM, (printf, > > const, string functions) can I realistically move them to ERAM since > > all I care about is the number crunching, and will I really move from > > milli Second runtimes to micro Second timeframes by moving all data > > to IRAM vs ERAM? > > Do you really need printf ? > Are you using printf or log printf ? > > Move the least used variables to ERAM not all of them. > Move the string functions as the first thing. > > Look at your linker file and modify the SECTIONS one at a time and test. > > Moving to iram you should get a speed up. > > What compiler settings are you using and have you tested with both debug and > release ? > > Selected speed most critical ? See compiler build options tab. > > Are you using compiler instrinsics where available ? > Have you removed/turned off the unecessary bios functions ? > Like rtdx etc ? > > Keep your operations simple so the compiler can optimise. > Sometimes can help to add intermediate operations so the compiler > can split things onto the different data paths. > > Alex
Alex, Thanks for your suggestions, all those parameters will be tested as time goes on, but I just can't help think that I"m overlooking something incredibly obvious; I am off from my expectations by a factor of 100. for example I did some time trials with data in ERAM, debug mode: for (current_sample_index = 0; current_sample_index < 4; current_sample_index++) { memset (section_value, 0, sizeof(section_value)); //sizeof section_value == 10*7*4 == 280bytes memset (pulsemarker, 0, sizeof(pulsemarker)); // sizeof pulsemarker == 10000*5*4 == 200,000bytes } this routine took 6.6milliseconds. It that consistent with ERAM memory on C6713?