Hi, I am using the C31 DSK and the Code Composer and as I am quite new in the DSP field, I have "newbie" questions. How can I calculate the execution speed of the C31? What would be the execution time of the following part of code: for (n=0; n <= (100); n++) { S[n]=0; } Is there a way at all to predict the execution time? In whitch TI document can I find more about this topic? As the DSK has only 2K RAM I d like to know how much my C code uses. Is there a possibility to check this within CodeComposer? If I define an array: long double a[50]; how much memory will be reserved for it? Where is the connection between the float/double/long double and the short/single-precision/extended precision floating point format mentioned in the TMS320C3x Users guide? regards, Andreas |
|
processing time and memory usage
Started by ●July 11, 2002
Reply by ●July 12, 20022002-07-12
Hello Andreas Your first question 'how fast is memory filling' will depend on a few things. The answer can be found in the C3x Users Guide (device guide), but as you are in the learning, here are some answers that you can compare to as you go along. External memory writes take a minimum of 2 cycles each (0 ws) and cannot be parallelized since there is only one bus! Look at the timing diagrams in the UG or data sheet. You may want to ask why reads are 1 cycle and writes are 2 cycles. Hint: Writes to the bus are 'posted' which allows the CPU to continue running internaly and as long as there are no conflicts, no speed penalty. A 2 cycle write also makes for an easy /CE controlled write interface. Internal writes (reads or write/read) can be parallelized achieving 2 writes per cycle, or 4x that of external 0-ws memory. I try to point this out as much as I can since this can be huge advantage compared to external memory! If your loop counter is global, static or not fully optimized as a repeat block value it will likely be kept in memory and updated on each loop (IE slow) The compiler often needs some help in setting up parallel writes. In particular, consider how the code you have shown will be looked at by the compiler. Basicaly you did not provide many hints. You need to remember that the compiler is trying to minimize resources as well as cycles. In addition, some things are simply not that easy to teach the compiler. For example. - Parallel writes only come in pairs. If the repeat count is not even you cant expect the compiler to perform parallel write. Instead, consider the following which does achieve parallel write code generation (using -o3 -mr and -ou optimizations). Excluding setup, and assuming the destination(s) is on chip and there are no other code or dma conflicts, this will take 50 cycles. However, if these conditions are not met, your milage could vary considerably! Program Data My code Your Code internal internal 50 100 Different memory spaces internal internal 50 100 code/data same memory spaces, RPTS internal internal 100 200 code/data same memory spaces, RPTB external(0ws) internal 50 100 internal external(0ws) 200 200 external(0ws) external(0ws) 200 200 RPTS loop external(0ws) external(0ws) 300 300 RPTB loop (must also fetch code) main() { int *p1, *p2; int x; p1 = (int *) 0x809800 ; p2 = (int *) 0x809800+50; for(x=0;x<100/2;x++) { *p1++=0; *p2++=0; } } Code Size --------- Generate a map file (an output from the linker) and open it as a text document in CC. You may be surprised how small many programs can be. Again, just like the memory fill example, knowing how to minimize things can help considerably. Its not that hard to do once you have completed a few simple programs. long doubles ------------ 'float' and 'double' are both treated by the compiler as 32 bit floats. Internaly the c3x registers are 40 bits with the bottom 8 extended precision bits not normaly being saved. Additionaly, 32 bit inputs from memory are of only 32 bit precision. When 'long double' is selected a two word structure is created which fully saves the 40 bit registers. This option is great for higher precision needs such as when computing coefficients, but should be avoided for high speed DSP work. If you feel that you absolutely need the precision, it often turns out that there are ways to get that precision from rewriting your code. Taking advantage of floating point ---------------------------------- My best example of how to take advantage of floating point is using differential compression on audio, video and other data (TI and I have a patent on this). What I found out is that floating point is inherently the same as ADPCM when constrained to log base 2, but with the additional advantage of having a huge dynamic range (fixed point does not have an exponent). A simple differential (first derivitive) at the front and and an integration at the back end is all that is needed and you can process the data even when its 'compressed' since the data is merely a pre-filtered floating point data stream. The PAR_EQ.C example that now ships with the DSK software shows just how effective this can be. It is set up to LDC compress the incoming audio data, chop off most of the mantissa bits, and then pass that data stream through 10 stages of IIR filtering. Normaly this would make almost any data anomoly surface with ease, but most people cant hear the effect even when there are NO mantissa bits so imagine what happens when you dont chop them all out. Float/Int/Long Double Connection in hardware -------- Have a look at how the floating point extended precision registers are set up in the register file. You should quickly notice that these 40 bit registers are accessed as integers using the lower 32 bits and as floats using the upper 32 bits. The 'fix' and 'float' assembler codes are then used to convert from one format to another. Hope this helps Best regards, Keith Larson ===================== At 10:13 PM 7/11/02 -0000, you wrote: Hi, I am using the C31 DSK and the Code Composer and as I am quite new in the DSP field, I have "newbie" questions. How can I calculate the execution speed of the C31? What would be the execution time of the following part of code: for (n=0; n <= (100); n++) { S[n]=0; } Is there a way at all to predict the execution time? In which TI document can I find more about this topic? As the DSK has only 2K RAM I d like to know how much my C code uses. Is there a possibility to check this within CodeComposer? If I define an array: long double a[50]; how much memory will be reserved for it? Where is the connection between the float/double/long double and the short/single-precision/extended precision floating point format mentioned in the TMS320C3x Users guide? regards, Andreas +-----------+ |Keith Larson | |Member Group Technical Staff | |Texas Instruments Incorporated | | | | 281-274-3288 | | | | www.micro.ti.com/~klarson | |-----------+ | TMS320C3x/C4x/VC33 Applications | | | | TMS320VC33 | | The lowest cost and lowest power 500 uw/mflop | | floating point DSP on the planet! | +-----------+ |